Thursday, January 11, 2018

Statistician's nightmare - mistakes in statistical analyses of clinical trials

Statistician’s job could be risky too.

In recent news announcement, sponsor had to disclose the errors in statistical analyses. All these errors have consequences to the company’s value or even the company’s fate. I hope that the study team members who made this kind of mistakes still have a job in their company. I did have a friend ending up losing the job due to the incorrect report of the p-value.

Here are two examples. In the first example, the p-value was incorrectly calculated and announced, the later had to be corrected – very embarrassing for the statistician who made this mistake. In the second example, the mistake is more on the programming and data management side. Had the initial results been positive, the sponsor might never go back to re-assess the outcomes and the errors might never be identified.

Example #1:
Axovant Sciences (NASDAQ:AXON) today announced a correction to the data related to the Company’s investigational drug nelotanserin previously reported in its January 8, 2018 press release. In the results of the pilot Phase 2 Visual Hallucination study, the post-hoc subset analysis of patients with a baseline Scale for the Assessment of Positive Symptoms - Parkinson's Disease (SAPS-PD) score of greater than 8.0 was misreported. The previously reported data for this population (n=19) that nelotanserin treatment at 40 mg for two weeks followed by 80 mg for two weeks resulted in a 1.21 point improvement (p=0.011, unadjusted) were incorrect. While nelotanserin treatment at 40 mg for two weeks followed by 80 mg for two weeks did result in a 1.21 point improvement, the p-value was actually 0.531, unadjusted. Based on these updated results, the Company will continue to discuss a larger confirmatory nelotanserin study with the U.S. Food and Drug Administration (FDA) that is focused on patients with dementia with Lewy bodies (DLB) with motor function deficits. The Company may further evaluate nelotanserin for psychotic symptoms in DLB and Parkinson’s disease dementia (PDD) patients in future clinical studies.
Example #2:  
(note: PE: pulmonary exacerbation; PEBAC: pulmonary exacerbation blinded adjudication committee) 
Re-Assessment of Outcomes
Following database lock and unblinding of treatment assignment, the Applicant performed additional data assessments due to errors identified in the programming/data entry that impacted identification of PEs. This led to changes in the final numbers of PEs. Based on discussion the Applicant had with the PEBAC Chair, it was decided that 10 PEs initially adjudicated by the PEBAC were to be re-adjudicated by the PEBAC using complete and final subject-level information. This led to a re-adjudication by the PEBAC who were blinded to subject ID, site ID, and treatment. Result(s) of prior adjudication were not provided to the PEBAC.
 Efficacy results presented in Section 7.3 reflect the revised numbers. Further details regarding the reassessment by the PEBAC are discussed in Section 7.3.6.
7.3.6 Primary Endpoint Changes after Database Lock and Un-Blinding
Following database lock and treatment assignment un-blinding, the Applicant performed additional data assessments leading to changes in the final numbers of PEs. Specifically, per the Applicant, during a review of the ORBIT-3 and ORBIT-4 data occurring after database locking and data un-blinding (for persons involved in the data maintenance and analyses), ‘personnel identified errors in the programming done by Accenture Inc. (data analysis contract research organization (CRO)) and one data entry error that impacted identification of PEs. Because of the programming errors, the Applicant states that they chose to conduct a ‘comprehensive audit of all electronic Case Report Forms (eCRFs) entries for signs, symptoms or laboratory abnormalities as entered in the PE worksheets for all patients in ARD-3150-1201 and ARD-3150-1202’ (ORBIT-3 and ORBIT-4). From this audit, the Applicant notes ‘that no further programming errors’ were identified but instead 10 PE events (three from ORBIT-4 and seven from ORBIT-3) were found for which the PE assessment by the PEBAC was considered potentially incorrect. This was based on the premise that subject-level data provided to the PEBAC during the original PE adjudication were updated at the time of the database lock. Reasons provided are: 1) the clinical site provided update information to the eCRF after
 the initial PEBAC review (2 PEs), 2) incorrect information was supplied to the PEBAC during initial adjudication process (2 PEs), 3) inconsistency between visit dates and reported signs and symptoms (6 PEs). After discussion with the PEBAC Chair, it was decided that these 10 PEs initially deemed PEs by the PEBAC were to be re-assessed by the PEBAC using complete and final subject-level information. This led to a re-adjudication by the PEBAC during a closed session on January 25, 2017. This re-adjudication was coordinated by Synteract (Applicant’s CRO) who provided data to the PEBAC that were blinded to subject ID, site ID, and treatment. In addition, result(s) of prior adjudication were not provided. While the PEBAC was provided with subject profiles for other relevant study visits, the PEBAC focus was only on the selected visits for which data were updated or corrected.
 Because of the identified programming errors and PEBAC re-adjudication, there were two new first PEs added to the Cipro arm in ORBIT-3 and two new first PEs added to the placebo arm in ORBIT-4. Given these changes, the log-rank p-value in ORBIT-4 changed from 0.058 to 0.032 (when including sex and prior PEs strata). The p-value in ORBIT-3 changed from 0.826 to 0.974 remaining insignificant. These changes are summarized in Table 9. Note that there were no overall changes in the results of the secondary endpoints analyses from changes in PE status described above.

It is inevitable to make mistakes during the statistical analysis if there is no adequate procedures to prevent them. The following procedures can minimize the chances of making the mistakes as the examples above. 
  • Independent validation process (double programming): The probability for two independent people to make the same mistake is very very low. 
  • Dry-run process: using the dirty data, perform the statistician analysis using the dummy randomization schedule, i.e., perform the statistical analysis with the real data, but fake treatment assignment. The purpose is to do the programming work up front and to check the data upfront so that the issues and mistakes can be identified and corrected. 

No comments: