Saturday, January 01, 2022

Futility Analysis and Conditional Power When Two Phase 3 Studies are Simultaneously Conducted

In late-phase clinical trials, an independent Data Monitoring Committee (DMC) is usually set up. If the clinical program includes multiple late-phase studies, the same DMC will be responsible for the entire program. With DMC, the interim analyses can be performed for different purposes:
  • The interim analysis for safety
    • with pre-specified stopping rule (for example stop the trial if the significant imbalance in # of Serious Adverse Events or in # of deaths)
    • without pre-specified stopping rule (rely on DMC members to review the overall safety)
  • The interim analysis for efficacy: To see if the new treatment is overwhelmingly better than the control group  - then stop the trial for efficacy
  • The interim analysis for futility (futility analysis): To see if the new treatment is unlikely to be better than the control group or the study will be unlikely to achieve its objective given the data at the interim – then stop the trial for futility.
There seem to be more studies with built-in futility analysis without interim analysis for overwhelming efficacy, mainly because of the concerns about the alpha-spending for efficacy. The futility analysis will have an impact on the beta-spending and the statistical power, but not on the alpha-spending. For the decision-making, regulatory agencies are usually more concerned about the alpha level (incorrectly approves a drug that does not work) or the alpha level inflation. The sponsors are more concerned about the statistical power (incorrectly concludes a drug not working while the drug is actually working).

Futility analysis usually requires calculating the Conditional Power (CP) that is defined as the probability that the final study result will be statistically significant, given the data observed thus far at the time of the interim data cut and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect (alternative hypothesis) or the effect estimated from the interim data.  

If there is one single pivotal trial, the stopping rule and the CP are relatively straightforward. However, it is uncommon that the sponsor may need to conduct two pivotal (phase 3) studies (two adequate and well-controlled (A&WC) trials in FDA's term) to demonstrate substantial evidence of effectiveness as outlined in FDA guidance for industry "Demonstrating Substantial Evidence of Effectiveness for Human Drug and Biological Products Guidance for Industry".

For a clinical program with two independent A&WC trials (usually with identical design), the futility analysis and CP calculation are a little bit more complicated. Two independent A&WC trials may have an identical design but be executed differently (i.e., may not be started at the same time; may be conducted in different geographic regions/countries; and may have different enrollment speeds,...). 

When futility analysis is performed for two A&WC trials, should the conditional powers be calculated for individual studies separately or should the conditional powers be calculated for both studies together (i.e. pooled data from both studies)? 

When there are two identical A&WC trials, the interim analysis for safety should be based on the pooled data sets from both studies because it will give a more definitive answer to the safety issues, the interim analysis for efficacy should be based on the individual study data because the decision about the overwhelming efficacy should be based on the individual study, not the integrated data from two studies; the interim analysis for futility is a little bit more complicated and the decision to use the data from an individual study or to use the data from the pooled data seems to be dependent on how close the observed results from two A&WC trials are at the time of the interim analysis. 

For futility analysis using stochastic curtailment procedure, While CPs can be calculated for each individual study assuming that the treatment effect in the remaining subjects in the same study will follow the treatment effect estimated from the data of this same study at the time of the interim data cut, 

There is an alternative way to calculate the CP, i.e., to calculate the CP for each individual study, but use the observed treatment effect from the pooled data at the interim from both studies to project the trend and pattern for the remaining subjects. 

According to the paper by Lan and Wittes (1988) "The B-Value: A Tool for Monitoring Data", the CP calculation involves the decomposition of overall critical value (B-value or B1 for example) into the sum of two statistically independent interval B-values: 
  • Bt, the value of B that accumulated up through time t when interim analysis is conducted; and 
  • (B1 - Bt), the incremental value of B that accumulates from time t through the end of the study. The legitimacy of the decomposition follows from the independence of distributions of the outcomes for successive study subjects
At the time t when the interim analysis is conducted, Bt is known and is estimated from the observed data up to the time t. (B1 - Bt) is a random variable that needs to be estimated. The conditional power is derived by fixing Bt and calculating the probability that Bt + (B1 - Bt) will exceed Z1-a/2.

To calculate the CPs when there are two identical A&WC studies, t, as a measure of the information fraction, will be different for different studies. At the time t, maybe 60% of subjects have been enrolled in study #1 while 50% of subjects are enrolled in study #2. In CP calculations, the Bt part will be obtained from the individual study. The (B1-Bt) part is estimated assuming the remaining data following the observed effect up to the interim time t, should the observed effect up to the interim time t be based on the data from the individual study or from the pooled data?

It turns out both approaches can be used: 
  • estimate the treatment differences for each individual study and calculate the CP assuming that the reminding data follows the trend and pattern based on the observed data from individual study
  • estimate the treatment difference from both studies and calculate the CP assuming that the remaining data follow the trend and pattern based on the observed data from the pooled data of two studies.           
For both of these approaches, the CPs will be calculated for each individual study (therefore one CP for each study). The difference between these two approaches is in the calculation of the (B1-Bt) part - based on the individual study itself or based on the pooled data from both studies. 

We can take a look at the famous and controversial case in Biogen's aducanumab program in Alzheimer's disease. Aducanumab program in Alzheimer's diseases consisted of two pivotal, phase 3 studies (EMERGE (study 301) and ENGAGE (study 302)), and both studies were designed the same and conducted simultaneously globally. Each study had two active arms (low dose and high dose of aducanumab) versus placebo - therefore two hypothesis tests (low dose vs. placebo and high dose vs. placebo). There was a total of four hypothesis tests (two for each study).  The protocol and SAP specified the interim analysis for futility. 

An interim analysis was performed after approximately 50% of the subjects had the opportunity to complete the Week 78 visit for both EMERGE and ENGAGE studies. An interim analysis for the futility of the primary endpoint was performed to allow early termination of the studies if it was evident that the efficacy of aducanumab was unlikely to be achieved. The futility criteria were based on conditional power, which was the chance that the primary efficacy endpoint analysis would be statistically significant in favor of aducanumab at the planned final analysis, given the data at the interim analysis. The CP was calculated assuming that the future unobserved effect was equal to the maximum likelihood estimate of what is observed in the interim data. 

For each study, two CPs were calculated. The pre-specified CP calculation was to use the pooled interim data from both EMERGE and ENGAGE studies for the (B1-Bt) part and assume that the treatment effect for the remaining of the study would follow the observed treatment effect at the interim analysis. At the interim analysis, the CPs were calculated to be 13% for low dose vs placebo and 0% for high dose vs. placebo in EMERGE study, and 11% for low dose vs placebo and 12% for high dose vs. placebo in ENGAGE study. Given all four CPs were lower than the threshold of 20% (a criterion for futility), the DMC recommended stopping both studies for futility.  Biogen followed the DMC recommendation and stopped both EMERGE and ENGAGE studies for futility
.

Only after two terminated studies were wrapped up, the reanalyses of the final data indicated that there were statistically significant treatment differences in one of the studies (the ENGAGE study). With the help of the FDA, Biogen was able to submit the BLA and obtain approval for aducanumab for Alzheimer's disease. Leading to the FDA approval, there was an advisory committee meeting to review the aducanumab data. In FDA's presentation, the conditional powers were retrospectively re-calculated - this time, the conditional powers were calculated for each individual study and assumed future unobserved effect would be similar to the interim data for each individual study (not the pooled interim data). FDA claimed that CPs using this approach were more appropriate and would have one of the four CPs above the threshold of 20% (CP=59% for high-dose vs placebo in ENGAGE study) - the studies would not be recommended for stopping for futility. 


Retrospectively, CPs calculated for each study independently (not using the pooled interim data to project the trend and pattern for the remaining data) seemed to be better in Biogen aducanumab program consisting of two A&WC trials. 

However, in a paper by Deng et al "Superiority of combining two independent trials in interim futility analysis", CP calculation using the observed treatment effects from the pooled interim data from two studies was considered a better approach. It concluded, "it is demonstrated that by leveraging data from the other study, the probability of making correct interim decision is increased if the treatment effects are similar between the two studies, and such benefit remains even if there is small to moderate between-study difference."

It is probably true that CP calculation using the pooled data at the interim to project the trend and pattern for the remainder data is a better approach if two studies are conducted in the same way and the results at the time of the interim analysis are similar. However, the CP calculation and the statistical analysis plan for interim analysis are usually pre-specified before seeing the unblinded data. At the time of the interim analysis, it is usually unknown whether or not the results (treatment effects) observed from two identical studies will be similar. Even though two A&WC studies are designed the same, the operation and execution of the trial can still be different: two studies may be conducted in different countries, enrollment speed may be different,... As evidenced by Biogen's EMERGE and ENGAGE trials, two identical designed studies may have different results - therefore calculating the CP entirely independently for each study may be more appropriate when two identical A&WC trials are conducted.   

No comments:

Post a Comment