On Biostatistics and Clinical Trials: Futility Analysis and Conditional Power

Adaptive design has been used to drug development programs more efficient. According to FDA's guidance Adaptive Designs for Clinical Trials of Drugs and BiologicsGuidance for Industry, an adaptive design is defined as a clinical trial design that allows for prospectively planned modifications to one or more aspects of the design based on accumulating data from subjects in the trial. The modifications to the design based on the accumulating data from an ongoing study are through 'interim analysis'. An interim analysis is any examination of data obtained from subjects in a trial while that trial is ongoing and is not restricted to cases in which there are formal between-group comparisons. The observed data used in the interim analysis can include one or more types, such as baseline data, safety outcome data, pharmacokinetic, pharmacodynamic, biomarker data, or efficacy outcome data.

when an adaptive design is proposed, which aspect(s) of the trial to be adapted will need to be pre-specified and agreed upon by the regulatory agencies such as FDA. In the list of adaptations, the most common type of adaptive design is 'group sequential design'.

Group sequential design
Adaptations to the sample size
Adaptations to the patient population (e.e., adaptive enrichment)
Adaptations to treatment arm selection
Adaptations to patient allocation
Adaptations to endpoint selection
Adaptations to multiple design features

Group sequential design is probably the most commonly used adaptive design (even before the adaptive design concept came out). Group sequential design was once categorized as 'well-understood' adaptive design. Ironically, many studies with group sequential design may not be called 'adaptive design' and the term 'group sequential design' may not be used in the study protocol at all.

According to FDA's Adaptive Designs for Clinical Trials of Drugs and Biologics Guidance for Industry

"Group sequential designs may include rules for stopping the trial when there is sufficient evidence of efficacy to support regulatory decision-making or when there is evidence that the trial is unlikely to demonstrate efficacy, which is often called stopping for futility."
"There are a number of additional considerations for ensuring the appropriate design, conduct, and analysis of a group sequential trial. First, for group sequential methods to be valid, it is important to adhere to the prospective analytic plan and terminate the trial for efficacy only if the stopping criteria are met. Second, guidelines for stopping the trial early for futility should be implemented appropriately. Trial designs often employ nonbinding futility rules, in that the futility stopping criteria are guidelines that may or may not be followed, depending on the totality of the available interim results. The addition of such nonbinding futility guidelines to a fixed sample trial, or to a trial with appropriate group sequential stopping rules for efficacy, does not increase the Type I error probability and is often appropriate. Alternatively, a group sequential design may include binding futility rules, in that the trial should always stop if the futility criteria are met. Binding futility rules can provide some advantages in efficacy analyses (e.g., a relaxed threshold for a determination of efficacy), but the Type I error probability is controlled only if the stopping rules are followed. Therefore, if a trial continues despite meeting prespecified binding futility rules, the Agency will likely consider that trial to have failed to provide evidence of efficacy, regardless of the outcome at the final analysis. Note also that some DMCs might prefer the flexibility of nonbinding futility guidelines."

With group sequential design, interim analyses will be performed during the study to evaluate early evidence of efficacy or early evidence of futility. To stop the study for efficacy, the most common approach is so-called 'repeat significance testing'. to stop the study for futility, the most common approach is through calculating the conditional power.

Group sequential design:
The interim analysis for efficacy: To see if the new treatment is overwhelmingly better than control - then stop the trial for efficacy
Repeat significance testing
Pocock
O'Brien-Fleming
Alpha-spending by Lan and DeMets

The interim analysis for futility (futility analysis): To see if the new treatment is unlikely to be superior to the control – then stop the trial for futility - this is called ‘futility analysis’.
Repeat significance testing
Stochastic curtailment approach with three families of stochastic curtailment tests
Conditional power tests (frequentist approach)
Predictive power tests (mixed Bayesian-frequentist approach).
Predictive probability tests (Bayesian approach)

The most common futility analysis requires the calculation of the conditional power (CP) that is the probability that the study will demonstrate statistical significance at the end of the study (i.e. final analysis to claim superiority), conditioning on the data observed in the study thus far, and an assumption about the trend of the data to be observed in the remainder of the study.

According to the paper by Lachin "A review of methods for futility stopping based on conditional power":

“Conditional power (CP) is the probability that the final study result will be statistically significant, given the data observed thus far and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect, or the effect estimated from the current data, or under the null hypothesis.”

In conditional power calculation, assumptions about the trend of the data in the remainder of the study can be as the following and the assumption of the remainder data following the observed data is probably more reasonable. The assumption of the remainder data following the alternative hypothesis can overestimate the overall treatment effect (especially if the alternative hypothesis was based on the aggressive, over-optimistic assumptions) resulting in inflated conditional power. On the other hand, the assumption of the remainder data following the null hypothesis can underestimate the overall treatment effect resulting in deflated conditional power.

Observed data - the effect estimated from the current data so far
The alternative hypothesis - assuming the original design effect
The null hypothesis - assuming no effect in the remainder of the study

To summarize, the futility analysis is through interim analysis to determine if the trial data indicates the inability of a clinical trial to achieve its objectives. Futility analysis usually requires the calculation of the conditional power (CP) that is defined as the probability that the final study result will be statistically significant, given the data observed thus far at the time of the interim data cut and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect (alternative hypothesis) or the effect estimated from the current data. It is pretty common that the threshold for futility is defined as CP less than 20% - suggesting that the probability of the final result to be statistically significant is less than 20% given the data observed at the time of interim analysis. If the CP is less than 20% at the time of the interim analysis, the Data Monitoring Committee may recommend the sponsor stop the trial (stop the trial for futility).

in a book chapter by Tin, Ming T "Conditional Power in Clinical Trial Monitoring", the pros and cons of conditional power were discussed.

To put things in perspective, the conditional power approach attempts to assess whether evidence for efficacy or the lack of it based on the interim data is consistent with that at the planned end of the trial by projecting forward or using conditional likelihood given the eventuality. Thus it substantially alleviates the major inconsistency in all other group sequential tests where different sequential procedures applied to the same data yield different answers. ...
The advantage of the conditional power approach for trial monitoring is its flexibility. It can be used for unplanned analysis and even analysis whose timing depends on previous data. For example, it allows inferences from overrunning or underrunning (namely, more data come in after the sequential boundary is crossed, or the trial is stopped before the stopping boundary is reached. Conditional power can be used to aid the decision for early termination of a clinical trial to complement the use of other methods or when other methods are not applicable.

The caveat is that the conditional power can be calculated with different assumptions about the remaining data. Depending on the assumptions about the remaining data following the observed data, the alternative hypothesis (original design effect), or others, the conditional power can sometimes be quite different resulting in different conclusions about the futility assessment.

Some examples:

in the SAP for "Randomized, Open-Label Study of Abiraterone Acetate (JNJ-212082) plus Prednisone with or without Exemestane in Postmenopausal Women with ER+ Metastatic Breast Cancer Progressing after Letrozole or Anastrozole Therapy", conditional power was described to be calculated with both the assumption of the remaining data following the original hazard ratio (alternative hypothesis) and the assumption of the remaining data following the observed hazard ratio at the interim.

3.1.2 Conditional Power

Conditional power is the probability that the study will demonstrate statistical significance at the end of the study (i.e. final analysis to claim superiority on PFS), conditioning on the data observed in the study thus far, and an assumption about the trend of the data to be observed in the remainder of the study. Two assumptions about the trend of the data were presented below: The futility boundary corresponds to a conditional power of approximately 39% if the original hazard ratio assumption is true, while only 4% conditional power will be achieved if the observed hazard ratio at interim is true for the remainder of the study. The efficacy boundary corresponds to a conditional power of approximately 90% if the original hazard ratio assumption is true, and 92% conditional power will be achieved if the observed hazard ratio at interim is true for the remainder of the study. The conditional power of stopping boundaries was computed using method of Lan (2009).

In Gilead's trial "A Multicenter, Adaptive, Randomized Blinded Controlled Trial of the Safety and Efficacy of Investigational Therapeutics for the Treatment of COVID-19 in Hospitalized Adults", the repeat significant test procedure (the alpha spending function) was used to evaluate the potential stop for overwhelming efficacy and the stochastic curtailment approach (conditional power) was used to evaluate the potential stop for futility.