Monday, December 27, 2021

Futility Analysis and Conditional Power

Adaptive design has been used to drug development programs more efficient. According to FDA's guidance Adaptive Designs for Clinical Trials of Drugs and BiologicsGuidance for Industry, an adaptive design is defined as a clinical trial design that allows for prospectively planned modifications to one or more aspects of the design based on accumulating data from subjects in the trial. The modifications to the design based on the accumulating data from an ongoing study are through 'interim analysis'. An interim analysis is any examination of data obtained from subjects in a trial while that trial is ongoing and is not restricted to cases in which there are formal between-group comparisons. The observed data used in the interim analysis can include one or more types, such as baseline data, safety outcome data, pharmacokinetic, pharmacodynamic, biomarker data, or efficacy outcome data.

when an adaptive design is proposed, which aspect(s) of the trial to be adapted will need to be pre-specified and agreed upon by the regulatory agencies such as FDA. In the list of adaptations, the most common type of adaptive design is 'group sequential design'. 

    • Group sequential design
    • Adaptations to the sample size
    • Adaptations to the patient population (e.e., adaptive enrichment)
    • Adaptations to treatment arm selection
    • Adaptations to patient allocation 
    • Adaptations to endpoint selection
    • Adaptations to multiple design features

Group sequential design is probably the most commonly used adaptive design (even before the adaptive design concept came out). Group sequential design was once categorized as 'well-understood' adaptive design. Ironically, many studies with group sequential design may not be called 'adaptive design' and the term 'group sequential design' may not be used in the study protocol at all. 

According to FDA's Adaptive Designs for Clinical Trials of Drugs and Biologics Guidance for Industry

"Group sequential designs may include rules for stopping the trial when there is sufficient evidence of efficacy to support regulatory decision-making or when there is evidence that the trial is unlikely to demonstrate efficacy, which is often called stopping for futility."

"There are a number of additional considerations for ensuring the appropriate design, conduct, and analysis of a group sequential trial. First, for group sequential methods to be valid, it is important to adhere to the prospective analytic plan and terminate the trial for efficacy only if the stopping criteria are met. Second, guidelines for stopping the trial early for futility should be implemented appropriately. Trial designs often employ nonbinding futility rules, in that the futility stopping criteria are guidelines that may or may not be followed, depending on the totality of the available interim results. The addition of such nonbinding futility guidelines to a fixed sample trial, or to a trial with appropriate group sequential stopping rules for efficacy, does not increase the Type I error probability and is often appropriate. Alternatively, a group sequential design may include binding futility rules, in that the trial should always stop if the futility criteria are met. Binding futility rules can provide some advantages in efficacy analyses (e.g., a relaxed threshold for a determination of efficacy), but the Type I error probability is controlled only if the stopping rules are followed. Therefore, if a trial continues despite meeting prespecified binding futility rules, the Agency will likely consider that trial to have failed to provide evidence of efficacy, regardless of the outcome at the final analysis. Note also that some DMCs might prefer the flexibility of nonbinding futility guidelines."

With group sequential design, interim analyses will be performed during the study to evaluate early evidence of efficacy or early evidence of futility. To stop the study for efficacy, the most common approach is so-called 'repeat significance testing'. to stop the study for futility, the most common approach is through calculating the conditional power.

Group sequential design:
  • The interim analysis for efficacy: To see if the new treatment is overwhelmingly better than control - then stop the trial for efficacy
    • Repeat significance testing
      • Pocock
      • O'Brien-Fleming
      • Alpha-spending by Lan and DeMets 
  • The interim analysis for futility (futility analysis): To see if the new treatment is unlikely to be superior to the control – then stop the trial for futility - this is called ‘futility analysis’.
    • Repeat significance testing
    • Stochastic curtailment approach with three families of stochastic curtailment tests
      • Conditional power tests (frequentist approach)
      • Predictive power tests (mixed Bayesian-frequentist approach).
      • Predictive probability tests (Bayesian approach)

The most common futility analysis requires the calculation of the conditional power (CP) that is the probability that the study will demonstrate statistical significance at the end of the study (i.e. final analysis to claim superiority), conditioning on the data observed in the study thus far, and an assumption about the trend of the data to be observed in the remainder of the study. 

According to the paper by Lachin "A review of methods for futility stopping based on conditional power":
“Conditional power (CP) is the probability that the final study result will be statistically significant, given the data observed thus far and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect, or the effect estimated from the current data, or under the null hypothesis.”

In conditional power calculation, assumptions about the trend of the data in the remainder of the study can be as the following and the assumption of the remainder data following the observed data is probably more reasonable. The assumption of the remainder data following the alternative hypothesis can overestimate the overall treatment effect (especially if the alternative hypothesis was based on the aggressive, over-optimistic assumptions) resulting in inflated conditional power. On the other hand, the assumption of the remainder data following the null hypothesis can underestimate the overall treatment effect resulting in deflated conditional power.  

  • Observed data - the effect estimated from the current data so far
  • The alternative hypothesis - assuming the original design effect
  • The null hypothesis - assuming no effect in the remainder of the study 

To summarize, the futility analysis is through interim analysis to determine if the trial data indicates the inability of a clinical trial to achieve its objectives. Futility analysis usually requires the calculation of the conditional power (CP) that is defined as the probability that the final study result will be statistically significant, given the data observed thus far at the time of the interim data cut and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect (alternative hypothesis) or the effect estimated from the current data. It is pretty common that the threshold for futility is defined as CP less than 20% - suggesting that the probability of the final result to be statistically significant is less than 20% given the data observed at the time of interim analysis. If the CP is less than 20% at the time of the interim analysis, the Data Monitoring Committee may recommend the sponsor stop the trial (stop the trial for futility).  

in a book chapter by Tin, Ming T "Conditional Power in Clinical Trial Monitoring", the pros and cons of conditional power were discussed. 

To put things in perspective, the conditional power approach attempts to assess whether evidence for efficacy or the lack of it based on the interim data is consistent with that at the planned end of the trial by projecting forward or using conditional likelihood given the eventuality. Thus it substantially alleviates the major inconsistency in all other group sequential tests where different sequential procedures applied to the same data yield different answers. ...

The advantage of the conditional power approach for trial monitoring is its flexibility. It can be used for unplanned analysis and even analysis whose timing depends on previous data. For example, it allows inferences from overrunning or underrunning (namely, more data come in after the sequential boundary is crossed, or the trial is stopped before the stopping boundary is reached. Conditional power can be used to aid the decision for early termination of a clinical trial to complement the use of other methods or when other methods are not applicable. 

The caveat is that the conditional power can be calculated with different assumptions about the remaining data. Depending on the assumptions about the remaining data following the observed data, the alternative hypothesis (original design effect), or others, the conditional power can sometimes be quite different resulting in different conclusions about the futility assessment. 

Some examples: 

in the SAP for "Randomized, Open-Label Study of Abiraterone Acetate (JNJ-212082) plus Prednisone with or without Exemestane in Postmenopausal Women with ER+ Metastatic Breast Cancer Progressing after Letrozole or Anastrozole Therapy", conditional power was described to be calculated with both the assumption of the remaining data following the original hazard ratio (alternative hypothesis) and the assumption of the remaining data following the observed hazard ratio at the interim.
3.1.2 Conditional Power

Conditional power is the probability that the study will demonstrate statistical significance at the end of the study (i.e. final analysis to claim superiority on PFS), conditioning on the data observed in the study thus far, and an assumption about the trend of the data to be observed in the remainder of the study. Two assumptions about the trend of the data were presented below: The futility boundary corresponds to a conditional power of approximately 39% if the original hazard ratio assumption is true, while only 4% conditional power will be achieved if the observed hazard ratio at interim is true for the remainder of the study. The efficacy boundary corresponds to a conditional power of approximately 90% if the original hazard ratio assumption is true, and 92% conditional power will be achieved if the observed hazard ratio at interim is true for the remainder of the study. The conditional power of stopping boundaries was computed using method of Lan (2009).
In Gilead's trial "A Multicenter, Adaptive, Randomized Blinded Controlled Trial of the Safety and Efficacy of Investigational Therapeutics for the Treatment of COVID-19 in Hospitalized Adults", the repeat significant test procedure (the alpha spending function) was used to evaluate the potential stop for overwhelming efficacy and the stochastic curtailment approach (conditional power) was used to evaluate the potential stop for futility. 


In a trial by Incyte "GRAVITAS-301: A Randomized, Double-Blind, Placebo-Controlled Phase 3 Study of Itacitinib or Placebo in Combination With Corticosteroids for the Treatment of First-Line Acute Graft-Versus-Host Disease", interim data monitoring for the potential stop for efficacy or futility is assessed and conditional power of 20% is used as the threshold for declaring the futility: 


Further reading: 

Saturday, December 11, 2021

Interventional Study (Clinical Trial), Non-interventional Study (Observational Study), and Registry Study

 FDA recently issued two separate guidance documents for industry: 

While these two guidance documents are focused on real-world data (RWD) and real-world evidence (RWE), they also provided the definitions for distinctions for interventional study, non-interventional study, and registry study. 

Some of the terms are confusing and non-distinguishable, for example, we use clinical study and clinical trial interchangeably and we use registry and non-interventional study interchangeably, Based on FDA guidance documents, these different terms are for describing different types of studies. 

The term clinical study means research that evaluates human health outcomes associated with taking a drug of interest. Clinical studies include interventional (clinical trial) designs and non-interventional (observational) designs. 

Interventional Study (also referred to as a Clinical Trial)
the term interventional study (also referred to as a clinical trial) is a study in which participants, either healthy volunteers or volunteers with the disease being studied, are assigned to one or more interventions, according to a study protocol, to evaluate the effects of those interventions on subsequent health-related biomedical or behavioral outcomes. One example of an interventional study is a traditional randomized controlled trial, in which some participants are randomly assigned to receive a drug of interest (test article), whereas others receive an active comparator drug or placebo. Clinical trials with pragmatic elements (e.g., broad eligibility criteria, recruitment of participants in usual care settings) and single-arm trials are other types of interventional study designs.

Non-interventional study (also referred to as an observational study)

a non-interventional study (also referred to as an observational study) is a type of study in which patients received the marketed drug of interest during routine medical practice and are not assigned to an intervention according to a protocol. Examples of  non-interventional study designs include (1) observational cohort studies, in which patients are identified as belonging to a study group according to the drug or drugs received or not received during routine medical practice, and subsequent biomedical or health outcomes are identified and (2) case-control studies, in which patients are identified as belonging to a study group based on having or not having a health-related biomedical or behavioral outcome, and antecedent treatments received are identified.

Registry study

a registry is defined as an organized system that collects clinical and other data in a standardized format for a population defined by a particular disease, condition, or exposure. Establishing registries involves enrolling a predefined population and collecting prespecified health-related data for each patient in that population (patient-level data). Data about this population can be entered directly into the registry (e.g., clinician-reported outcomes) and can also include additional data linked from other sources that characterize registry participants. Such external data sources can include data from medical claims, from pharmacy and/or laboratory databases, and from EHRs, blood banks, and/or medical device outputs. Trained staff should follow standard operating procedures to aggregate data for a registry and carry out data curation.

Registries range in complexity regarding the extent and detail of the data captured and how the data are curated. For example, registries used for quality assurance purposes related to the delivery of care for a particular health care institution or health care system tend to collect limited data related to the provision of care. Registries designed to address specific research questions tend to systematically collect longitudinal data in a defined population, on factors characterizing patients’ clinical status, treatments received, and subsequent clinical events. The data collected in a given registry and the procedures for data collection are relevant when considering how registry data can be used. 

Registries have the potential to support medical product development, and registry data can ultimately be used, when appropriate, to inform the design and support the conduct of either interventional studies (clinical trials) or non-interventional (observational) studies. Examples of such uses include, but are not limited to: 

  • Characterizing the natural history of a disease
  • Providing information that can help determine sample size, selection criteria, and study endpoints when planning an interventional study 
  • Selecting suitable study participants—based on factors such as demographic characteristics, disease duration or severity, and past history or response to prior  therapy—to include in an interventional study (e.g., randomized trial) that will assign a drug to assess that drug’s safety or effectiveness 
  • Identifying biomarkers or clinical characteristics that are associated with important  clinical outcomes of relevance to the planning of interventional and non-interventional studies
  • Supporting, in appropriate clinical circumstances, inferences about safety and  effectiveness in the context of: 
    • A non-interventional study evaluating a drug received during routine medical practice  and captured by the registry 
    • - An externally controlled trial including registry data as an external control arm

An existing registry can be used to collect data for purposes other than those originally intended, and reusing a registry’s infrastructure to support multiple interventional and non-interventional studies can generate efficiencies. Before designing and initiating an interventional or non-interventional study using registry data for regulatory decisions, sponsors should consult with the appropriate FDA review division regarding the appropriateness of using a specific registry as a real-world data source. 

Registries can generally be categorized as

(1) disease registries that use the state of a particular disease or condition as the inclusion criterion,

(2) health services registries where the patient is exposed to a specific health care service, or

(3) product registries where the patient is exposed to a specific health care product. 

The guidance documents also provided definitions for other types of studies: 

Natural history study

a natural history study is a non-interventional (observational) study intended to track the course of the disease for purposes such as identifying demographic, genetic, environmental, and other (e.g., treatment) variables that correlate with disease development and outcomes. Natural history studies are likely to include patients receiving the current standard of care and/or emergent care, which may alter some manifestations of the disease. Disease registries are common platforms to acquire the data for natural history studies.

Externally controlled trial

An externally controlled trial, as one type of clinical trial, compares outcomes in a group of participants receiving the test treatment with outcomes in a group external to the trial, rather than to an internal control group from the same trial population assigned to a different treatment. The external control arm can be a group, treated or untreated, from an earlier time (historical control) or a group, treated or untreated, during the same time period (concurrent control) but in another setting.

Further Reading: 


Wednesday, December 01, 2021

Clinical Trial Design: Double-Blind Fixed Duration Trial with Long-term Double-Blind Various Treatment Duration

The randomized, double-blind, parallel-group design is the most common type of design for clinical trials (especially the confirmatory clinical trials). This type of design can be further classified into clinical trials with a fixed treatment duration or with various treatment durations. For clinical trials with a fixed duration, all patients are treated with study drugs for a fixed duration (for example, 16 weeks, 24 weeks, 52 weeks,...) and the primary efficacy endpoint will be estimated at the end of the fixed duration (for example, change from baseline in xxx measure at week 16, week 24, or week 52,...). For clinical trials with various durations such as event-driven study design, patients are treated with study drugs for various durations - the early enrolled patients may receive the study drugs for a much longer time than those later enrolled patients, and patients will stay in the study and receive the study drugs if no protocol-defined event occurs until the required number of events for the study has occurred and the entire study is closed. The event can be Clinical Worsening Event, MACE, Exacerbation, Hospitalization, Progression-free survival, Death,...

In the informed consent form, patients who participate in the clinical trial will be informed how long they may be treated with the experimental drug or placebo. The ethical issue arises if patients are randomly assigned to the placebo group and treated with placebo for a prolonged period of time.     

Lately, we saw several clinical trials with a hybrid approach containing a double-blind fixed duration and then followed by a double-blind various duration. The primary efficacy endpoint was measured at the end of the fixed duration (i.e., week 52 for INBUILD and ISABELLA trials and week 24 for STELLAR trial). The double-blind various duration was added to the trial to collect the information for secondary and exploratory endpoints that need a longer exposure time. The double-blind various duration depends on the enrollment speed and the timing of patients entering into the study. Early-enrolled patients will stay in the study much longer than the later-enrolled patients. The slower the enrollment speed is, the longer the double-blind various duration takes. 

INBUILD study: Nintedanib in Progressive Fibrosing Interstitial Lung Diseases 

For each patient, the trial consisted of two parts: Part A, which was conducted during the first 52 weeks, and Part B, which was a variable treatment period beyond week 52 during which patients continued to receive either nintedanib or placebo until all the patients had completed Part A. 


The primary assessment of benefit-risk of nintedanib in patients with PF-ILD will be based on efficacy and safety data over 52 weeks.

The primary analysis of this study will therefore be performed once the last randomized patient reaches the Week 52 Visit (Visit 9 at the end of Part A). At that time, a database lock will occur and all the data will be unblinded. Efficacy and safety analyses will be performed on the data from Part A of the trial to assess the benefit-risk of nintedanib over 52 weeks. In addition, data collected in Part B of the trial (after 52 weeks) and available at the time of data cut-off for the primary analysis will be reported together with data from Part A (i.e. over the whole trial).

Once the benefit-risk assessment of nintedanib over 52 weeks is confirmed to be positive, all patients receiving trial medication in Part B will be offered open-label treatment with nintedanib in a separate study.

Trial 1199.247 i.e. Part B will continue until all patients have been switched to open-label nintedanib or completed the Follow-up Visit. A final database lock will then occur and Part B data collected between the data cut off for the primary analysis and the final database lock will be reported together with data from Part A i.e. over the whole trial.

ISABELLA Studies: GLPG1690, a novel autotaxin inhibitor, in idiopathic pulmonary fibrosis

See the paper: Rationale, design and objectives of two phase III, randomised, placebo controlled studies of GLPG1690, a novel autotaxin inhibitor, in idiopathic pulmonary fibrosis (ISABELA 1 and 2)

In each study, approximately 750 subjects will be randomized 1:1:1 to receive oral GLPG1690 600 mg, GLPG1690 200 mg or matching placebo, once daily, in addition to local SOC. SOC is defined as either pirfenidone or nintedanib, or neither pirfenidone nor nintedanib (for any reason). Treatment will continue for at least 52 weeks (subjects will continue to receive randomized treatment until the last patient reaches 52 weeks in the study). A follow-up visit will be conducted 4 weeks after the end-of-study visit (figure 1 below).


STELLAR Study: Sotatercept in Pulmonary Arterial Hypertension

According to Acceleron's ATS 2021 INTERNATIONAL CONFERENCE ACCELERON INVESTOR AND ANALYST CALL, the STELLAR study was designed as the following:


The double-blind fixed duration is 24 weeks and the primary efficacy endpoint (6MWD) is measured at week 24. The double-blind various duration had a cap at 72 weeks, i.e., the maximum duration for the period is 72 weeks). Patients can be in the long-term double-blind treatment period for 0 (the last enrolled patient) to 72 weeks (early enrolled patients).