Monday, February 16, 2026

The Statistical Magic Trick: How Trials Share Results While Staying Blind

 

In the world of clinical research, "breaking the blind" is typically a cardinal sin. Yet, high-stakes Phase 3 trials like ORIGIN 3 (atacicept), PROTECT (sparsentan), and ATTRIBUTE-CM (acoramidis) have all successfully navigated the complex path of publishing interim results in the New England Journal of Medicine while keeping their long-term studies scientifically intact.

How do they pull off this statistical magic trick? It comes down to a rigorous architectural separation of data and people including statistical analysis team. Here is a look at the techniques used to maintain the "blind" during interim disclosures.

1. The "Firewall" Strategy: Independent Reporting Teams

The most critical technique used across all three studies is the creation of a "Firewall."

While a trial is ongoing, the people running the study (Sponsor clinical teams, investigators at hospitals, and the patients) must remain blinded. To perform an interim analysis, the Sponsor appoints an Independent Reporting Team (IRT) or an Independent Statistical Center.

  • How it works: This group has no contact with the study sites. They receive the raw "unblinded" data, perform the calculations for the primary endpoint (like the 36-week proteinuria reduction in PROTECT or ORIGIN 3), and prepare the manuscript for publication.
  • The Result: The people actually treating the patients remain in the dark, ensuring their medical decisions aren't influenced by knowing who is on the "winning" drug.

2. Safeguarding Against "Functional Unblinding"

In some trials, the drug’s effect is so obvious it could accidentally reveal the treatment.

  • In ORIGIN 3: Atacicept significantly lowers serum IgA and IgG levels. If a doctor saw these lab results, they would immediately know the patient was on the active drug. To prevent this, these specific lab values are suppressed. The results are sent to the Independent Reporting Team but are hidden from the investigators and the Sponsor’s site monitors.
  • In PROTECT: This study compared two active drugs (sparsentan vs. irbesartan). To ensure the difference in pill appearance didn't tip anyone off, they used a Double-Dummy design. Every patient took two sets of pills—one active and one placebo—so the physical routine remained identical for everyone.

3. Aggregate vs. Individual Disclosure

A common misconception is that "publishing the results" means everyone knows who got what. In reality, the NEJM publications for these trials only disclose aggregate data (group averages), not the individual patient level data.

  • In ATTRIBUTE-CM: When the Part A results (12-month 6-minute walk distance) were disclosed, the public learned how the group performed. However, the individual treatment assignments for each patient remained locked in the secure database.
  • The Benefit: Even if an investigator reads the NEJM article and sees that acoramidis is effective, they still do not know if the specific patient sitting in their office is receiving acoramidis or the placebo.

4. Prespecified Alpha Spending and "The Gatekeeper"

To maintain the statistical integrity of the final results (like the 104-week kidney function in ORIGIN 3 or the 30-month clinical outcomes in ATTRIBUTE-CM), the Statistical Analysis Plan (SAP) dictates exactly how much "statistical credit" is used during the interim look.

  • The Independent Data Monitoring Committee (iDMC) acts as the gatekeeper. They review the unblinded data behind closed doors and only allow the trial to proceed if the interim disclosure doesn't compromise the "power" of the final analysis.

Why go through all this trouble?

The goal is Accelerated Approval. By using these techniques, sponsors can show the FDA (and the medical community) that a drug works on a "surrogate marker" (like proteinuria) at an interim stage. This allows life-saving drugs to reach patients years earlier, while the "blinded" portion of the trial continues to gather the long-term data needed for full, traditional approval.

By combining physical dummies, suppressed lab data, and strict "firewalls" between statistical teams, researchers prove that you can indeed share the news of a trial's success without ruining the science that supports it.

Some Extra Words on ATTRIBUTE-CM Study

ATTRIBUTE-CM study is a phase 3, double-blind trial, 632 patients with transthyretin amyloid cardiomyopathy were randomly assigned in a 2:1 ratio to receive acoramidis hydrochloride at a dose of 800 mg twice daily or matching placebo for 30 months. The study contained two parts: Part A with primary endpoint of change from baseline to Month 12 of treatment in distance walked during the 6MWT, Part B with primary endpoint of a hierarchical combination of All-Cause mortality and CV-related hospitalization over a 30month period.

There were two readouts for the study and the Part A readouts were based on an interim analyses by the independent DMC. 

In December 2021, BridgeBio Pharma experienced a major setback when its Phase 3 ATTRibute-CM trial for acoramidis (a treatment for transthyretin amyloid cardiomyopathy - ATTR-CM) failed to meet its primary endpoint of improving the 6-minute walk distance (6MWD) at Month 12 (Part A primary efficacy endpoint). In the initial 12-month data, patients taking acoramidis did not show a statistically significant improvement in their 6MWD compared to those on a placebo.

Despite the failure of the 6MWD endpoint at 12 months, the study continued because the independent data monitoring committee noted encouraging trends in other areas. By July 2023, BridgeBio reported positive top-line results from the full study (Month 30), where acoramidis demonstrated a highly statistically significant improvement in a hierarchical analysis that included mortality, hospitalization, and 6MWD (Part B primary efficacy endpoint). 

Following the successful long-term data (Part B), which showed a 25% reduction in all-cause mortality and 50% reduction in cardiovascular hospitalization frequency (known as Attruby), the drug was approved by the FDA, with 3,751 prescriptions filled as of August 2025.

The study included an embedded Part A readouts that required the unblinding for interim analysis. The sponsor specified the following for maintaining the blinding for the overall study while the Part A results were analyzed and disclosed.


Sunday, January 18, 2026

Maximal Tolerated Dose (MTD) to Recommended Phase 2 Dose (RP2D) - a shift in early oncology trial designs

 As the field of oncology moves from systemic cytotoxic chemotherapies to targeted agents and immunotherapies, the paradigm for dose selection is undergoing a historic shift. For decades, the Maximum Tolerated Dose (MTD) was the "gold standard" for early-phase trials, but today’s clinical trialists and statisticians are increasingly prioritizing the Recommended Phase 2 Dose (RP2D) as a more robust and patient-centric metric.

This evolution is spearheaded by the FDA’s Project Optimus, which emphasizes "dose optimization" rather than simply finding the highest dose a patient can survive.

From "More is Better" to "The Optimal Balance"

The traditional MTD-centric approach was built on the assumption that a drug's efficacy increases linearly with its toxicity—a rule that often held true for classical chemotherapy. However, for modern targeted therapies, the Optimal Biologic Dose (OBD)—the dose that achieves maximum target saturation—often occurs well below the MTD.

Feature

Maximum Tolerated Dose (MTD)

Recommended Phase 2 Dose (RP2D)

Focus

Toxicity-driven; finding the safety ceiling.

Value-driven; finding the therapeutic "sweet spot".

Observation

Short-term (Cycle 1) Dose-Limiting Toxicities (DLTs).

Long-term tolerability, PK/PD, and cumulative safety.

Assumption

Efficacy increases with dose ("More is Better").

Efficacy may plateau while toxicity continues to rise.

Clinical Utility

A safety guardrail to prevent overdosing.

A strategic decision for registrational success.

Why RP2D is Preferred over MTD

For the modern statistician, the RP2D represents a "totality of evidence" that the MTD simply cannot provide:

  • Sustainability vs. Intensity: MTD focuses on what a patient can tolerate for 21 days. In contrast, RP2D considers the long-term tolerability necessary for chronic treatment, preventing premature discontinuations that can derail a trial's efficacy results.
  • The Sotorasib Lesson: FDA reviews, such as those for sotorasib, have highlighted the "dosing conundrum" where initial MTD-based doses led to excessive toxicity, eventually requiring post-market studies to find a more optimal, lower dose.
  • Target Saturation: Modern agents often reach a Pharmacokinetic (PK) plateau where increasing the dose adds no therapeutic benefit but significantly increases the rate of low-grade, chronic toxicities.
  • Dose-Response Nuance: As discussed in previous explorations of Determining the Dose in Clinical Trials, while the MTD is a safety limit identified through escalation, the RP2D is a comprehensive recommendation for further evaluation that aims to expose as few patients as possible to intolerable doses.

The Statistical Shift: Beyond 3+3

To find a true RP2D, statisticians are moving away from the rigid "3+3" rule-based designs to more flexible, model-informed approaches. These include:

  • Bayesian Optimal Interval (BOIN) designs that allow for a more nuanced exploration of the therapeutic window.
  • Randomized Dose-Ranging Studies: Encouraged by Project Optimus, these trials evaluate multiple doses early to compare safety and efficacy side-by-side.
  • Dose Expansion Cohorts: Used to refine the RP2D by gathering deeper data on preliminary efficacy and late-onset toxicities in specific patient subgroups.

Conclusion

The shift from MTD to RP2D is more than a regulatory requirement; it is a clinical necessity. By identifying an optimized RP2D early, sponsors can avoid the "safety pitfalls" of MTD, improve patient quality of life, and build a stronger evidence chain for final approval. In the era of precision medicine, finding the right dose for the right patient is just as important as finding the right drug.


Sunday, January 04, 2026

Excessive number of clinical trial protocol amendments due to complex trial design

In a previous blog post "Protocol amendment in clinical trials", I discussed the impact of protocol amendments on the clinical trial performance and cost and the reasons for driving the protocol amendments. Protocol amendments are unavoidable, but we can try to think about the study design and execution proactively to minimize the number of protocol amendments. Sometimes, the excessive number of protocol amendment are driven by Complex Innovative Trial Design or CID in short (for example, adaptive design, basket/umbrella/platform trial design, expansion cohort design, Bayesian design...).

We noticed an extreme case of a clinical trial with the study protocol amended 50 times. This refers to Study P001 (also known as KEYNOTE-001, NCT01295827) by Merck, which was a large, multi-cohort Phase 1 trial with numerous expansion cohorts that supported the initial accelerated approval of pembrolizumab. in Statistical Review and Evaluation, BLA 125514, FDA Center for Drug Evaluation and Research, August 2014, The FDA reviewer noted this high number of amendments while discussing the complexity of the trial design. KEYNOTE-001 was a massive "seamless" adaptive trial that evolved from a traditional Phase 1 dose-escalation study into a large study with multiple expansion cohorts (Part A, A1, A2, B, C, D, etc.) covering different tumor types (Melanoma, NSCLC) and dosing regimens. The "50 times" figure likely includes all global and country-specific amendments up to the time of the BLA submission in February 2014.

The high number of protocol amendments for KEYNOTE-001 was a direct result of its innovative, "seamless" adaptive study design. Initially launched as a standard Phase 1 dose-escalation trial, the study evolved into a massive, multi-cohort trial that eventually enrolled 1,235 patients.

The 50 amendments occurred primarily due to the following reasons:
  • Addition of Expansion Cohorts: As early data showed promising results, the protocol was repeatedly amended to add new expansion cohorts for specific tumor types, most notably melanoma and non-small cell lung cancer (NSCLC).
  • Sample Size Increases: Striking patient responses led investigators to increase sample sizes within existing cohorts to better evaluate efficacy endpoints like overall response rate (ORR).
  • Adaptive Dosing Changes: The protocol was amended to change dosing regimens based on emerging safety and efficacy data. For example, Amendment 7 changed dosing from every two weeks (Q2W) to every three weeks (Q3W), and Amendment 10 shifted all participants to a fixed dose of 200 mg.
  • Biomarker Integration: Amendments were used to add co-primary endpoints related to PD-L1 expression after researchers observed its correlation with drug efficacy. This included the validation of a companion diagnostic assay.
  • Regulatory Speed: This "seamless" approach allowed Merck to skip traditional Phase 2 and 3 steps for certain indications, leading to the first-ever FDA approval of an anti-PD-1 therapy.
While efficient, the FDA's statistical reviewers noted that such frequent changes (averaging more than one amendment per month during the most active phases) created significant operational and analytical complexity for the trial. The main challenges in analyzing the KEYNOTE-001 trial data, as noted in the FDA's statistical and medical reviews, stemmed from the extreme complexity of a "seamless" design that was modified more than 50 times. 

The primary analytical hurdles included:
  • Statistical Integrity and Type I Error Risk: The frequent addition of new cohorts and subgroups—often based on emerging data—increased the number of statistical comparisons. This raised concerns about "multiplicity," where the probability of finding a significant result by chance (Type I error) increases with every new hypothesis tested.
  • Operational and Data Management Complexity: Maintaining data quality was difficult when different sites were often operating under different versions of the protocol simultaneously. The FDA noted that this led to potential adherence issues and made it difficult to isolate single cohorts for clean, standalone submissions.
  • Shifting Dosing and Regimens: The trial transitioned from weight-based dosing (2 mg/kg or 10 mg/kg) to a fixed dose (200 mg) and changed the frequency of administration (every 2 weeks to every 3 weeks) mid-study. This required complex "pooled analyses" to prove that efficacy and safety were consistent across these varying schedules.
  • Biomarker Selection and Validation: The protocol was amended to include a PD-L1 companion diagnostic while the study was already underway. This created a challenge in defining "training" vs. "validation" sets within the same trial population to establish the diagnostic's cutoff levels without introducing bias.
  • Lack of a Control Arm: Because the trial was essentially a massive Phase 1 expansion, it lacked a randomized control arm for several indications. This forced reviewers to rely on cross-trial comparisons and historical data, which are inherently more prone to bias than randomized controlled trials (RCTs).
  • Patient Selection Bias: The "adaptive" nature allowed for rapid accrual in specific successful cohorts, which, while beneficial for speed, made it difficult to ensure the final patient population was representative of the broader real-world population.
Although the excessive number of protocol amendments, the results from the KEYNOTE-001 resulted in the FDA approval of pembrolizumab in the treatment of multiple tumor types. KEYNOTE-001 study was also the basis for the NEJM article "Seamless Oncology-Drug Development" by Prowell, Theoret, and Pazdur.

Thursday, January 01, 2026

One-way versus two-way tipping point analysis for robustness assessment of the missing data

Tipping point analysis (TPA) is a key sensitivity analysis mandated by regulatory agencies like the FDA to assess the robustness of clinical trial results to untestable assumptions about missing data. Specifically, it explores how much the assumption about the missing not at random (MNAR) mechanism would have to change to overturn the study's primary conclusion (e.g., a statistically significant treatment effect becoming non-significant). See a previous blog post "Tipping point analysis - multiple imputation for stress test under missing not at random (MNAR)"

One-Way Tipping Point Analysis for Robustness Assessment

A one-way tipping point analysis is a sensitivity method used to evaluate the robustness of a study’s primary findings by systematically altering the missing data assumption for only one treatment group at a time—most commonly the active treatment arm. While the missing outcomes in the control group are typically handled under a standard Missing at Random (MAR) or Jump to Reference assumption, the missing outcomes in the active arm are subjected to a varying "shift parameter" (δ). This parameter progressively penalizes the imputed values (e.g., making them increasingly worse) until the statistically significant treatment effect disappears, or "tips." By identifying this specific value, researchers can present a clear, one-dimensional threshold to clinical experts and regulators, who then judge whether such a drastic deviation from the observed data is clinically plausible or an unlikely extreme.

Two-Way Tipping Point Analysis for Robustness Assessment

A two-way TPA is an advanced method to assess robustness by independently varying the missing data assumptions for both treatment groups (e.g., the active treatment arm and the control/reference arm).

Missing Data Assumptions (MAR vs. MNAR)

The two-way TPA is used to assess the robustness of the primary analysis, which is typically conducted under the assumption of Missing at Random (MAR).

  • Missing at Random (MAR): Assumes that the probability of data being missing depends only on the observed data (e.g., a patient with a worse baseline condition is more likely to drop out, and we have observed the baseline data).

  • Missing Not at Random (MNAR): Assumes that the probability of data being missing depends on the unobserved missing outcome data itself (e.g., a patient drops out because their unobserved outcome has worsened more than what is predicted by their observed data).

Robustness Assessment

The two-way TPA evaluates robustness to plausible MNAR scenarios. This is done by imputing the missing outcomes (often starting with an MAR method like Multiple Imputation) and then applying a systematic, independent "shift parameter" (or δ) to the imputed values in each arm.

  • Process: The shift parameters (δActive and δControl) are varied systematically across a two-dimensional grid, typically in a direction that reduces the observed treatment effect.

  • Tipping Point: The δActive and δControl values at which the primary conclusion (e.g., statistical significance) is "tipped" or overturned define the tipping point.

  • Robustness: The larger and/or more clinically implausible the combination of shift parameters required to overturn the conclusion, the more robust the original result is considered to be under different MNAR assumptions.

Two-Way Tipping Point Result Tables

The results of a two-way TPA are typically presented as a grid or heat map table where:

  • One axis represents the shift parameter applied to the missing outcomes in the Active Treatment arm (δActive).

  • The other axis represents the shift parameter applied to the missing outcomes in the Control/Reference arm (δControl).

  • The cells of the table contain the resulting p-value or estimated treatment difference for that specific combination of assumptions.

The goal is to find the boundary of the grid where the result crosses the significance threshold (e.g., p >= 0.05 or the lower bound of the confidence interval crosses the null value).


Comparison: One-Way vs. Two-Way Tipping Point Analysis

The choice between one-way and two-way TPA is a trade-off between simplicity and comprehensiveness.

FeatureOne-Way Tipping Point AnalysisTwo-Way Tipping Point Analysis
Missingness AssumptionThe shift parameter (δ) is only applied to one arm, usually the active treatment group, while the missing data in the control arm are imputed based on the MAR assumption (e.g., Jump to Reference).Independent shift parameters (δActive and δControl) are applied to both arms simultaneously.
Sensitivity ExploredExplores MNAR scenarios where dropouts in one arm have systematically worse/better outcomes than assumed by MAR, relative to the other arm's MAR assumption.Explores a two-dimensional space of MNAR scenarios, allowing dropouts in both arms to vary independently.
ComplexitySimpler to calculate and interpret (one dimension).More computationally intensive and complex to interpret (two-dimensional grid).
PlausibilityOften viewed as less comprehensive, as it does not model the possibility of simultaneous, independent MNAR mechanisms in both arms.Considered more comprehensive as it allows for a wider range of clinically plausible and implausible MNAR scenarios.
Result PresentationA line plot or simple table with a single 'tipping point' value.A grid/matrix table or heat map showing the boundary of non-significance.

In essence, the two-way TPA is generally preferred by regulatory agencies for its superior ability to assess robustness because it explores a more realistic and exhaustive range of asymmetric MNAR mechanisms.