Sunday, January 18, 2026

Maximal Tolerated Dose (MTD) to Recommended Phase 2 Dose (RP2D) - a shift in early oncology trial designs

 As the field of oncology moves from systemic cytotoxic chemotherapies to targeted agents and immunotherapies, the paradigm for dose selection is undergoing a historic shift. For decades, the Maximum Tolerated Dose (MTD) was the "gold standard" for early-phase trials, but today’s clinical trialists and statisticians are increasingly prioritizing the Recommended Phase 2 Dose (RP2D) as a more robust and patient-centric metric.

This evolution is spearheaded by the FDA’s Project Optimus, which emphasizes "dose optimization" rather than simply finding the highest dose a patient can survive.

From "More is Better" to "The Optimal Balance"

The traditional MTD-centric approach was built on the assumption that a drug's efficacy increases linearly with its toxicity—a rule that often held true for classical chemotherapy. However, for modern targeted therapies, the Optimal Biologic Dose (OBD)—the dose that achieves maximum target saturation—often occurs well below the MTD.

Feature

Maximum Tolerated Dose (MTD)

Recommended Phase 2 Dose (RP2D)

Focus

Toxicity-driven; finding the safety ceiling.

Value-driven; finding the therapeutic "sweet spot".

Observation

Short-term (Cycle 1) Dose-Limiting Toxicities (DLTs).

Long-term tolerability, PK/PD, and cumulative safety.

Assumption

Efficacy increases with dose ("More is Better").

Efficacy may plateau while toxicity continues to rise.

Clinical Utility

A safety guardrail to prevent overdosing.

A strategic decision for registrational success.

Why RP2D is Preferred over MTD

For the modern statistician, the RP2D represents a "totality of evidence" that the MTD simply cannot provide:

  • Sustainability vs. Intensity: MTD focuses on what a patient can tolerate for 21 days. In contrast, RP2D considers the long-term tolerability necessary for chronic treatment, preventing premature discontinuations that can derail a trial's efficacy results.
  • The Sotorasib Lesson: FDA reviews, such as those for sotorasib, have highlighted the "dosing conundrum" where initial MTD-based doses led to excessive toxicity, eventually requiring post-market studies to find a more optimal, lower dose.
  • Target Saturation: Modern agents often reach a Pharmacokinetic (PK) plateau where increasing the dose adds no therapeutic benefit but significantly increases the rate of low-grade, chronic toxicities.
  • Dose-Response Nuance: As discussed in previous explorations of Determining the Dose in Clinical Trials, while the MTD is a safety limit identified through escalation, the RP2D is a comprehensive recommendation for further evaluation that aims to expose as few patients as possible to intolerable doses.

The Statistical Shift: Beyond 3+3

To find a true RP2D, statisticians are moving away from the rigid "3+3" rule-based designs to more flexible, model-informed approaches. These include:

  • Bayesian Optimal Interval (BOIN) designs that allow for a more nuanced exploration of the therapeutic window.
  • Randomized Dose-Ranging Studies: Encouraged by Project Optimus, these trials evaluate multiple doses early to compare safety and efficacy side-by-side.
  • Dose Expansion Cohorts: Used to refine the RP2D by gathering deeper data on preliminary efficacy and late-onset toxicities in specific patient subgroups.

Conclusion

The shift from MTD to RP2D is more than a regulatory requirement; it is a clinical necessity. By identifying an optimized RP2D early, sponsors can avoid the "safety pitfalls" of MTD, improve patient quality of life, and build a stronger evidence chain for final approval. In the era of precision medicine, finding the right dose for the right patient is just as important as finding the right drug.


Sunday, January 04, 2026

Excessive number of clinical trial protocol amendments due to complex trial design

In a previous blog post "Protocol amendment in clinical trials", I discussed the impact of protocol amendments on the clinical trial performance and cost and the reasons for driving the protocol amendments. Protocol amendments are unavoidable, but we can try to think about the study design and execution proactively to minimize the number of protocol amendments. Sometimes, the excessive number of protocol amendment are driven by Complex Innovative Trial Design or CID in short (for example, adaptive design, basket/umbrella/platform trial design, expansion cohort design, Bayesian design...).

We noticed an extreme case of a clinical trial with the study protocol amended 50 times. This refers to Study P001 (also known as KEYNOTE-001, NCT01295827) by Merck, which was a large, multi-cohort Phase 1 trial with numerous expansion cohorts that supported the initial accelerated approval of pembrolizumab. in Statistical Review and Evaluation, BLA 125514, FDA Center for Drug Evaluation and Research, August 2014, The FDA reviewer noted this high number of amendments while discussing the complexity of the trial design. KEYNOTE-001 was a massive "seamless" adaptive trial that evolved from a traditional Phase 1 dose-escalation study into a large study with multiple expansion cohorts (Part A, A1, A2, B, C, D, etc.) covering different tumor types (Melanoma, NSCLC) and dosing regimens. The "50 times" figure likely includes all global and country-specific amendments up to the time of the BLA submission in February 2014.

The high number of protocol amendments for KEYNOTE-001 was a direct result of its innovative, "seamless" adaptive study design. Initially launched as a standard Phase 1 dose-escalation trial, the study evolved into a massive, multi-cohort trial that eventually enrolled 1,235 patients.

The 50 amendments occurred primarily due to the following reasons:
  • Addition of Expansion Cohorts: As early data showed promising results, the protocol was repeatedly amended to add new expansion cohorts for specific tumor types, most notably melanoma and non-small cell lung cancer (NSCLC).
  • Sample Size Increases: Striking patient responses led investigators to increase sample sizes within existing cohorts to better evaluate efficacy endpoints like overall response rate (ORR).
  • Adaptive Dosing Changes: The protocol was amended to change dosing regimens based on emerging safety and efficacy data. For example, Amendment 7 changed dosing from every two weeks (Q2W) to every three weeks (Q3W), and Amendment 10 shifted all participants to a fixed dose of 200 mg.
  • Biomarker Integration: Amendments were used to add co-primary endpoints related to PD-L1 expression after researchers observed its correlation with drug efficacy. This included the validation of a companion diagnostic assay.
  • Regulatory Speed: This "seamless" approach allowed Merck to skip traditional Phase 2 and 3 steps for certain indications, leading to the first-ever FDA approval of an anti-PD-1 therapy.
While efficient, the FDA's statistical reviewers noted that such frequent changes (averaging more than one amendment per month during the most active phases) created significant operational and analytical complexity for the trial. The main challenges in analyzing the KEYNOTE-001 trial data, as noted in the FDA's statistical and medical reviews, stemmed from the extreme complexity of a "seamless" design that was modified more than 50 times. 

The primary analytical hurdles included:
  • Statistical Integrity and Type I Error Risk: The frequent addition of new cohorts and subgroups—often based on emerging data—increased the number of statistical comparisons. This raised concerns about "multiplicity," where the probability of finding a significant result by chance (Type I error) increases with every new hypothesis tested.
  • Operational and Data Management Complexity: Maintaining data quality was difficult when different sites were often operating under different versions of the protocol simultaneously. The FDA noted that this led to potential adherence issues and made it difficult to isolate single cohorts for clean, standalone submissions.
  • Shifting Dosing and Regimens: The trial transitioned from weight-based dosing (2 mg/kg or 10 mg/kg) to a fixed dose (200 mg) and changed the frequency of administration (every 2 weeks to every 3 weeks) mid-study. This required complex "pooled analyses" to prove that efficacy and safety were consistent across these varying schedules.
  • Biomarker Selection and Validation: The protocol was amended to include a PD-L1 companion diagnostic while the study was already underway. This created a challenge in defining "training" vs. "validation" sets within the same trial population to establish the diagnostic's cutoff levels without introducing bias.
  • Lack of a Control Arm: Because the trial was essentially a massive Phase 1 expansion, it lacked a randomized control arm for several indications. This forced reviewers to rely on cross-trial comparisons and historical data, which are inherently more prone to bias than randomized controlled trials (RCTs).
  • Patient Selection Bias: The "adaptive" nature allowed for rapid accrual in specific successful cohorts, which, while beneficial for speed, made it difficult to ensure the final patient population was representative of the broader real-world population.
Although the excessive number of protocol amendments, the results from the KEYNOTE-001 resulted in the FDA approval of pembrolizumab in the treatment of multiple tumor types. KEYNOTE-001 study was also the basis for the NEJM article "Seamless Oncology-Drug Development" by Prowell, Theoret, and Pazdur.

Thursday, January 01, 2026

One-way versus two-way tipping point analysis for robustness assessment of the missing data

Tipping point analysis (TPA) is a key sensitivity analysis mandated by regulatory agencies like the FDA to assess the robustness of clinical trial results to untestable assumptions about missing data. Specifically, it explores how much the assumption about the missing not at random (MNAR) mechanism would have to change to overturn the study's primary conclusion (e.g., a statistically significant treatment effect becoming non-significant). See a previous blog post "Tipping point analysis - multiple imputation for stress test under missing not at random (MNAR)"

One-Way Tipping Point Analysis for Robustness Assessment

A one-way tipping point analysis is a sensitivity method used to evaluate the robustness of a study’s primary findings by systematically altering the missing data assumption for only one treatment group at a time—most commonly the active treatment arm. While the missing outcomes in the control group are typically handled under a standard Missing at Random (MAR) or Jump to Reference assumption, the missing outcomes in the active arm are subjected to a varying "shift parameter" (δ). This parameter progressively penalizes the imputed values (e.g., making them increasingly worse) until the statistically significant treatment effect disappears, or "tips." By identifying this specific value, researchers can present a clear, one-dimensional threshold to clinical experts and regulators, who then judge whether such a drastic deviation from the observed data is clinically plausible or an unlikely extreme.

Two-Way Tipping Point Analysis for Robustness Assessment

A two-way TPA is an advanced method to assess robustness by independently varying the missing data assumptions for both treatment groups (e.g., the active treatment arm and the control/reference arm).

Missing Data Assumptions (MAR vs. MNAR)

The two-way TPA is used to assess the robustness of the primary analysis, which is typically conducted under the assumption of Missing at Random (MAR).

  • Missing at Random (MAR): Assumes that the probability of data being missing depends only on the observed data (e.g., a patient with a worse baseline condition is more likely to drop out, and we have observed the baseline data).

  • Missing Not at Random (MNAR): Assumes that the probability of data being missing depends on the unobserved missing outcome data itself (e.g., a patient drops out because their unobserved outcome has worsened more than what is predicted by their observed data).

Robustness Assessment

The two-way TPA evaluates robustness to plausible MNAR scenarios. This is done by imputing the missing outcomes (often starting with an MAR method like Multiple Imputation) and then applying a systematic, independent "shift parameter" (or δ) to the imputed values in each arm.

  • Process: The shift parameters (δActive and δControl) are varied systematically across a two-dimensional grid, typically in a direction that reduces the observed treatment effect.

  • Tipping Point: The δActive and δControl values at which the primary conclusion (e.g., statistical significance) is "tipped" or overturned define the tipping point.

  • Robustness: The larger and/or more clinically implausible the combination of shift parameters required to overturn the conclusion, the more robust the original result is considered to be under different MNAR assumptions.

Two-Way Tipping Point Result Tables

The results of a two-way TPA are typically presented as a grid or heat map table where:

  • One axis represents the shift parameter applied to the missing outcomes in the Active Treatment arm (δActive).

  • The other axis represents the shift parameter applied to the missing outcomes in the Control/Reference arm (δControl).

  • The cells of the table contain the resulting p-value or estimated treatment difference for that specific combination of assumptions.

The goal is to find the boundary of the grid where the result crosses the significance threshold (e.g., p >= 0.05 or the lower bound of the confidence interval crosses the null value).


Comparison: One-Way vs. Two-Way Tipping Point Analysis

The choice between one-way and two-way TPA is a trade-off between simplicity and comprehensiveness.

FeatureOne-Way Tipping Point AnalysisTwo-Way Tipping Point Analysis
Missingness AssumptionThe shift parameter (δ) is only applied to one arm, usually the active treatment group, while the missing data in the control arm are imputed based on the MAR assumption (e.g., Jump to Reference).Independent shift parameters (δActive and δControl) are applied to both arms simultaneously.
Sensitivity ExploredExplores MNAR scenarios where dropouts in one arm have systematically worse/better outcomes than assumed by MAR, relative to the other arm's MAR assumption.Explores a two-dimensional space of MNAR scenarios, allowing dropouts in both arms to vary independently.
ComplexitySimpler to calculate and interpret (one dimension).More computationally intensive and complex to interpret (two-dimensional grid).
PlausibilityOften viewed as less comprehensive, as it does not model the possibility of simultaneous, independent MNAR mechanisms in both arms.Considered more comprehensive as it allows for a wider range of clinically plausible and implausible MNAR scenarios.
Result PresentationA line plot or simple table with a single 'tipping point' value.A grid/matrix table or heat map showing the boundary of non-significance.

In essence, the two-way TPA is generally preferred by regulatory agencies for its superior ability to assess robustness because it explores a more realistic and exhaustive range of asymmetric MNAR mechanisms.

Monday, December 29, 2025

FDA guidance "Sponsor Responsibilities - Safety Reporting Requirements and Safety Assessment for IND and Bioavailability/Bioequivalence Studies"

Earlier this month, FDA issued its guidance "Sponsor Responsibilities - Safety Reporting Requirements and Safety Assessment for IND and Bioavailability/Bioequivalence Studies". As an clinical trialist, the updated FDA guidance (or the 2025 guidance) represents a major step forward, primarily by refining the focus on safety assessment and introducing key operational elements.

The 2025 guidance is not a complete rewrite of the 2012 version ("Safety Reporting Requirements for INDs and BA/BE Studies"), but rather a merger of the 2012 guidance content with the principles from the 2015 draft guidance on safety assessment.

Here is a comparison highlighting the key new elements the sponsor must now consider:

Key New Elements in the 2025 Guidance

The most significant change is a shift from focusing solely on individual case safety reports (ICSRs) to a greater emphasis on proactive, systematic safety assessment and the analysis of aggregate data.

New ConceptDescription and Implication for TrialistsRelevant Section in New Guidance 
Focus on Sponsor Responsibilities OnlyThe new guidance is strictly limited to Sponsor Responsibilities for safety reporting. All recommendations for Investigator Responsibilities found in the 2012 guidance have been moved to a separate document, reflecting a clear split in regulatory oversight.Section I, II (Preamble)
Aggregate Data AssessmentThis is the central update. The guidance expands significantly on the requirement to perform regular, proactive aggregate analyses of all accumulating safety data. The goal is to identify new or increased risks that would trigger expedited reporting, rather than relying only on individual case reports.Section III (Definitions) and Section IV (Aggregate Analyses)
Mandatory Safety Surveillance Plan (SSP)The guidance introduces the term Safety Surveillance Plan (SSP) as a systematic and organized approach to safety monitoring. The plan should include: 1) Clearly defined roles and responsibilities; 2) A plan for the regular review and evaluation of Serious Adverse Events (SAEs); and 3) The process for performing aggregate safety reviews.Section IV.C (Safety Surveillance Plan)
Sole Sponsor Causality DeterminationThe guidance emphasizes that the final responsibility for determining whether an event meets the criteria for expedited reporting (i.e., a "Suspected Adverse Reaction," or SUSAR) lies solely with the sponsor. While the sponsor should consider the investigator's opinion, the sponsor is imputed with the ultimate responsibility for the causality judgment for regulatory submission purposes.Section III.B (Suspected Adverse Reaction)
Flexibility in Safety ReviewThe new guidance offers greater flexibility by allowing sponsors to choose which individual, group, or entity (e.g., Safety Monitoring Committee, Data Monitoring Committee) is responsible for reviewing, analyzing, and making decisions regarding IND safety reporting.Section IV.C.1 (Features and Composition of the Entity)

This shift aims to reduce the "noise" of over-reporting uninformative individual adverse events, which was a concern under the old paradigm. Instead, the focus is placed on the sponsor's expert medical review and comprehensive analysis of the overall safety data package.

Here is a side-by-side comparison table summarizing the main discussion points and key changes between the 2012 and 2025 FDA guidance documents on safety reporting.


Safety Reporting Guidance: 2012 vs. 2025 Comparison

Discussion Point2012 Final Guidance: Safety Reporting Requirements for INDs and BA/BE Studies2025 Final Guidance: Sponsor Responsibilities — Safety Reporting Requirements and Safety Assessment for IND and BA/BE Studies
Primary Scope and FocusFocused on procedural requirements for expedited reporting of individual Serious Adverse Events (SAEs).Mandatory emphasis on safety assessment and aggregate data analysis to identify new, significant risks. Merges content with principles from the 2015 draft guidance on safety assessment.
Division of ResponsibilitiesContained recommendations for both Sponsor and Investigator safety reporting responsibilities.Exclusively focuses on Sponsor responsibilities. Investigator reporting recommendations are placed in a separate, concurrently issued guidance document.
Safety Surveillance/PlanningImplicit in the sponsor's duties, but lacked a formalized planning requirement.Introduces the new term "Safety Surveillance Plan (SSP)" to describe a required systematic and organized approach.
Plan Components (SSP)Did not specify formal plan components.Requires the plan to include clearly defined roles and responsibilities, a process for regular review of SAEs, and a process for aggregate safety reviews.
Requirement for ReviewFocused primarily on individual case review to determine if the reporting criteria (Serious, Unexpected, Suspected Adverse Reaction - SUSAR) were met.Explicitly requires sponsors to review and evaluate all accumulating safety data at regular intervals (aggregate review) to update the overall safety profile.
Decision-Making BodyLacked specific recommendations for the structure of the internal safety review process.Offers greater flexibility by allowing the sponsor to choose the individual, group, or entity (e.g., Safety Assessment Committee) responsible for safety reporting and decision-making.
Source of Safety DataFocused mainly on reports from the clinical trial itself.Emphasizes that sponsors must review information from any source (e.g., animal studies, scientific literature, foreign reports, and commercial experience) to identify new significant risks to trial participants.
Expedited Reporting RationaleThe concern was the overreporting of uninformative individual Adverse Events (AEs), which hindered the IRB's ability to focus on true risks.Seeks to reduce overreporting by clarifying that the decision for a 7- or 15-day expedited report must be based on the sponsor's professional judgment of causality (i.e., a reasonable possibility).

Summary of the Shift

The 2025 guidance strongly emphasizes a shift in the regulatory burden from volume-based individual reporting (the 2012 paradigm) to quality-based, comprehensive safety analysis by the sponsor. The overall goal is to enhance patient protection by focusing the FDA, IRBs, and investigators on truly meaningful safety signals derived from cumulative data, rather than individual case reports.

Monday, December 01, 2025

Handling "Median Not Reached": A Guide to Analyzing and Presenting Low Event Rate Survival Data

In the era of highly effective therapies for may diseases, clinical researchers are increasingly encountering a "good" problem in the time to event analyses: the Kaplan-Meier survival curves are flattening out well above the 50% mark. While this represents a triumph for patient outcomes, it creates a headache for statistical reporting. When the event rate is low (below 50%), the Median Time to Event (e.g., Median Overall Survival) and its 95% Confidence Interval (CI) cannot be estimated (often reported as "NE" (not estimable), "NR" (not reached), or "NC" (not calculable)).

So, how do we robustly describe the efficacy of a treatment when the standard metric fails? This post outlines the best-practice alternatives for summarizing, analyzing, and visualizing survival data in low event settings.


1. The Limitation of the Median

The median survival time is simply the time point at which the survival probability drops to 0.50. If the Kaplan-Meier curve plateaus at 70% or 80% because fewer than half the patients experienced the event, the median is mathematically undefined. Reporting it merely as "Not Reached" (NR) is accurate but clinically uninformative—it tells us what the survival is not, but not what it is.

To provide a complete picture, we must pivot to alternative metrics that describe different parts of the survival distribution.

2. Primary Summary Measures

A. Landmark Survival Probabilities

When we cannot answer "When will half the patients die?", we should ask, "What proportion of patients are event-free at time ?"

Landmark analysis reports the Kaplan-Meier survival probability (with 95% CIs) at clinically relevant, fixed timepoints (e.g., 24 weeks, 12 months, 24 months, 5 years).

  • Best Practice: Pre-specify these timepoints in the Statistical Analysis Plan (SAP) to avoid data dredging.

  • Example Reporting: "Event free rate was 93% at week 24 in the treatment group", "The 3-year recurrence-free survival rate was 88.4% (95% CI: 85.1–91.0) in the treatment arm compared to 82.1% (95% CI: 78.4–85.2) in the placebo arm."

B. Lower-Percentile Survival Times (10th and 25th)

Just because the 50th percentile (median) is missing doesn't mean all percentiles are.

  • 25th Percentile: The time at which 25% of patients have experienced the event (or survival drops to 75%).

  • 10th Percentile: The time at which 10% of patients have experienced the event (or survival drops to 90%).

These metrics characterize the "early failures" or the worst-performing subset of the cohort. They are particularly useful for showing that a treatment delays early progression even if the long-term survival is high.

MetricTreatment GroupControl Group
Median (50th)NR (95% CI: NR, NR)NR (95% CI: 36.7, NR)
25th Percentile18.4 months (14.2, 22.1)12.1 months (9.8, 14.5)
10th Percentile5.4 months (4.1, 6.8)3.2 months (2.8, 3.9)

Note: In the table above, while the median is NR for both, the 25th percentile clearly demonstrates a 6-month delay in progression for the treatment group.


3. Robust Analytical Alternatives

A. The "Reverse Kaplan-Meier" Method for Follow-Up

In low event trials, it is critical to prove that the "NR" result is due to drug efficacy, not just because patients left the study early. The Reverse Kaplan-Meier method is the gold standard for calculating median follow-up.

  • How it works: You reverse the censoring indicator (Event = Censored; Censored = Event) and run a standard Kaplan-Meier analysis. The resulting median is the median potential follow-up time.

  • Why use it: Unlike the "median time on study," it is not biased by early deaths or events, providing a true measure of how long the trial centers monitored the patients.

B. Restricted Mean Survival Time (RMST)

RMST is rapidly becoming the preferred alternative to the Hazard Ratio (HR) in low event trials, especially when the Proportional Hazards assumption is violated (e.g., crossing curves).

  • Definition: RMST is the "area under the survival curve" up to a specific time point ($\tau$). It represents the average survival time a patient lives during that window.

  • Reporting: You can report the Difference in RMST (Treatment minus Control) or the Ratio.

  • Interpretation: "Over the 5-year follow-up period, patients on the new therapy lived, on average, 4.2 months longer than those on the control (RMST difference = 4.2 months, p=0.003)."


4. Visualization Best Practices

A. The Kaplan-Meier Plot: Handling the Y-Axis

In trials with very high survival (e.g., >90%), the survival curves may be squeezed into the top 10% of the graph, making it hard to see separation.

  • Line Break (Axis Break): It is acceptable to "break" the y-axis to focus on the relevant range (e.g., from 80% to 100%), provided this is clearly marked.

  • Inverted Plot (Failure Plot): Alternatively, plot the Cumulative Incidence of Events (1 - Survival) on a y-axis ranging from 0% to 20%. This often visualizes the difference in event rates more clearly than a survival curve stuck at the top of the chart.

B. The "Number at Risk" Table

Always include a table below the x-axis aligned with the tick marks. In low event trials, this table reveals whether the "flat tail" of the curve is based on hundreds of patients or just a few who haven't been followed long enough.


5. Optional Exploratory Methods

If pre-specified in the protocol, Parametric Modeling can be used to estimate the median survival even if it hasn't been reached observed data.

  • Weibull Distribution: By fitting a Weibull model to the observed data, you can extrapolate the curve to predict when the median would be reached, assuming the risk profile remains constant.

  • Caution: This is a prediction, not an observation. It should be labeled clearly as "Estimated Median (Parametric)" and treated as exploratory evidence.

Summary Checklist for Reporting Low Event Data

  1. State clearly that the median is NE/NR.

  2. Report Landmark Rates (e.g., 3-year survival) with CIs.

  3. Report Lower Percentiles (25th, 10th) to show early separation.

  4. Use RMST to quantify the average time gained.

  5. Calculate Follow-up using the Reverse Kaplan-Meier method.

  6. Adjust Plots (zoom/break y-axis) to make differences visible, but keep the full context clear.

                                                          Note: AI-assisted writing for this blog article. 

Sunday, November 09, 2025

When external controls collide with the FDA: two recent cases and what they teach us

When trials don’t use a traditional control — What the FDA guidance says

In the usual drug-development pathway, regulators expect a trial where patients are randomly assigned either to receive the investigational drug or a comparator (placebo or active treatment). This randomized-controlled-trial (RCT) design is the “gold standard” for showing that a drug works (i.e., that the benefit is due to the drug not to other factors). For NDA/BLA approvals, FDA issued two guidance documents:

However, there are settings—rare diseases, rapidly progressive conditions, or severe unmet need—where recruiting a traditional randomized control arm is difficult or ethically questionable. In those settings, sponsors may propose to compare patients treated under an investigational protocol against a group of patients drawn from an outside dataset (e.g., prior trials, patient registries, real-world data) who did not receive the investigational drug. Such a comparison group is often called an external control (sometimes “synthetic control” or “historical control”).

In this design:

  • The treatment arm is prospective, treated under a defined protocol.

  • The control arm comes from a separate dataset (regardless of when or how collected) and did not receive the investigational product under the same protocol.
    Because the groups were not randomized together, there are extra risks of bias and confounding. The key question is: can the FDA rely on such evidence to reach its statutory standard of “substantial evidence of effectiveness”?
    In February 2023 the FDA issued a draft guidance for industry titled Considerations for the Design and Conduct of Externally Controlled Trials for Drug and Biological Products

Here are some of the most important take-aways from that guidance:
  • The guidance explicitly recognizes that externally-controlled trials may, in some circumstances, serve as an adequate and well-controlled investigation to support approval of a drug or biologic. 

  • But the guidance is cautious: it states that in many cases “the likelihood of credibly demonstrating the effectiveness of a drug of interest with an external control is low.” 

  • The guidance puts heavy emphasis on using patient-level data (not just summary published numbers) for the external control. 

  • It emphasizes key threats: unmeasured confounding, bias, differences in assessment or follow-up, intercurrent events, immortal time bias, and temporal changes in standard of care or diagnostics. 

  • It recommends early and frequent communication between sponsor and FDA if an external control design is under consideration. 

  • It doesn’t endorse a single statistical adjustment method (propensity scores, Bayesian modelling, etc.). Instead, it says the analytic plan must be prespecified, transparent, and justified in the context of the specific trial. 

  • The guidance makes clear this is not the default approach—it remains a specialized tool rather than the standard route.

In short: the FDA is signaling openness to external controls in selected settings—but the bar remains high.


How this ties to the two recent cases

Here’s how the guidance connects to the two real-world situations.

Case 1: uniQure and AMT-130 in Huntington’s disease

  • uniQure reported a treated cohort using AMT-130 (a one-time gene therapy) and compared outcomes to a natural-history/external cohort.

  • The agency essentially pushed back: although the signals were strong, the design raised concerns (especially given the lack of randomization, external control issues) and the company reported that pre-BLA discussions did not confirm a clean path.

  • Under the FDA external-control guidance, the concerns would include: are the treated and external groups sufficiently similar (baseline disease features, prior therapies, timing)? Was the index date (“time zero”) aligned? Are the outcome assessments the same, and is the follow-up and endpoint definition comparable? Are missing data or unmeasured confounders plausibly influencing the result? The guidance states that if any of these are weak, the design may not credibly distinguish drug effect from other influences. 

  • The uniQure case illustrates: even with promising effect size, the agency may conclude that uncertainties tied to external control reduce confidence in the evidence.

Case 2: Biohaven Pharmaceuticals and Vyglxia (troriluzole) in Spinocerebellar ataxia

  • Biohaven’s NDA included a large externally-controlled dataset (real-world evidence for the control arm) reporting disease-slowing.

  • The FDA issued a Complete Response Letter (CRL) citing concerns with the external-control-based evidence (e.g., bias, data‐quality, missing/unmeasured information). According to Biohaven, the agency rejected its drug, called troriluzole, due to issues that can be "inherent to real-world evidence and external control studies, including potential bias, design flaws, law of pre-specification and unmeasured confounding factors."

  • According to the guidance, when the anticipated treatment effect size is modest, an externally controlled trial is less likely to be acceptable unless the design is very strong. 

  • Also the guidance emphasizes that outcome ascertainment, timing, and data source differences between arms are key threats. The Biohaven case appears to reflect those issues.

  • This case reinforces that external controls are not a way to bypass rigorous design—they still demand high‐quality data, pre-specification, and detailed justification.


What this means for patients, clinicians and developers

  • For patients & clinicians: When you see a newly approved drug whose pivotal evidence is based on an external control rather than a randomized trial, ask: how comparable were the groups? How solid was the external data (patient-level, good follow-up, similar measurement)? Has a confirmatory trial been required or planned?

  • For developers: If you plan to rely on an external control, start very early: identify and curate the external dataset, lock the eligibility and analytic plan, engage FDA in frequent meetings, anticipate the agency will probe data quality, comparability, missing data and bias. Treat the external control design as a rigorous undertaking—not a shortcut.

  • For the field of rare diseases and high-unmet-need areas (for example your interest area of pulmonary arterial hypertension): External controls offer a promising option when randomization is infeasible—but you must make the case strongly. The FDA’s guidance gives you the checklist: patient­-level data, alignment of arms, clear index date, consistent endpoint measurement, robust analytics. If you can’t satisfy those, a randomized or concurrent control may still be needed.


Final thoughts

The FDA’s external‐control guidance is a signal that regulators recognize the realities of these challenging settings—but it does not mean external controls are easy, or will automatically yield approval. The recent uniQure and Biohaven cases are a clear reminder: you may have impressive effect size or compelling unmet need—but the agency will still closely scrutinize design, data source, comparability, missing data, and bias.