Sunday, January 04, 2026

Excessive number of clinical trial protocol amendments due to complex trial design

In a previous blog post "Protocol amendment in clinical trials", I discussed the impact of protocol amendments on the clinical trial performance and cost and the reasons for driving the protocol amendments. Protocol amendments are unavoidable, but we can try to think about the study design and execution proactively to minimize the number of protocol amendments. Sometimes, the excessive number of protocol amendment are driven by Complex Innovative Trial Design or CID in short (for example, adaptive design, basket/umbrella/platform trial design, expansion cohort design, Bayesian design...).

We noticed an extreme case of a clinical trial with the study protocol amended 50 times. This refers to Study P001 (also known as KEYNOTE-001, NCT01295827) by Merck, which was a large, multi-cohort Phase 1 trial with numerous expansion cohorts that supported the initial accelerated approval of pembrolizumab. in Statistical Review and Evaluation, BLA 125514, FDA Center for Drug Evaluation and Research, August 2014, The FDA reviewer noted this high number of amendments while discussing the complexity of the trial design. KEYNOTE-001 was a massive "seamless" adaptive trial that evolved from a traditional Phase 1 dose-escalation study into a large study with multiple expansion cohorts (Part A, A1, A2, B, C, D, etc.) covering different tumor types (Melanoma, NSCLC) and dosing regimens. The "50 times" figure likely includes all global and country-specific amendments up to the time of the BLA submission in February 2014.

The high number of protocol amendments for KEYNOTE-001 was a direct result of its innovative, "seamless" adaptive study design. Initially launched as a standard Phase 1 dose-escalation trial, the study evolved into a massive, multi-cohort trial that eventually enrolled 1,235 patients.

The 50 amendments occurred primarily due to the following reasons:
  • Addition of Expansion Cohorts: As early data showed promising results, the protocol was repeatedly amended to add new expansion cohorts for specific tumor types, most notably melanoma and non-small cell lung cancer (NSCLC).
  • Sample Size Increases: Striking patient responses led investigators to increase sample sizes within existing cohorts to better evaluate efficacy endpoints like overall response rate (ORR).
  • Adaptive Dosing Changes: The protocol was amended to change dosing regimens based on emerging safety and efficacy data. For example, Amendment 7 changed dosing from every two weeks (Q2W) to every three weeks (Q3W), and Amendment 10 shifted all participants to a fixed dose of 200 mg.
  • Biomarker Integration: Amendments were used to add co-primary endpoints related to PD-L1 expression after researchers observed its correlation with drug efficacy. This included the validation of a companion diagnostic assay.
  • Regulatory Speed: This "seamless" approach allowed Merck to skip traditional Phase 2 and 3 steps for certain indications, leading to the first-ever FDA approval of an anti-PD-1 therapy.
While efficient, the FDA's statistical reviewers noted that such frequent changes (averaging more than one amendment per month during the most active phases) created significant operational and analytical complexity for the trial. The main challenges in analyzing the KEYNOTE-001 trial data, as noted in the FDA's statistical and medical reviews, stemmed from the extreme complexity of a "seamless" design that was modified more than 50 times. 

The primary analytical hurdles included:
  • Statistical Integrity and Type I Error Risk: The frequent addition of new cohorts and subgroups—often based on emerging data—increased the number of statistical comparisons. This raised concerns about "multiplicity," where the probability of finding a significant result by chance (Type I error) increases with every new hypothesis tested.
  • Operational and Data Management Complexity: Maintaining data quality was difficult when different sites were often operating under different versions of the protocol simultaneously. The FDA noted that this led to potential adherence issues and made it difficult to isolate single cohorts for clean, standalone submissions.
  • Shifting Dosing and Regimens: The trial transitioned from weight-based dosing (2 mg/kg or 10 mg/kg) to a fixed dose (200 mg) and changed the frequency of administration (every 2 weeks to every 3 weeks) mid-study. This required complex "pooled analyses" to prove that efficacy and safety were consistent across these varying schedules.
  • Biomarker Selection and Validation: The protocol was amended to include a PD-L1 companion diagnostic while the study was already underway. This created a challenge in defining "training" vs. "validation" sets within the same trial population to establish the diagnostic's cutoff levels without introducing bias.
  • Lack of a Control Arm: Because the trial was essentially a massive Phase 1 expansion, it lacked a randomized control arm for several indications. This forced reviewers to rely on cross-trial comparisons and historical data, which are inherently more prone to bias than randomized controlled trials (RCTs).
  • Patient Selection Bias: The "adaptive" nature allowed for rapid accrual in specific successful cohorts, which, while beneficial for speed, made it difficult to ensure the final patient population was representative of the broader real-world population.
Although the excessive number of protocol amendments, the results from the KEYNOTE-001 resulted in the FDA approval of pembrolizumab in the treatment of multiple tumor types. KEYNOTE-001 study was also the basis for the NEJM article "Seamless Oncology-Drug Development" by Prowell, Theoret, and Pazdur.

Thursday, January 01, 2026

One-way versus two-way tipping point analysis for robustness assessment of the missing data

Tipping point analysis (TPA) is a key sensitivity analysis mandated by regulatory agencies like the FDA to assess the robustness of clinical trial results to untestable assumptions about missing data. Specifically, it explores how much the assumption about the missing not at random (MNAR) mechanism would have to change to overturn the study's primary conclusion (e.g., a statistically significant treatment effect becoming non-significant). See a previous blog post "Tipping point analysis - multiple imputation for stress test under missing not at random (MNAR)"

One-Way Tipping Point Analysis for Robustness Assessment

A one-way tipping point analysis is a sensitivity method used to evaluate the robustness of a study’s primary findings by systematically altering the missing data assumption for only one treatment group at a time—most commonly the active treatment arm. While the missing outcomes in the control group are typically handled under a standard Missing at Random (MAR) or Jump to Reference assumption, the missing outcomes in the active arm are subjected to a varying "shift parameter" (δ). This parameter progressively penalizes the imputed values (e.g., making them increasingly worse) until the statistically significant treatment effect disappears, or "tips." By identifying this specific value, researchers can present a clear, one-dimensional threshold to clinical experts and regulators, who then judge whether such a drastic deviation from the observed data is clinically plausible or an unlikely extreme.

Two-Way Tipping Point Analysis for Robustness Assessment

A two-way TPA is an advanced method to assess robustness by independently varying the missing data assumptions for both treatment groups (e.g., the active treatment arm and the control/reference arm).

Missing Data Assumptions (MAR vs. MNAR)

The two-way TPA is used to assess the robustness of the primary analysis, which is typically conducted under the assumption of Missing at Random (MAR).

  • Missing at Random (MAR): Assumes that the probability of data being missing depends only on the observed data (e.g., a patient with a worse baseline condition is more likely to drop out, and we have observed the baseline data).

  • Missing Not at Random (MNAR): Assumes that the probability of data being missing depends on the unobserved missing outcome data itself (e.g., a patient drops out because their unobserved outcome has worsened more than what is predicted by their observed data).

Robustness Assessment

The two-way TPA evaluates robustness to plausible MNAR scenarios. This is done by imputing the missing outcomes (often starting with an MAR method like Multiple Imputation) and then applying a systematic, independent "shift parameter" (or δ) to the imputed values in each arm.

  • Process: The shift parameters (δActive and δControl) are varied systematically across a two-dimensional grid, typically in a direction that reduces the observed treatment effect.

  • Tipping Point: The δActive and δControl values at which the primary conclusion (e.g., statistical significance) is "tipped" or overturned define the tipping point.

  • Robustness: The larger and/or more clinically implausible the combination of shift parameters required to overturn the conclusion, the more robust the original result is considered to be under different MNAR assumptions.

Two-Way Tipping Point Result Tables

The results of a two-way TPA are typically presented as a grid or heat map table where:

  • One axis represents the shift parameter applied to the missing outcomes in the Active Treatment arm (δActive).

  • The other axis represents the shift parameter applied to the missing outcomes in the Control/Reference arm (δControl).

  • The cells of the table contain the resulting p-value or estimated treatment difference for that specific combination of assumptions.

The goal is to find the boundary of the grid where the result crosses the significance threshold (e.g., p >= 0.05 or the lower bound of the confidence interval crosses the null value).


Comparison: One-Way vs. Two-Way Tipping Point Analysis

The choice between one-way and two-way TPA is a trade-off between simplicity and comprehensiveness.

FeatureOne-Way Tipping Point AnalysisTwo-Way Tipping Point Analysis
Missingness AssumptionThe shift parameter (δ) is only applied to one arm, usually the active treatment group, while the missing data in the control arm are imputed based on the MAR assumption (e.g., Jump to Reference).Independent shift parameters (δActive and δControl) are applied to both arms simultaneously.
Sensitivity ExploredExplores MNAR scenarios where dropouts in one arm have systematically worse/better outcomes than assumed by MAR, relative to the other arm's MAR assumption.Explores a two-dimensional space of MNAR scenarios, allowing dropouts in both arms to vary independently.
ComplexitySimpler to calculate and interpret (one dimension).More computationally intensive and complex to interpret (two-dimensional grid).
PlausibilityOften viewed as less comprehensive, as it does not model the possibility of simultaneous, independent MNAR mechanisms in both arms.Considered more comprehensive as it allows for a wider range of clinically plausible and implausible MNAR scenarios.
Result PresentationA line plot or simple table with a single 'tipping point' value.A grid/matrix table or heat map showing the boundary of non-significance.

In essence, the two-way TPA is generally preferred by regulatory agencies for its superior ability to assess robustness because it explores a more realistic and exhaustive range of asymmetric MNAR mechanisms.

Monday, December 29, 2025

FDA guidance "Sponsor Responsibilities - Safety Reporting Requirements and Safety Assessment for IND and Bioavailability/Bioequivalence Studies"

Earlier this month, FDA issued its guidance "Sponsor Responsibilities - Safety Reporting Requirements and Safety Assessment for IND and Bioavailability/Bioequivalence Studies". As an clinical trialist, the updated FDA guidance (or the 2025 guidance) represents a major step forward, primarily by refining the focus on safety assessment and introducing key operational elements.

The 2025 guidance is not a complete rewrite of the 2012 version ("Safety Reporting Requirements for INDs and BA/BE Studies"), but rather a merger of the 2012 guidance content with the principles from the 2015 draft guidance on safety assessment.

Here is a comparison highlighting the key new elements the sponsor must now consider:

Key New Elements in the 2025 Guidance

The most significant change is a shift from focusing solely on individual case safety reports (ICSRs) to a greater emphasis on proactive, systematic safety assessment and the analysis of aggregate data.

New ConceptDescription and Implication for TrialistsRelevant Section in New Guidance 
Focus on Sponsor Responsibilities OnlyThe new guidance is strictly limited to Sponsor Responsibilities for safety reporting. All recommendations for Investigator Responsibilities found in the 2012 guidance have been moved to a separate document, reflecting a clear split in regulatory oversight.Section I, II (Preamble)
Aggregate Data AssessmentThis is the central update. The guidance expands significantly on the requirement to perform regular, proactive aggregate analyses of all accumulating safety data. The goal is to identify new or increased risks that would trigger expedited reporting, rather than relying only on individual case reports.Section III (Definitions) and Section IV (Aggregate Analyses)
Mandatory Safety Surveillance Plan (SSP)The guidance introduces the term Safety Surveillance Plan (SSP) as a systematic and organized approach to safety monitoring. The plan should include: 1) Clearly defined roles and responsibilities; 2) A plan for the regular review and evaluation of Serious Adverse Events (SAEs); and 3) The process for performing aggregate safety reviews.Section IV.C (Safety Surveillance Plan)
Sole Sponsor Causality DeterminationThe guidance emphasizes that the final responsibility for determining whether an event meets the criteria for expedited reporting (i.e., a "Suspected Adverse Reaction," or SUSAR) lies solely with the sponsor. While the sponsor should consider the investigator's opinion, the sponsor is imputed with the ultimate responsibility for the causality judgment for regulatory submission purposes.Section III.B (Suspected Adverse Reaction)
Flexibility in Safety ReviewThe new guidance offers greater flexibility by allowing sponsors to choose which individual, group, or entity (e.g., Safety Monitoring Committee, Data Monitoring Committee) is responsible for reviewing, analyzing, and making decisions regarding IND safety reporting.Section IV.C.1 (Features and Composition of the Entity)

This shift aims to reduce the "noise" of over-reporting uninformative individual adverse events, which was a concern under the old paradigm. Instead, the focus is placed on the sponsor's expert medical review and comprehensive analysis of the overall safety data package.

Here is a side-by-side comparison table summarizing the main discussion points and key changes between the 2012 and 2025 FDA guidance documents on safety reporting.


Safety Reporting Guidance: 2012 vs. 2025 Comparison

Discussion Point2012 Final Guidance: Safety Reporting Requirements for INDs and BA/BE Studies2025 Final Guidance: Sponsor Responsibilities — Safety Reporting Requirements and Safety Assessment for IND and BA/BE Studies
Primary Scope and FocusFocused on procedural requirements for expedited reporting of individual Serious Adverse Events (SAEs).Mandatory emphasis on safety assessment and aggregate data analysis to identify new, significant risks. Merges content with principles from the 2015 draft guidance on safety assessment.
Division of ResponsibilitiesContained recommendations for both Sponsor and Investigator safety reporting responsibilities.Exclusively focuses on Sponsor responsibilities. Investigator reporting recommendations are placed in a separate, concurrently issued guidance document.
Safety Surveillance/PlanningImplicit in the sponsor's duties, but lacked a formalized planning requirement.Introduces the new term "Safety Surveillance Plan (SSP)" to describe a required systematic and organized approach.
Plan Components (SSP)Did not specify formal plan components.Requires the plan to include clearly defined roles and responsibilities, a process for regular review of SAEs, and a process for aggregate safety reviews.
Requirement for ReviewFocused primarily on individual case review to determine if the reporting criteria (Serious, Unexpected, Suspected Adverse Reaction - SUSAR) were met.Explicitly requires sponsors to review and evaluate all accumulating safety data at regular intervals (aggregate review) to update the overall safety profile.
Decision-Making BodyLacked specific recommendations for the structure of the internal safety review process.Offers greater flexibility by allowing the sponsor to choose the individual, group, or entity (e.g., Safety Assessment Committee) responsible for safety reporting and decision-making.
Source of Safety DataFocused mainly on reports from the clinical trial itself.Emphasizes that sponsors must review information from any source (e.g., animal studies, scientific literature, foreign reports, and commercial experience) to identify new significant risks to trial participants.
Expedited Reporting RationaleThe concern was the overreporting of uninformative individual Adverse Events (AEs), which hindered the IRB's ability to focus on true risks.Seeks to reduce overreporting by clarifying that the decision for a 7- or 15-day expedited report must be based on the sponsor's professional judgment of causality (i.e., a reasonable possibility).

Summary of the Shift

The 2025 guidance strongly emphasizes a shift in the regulatory burden from volume-based individual reporting (the 2012 paradigm) to quality-based, comprehensive safety analysis by the sponsor. The overall goal is to enhance patient protection by focusing the FDA, IRBs, and investigators on truly meaningful safety signals derived from cumulative data, rather than individual case reports.

Monday, December 01, 2025

Handling "Median Not Reached": A Guide to Analyzing and Presenting Low Event Rate Survival Data

In the era of highly effective therapies for may diseases, clinical researchers are increasingly encountering a "good" problem in the time to event analyses: the Kaplan-Meier survival curves are flattening out well above the 50% mark. While this represents a triumph for patient outcomes, it creates a headache for statistical reporting. When the event rate is low (below 50%), the Median Time to Event (e.g., Median Overall Survival) and its 95% Confidence Interval (CI) cannot be estimated (often reported as "NE" (not estimable), "NR" (not reached), or "NC" (not calculable)).

So, how do we robustly describe the efficacy of a treatment when the standard metric fails? This post outlines the best-practice alternatives for summarizing, analyzing, and visualizing survival data in low event settings.


1. The Limitation of the Median

The median survival time is simply the time point at which the survival probability drops to 0.50. If the Kaplan-Meier curve plateaus at 70% or 80% because fewer than half the patients experienced the event, the median is mathematically undefined. Reporting it merely as "Not Reached" (NR) is accurate but clinically uninformative—it tells us what the survival is not, but not what it is.

To provide a complete picture, we must pivot to alternative metrics that describe different parts of the survival distribution.

2. Primary Summary Measures

A. Landmark Survival Probabilities

When we cannot answer "When will half the patients die?", we should ask, "What proportion of patients are event-free at time ?"

Landmark analysis reports the Kaplan-Meier survival probability (with 95% CIs) at clinically relevant, fixed timepoints (e.g., 24 weeks, 12 months, 24 months, 5 years).

  • Best Practice: Pre-specify these timepoints in the Statistical Analysis Plan (SAP) to avoid data dredging.

  • Example Reporting: "Event free rate was 93% at week 24 in the treatment group", "The 3-year recurrence-free survival rate was 88.4% (95% CI: 85.1–91.0) in the treatment arm compared to 82.1% (95% CI: 78.4–85.2) in the placebo arm."

B. Lower-Percentile Survival Times (10th and 25th)

Just because the 50th percentile (median) is missing doesn't mean all percentiles are.

  • 25th Percentile: The time at which 25% of patients have experienced the event (or survival drops to 75%).

  • 10th Percentile: The time at which 10% of patients have experienced the event (or survival drops to 90%).

These metrics characterize the "early failures" or the worst-performing subset of the cohort. They are particularly useful for showing that a treatment delays early progression even if the long-term survival is high.

MetricTreatment GroupControl Group
Median (50th)NR (95% CI: NR, NR)NR (95% CI: 36.7, NR)
25th Percentile18.4 months (14.2, 22.1)12.1 months (9.8, 14.5)
10th Percentile5.4 months (4.1, 6.8)3.2 months (2.8, 3.9)

Note: In the table above, while the median is NR for both, the 25th percentile clearly demonstrates a 6-month delay in progression for the treatment group.


3. Robust Analytical Alternatives

A. The "Reverse Kaplan-Meier" Method for Follow-Up

In low event trials, it is critical to prove that the "NR" result is due to drug efficacy, not just because patients left the study early. The Reverse Kaplan-Meier method is the gold standard for calculating median follow-up.

  • How it works: You reverse the censoring indicator (Event = Censored; Censored = Event) and run a standard Kaplan-Meier analysis. The resulting median is the median potential follow-up time.

  • Why use it: Unlike the "median time on study," it is not biased by early deaths or events, providing a true measure of how long the trial centers monitored the patients.

B. Restricted Mean Survival Time (RMST)

RMST is rapidly becoming the preferred alternative to the Hazard Ratio (HR) in low event trials, especially when the Proportional Hazards assumption is violated (e.g., crossing curves).

  • Definition: RMST is the "area under the survival curve" up to a specific time point ($\tau$). It represents the average survival time a patient lives during that window.

  • Reporting: You can report the Difference in RMST (Treatment minus Control) or the Ratio.

  • Interpretation: "Over the 5-year follow-up period, patients on the new therapy lived, on average, 4.2 months longer than those on the control (RMST difference = 4.2 months, p=0.003)."


4. Visualization Best Practices

A. The Kaplan-Meier Plot: Handling the Y-Axis

In trials with very high survival (e.g., >90%), the survival curves may be squeezed into the top 10% of the graph, making it hard to see separation.

  • Line Break (Axis Break): It is acceptable to "break" the y-axis to focus on the relevant range (e.g., from 80% to 100%), provided this is clearly marked.

  • Inverted Plot (Failure Plot): Alternatively, plot the Cumulative Incidence of Events (1 - Survival) on a y-axis ranging from 0% to 20%. This often visualizes the difference in event rates more clearly than a survival curve stuck at the top of the chart.

B. The "Number at Risk" Table

Always include a table below the x-axis aligned with the tick marks. In low event trials, this table reveals whether the "flat tail" of the curve is based on hundreds of patients or just a few who haven't been followed long enough.


5. Optional Exploratory Methods

If pre-specified in the protocol, Parametric Modeling can be used to estimate the median survival even if it hasn't been reached observed data.

  • Weibull Distribution: By fitting a Weibull model to the observed data, you can extrapolate the curve to predict when the median would be reached, assuming the risk profile remains constant.

  • Caution: This is a prediction, not an observation. It should be labeled clearly as "Estimated Median (Parametric)" and treated as exploratory evidence.

Summary Checklist for Reporting Low Event Data

  1. State clearly that the median is NE/NR.

  2. Report Landmark Rates (e.g., 3-year survival) with CIs.

  3. Report Lower Percentiles (25th, 10th) to show early separation.

  4. Use RMST to quantify the average time gained.

  5. Calculate Follow-up using the Reverse Kaplan-Meier method.

  6. Adjust Plots (zoom/break y-axis) to make differences visible, but keep the full context clear.

                                                          Note: AI-assisted writing for this blog article. 

Sunday, November 09, 2025

When external controls collide with the FDA: two recent cases and what they teach us

When trials don’t use a traditional control — What the FDA guidance says

In the usual drug-development pathway, regulators expect a trial where patients are randomly assigned either to receive the investigational drug or a comparator (placebo or active treatment). This randomized-controlled-trial (RCT) design is the “gold standard” for showing that a drug works (i.e., that the benefit is due to the drug not to other factors). For NDA/BLA approvals, FDA issued two guidance documents:

However, there are settings—rare diseases, rapidly progressive conditions, or severe unmet need—where recruiting a traditional randomized control arm is difficult or ethically questionable. In those settings, sponsors may propose to compare patients treated under an investigational protocol against a group of patients drawn from an outside dataset (e.g., prior trials, patient registries, real-world data) who did not receive the investigational drug. Such a comparison group is often called an external control (sometimes “synthetic control” or “historical control”).

In this design:

  • The treatment arm is prospective, treated under a defined protocol.

  • The control arm comes from a separate dataset (regardless of when or how collected) and did not receive the investigational product under the same protocol.
    Because the groups were not randomized together, there are extra risks of bias and confounding. The key question is: can the FDA rely on such evidence to reach its statutory standard of “substantial evidence of effectiveness”?
    In February 2023 the FDA issued a draft guidance for industry titled Considerations for the Design and Conduct of Externally Controlled Trials for Drug and Biological Products

Here are some of the most important take-aways from that guidance:
  • The guidance explicitly recognizes that externally-controlled trials may, in some circumstances, serve as an adequate and well-controlled investigation to support approval of a drug or biologic. 

  • But the guidance is cautious: it states that in many cases “the likelihood of credibly demonstrating the effectiveness of a drug of interest with an external control is low.” 

  • The guidance puts heavy emphasis on using patient-level data (not just summary published numbers) for the external control. 

  • It emphasizes key threats: unmeasured confounding, bias, differences in assessment or follow-up, intercurrent events, immortal time bias, and temporal changes in standard of care or diagnostics. 

  • It recommends early and frequent communication between sponsor and FDA if an external control design is under consideration. 

  • It doesn’t endorse a single statistical adjustment method (propensity scores, Bayesian modelling, etc.). Instead, it says the analytic plan must be prespecified, transparent, and justified in the context of the specific trial. 

  • The guidance makes clear this is not the default approach—it remains a specialized tool rather than the standard route.

In short: the FDA is signaling openness to external controls in selected settings—but the bar remains high.


How this ties to the two recent cases

Here’s how the guidance connects to the two real-world situations.

Case 1: uniQure and AMT-130 in Huntington’s disease

  • uniQure reported a treated cohort using AMT-130 (a one-time gene therapy) and compared outcomes to a natural-history/external cohort.

  • The agency essentially pushed back: although the signals were strong, the design raised concerns (especially given the lack of randomization, external control issues) and the company reported that pre-BLA discussions did not confirm a clean path.

  • Under the FDA external-control guidance, the concerns would include: are the treated and external groups sufficiently similar (baseline disease features, prior therapies, timing)? Was the index date (“time zero”) aligned? Are the outcome assessments the same, and is the follow-up and endpoint definition comparable? Are missing data or unmeasured confounders plausibly influencing the result? The guidance states that if any of these are weak, the design may not credibly distinguish drug effect from other influences. 

  • The uniQure case illustrates: even with promising effect size, the agency may conclude that uncertainties tied to external control reduce confidence in the evidence.

Case 2: Biohaven Pharmaceuticals and Vyglxia (troriluzole) in Spinocerebellar ataxia

  • Biohaven’s NDA included a large externally-controlled dataset (real-world evidence for the control arm) reporting disease-slowing.

  • The FDA issued a Complete Response Letter (CRL) citing concerns with the external-control-based evidence (e.g., bias, data‐quality, missing/unmeasured information). According to Biohaven, the agency rejected its drug, called troriluzole, due to issues that can be "inherent to real-world evidence and external control studies, including potential bias, design flaws, law of pre-specification and unmeasured confounding factors."

  • According to the guidance, when the anticipated treatment effect size is modest, an externally controlled trial is less likely to be acceptable unless the design is very strong. 

  • Also the guidance emphasizes that outcome ascertainment, timing, and data source differences between arms are key threats. The Biohaven case appears to reflect those issues.

  • This case reinforces that external controls are not a way to bypass rigorous design—they still demand high‐quality data, pre-specification, and detailed justification.


What this means for patients, clinicians and developers

  • For patients & clinicians: When you see a newly approved drug whose pivotal evidence is based on an external control rather than a randomized trial, ask: how comparable were the groups? How solid was the external data (patient-level, good follow-up, similar measurement)? Has a confirmatory trial been required or planned?

  • For developers: If you plan to rely on an external control, start very early: identify and curate the external dataset, lock the eligibility and analytic plan, engage FDA in frequent meetings, anticipate the agency will probe data quality, comparability, missing data and bias. Treat the external control design as a rigorous undertaking—not a shortcut.

  • For the field of rare diseases and high-unmet-need areas (for example your interest area of pulmonary arterial hypertension): External controls offer a promising option when randomization is infeasible—but you must make the case strongly. The FDA’s guidance gives you the checklist: patient­-level data, alignment of arms, clear index date, consistent endpoint measurement, robust analytics. If you can’t satisfy those, a randomized or concurrent control may still be needed.


Final thoughts

The FDA’s external‐control guidance is a signal that regulators recognize the realities of these challenging settings—but it does not mean external controls are easy, or will automatically yield approval. The recent uniQure and Biohaven cases are a clear reminder: you may have impressive effect size or compelling unmet need—but the agency will still closely scrutinize design, data source, comparability, missing data, and bias.

Thursday, November 06, 2025

FDA Commissioner's National Priority Voucher (CNPV) program versus Priority Review Voucher (PRV)

FDA created a new voucher program called "Commissioner's National Priority Voucher (CNPV) Pilot Program" to accelerate Drug Review for Companies Supporting U.S. National Interests. The CNPV pilot program offers an unprecedented opportunity to reduce drug and biological product application or efficacy supplement (ES) review times from 10-12 months to just 1-2 months. Announced in June 2025, this innovative program uses a collaborative tumor board style review process to accelerate approvals for companies aligned with critical U.S. national health priorities.

On October 16, 2025, FDA Awards First-Ever National Priority Vouchers to Nine Sponsors. The following 9 products were selected:

  • Pergoveris for infertility
  • Teplizumab for Type I diabetes
  • Cytisinicline for nicotine vaping addiction
  • DB-OTO for deafness
  • Cenegermin-bkbj for blindness
  • RMC-6236 for pancreatic cancer
  • Bitopertin for porphyria
  • Ketamine for domestic manufacturing of a critical drug for general anesthesia
  • Augmentin XR for domestic manufacturing of a common antibiotic 
On November 06, 2025, FDA Awards Second Batch of National Priority Vouchers. The following 6 products were selected following external applications and internal nominations from FDA review divisions:
  • Zongertinib for HER2 lung cancer
  • Bedaquiline for drug-resistant tuberculosis in young children
  • Dostarlimab for rectal cancer
  • Casgevy for sickle cell disease
  • Orforglipron for obesity and related health conditions
  • Wegovy for obesity and related health conditions

The new voucher program was criticized in this NEJM article “Flaws in the FDA’s New Priority Voucher Program” by Carpenter, Hwang, and Kesselheim. The authors argue that while the program seeks to promote innovation and address public health needs, it has significant shortcomings. They note that regulatory review already represents a small portion of total drug-development time and that past voucher programs have shown little evidence of spurring innovation while straining FDA resources. The paper warns that the CNPV program, created without congressional authorization and with vague eligibility criteria, risks politicizing FDA decisions, fostering conflicts of interest, and undermining public trust. Moreover, the compressed review timelines could compromise drug safety and overburden staff. The authors suggest that if implemented, the program should focus on generic drugs where expedited review could have tangible benefits, and should incorporate strong transparency, conflict-of-interest safeguards, and legislative oversight to maintain the integrity of the FDA’s regulatory process.

In a previous post, the FDA's Priority Review Voucher (PRV) programs were discussed. Here’s a side-by-side comparison table summarizing the key similarities and differences between the Commissioner’s National Priority Voucher (CNPV) and the Priority Review Voucher (PRV) programs such as the Rare Pediatric Disease PRV:

FeatureCommissioner’s National Priority Voucher (CNPV)Priority Review Voucher (PRV) – e.g., Rare Pediatric Disease
Origin / AuthorizationCreated by the FDA Commissioner in 2025 as a pilot program without explicit congressional authorization.Created by Congress through legislative action (e.g., the 2012 FDA Safety and Innovation Act for rare pediatric diseases).
PurposeTo accelerate review for drugs aligned with U.S. national health priorities, including unmet needs, national security, innovation, and affordability.To incentivize development of drugs for neglected or rare conditions (e.g., tropical or rare pediatric diseases).
Review Time BenefitShortens FDA review time for selected drugs to 1–2 months, much faster than any existing program.Converts a standard 10-month review to a priority 6-month review for another drug application.
Eligibility CriteriaBroad and somewhat opaque—drugs must align with “national priorities,” such as public health crises or domestic manufacturing.Clearly defined by statute—applies to drugs for designated rare pediatric diseases (or other legislatively defined conditions).
Voucher TransferabilityNontransferable (can only be used by the same sponsor; valid if company ownership changes).Transferable—can be sold or traded to another company, often for hundreds of millions of dollars.
Voucher ExpirationExpires after 2 years if unused.Does not expire.
Selection and OversightControlled by the FDA Commissioner’s office, raising concerns about political influence and conflicts of interest.Statutory oversight and criteria limit discretion; implementation and tracking are agency-administered but legislatively mandated.
Scope / Eligible ProductsFocused on drugs supporting U.S. national interests (innovation, security, affordability).Focused on drugs addressing specific rare or neglected diseases.
Impact on FDA WorkloadCould significantly burden FDA staff due to extremely short review timelines and unfunded mandates.Adds workload but within a manageable 6-month priority window, and user fees help fund FDA resources.
Pilot Structure / LimitsInitially limited to five awardees in the first year; long-term limits unclear.Ongoing statutory program with defined eligibility and reporting mechanisms.
Evidence of EffectivenessNo evidence yet; experts predict limited impact on overall development time and potential risks to review quality.Empirical studies show limited evidence of incentivizing innovation but measurable market and financial impacts.
Key ConcernsPoliticization, conflicts of interest, safety risks from rushed reviews, lack of transparency, and legal vulnerability due to lack of statutory basis.Limited innovation incentive, windfall profits for some sponsors, and added workload, but more transparent and regulated.
Suggested ImprovementsApply to generic drugs, establish clear criteria, ensure independent review, legislative authorization, and transparency.Better alignment between voucher incentives and public health impact; continue to monitor postmarket safety.

While both CNPV and PRV programs aim to accelerate access to important therapies, the CNPV is a politically driven, non-statutory pilot emphasizing national priorities, with potentially risky acceleration timelines and limited oversight. In contrast, the PRV system—though imperfect—has a clear legislative framework, defined eligibility, and established oversight mechanisms that maintain greater regulatory transparency and accountability.