Monday, February 16, 2026

The Statistical Magic Trick: How Trials Share Results While Staying Blind

 

In the world of clinical research, "breaking the blind" is typically a cardinal sin. Yet, high-stakes Phase 3 trials like ORIGIN 3 (atacicept), PROTECT (sparsentan), and ATTRIBUTE-CM (acoramidis) have all successfully navigated the complex path of publishing interim results in the New England Journal of Medicine while keeping their long-term studies scientifically intact.

How do they pull off this statistical magic trick? It comes down to a rigorous architectural separation of data and people including statistical analysis team. Here is a look at the techniques used to maintain the "blind" during interim disclosures.

1. The "Firewall" Strategy: Independent Reporting Teams

The most critical technique used across all three studies is the creation of a "Firewall."

While a trial is ongoing, the people running the study (Sponsor clinical teams, investigators at hospitals, and the patients) must remain blinded. To perform an interim analysis, the Sponsor appoints an Independent Reporting Team (IRT) or an Independent Statistical Center.

  • How it works: This group has no contact with the study sites. They receive the raw "unblinded" data, perform the calculations for the primary endpoint (like the 36-week proteinuria reduction in PROTECT or ORIGIN 3), and prepare the manuscript for publication.
  • The Result: The people actually treating the patients remain in the dark, ensuring their medical decisions aren't influenced by knowing who is on the "winning" drug.

2. Safeguarding Against "Functional Unblinding"

In some trials, the drug’s effect is so obvious it could accidentally reveal the treatment.

  • In ORIGIN 3: Atacicept significantly lowers serum IgA and IgG levels. If a doctor saw these lab results, they would immediately know the patient was on the active drug. To prevent this, these specific lab values are suppressed. The results are sent to the Independent Reporting Team but are hidden from the investigators and the Sponsor’s site monitors.
  • In PROTECT: This study compared two active drugs (sparsentan vs. irbesartan). To ensure the difference in pill appearance didn't tip anyone off, they used a Double-Dummy design. Every patient took two sets of pills—one active and one placebo—so the physical routine remained identical for everyone.

3. Aggregate vs. Individual Disclosure

A common misconception is that "publishing the results" means everyone knows who got what. In reality, the NEJM publications for these trials only disclose aggregate data (group averages), not the individual patient level data.

  • In ATTRIBUTE-CM: When the Part A results (12-month 6-minute walk distance) were disclosed, the public learned how the group performed. However, the individual treatment assignments for each patient remained locked in the secure database.
  • The Benefit: Even if an investigator reads the NEJM article and sees that acoramidis is effective, they still do not know if the specific patient sitting in their office is receiving acoramidis or the placebo.

4. Prespecified Alpha Spending and "The Gatekeeper"

To maintain the statistical integrity of the final results (like the 104-week kidney function in ORIGIN 3 or the 30-month clinical outcomes in ATTRIBUTE-CM), the Statistical Analysis Plan (SAP) dictates exactly how much "statistical credit" is used during the interim look.

  • The Independent Data Monitoring Committee (iDMC) acts as the gatekeeper. They review the unblinded data behind closed doors and only allow the trial to proceed if the interim disclosure doesn't compromise the "power" of the final analysis.

Why go through all this trouble?

The goal is Accelerated Approval. By using these techniques, sponsors can show the FDA (and the medical community) that a drug works on a "surrogate marker" (like proteinuria) at an interim stage. This allows life-saving drugs to reach patients years earlier, while the "blinded" portion of the trial continues to gather the long-term data needed for full, traditional approval.

By combining physical dummies, suppressed lab data, and strict "firewalls" between statistical teams, researchers prove that you can indeed share the news of a trial's success without ruining the science that supports it.

Some Extra Words on ATTRIBUTE-CM Study

ATTRIBUTE-CM study is a phase 3, double-blind trial, 632 patients with transthyretin amyloid cardiomyopathy were randomly assigned in a 2:1 ratio to receive acoramidis hydrochloride at a dose of 800 mg twice daily or matching placebo for 30 months. The study contained two parts: Part A with primary endpoint of change from baseline to Month 12 of treatment in distance walked during the 6MWT, Part B with primary endpoint of a hierarchical combination of All-Cause mortality and CV-related hospitalization over a 30month period.

There were two readouts for the study and the Part A readouts were based on an interim analyses by the independent DMC. 

In December 2021, BridgeBio Pharma experienced a major setback when its Phase 3 ATTRibute-CM trial for acoramidis (a treatment for transthyretin amyloid cardiomyopathy - ATTR-CM) failed to meet its primary endpoint of improving the 6-minute walk distance (6MWD) at Month 12 (Part A primary efficacy endpoint). In the initial 12-month data, patients taking acoramidis did not show a statistically significant improvement in their 6MWD compared to those on a placebo.

Despite the failure of the 6MWD endpoint at 12 months, the study continued because the independent data monitoring committee noted encouraging trends in other areas. By July 2023, BridgeBio reported positive top-line results from the full study (Month 30), where acoramidis demonstrated a highly statistically significant improvement in a hierarchical analysis that included mortality, hospitalization, and 6MWD (Part B primary efficacy endpoint). 

Following the successful long-term data (Part B), which showed a 25% reduction in all-cause mortality and 50% reduction in cardiovascular hospitalization frequency (known as Attruby), the drug was approved by the FDA, with 3,751 prescriptions filled as of August 2025.

The study included an embedded Part A readouts that required the unblinding for interim analysis. The sponsor specified the following for maintaining the blinding for the overall study while the Part A results were analyzed and disclosed.


Sunday, January 18, 2026

Maximal Tolerated Dose (MTD) to Recommended Phase 2 Dose (RP2D) - a shift in early oncology trial designs

 As the field of oncology moves from systemic cytotoxic chemotherapies to targeted agents and immunotherapies, the paradigm for dose selection is undergoing a historic shift. For decades, the Maximum Tolerated Dose (MTD) was the "gold standard" for early-phase trials, but today’s clinical trialists and statisticians are increasingly prioritizing the Recommended Phase 2 Dose (RP2D) as a more robust and patient-centric metric.

This evolution is spearheaded by the FDA’s Project Optimus, which emphasizes "dose optimization" rather than simply finding the highest dose a patient can survive.

From "More is Better" to "The Optimal Balance"

The traditional MTD-centric approach was built on the assumption that a drug's efficacy increases linearly with its toxicity—a rule that often held true for classical chemotherapy. However, for modern targeted therapies, the Optimal Biologic Dose (OBD)—the dose that achieves maximum target saturation—often occurs well below the MTD.

Feature

Maximum Tolerated Dose (MTD)

Recommended Phase 2 Dose (RP2D)

Focus

Toxicity-driven; finding the safety ceiling.

Value-driven; finding the therapeutic "sweet spot".

Observation

Short-term (Cycle 1) Dose-Limiting Toxicities (DLTs).

Long-term tolerability, PK/PD, and cumulative safety.

Assumption

Efficacy increases with dose ("More is Better").

Efficacy may plateau while toxicity continues to rise.

Clinical Utility

A safety guardrail to prevent overdosing.

A strategic decision for registrational success.

Why RP2D is Preferred over MTD

For the modern statistician, the RP2D represents a "totality of evidence" that the MTD simply cannot provide:

  • Sustainability vs. Intensity: MTD focuses on what a patient can tolerate for 21 days. In contrast, RP2D considers the long-term tolerability necessary for chronic treatment, preventing premature discontinuations that can derail a trial's efficacy results.
  • The Sotorasib Lesson: FDA reviews, such as those for sotorasib, have highlighted the "dosing conundrum" where initial MTD-based doses led to excessive toxicity, eventually requiring post-market studies to find a more optimal, lower dose.
  • Target Saturation: Modern agents often reach a Pharmacokinetic (PK) plateau where increasing the dose adds no therapeutic benefit but significantly increases the rate of low-grade, chronic toxicities.
  • Dose-Response Nuance: As discussed in previous explorations of Determining the Dose in Clinical Trials, while the MTD is a safety limit identified through escalation, the RP2D is a comprehensive recommendation for further evaluation that aims to expose as few patients as possible to intolerable doses.

The Statistical Shift: Beyond 3+3

To find a true RP2D, statisticians are moving away from the rigid "3+3" rule-based designs to more flexible, model-informed approaches. These include:

  • Bayesian Optimal Interval (BOIN) designs that allow for a more nuanced exploration of the therapeutic window.
  • Randomized Dose-Ranging Studies: Encouraged by Project Optimus, these trials evaluate multiple doses early to compare safety and efficacy side-by-side.
  • Dose Expansion Cohorts: Used to refine the RP2D by gathering deeper data on preliminary efficacy and late-onset toxicities in specific patient subgroups.

Conclusion

The shift from MTD to RP2D is more than a regulatory requirement; it is a clinical necessity. By identifying an optimized RP2D early, sponsors can avoid the "safety pitfalls" of MTD, improve patient quality of life, and build a stronger evidence chain for final approval. In the era of precision medicine, finding the right dose for the right patient is just as important as finding the right drug.


Sunday, January 04, 2026

Excessive number of clinical trial protocol amendments due to complex trial design

In a previous blog post "Protocol amendment in clinical trials", I discussed the impact of protocol amendments on the clinical trial performance and cost and the reasons for driving the protocol amendments. Protocol amendments are unavoidable, but we can try to think about the study design and execution proactively to minimize the number of protocol amendments. Sometimes, the excessive number of protocol amendment are driven by Complex Innovative Trial Design or CID in short (for example, adaptive design, basket/umbrella/platform trial design, expansion cohort design, Bayesian design...).

We noticed an extreme case of a clinical trial with the study protocol amended 50 times. This refers to Study P001 (also known as KEYNOTE-001, NCT01295827) by Merck, which was a large, multi-cohort Phase 1 trial with numerous expansion cohorts that supported the initial accelerated approval of pembrolizumab. in Statistical Review and Evaluation, BLA 125514, FDA Center for Drug Evaluation and Research, August 2014, The FDA reviewer noted this high number of amendments while discussing the complexity of the trial design. KEYNOTE-001 was a massive "seamless" adaptive trial that evolved from a traditional Phase 1 dose-escalation study into a large study with multiple expansion cohorts (Part A, A1, A2, B, C, D, etc.) covering different tumor types (Melanoma, NSCLC) and dosing regimens. The "50 times" figure likely includes all global and country-specific amendments up to the time of the BLA submission in February 2014.

The high number of protocol amendments for KEYNOTE-001 was a direct result of its innovative, "seamless" adaptive study design. Initially launched as a standard Phase 1 dose-escalation trial, the study evolved into a massive, multi-cohort trial that eventually enrolled 1,235 patients.

The 50 amendments occurred primarily due to the following reasons:
  • Addition of Expansion Cohorts: As early data showed promising results, the protocol was repeatedly amended to add new expansion cohorts for specific tumor types, most notably melanoma and non-small cell lung cancer (NSCLC).
  • Sample Size Increases: Striking patient responses led investigators to increase sample sizes within existing cohorts to better evaluate efficacy endpoints like overall response rate (ORR).
  • Adaptive Dosing Changes: The protocol was amended to change dosing regimens based on emerging safety and efficacy data. For example, Amendment 7 changed dosing from every two weeks (Q2W) to every three weeks (Q3W), and Amendment 10 shifted all participants to a fixed dose of 200 mg.
  • Biomarker Integration: Amendments were used to add co-primary endpoints related to PD-L1 expression after researchers observed its correlation with drug efficacy. This included the validation of a companion diagnostic assay.
  • Regulatory Speed: This "seamless" approach allowed Merck to skip traditional Phase 2 and 3 steps for certain indications, leading to the first-ever FDA approval of an anti-PD-1 therapy.
While efficient, the FDA's statistical reviewers noted that such frequent changes (averaging more than one amendment per month during the most active phases) created significant operational and analytical complexity for the trial. The main challenges in analyzing the KEYNOTE-001 trial data, as noted in the FDA's statistical and medical reviews, stemmed from the extreme complexity of a "seamless" design that was modified more than 50 times. 

The primary analytical hurdles included:
  • Statistical Integrity and Type I Error Risk: The frequent addition of new cohorts and subgroups—often based on emerging data—increased the number of statistical comparisons. This raised concerns about "multiplicity," where the probability of finding a significant result by chance (Type I error) increases with every new hypothesis tested.
  • Operational and Data Management Complexity: Maintaining data quality was difficult when different sites were often operating under different versions of the protocol simultaneously. The FDA noted that this led to potential adherence issues and made it difficult to isolate single cohorts for clean, standalone submissions.
  • Shifting Dosing and Regimens: The trial transitioned from weight-based dosing (2 mg/kg or 10 mg/kg) to a fixed dose (200 mg) and changed the frequency of administration (every 2 weeks to every 3 weeks) mid-study. This required complex "pooled analyses" to prove that efficacy and safety were consistent across these varying schedules.
  • Biomarker Selection and Validation: The protocol was amended to include a PD-L1 companion diagnostic while the study was already underway. This created a challenge in defining "training" vs. "validation" sets within the same trial population to establish the diagnostic's cutoff levels without introducing bias.
  • Lack of a Control Arm: Because the trial was essentially a massive Phase 1 expansion, it lacked a randomized control arm for several indications. This forced reviewers to rely on cross-trial comparisons and historical data, which are inherently more prone to bias than randomized controlled trials (RCTs).
  • Patient Selection Bias: The "adaptive" nature allowed for rapid accrual in specific successful cohorts, which, while beneficial for speed, made it difficult to ensure the final patient population was representative of the broader real-world population.
Although the excessive number of protocol amendments, the results from the KEYNOTE-001 resulted in the FDA approval of pembrolizumab in the treatment of multiple tumor types. KEYNOTE-001 study was also the basis for the NEJM article "Seamless Oncology-Drug Development" by Prowell, Theoret, and Pazdur.

Thursday, January 01, 2026

One-way versus two-way tipping point analysis for robustness assessment of the missing data

Tipping point analysis (TPA) is a key sensitivity analysis mandated by regulatory agencies like the FDA to assess the robustness of clinical trial results to untestable assumptions about missing data. Specifically, it explores how much the assumption about the missing not at random (MNAR) mechanism would have to change to overturn the study's primary conclusion (e.g., a statistically significant treatment effect becoming non-significant). See a previous blog post "Tipping point analysis - multiple imputation for stress test under missing not at random (MNAR)"

One-Way Tipping Point Analysis for Robustness Assessment

A one-way tipping point analysis is a sensitivity method used to evaluate the robustness of a study’s primary findings by systematically altering the missing data assumption for only one treatment group at a time—most commonly the active treatment arm. While the missing outcomes in the control group are typically handled under a standard Missing at Random (MAR) or Jump to Reference assumption, the missing outcomes in the active arm are subjected to a varying "shift parameter" (δ). This parameter progressively penalizes the imputed values (e.g., making them increasingly worse) until the statistically significant treatment effect disappears, or "tips." By identifying this specific value, researchers can present a clear, one-dimensional threshold to clinical experts and regulators, who then judge whether such a drastic deviation from the observed data is clinically plausible or an unlikely extreme.

Two-Way Tipping Point Analysis for Robustness Assessment

A two-way TPA is an advanced method to assess robustness by independently varying the missing data assumptions for both treatment groups (e.g., the active treatment arm and the control/reference arm).

Missing Data Assumptions (MAR vs. MNAR)

The two-way TPA is used to assess the robustness of the primary analysis, which is typically conducted under the assumption of Missing at Random (MAR).

  • Missing at Random (MAR): Assumes that the probability of data being missing depends only on the observed data (e.g., a patient with a worse baseline condition is more likely to drop out, and we have observed the baseline data).

  • Missing Not at Random (MNAR): Assumes that the probability of data being missing depends on the unobserved missing outcome data itself (e.g., a patient drops out because their unobserved outcome has worsened more than what is predicted by their observed data).

Robustness Assessment

The two-way TPA evaluates robustness to plausible MNAR scenarios. This is done by imputing the missing outcomes (often starting with an MAR method like Multiple Imputation) and then applying a systematic, independent "shift parameter" (or δ) to the imputed values in each arm.

  • Process: The shift parameters (δActive and δControl) are varied systematically across a two-dimensional grid, typically in a direction that reduces the observed treatment effect.

  • Tipping Point: The δActive and δControl values at which the primary conclusion (e.g., statistical significance) is "tipped" or overturned define the tipping point.

  • Robustness: The larger and/or more clinically implausible the combination of shift parameters required to overturn the conclusion, the more robust the original result is considered to be under different MNAR assumptions.

Two-Way Tipping Point Result Tables

The results of a two-way TPA are typically presented as a grid or heat map table where:

  • One axis represents the shift parameter applied to the missing outcomes in the Active Treatment arm (δActive).

  • The other axis represents the shift parameter applied to the missing outcomes in the Control/Reference arm (δControl).

  • The cells of the table contain the resulting p-value or estimated treatment difference for that specific combination of assumptions.

The goal is to find the boundary of the grid where the result crosses the significance threshold (e.g., p >= 0.05 or the lower bound of the confidence interval crosses the null value).


Comparison: One-Way vs. Two-Way Tipping Point Analysis

The choice between one-way and two-way TPA is a trade-off between simplicity and comprehensiveness.

FeatureOne-Way Tipping Point AnalysisTwo-Way Tipping Point Analysis
Missingness AssumptionThe shift parameter (δ) is only applied to one arm, usually the active treatment group, while the missing data in the control arm are imputed based on the MAR assumption (e.g., Jump to Reference).Independent shift parameters (δActive and δControl) are applied to both arms simultaneously.
Sensitivity ExploredExplores MNAR scenarios where dropouts in one arm have systematically worse/better outcomes than assumed by MAR, relative to the other arm's MAR assumption.Explores a two-dimensional space of MNAR scenarios, allowing dropouts in both arms to vary independently.
ComplexitySimpler to calculate and interpret (one dimension).More computationally intensive and complex to interpret (two-dimensional grid).
PlausibilityOften viewed as less comprehensive, as it does not model the possibility of simultaneous, independent MNAR mechanisms in both arms.Considered more comprehensive as it allows for a wider range of clinically plausible and implausible MNAR scenarios.
Result PresentationA line plot or simple table with a single 'tipping point' value.A grid/matrix table or heat map showing the boundary of non-significance.

In essence, the two-way TPA is generally preferred by regulatory agencies for its superior ability to assess robustness because it explores a more realistic and exhaustive range of asymmetric MNAR mechanisms.

Monday, December 29, 2025

FDA guidance "Sponsor Responsibilities - Safety Reporting Requirements and Safety Assessment for IND and Bioavailability/Bioequivalence Studies"

Earlier this month, FDA issued its guidance "Sponsor Responsibilities - Safety Reporting Requirements and Safety Assessment for IND and Bioavailability/Bioequivalence Studies". As an clinical trialist, the updated FDA guidance (or the 2025 guidance) represents a major step forward, primarily by refining the focus on safety assessment and introducing key operational elements.

The 2025 guidance is not a complete rewrite of the 2012 version ("Safety Reporting Requirements for INDs and BA/BE Studies"), but rather a merger of the 2012 guidance content with the principles from the 2015 draft guidance on safety assessment.

Here is a comparison highlighting the key new elements the sponsor must now consider:

Key New Elements in the 2025 Guidance

The most significant change is a shift from focusing solely on individual case safety reports (ICSRs) to a greater emphasis on proactive, systematic safety assessment and the analysis of aggregate data.

New ConceptDescription and Implication for TrialistsRelevant Section in New Guidance 
Focus on Sponsor Responsibilities OnlyThe new guidance is strictly limited to Sponsor Responsibilities for safety reporting. All recommendations for Investigator Responsibilities found in the 2012 guidance have been moved to a separate document, reflecting a clear split in regulatory oversight.Section I, II (Preamble)
Aggregate Data AssessmentThis is the central update. The guidance expands significantly on the requirement to perform regular, proactive aggregate analyses of all accumulating safety data. The goal is to identify new or increased risks that would trigger expedited reporting, rather than relying only on individual case reports.Section III (Definitions) and Section IV (Aggregate Analyses)
Mandatory Safety Surveillance Plan (SSP)The guidance introduces the term Safety Surveillance Plan (SSP) as a systematic and organized approach to safety monitoring. The plan should include: 1) Clearly defined roles and responsibilities; 2) A plan for the regular review and evaluation of Serious Adverse Events (SAEs); and 3) The process for performing aggregate safety reviews.Section IV.C (Safety Surveillance Plan)
Sole Sponsor Causality DeterminationThe guidance emphasizes that the final responsibility for determining whether an event meets the criteria for expedited reporting (i.e., a "Suspected Adverse Reaction," or SUSAR) lies solely with the sponsor. While the sponsor should consider the investigator's opinion, the sponsor is imputed with the ultimate responsibility for the causality judgment for regulatory submission purposes.Section III.B (Suspected Adverse Reaction)
Flexibility in Safety ReviewThe new guidance offers greater flexibility by allowing sponsors to choose which individual, group, or entity (e.g., Safety Monitoring Committee, Data Monitoring Committee) is responsible for reviewing, analyzing, and making decisions regarding IND safety reporting.Section IV.C.1 (Features and Composition of the Entity)

This shift aims to reduce the "noise" of over-reporting uninformative individual adverse events, which was a concern under the old paradigm. Instead, the focus is placed on the sponsor's expert medical review and comprehensive analysis of the overall safety data package.

Here is a side-by-side comparison table summarizing the main discussion points and key changes between the 2012 and 2025 FDA guidance documents on safety reporting.


Safety Reporting Guidance: 2012 vs. 2025 Comparison

Discussion Point2012 Final Guidance: Safety Reporting Requirements for INDs and BA/BE Studies2025 Final Guidance: Sponsor Responsibilities — Safety Reporting Requirements and Safety Assessment for IND and BA/BE Studies
Primary Scope and FocusFocused on procedural requirements for expedited reporting of individual Serious Adverse Events (SAEs).Mandatory emphasis on safety assessment and aggregate data analysis to identify new, significant risks. Merges content with principles from the 2015 draft guidance on safety assessment.
Division of ResponsibilitiesContained recommendations for both Sponsor and Investigator safety reporting responsibilities.Exclusively focuses on Sponsor responsibilities. Investigator reporting recommendations are placed in a separate, concurrently issued guidance document.
Safety Surveillance/PlanningImplicit in the sponsor's duties, but lacked a formalized planning requirement.Introduces the new term "Safety Surveillance Plan (SSP)" to describe a required systematic and organized approach.
Plan Components (SSP)Did not specify formal plan components.Requires the plan to include clearly defined roles and responsibilities, a process for regular review of SAEs, and a process for aggregate safety reviews.
Requirement for ReviewFocused primarily on individual case review to determine if the reporting criteria (Serious, Unexpected, Suspected Adverse Reaction - SUSAR) were met.Explicitly requires sponsors to review and evaluate all accumulating safety data at regular intervals (aggregate review) to update the overall safety profile.
Decision-Making BodyLacked specific recommendations for the structure of the internal safety review process.Offers greater flexibility by allowing the sponsor to choose the individual, group, or entity (e.g., Safety Assessment Committee) responsible for safety reporting and decision-making.
Source of Safety DataFocused mainly on reports from the clinical trial itself.Emphasizes that sponsors must review information from any source (e.g., animal studies, scientific literature, foreign reports, and commercial experience) to identify new significant risks to trial participants.
Expedited Reporting RationaleThe concern was the overreporting of uninformative individual Adverse Events (AEs), which hindered the IRB's ability to focus on true risks.Seeks to reduce overreporting by clarifying that the decision for a 7- or 15-day expedited report must be based on the sponsor's professional judgment of causality (i.e., a reasonable possibility).

Summary of the Shift

The 2025 guidance strongly emphasizes a shift in the regulatory burden from volume-based individual reporting (the 2012 paradigm) to quality-based, comprehensive safety analysis by the sponsor. The overall goal is to enhance patient protection by focusing the FDA, IRBs, and investigators on truly meaningful safety signals derived from cumulative data, rather than individual case reports.

Monday, December 01, 2025

Handling "Median Not Reached": A Guide to Analyzing and Presenting Low Event Rate Survival Data

In the era of highly effective therapies for may diseases, clinical researchers are increasingly encountering a "good" problem in the time to event analyses: the Kaplan-Meier survival curves are flattening out well above the 50% mark. While this represents a triumph for patient outcomes, it creates a headache for statistical reporting. When the event rate is low (below 50%), the Median Time to Event (e.g., Median Overall Survival) and its 95% Confidence Interval (CI) cannot be estimated (often reported as "NE" (not estimable), "NR" (not reached), or "NC" (not calculable)).

So, how do we robustly describe the efficacy of a treatment when the standard metric fails? This post outlines the best-practice alternatives for summarizing, analyzing, and visualizing survival data in low event settings.


1. The Limitation of the Median

The median survival time is simply the time point at which the survival probability drops to 0.50. If the Kaplan-Meier curve plateaus at 70% or 80% because fewer than half the patients experienced the event, the median is mathematically undefined. Reporting it merely as "Not Reached" (NR) is accurate but clinically uninformative—it tells us what the survival is not, but not what it is.

To provide a complete picture, we must pivot to alternative metrics that describe different parts of the survival distribution.

2. Primary Summary Measures

A. Landmark Survival Probabilities

When we cannot answer "When will half the patients die?", we should ask, "What proportion of patients are event-free at time ?"

Landmark analysis reports the Kaplan-Meier survival probability (with 95% CIs) at clinically relevant, fixed timepoints (e.g., 24 weeks, 12 months, 24 months, 5 years).

  • Best Practice: Pre-specify these timepoints in the Statistical Analysis Plan (SAP) to avoid data dredging.

  • Example Reporting: "Event free rate was 93% at week 24 in the treatment group", "The 3-year recurrence-free survival rate was 88.4% (95% CI: 85.1–91.0) in the treatment arm compared to 82.1% (95% CI: 78.4–85.2) in the placebo arm."

B. Lower-Percentile Survival Times (10th and 25th)

Just because the 50th percentile (median) is missing doesn't mean all percentiles are.

  • 25th Percentile: The time at which 25% of patients have experienced the event (or survival drops to 75%).

  • 10th Percentile: The time at which 10% of patients have experienced the event (or survival drops to 90%).

These metrics characterize the "early failures" or the worst-performing subset of the cohort. They are particularly useful for showing that a treatment delays early progression even if the long-term survival is high.

MetricTreatment GroupControl Group
Median (50th)NR (95% CI: NR, NR)NR (95% CI: 36.7, NR)
25th Percentile18.4 months (14.2, 22.1)12.1 months (9.8, 14.5)
10th Percentile5.4 months (4.1, 6.8)3.2 months (2.8, 3.9)

Note: In the table above, while the median is NR for both, the 25th percentile clearly demonstrates a 6-month delay in progression for the treatment group.


3. Robust Analytical Alternatives

A. The "Reverse Kaplan-Meier" Method for Follow-Up

In low event trials, it is critical to prove that the "NR" result is due to drug efficacy, not just because patients left the study early. The Reverse Kaplan-Meier method is the gold standard for calculating median follow-up.

  • How it works: You reverse the censoring indicator (Event = Censored; Censored = Event) and run a standard Kaplan-Meier analysis. The resulting median is the median potential follow-up time.

  • Why use it: Unlike the "median time on study," it is not biased by early deaths or events, providing a true measure of how long the trial centers monitored the patients.

B. Restricted Mean Survival Time (RMST)

RMST is rapidly becoming the preferred alternative to the Hazard Ratio (HR) in low event trials, especially when the Proportional Hazards assumption is violated (e.g., crossing curves).

  • Definition: RMST is the "area under the survival curve" up to a specific time point ($\tau$). It represents the average survival time a patient lives during that window.

  • Reporting: You can report the Difference in RMST (Treatment minus Control) or the Ratio.

  • Interpretation: "Over the 5-year follow-up period, patients on the new therapy lived, on average, 4.2 months longer than those on the control (RMST difference = 4.2 months, p=0.003)."


4. Visualization Best Practices

A. The Kaplan-Meier Plot: Handling the Y-Axis

In trials with very high survival (e.g., >90%), the survival curves may be squeezed into the top 10% of the graph, making it hard to see separation.

  • Line Break (Axis Break): It is acceptable to "break" the y-axis to focus on the relevant range (e.g., from 80% to 100%), provided this is clearly marked.

  • Inverted Plot (Failure Plot): Alternatively, plot the Cumulative Incidence of Events (1 - Survival) on a y-axis ranging from 0% to 20%. This often visualizes the difference in event rates more clearly than a survival curve stuck at the top of the chart.

B. The "Number at Risk" Table

Always include a table below the x-axis aligned with the tick marks. In low event trials, this table reveals whether the "flat tail" of the curve is based on hundreds of patients or just a few who haven't been followed long enough.


5. Optional Exploratory Methods

If pre-specified in the protocol, Parametric Modeling can be used to estimate the median survival even if it hasn't been reached observed data.

  • Weibull Distribution: By fitting a Weibull model to the observed data, you can extrapolate the curve to predict when the median would be reached, assuming the risk profile remains constant.

  • Caution: This is a prediction, not an observation. It should be labeled clearly as "Estimated Median (Parametric)" and treated as exploratory evidence.

Summary Checklist for Reporting Low Event Data

  1. State clearly that the median is NE/NR.

  2. Report Landmark Rates (e.g., 3-year survival) with CIs.

  3. Report Lower Percentiles (25th, 10th) to show early separation.

  4. Use RMST to quantify the average time gained.

  5. Calculate Follow-up using the Reverse Kaplan-Meier method.

  6. Adjust Plots (zoom/break y-axis) to make differences visible, but keep the full context clear.

                                                          Note: AI-assisted writing for this blog article.