Monday, October 26, 2020

Real World Evidence in Regulatory Submissions - Story of IBRANCE for male breast cancer

In April 2019, FDA approved Pfizer's IBRANCE for male breast cancer. The approval is for the label extension for an already produce based on the real-world data (RWD). The IBRANCE approval in male breast cancer patients is considered as the first case that an FDA approval is mainly based on the RWD and has been used as an example in many lectures discussing the real-world evidence (RDE). Here is what is said in the press release about IBRANCE's approval for male breast cancer. 
“Men with breast cancer have limited treatment options, making access to medicines such as IBRANCE critically important,” said Bret Miller, founder of the Male Breast Cancer Coalition. “We applaud the use of real-world data, a new approach to drug review, to make IBRANCE available to certain men with metastatic breast cancer and help address an unmet need for these patients.”

Real-world data is playing an increasingly important role in expanding the use of already approved innovative medicines. Due to the rarity of breast cancer in males, fewer clinical trials are conducted that include men resulting in fewer approved treatment options. In the U.S. in 2019, it is estimated that there will be 2,670 new cases of invasive breast cancer and about 500 deaths from metastatic breast cancer in males. The 21st Century Cures Act, enacted in 2016, was created to help accelerate medical product development, allowing new innovations and advances to become available to patients who need them faster and more efficiently. This law places additional focus on the use of real-world data to support regulatory decision-making.

Detailed analysis of the use of IBRANCE in men with HR+, HER2- advanced or metastatic breast cancer will be presented at an upcoming medical meeting. Based on limited data from postmarketing reports and electronic health records, the safety profile for men treated with IBRANCE is consistent with the safety profile in women treated with IBRANCE.
In the original approval of IBRANCE for breast cancer, a randomized, controlled clinical trial was conducted, however, the male breast cancer patients were excluded from the study. In order to obtain the label extension to the male breast cancer patients, the RWE from electronic health records (EHR) for the off-label use of IBRANCE in male breast cancer patients was used. According to the FDA's review documents:
Male patients with breast cancer were ineligible in studies that provided the data to demonstrate the clinical benefit to support prior approvals of palbociclib (IBRANCE®). According to the current clinical practice standards, in the absence of safety and efficacy data from adequate and well-controlled studies, male patients with breast cancer are treated similarly to women with breast cancer. In this submission, the applicant provided the results of an analysis of real-world data (RWD) from electronic health records (EHRs) as additional supportive data to characterize the use of palbociclib in combination with endocrine therapy (aromatase inhibitor or fulvestrant) in male patients with breast cancer based on observed tumor responses in this rare subset of patients with breast cancer.
In a recorded webinar "Applying Real-World Evidence to Regulatory and Drug Development Challenges", Dr. Rebecca Miksad from Flatiron Health (the CRO who did the RWD analyses for Pfizer) summarized five key learnings from IBRANCE example for RWE in regulatory submissions: 
Pre-specification of study protocol & analysis plans 

“Without having reviewed and consented to a protocol and SAP, FDA cannot be certain that the protocol and SAP were pre-specified and unchanged during the data selection and analyses” 

- ODAC Briefing Document 

Appropriate Cohort selection for the research question 

Real-world patient cohorts need to be representative of population of interest 
Appropriate cohort selection criteria is context- and disease-dependent 
It is important to understand the feasibility of capturing each clinically meaningful variable 
A documented and traceable selection processes is needed (e.g., detailed cohort diagram) 
Missingness impacts ability of RWD to align with typical trial inclusion/exclusion (I/E) criterion 
Agreement is an additional layer of quality needs to be assessed for abstracted data 

Suitability of real-world endpoints 

Data quality suitability for the use case is critical for drawing confident conclusions from real-world endpoints 
depending on context, data quality considerations for real-world endpoints include: 
Can the endpoint be benchmarked to a reference or gold standard? 
How reproducible is the variable’s performance? 

Traceability back to source data 

Fit for purpose analytical methodologies 
Good analysis cannot fix low quality data; but bad analysis wastes high quality data 
Potential RWD quality issues need to be considered as part of the analysis plan. For example, 
Address potential bias in the data due to data quality issues (selection/ascertainment/confounding/immortal time) 

Assess the impact of missing data (sensitivity analysis) 

The published an article including the responses from the regulatory agency (FDA), the sponsor (Pfizer), and the CRO (Flatiron Health) "How real-world evidence was used to support approval of Ibrance for male breast cancer". Some of the responses are copied here:

The Cancer Letter:
Was this the first approval based at least in part on real world evidence in oncology?
Ibrance (palbociclib) was initially approved in 2015. It is a kinase inhibitor, now approved in combination with an aromatase inhibitor as the first hormonal-based therapy in women who have gone through menopause and in men, or with fulvestrant in patients whose disease progressed following hormonal therapy.
Pfizer provided the results of an analysis of real world data (RWD) from electronic health records (EHRs) as additional supportive data to characterize the use of Ibrance in combination with endocrine therapy (aromatase inhibitor or fulvestrant) in male patients with breast cancer based on observed tumor responses in this rare subset of patients with breast cancer.
Leveraging RWD to improve regulatory decisions is a key strategic priority for the FDA. This data may be derived from a variety of sources, such as electronic health records, medical claims, product and disease registries, laboratory test results and even cutting-edge technology paired with mobile devices.
These types of data are being used to develop real world evidence (RWE) that can better inform regulatory decisions.
Because they include data covering the experience of physicians and patients with the actual use of new treatments in practice, and not just in research studies, the collective evaluation of these data sources has the potential to inform clinical decision-making by patients and providers, develop new hypotheses for further testing of new products to drive continued innovation and inform us about the performance of medical products.
FDA has previously accepted RWD to support drug product approvals, primarily in the setting of oncology and rare diseases.
RWD has been used to determine prognosis or natural history of disease in order to help inform regulatory decision-making, for example, data on historical response rates drawn from expanded access, practice settings, or chart reviews.


The Cancer Letter:
What were the RWE endpoints being used here?
The RWE endpoints used were real world tumor response and safety data. Real world tumor response was taken from the electronic health record as part of routine clinical care and information about each response event was retrospectively collected.
Therefore, this response included several factors, such as physical exam, symptom improvement, and pathology reports, which were used to supplement descriptions of radiology findings in the overall clinicians’ assessment of response.
Additional data on use and durations of prescriptions were also provided.
The expanded indication in breast cancer is based on limited data from post-marketing reports and electronic health records sourced from three databases: IQVIA Insurance database, Flatiron Health Breast Cancer database and the Pfizer global safety database.
Based on these limited data, the safety profile for men treated with IBRANCE is consistent with the safety profile in women treated with IBRANCE.
A detailed analysis of the use of IBRANCE in men with HR+, HER2- advanced or metastatic breast cancer will be presented at an upcoming medical meeting.
For this dataset, Pfizer engaged Flatiron to explore baseline characteristics, treatment patterns and clinical outcomes from patient-level, de-identified data for a group of male patients with metastatic breast cancer.

As with any project in which a partner is considering the inclusion of RWE as part of a regulatory submission, we consider it critical to ensure the data is “fit-for-purpose,” that is, ensuring that the dataset is fit for the intended use and can provide adequate scientific evidence.

Using RWE/RWD to support regulatory submission is a trend. Statisticians are meeting the challenges in developing the methods of integrating the RWD into the clinical trials and into the regulatory submissions. We are seeing the popular terms 'historical control', 'external control', 'synthetic control arms', 'digital twin', 'propensity score', 'Bayesian dynamic borrowing', 'causal inferences'... We hope to see that the regulatory agencies will be more receptive to the RWE in supporting regulatory approvals. 

Friday, October 23, 2020

Randomly Select a Subset from a Population Using SAS Proc Surveyselect and Dividing a Population into Multiple Subsets Using SAS Proc Rank

During a clinical trial, sometimes, we are asked to select a subset of subjects from all trial participants. For example, for a clinical trial with 500 subjects enrolled, we need to select randomly 10% of subjects to do quality control (i.e., select 50 subjects from total of 500 enrolled subjects.

This can be easily accomplished by using SAS Proc Surveyselect. The full SAS manual about Proc SurveySelect can be found here.

Proc surveyselect data=AllEnrolled method=srs n=50 out=random50;
Proc print data=random50 noobs;
Var SubjectID;

During the statistical analyses especially the posthoc analyses, we may be asked to perform subgroup analyses by median (2 groups), tertiles (3 groups), quartiles (4 groups). For example, we may want to do a subgroup analysis of treatment comparison by baseline BMI where subjects are split into three groups (tertiles) with an equal number of subjects in each subset. This can be easily implemented by using SAS Proc Rank.

Proc Rank data=allEnrolled group = 3 out=test;
var BMI;
ranks rank_BMI;
proc print data=test;

The full SAS manual about Proc Rank can be found here. Two papers from SAS blogs discussed this related issue. 

Sunday, October 18, 2020

Visual Inspection and Statistical Tests for Proportional Hazard Assumption

To analyze the time to event data (or survival analysis in the early days), the most commonly used approaches are the non-parametric method (such as log-rank test) and the semi-parametric method (Cox proportional hazard regression model). There are a lot of discussions about checking the assumption of the proportional hazard that is defined as the time independence of the covariates in the hazard function, that is, the ratio of the hazard function for two treatment groups with different regression covariates does not vary with time.

If the proportional hazard assumption does not hold,

  • the non-parametric log-rank test is still valid, but the statistical power will be decreased
  • the semi-parametric method of Cox proportional hazard regression will be invalid
There are several methods to check the proportional hazard assumption, primarily the virtual check of the various plots. 

Kaplan-Meier Plot:

For time to event variables, the first thing we will do is to generate the Kaplan-Meier plot. Kaplan-Meier curves can tell if there is an unusual pattern suggesting non-proportional hazards. For example, three Kaplan-Meier plots below suggest different types of non-proportional hazards that are commonly seen in clinical trials: delayed treatment effect, crossing hazard, and diminishing treatment effect. 

the Kaplan-Meier plot can be easily generated using SAS Proc Lifetest.
Proc lifetest plot=(s);
Time aval*cnsr(1);
Strata trt01p;
Where (s) specifies the request for survival plot (i.e., Kaplan-Meier plot). aval is the time to event variable, cnsr is the indicator for the censoring values. 

Log Log Survival Plot 

This plots the log of negative log of estimated survivor functions versus the log of time. The y-axis is
log(-log(Survival)) and the x-axis is log(time). We can visually check if there are two straight parallel lines suggesting the proportional hazards. The log log survival plot below may suggest the minor deviation from the proportional hazard assumption, but may still be ok (since there is no line crossing). 

the Log-lot plot can be easily generated using SAS Proc Lifetest.
Proc lifetest plot=(lls);
Time aval*cnsr(1);
Strata trt01p;
where (lls) requests for the log log plot. 

Schoenfeld Residual Plot

Schoenfeld plots every time event to test the proportional hazard assumption. A straight line passing through a residual value of 0 with gradient 0 indicates that the variable satisfies the PH assumption and therefore does not depend on time.

According to Schoenfeld's paper "Partial Residuals for The Proportional Hazards Regression Model"
Residuals are defined for the proportional hazards regression model introduced by Cox
(1972). These residuals can be plotted against time to test the proportional hazards
assumption. Histograms of these residuals can be used to examine fit and detect outlying
covariate values.

The Schoenfeld residual plot below suggests that the proportional hazard assumption holds (the horizontal line with slop = 0), but there seems to be an outlier (circled in yellow).  

Schoenfeld residual plot can be generated with two steps: obtain the Schoenfeld residuals from the model fit and then use a graphic tool to draw the plots. 

Proc phreg ;
Class trt01p (ref='N');
Model aval*cnsr(1) = trt01p;
output out= phcheck ressch = schres;
Proc sgplot data=phcheck;
Loess x = aval y = schres / clm;

Besides these plots, there are also some statistical tests (i.e., giving a p-value) for checking the proportional hazard assumptions. Unfortunately, none of these statistical tests can give a definitive answer about the proportional hazard assumptions (very similar to the normality tests by Shapiro-Wilk test or Kolmogorov-Smirnov test) that are commonly accepted.

Tests by Including Time-Dependent Covariates in the Cox Model

Generate the time-dependent covariates by creating interactions of the predictors and a function of survival time and include in the model. If any of the time-dependent covariates are significant then those predictors are not proportional.

In SAS it is possible to create all the time-dependent variables inside Proc Phreg. Furthermore, by using the test statement, it is possible to test all the time-dependent covariates all at once.
Proc phreg; 
 Model aval*cnsr(1) = age treat aget treatt; *treat has to be numeric values;
 aget = age*log(aval);  *Creating time dependent age variable (time*age interaction);
 treatt = treat*log(aval); *Creating time dependent age variable (time*treatment interaction); 
 TestAll: TEST aget, treatt; *test PH for all time dependent covariates;

Where 'TestAll' is the label and the TEST statement tests linear hypotheses about the regression coefficients. PROC PHREG performs a Wald test for the joint hypothesis specified in a single TEST statement. Each equation specifies a linear hypothesis; multiple equations (rows of the joint hypothesis) are separated by commas.

Using ASSESS with the PH option (the supremum test) to check proportional hazards

This approach is based on the paper by Lin DY, Wei LJ, and Ying Z. (1993), “Checking the Cox Model with Cumulative Sums of Martingale-Based Residuals” Biometrika, 80, 557–572 and is built into SAS Proc PHREG with ASSESS statement.

The procedure can detect violations of proportional hazards by using a transform of the martingale residuals known as the empirical score process. The empirical score process under the null hypothesis of no model misspecification can be approximated by zero mean Gaussian processes, and the observed score process can be compared to the simulated processes to asses departure from proportional hazards.

In SAS, the ASSESS statement with the PH option provides an easy method to assess the proportional hazards assumption both graphically and numerically for many covariates at once. 

proc phreg;
  class trt01p(ref='N');
  model aval*cnsr(1)=trt01p age / rl ties=efron;
  assess PH / resample seed=123456;

The option PH in ASSESS statement tells SAS that we would like to assess proportional hazards in addition to checking functional forms. The resample option is to request the supremum tests of the null hypothesis that proportional hazards holds. These tests calculate the proportion of simulated score processes that yielded a maximum score larger than the maximum observed score process. A very small proportion (p-value) suggests a violation of proportional hazards.

Grambsch-Therneau test or G-T test based on the scaled Schoenfeld residuals

Schoenfeld residual plot can be visually checked to assess the proportional hazard assumption. In addition, Grambsch-Therneau test can be used to do a hypothesis testing for the proportional hazard assumption. This approach is based on the paper by Grambsch-Therneau "Proportional Hazards Tests and Diagnostics Based on Weighted Residuals". This is essentially a goodness-of-fit test. 

Grambsch-Theneau test is not readily available in SAS, but there are programs in SAS or R available to do the G-T test. 
When analyzing the data for time to event variables, checking the assumption of proportional hazards is a 'must' step to do. However, we also need to realize that there is no single method that is acceptable to everybody. The visual check of various plots and the statistical tests all contain the subjective components in judging if the proportional hazard assumption holds. In most situations, the violation of the proportional hazard assumption is obvious. Only in the situation that there seems to be a minor or borderline violation, may different approaches give different conclusions - then the impact of the non-proportional-hazards on the estimation may not be very big. However, the sensitivity analyses assuming non-proportional-hazard assumption or non-parametric analyses (such as log-rank test) should be performed.  

Thursday, October 15, 2020

The fate of confirmatory clinical trials for Remdesivir for treatment of COVID-19

Remdesivir, as the first highly touted drug to treat COVID-19 patients, has now been approved for emergency use authorization in several countries. The focus of fighting COVID-19 seems to shift to the safe and effective vaccine development. Unfortunately, the efficacy of Remdesivir has not been confirmed due to the flaws in the study design (for example, no placebo control) or due to the issues in the study conduct (for example, early discontinuation of the study resulted in underpowered studies) .
In a previous post, six pivotal studies were listed for Remdesivir. These six studies were also listed in the article by Singh et al Remdesivir in COVID-19: A critical review of pharmacology, pre-clinical and clinical studies.

The table below listed the status (fate) of these studies.
 Protocol Title
Study Features
Fate of Studies
Gilead Sciences

Phase III, initially planned for 400 subjects, then increased to 2400, and then 6000 subjects

Two  arms: Standard of care + Remdesivir for 5 days, Standard of care + Remdesivir for 10 days

Enrolment was stopped early after 397 subjects were randomized.  

Results were published in NEJM (Goldman et al “Remdesivir for 5 or 10 Days in Patients with Severe Covid-19

Conclusion: “In patients with severe Covid-19 not requiring mechanical ventilation, our trial did not show a significant difference between a 5-day course and a 10-day course of remdesivir. With no placebo control, however, the magnitude of benefit cannot be determined.”

It is noted that the study design was flawed and should have included a third arm with Standard of Care without Remdesivir.

The data from this study was further compared to the external control group and results were announced in a recent press release "Comparative Analysis of Phase 3 SIMPLE-Severe Study and Real-World Retrospective Cohort of Patients Diagnosed with Severe COVID-19 Receiving Standard of Care" to show the statistically significant reduction in mortality in Remdesivir group. 

Gilead Sciences

Phase III, initially planned for 600 subjects, then increased to 1600 subjects

Three arms: Remdesivir for 5 days, Remdesivir for 10 days, Standard of care

The study is active, but not recruiting new patients. The enrolment has stopped. The results have not been published yet.

  • Study Demonstrates 5-Day Treatment Course of Remdesivir Resulted in Significantly Greater Clinical Improvement Versus Treatment with Standard of Care Alone
  • Data Add to Body of Evidence from Prior Studies Demonstrating Benefit of Remdesivir in Hospitalized Patients with COVID-19
The results were later published in JAMA "Effect of Remdesivir vs Standard Care on Clinical Status at 11 Days in Patients with Moderate Covid-19"

Also see the articles

Capital Medical University/Chinese Academy of Medical Sciences

Phase III, 308 Subjects
Two arms: Remdesivir, placebo

Mainland China only

The study was suspended (The epidemic of COVID-19 has been controlled well at present, no eligible patients can be recruited). 

The results have not been published yet. 

Capital Medical University

Phase III, 453 Subjects
Two arms: Remdesivir, placebo

Mainland China only

The study was terminated after 237 patients were enrolled and randomly (158 in remdesivir and 79 in placebo) (The epidemic of COVID-19 has been controlled well in China, no eligible patients can be enrolled at present.)

Results were published at Lancet

Conclusion: “In this study of adult patients admitted to hospital for severe COVID-19, remdesivir was not associated with statistically significant clinical benefits. However, the numerical reduction in time to clinical improvement in those treated earlier requires confirmation in larger studies.”
National Institute of Allergy and Infectious Diseases (NIAID)

Phase II, planned 440 Subjects (protocol specified 394 subjects), actual enrolment: 1063 subjects at the time of DMC review)

Two arms: Placebo, Remdesivir with additional arms to be added

Multi-National: US, Japan, South Korea, Singapore

The study was stopped after the interim analyses. 

Preliminary results were published in NEJM by Beigel et al. Remdesivir for the Treatment of Covid-19 - Preliminary Report

Conclusion: “Remdesivir was superior to placebo in shortening the time to recovery in adults hospitalized with Covid-19 and evidence of lower respiratory tract infection.”

The results from this study were the basis for FDA to issue Emergency Use Authorization for Remdesivir. Subsequently, several other countries followed suit.

It is disappointing that the study was stopped after inconclusive or not convincing results from the interim analyses. There was no mention if there was a pre-specified stopping rule and whether the boundaries for stopping the study had been crossed. 

Also see: Inside the NIH’s controversial decision to stop its big remdesivir study

The final report was later published in NEJM (on Oct 8). Results were better than those reported in the preliminary report (median recovery time was shorten by 5 days). The final report concluded "Our data show that remdesivir was superior to placebo in shortening the time to recovery in adults who were hospitalized with Covid-19 and had evidence of lower respiratory tract infection."
Institut National de la Santé Et de la Recherche Médicale, France

Phase III, 3100 Subjects
Four arms: Remdesivir, Lopinavir/ritonavir, Interferon Beta-1A, Hydroxychloroquine, Standard of care

France Only

This study is funded by WHO and is called DIsCoVeRy in and
SOLIDARITY trial in ISRCTN registration.

The interim results were published in a paper "Repurposed antiviral drugs for COVID-19; interim WHO SOLIDARITY trial results". It concludes "These Remdesivir, Hydroxychloroquine, Lopinavir and Interferon regimens appeared to have little or no effect on hospitalized COVID-19, as indicated by overall mortality, initiation of ventilation and duration of hospital stay. The mortality findings contain most of the randomized evidence on Remdesivir and Interferon, and are consistent with meta-analyses of mortality in all major trials."

The results were disputed by the manufacturer of Remdesirvir Gilead. 

Monday, October 12, 2020

Randomized Controlled Trial in Sheep and Methodological Rigor in Preclinical Studies

Miller et al published a paper in blue journal (AJRCCM) "Combined Mesenchymal Stromal Cell Therapy and Extracorporeal Membrane Oxygenation in Acute Respiratory Distress Syndrom: A Randomized Controlled Trial in Sheep." At the first glance, I thought it was a randomized controlled clinical trial. Then I realized it was a randomized controlled trial in sheep and the word 'clinical' was not in the title. Whether or not the study is conducted in humans or in sheep, it can still be called 'randomized controlled trial' or RCT in short. 

It is great to see that the pre-clinical studies are conducted in a way with scientific rigor. The results from RCT in animals will be more reliable and provide more definitive evidence for us to decide if additional RCT in humans should be warranted. 

The presentation of the paper by Miller et al followed exactly the same way how a randomized controlled clinical trial will be presented. 

Study Design

Ethical approvals were obtained from University Animal Ethics Committees of Queensland University of Technology and the University of Queensland and authorization for in vivo use of hMSCs was granted by the Australian Department of Agriculture. The study was conducted in accordance with Australian Code for the Care and Use of Animals for Scientific Purposes and is reported in compliance with Animal Research: Reporting of In Vivo Experiments guidelines. ......

Statistical Analysis

An a prior sample size calculations, based on the primary outcome of PaO2/FiO2, ratio at 24 hours, is detailed in the online supplement. Data are expressed as mean (+/-SD) or median (interquartile range [IQR]) if nonnormally distributed. Analysis was undertaken in Graphpad Prism. Longitudinal data were analyzed by fitting a mixed model. This model uses a compound symmetry covariance matrix and is fit using restricted maximum likelihood. Where a significant interaction was observed, post hoc comparisons were undertaken. Correction for multiple comparisons was made using the Benjamini-Hochberg method (false discovery rate restricted to 5%). ......

Last July, I attended a symposium in Paris and Dr. Sébastien Bonnet from Université Laval in Canada from gave a presentation titled "Improving the rigour of preclinical studies to identify promising therapies". His presentation resonated with me.

Preclinical studies are usually small in sample size, conducted at a single institute, not randomized, and not reproducible. For years, there has been a push for improving the methodological rigor in preclinical studies. The methods used in the design and analysis of clinical trials can also be used in pre-clinical studies. We hope to see the terms (such as randomized, controlled, multi-center, meta-analysis) applied in more pre-clinical studies.   
Flawed preclinical studies can produce misleading results that may be used as the basis for clinical trials. A lot of efforts and money be spent on clinical trials that are based on the results from shady preclinical studies. 

Tuesday, October 06, 2020

Covid-19 Vaccine: Is 50% Vaccine Efficacy (VE) Too Low?

Ever since FDA issued its guidance “Development and Licensure of Vaccines to Prevent COVID-19,” to help facilitate timely development of safe, effective COVID-19 Vaccines, the question arose whether the threshold of 50% vaccine efficacy (VE) was set too low.

As cited in an article “FDA to Require 50 Percent Efficacy for COVID-19 Vaccines”:

Gregory Poland, the director of the Mayo Vaccine Research Group, tells Reuters the efficacy guidelines are standard compared to other vaccines. “They look pretty much like influenza vaccine guidelines,” Poland says. “I don’t think that’s a high bar. I think that’s a low to . . . appropriate bar for a first-generation COVID-19 vaccine.” The effectiveness of the annual flu shot, for example, generally ranges between 40 percent and 60 percent, according to The Washington Post.

Peter Hotez, a vaccine expert at the Baylor College of Medicine, tells the Post the 50 percent threshold is low, a sign that the FDA recognizes “our first vaccine won’t be our best.” Ultimately, he says, vaccine developers should aim for 70–75 percent efficacy. 

If we just look at the face meaning of the 50% VE, it does look like the bar is low. Some people may interpret 50% VE as that the COVID-19 vaccine will only need to be effective in 50% of people – which is not true. Even in the FDA’s announcement about the issuance of its guidance, the statement about the requirement of 50% VE was incorrectly stated:

“The guidance also discusses the importance of ensuring that the sizes of clinical trials are large enough to demonstrate the safety and effectiveness of a vaccine. It conveys that the FDA would expect that a COVID-19 vaccine would prevent disease or decrease its severity in at least 50% of people who are vaccinated.”

Let’s see how the vaccine efficacy is calculated and what the 50% VE means.

According to Wikipedia, Vaccine efficacy (VE) is the percentage reduction of disease in a vaccinated group of people compared to an unvaccinated group, using the most favorable conditions.

The outcome data (vaccine efficacy) generally are expressed as a proportionate reduction in disease attack rate (AR) between the unvaccinated (ARU) and vaccinated (ARV), or can be calculated from the relative risk (RR) of disease among the vaccinated group.

The basic formula is written as:

VE=(ARU-ARV) / ARU * 100%


VE = Vaccine efficacy,

ARU = Attack rate of unvaccinated people,

ARV = Attack rate of vaccinated people.

An alternative, equivalent formulation of vaccine efficacy


where RR is the relative risk of developing the disease for vaccinated people compared to unvaccinated people.

In the actual calculation of VE, we will need to consider the total exposure time (usually measured by the total person-time). One person observed for one year = 1 person-year; one person observed for 3 months = 0.25 person year. the total person-time in year (or total person-years) will be the summation of person-years across all participants in the vaccine group and similarly across all participants in the placebo group. 

The point estimate of the VE can be written as:

If the clinical trial has a 1:1 randomization ratio (all participants are randomized equally into the vaccine group and the placebo group), the 'total person-time' in the vaccine group will be approximately equal to the 'total person-time' in the placebo group, the point estimate of VE can then be easily calculated as:


If we know the number of cases (here COVID-19 cases) in the vaccine group and in the placebo group, we can easily calculate the VE.  For example, if the total number of cases is 150 (50 cases observed in the vaccine group and 100 cases observed in the placebo group), the VE will be 1 - (50/100) = 0.5 = 50%.

If an interim analysis is performed after a total of 75 cases are observed, VE will be 50% if 25 cases are observed in the vaccine group and 50 cases are observed in the placebo group. 

Here is a comparison of VE calculations from three Phase III protocols of COVID-19 vaccines:






Primary efficacy endpoint

VE will be estimated with 1 - HR

(mRNA-1273 vs placebo) using a Cox proportional hazard regression model with treatment group as a fixed effect and adjusting for stratification factor


VE will be estimated by 100 × (1 – IRR), where IRR is the  calculated ratio of confirmed COVID-19 illness per 1000 person-years follow-up in the active vaccine group to the corresponding illness rate in the placebo group 7 days after the last dose.


VE is calculated as

RRR = 100*(1-relative risk), which RRR is the incidence of infection in the vaccine group relative to the incidence of infection in the control group expressed as a percentage.

Statistical model for calculating the VE and its 95% confidence interval

Cox proportional hazard model

Beta-binomial model

Modified Poisson regression model with robust variance

Sample size (number of volunteers to be recruited)




Number of cases needed to be observed




With the COVID-19 pandemic is still not under control in the US and a large number of volunteers participating in these phase III clinical trials, we hope that the total number of COVID-19 cases can be easily reached so that we can have a readout about the vaccine's efficacy. All three studies have included at least one interim analysis to have a possible readout much earlier. 

In addition to the requirement of at least 50% VE, FDA guidance also requires that the lower bound of the 95% confidence interval of the VE must be greater than 30%. 

The sample size (the number of COVID-19 infection cases) is largely dictated by this criterion of 30% for the lower bound of 95% CI. Otherwise, with 3 cases (1 case in the vaccine group and 2 cases in the placebo group), we would have a point estimate of VE = 50% to meet the requirement. 

50% VE implies that the vaccine can decrease the risk of COVID-19 cases by 50%. Comparing with other clinical trials, the 50% reduction is substantial and meaningful. In Moderna's trial, the VE will be estimated using the Cox proportional hazard model where the time to the first case of COVID-19 infection is also considered. In order to meet the criteria of at least 50% VE, the estimated hazard ratio (HR) needs to be equal to or less than 0.5. In oncology trials or in other clinical trials with time to event variables, if we can have an HR of 0.5 or lower, we will claim that the experimental treatment can reduce the risk of death or event by at least 50% - a result to die for. 

I agree with the statement about the COVID-19 vaccine from a Lancet paper:

“A vaccine that has 50% efficacy could appreciably reduce incidence of COVID-19 in vaccinated individuals, and might provide useful herd immunity. Hence, although efficacy far greater than 50% would be better, efficacy of about 50% would represent substantial progress.”
SARS-CoV-2 (the virus causing COVID-19) has a very high R0 (2.5 according to the table below) which estimates the speed at which a disease is capable of spreading in a population. We hope that we will have a vaccine that will meet the efficacy requirements of at least 50% VE in point estimate and at least 30% VE in the lower bound of 95% confidence interval. With an effective vaccine and the majority of people being vaccinated, we may be able to drop the transmissibility R0 below 1 to prevent the spread of the SARS-CoV-2.

Saturday, October 03, 2020

Should We Follow ICH E9 Addendum to Include the Estimands in all Clinical Trial Protocols?

ICH E9 "Statistical Principles for Clinical Trials" was issued in 1998 - more than 20 years ago. While the principles specified in ICH E9 are still being followed, a call for a revision or addendum has been there for many years. In 2017, the draft version of ICH E9 (R1) “Addendum on Estimands and Sensitivity Analysis in Clinical Trials to the Guideline on Statistical Principles for Clinical Trials” was released and at the end of 2019, ICH E9 (R1) was finalized. The E9 (R1) guidelines are now gradually been adopted by various regulatory agencies. In terms of the implementation, EMA seems to be ahead of the US requiring the sponsors to include the concept of Estimands in the regulatory submissions. 

Purpose and scope of the addendum to ICH E9:
  • Provides a framework for describing with precision a treatment effect of interest
  • Precision in describing a treatment effect of interest is facilitated by constructing the “estimand”
  • Estimand: A precise description of the treatment effect reflecting the clinical question posed by the trial objective. It summarises at a population-level what the outcomes would be in the same patients under different treatment conditions being compared
  • Clarity requires a thoughtful envisioning of “intercurrent events” such as discontinuation of assigned treatment, use of additional or alternative treatment, and terminal events such as death
  • Intercurrent Events: Events occurring after treatment initiation that affect either the interpretation of the existence of the measurements associated with the clinical question of interest
  • It is necessary to address intercurrent events when describing the clinical question of interest in order to precisely define the treatment effect that is to be estimated
  • Addendum introduces strategies to reflect different questions of interest that might be posed
  • Attributes used to construct the estimand are also introduced in the addendum
  • Addendum clarifies the definition and the role of sensitivity analysis
  • Sensitivity Analysis: A series of analyses conducted with the intent to explore the robustness of inferences from the main estimator to deviations from its underlying modeling assumptions and limitations in the data
Estimand attributes:
  • Treatment: The treatment condition of interest and, as appropriate, the alternative treatment condition to which comparison will be made
  • Population: Patients targeted by the clinical question
  • Variable (or endpoint): Obtained for each patient and required to address the clinical question
  • Population-level summary: Provides a basis for comparison between treatment conditions for the variable
  • Handling of intercurrent events
While FDA has not mandated the implementation of the ICH E9 (R1), there seems to be a trend in the industry that ICH E9 (R1) is gradually being adopted and the concept ‘estimands’ is being mentioned in the study protocols and statistical analysis plans (SAPs). 

Looking at the clinical trial protocol and SAP templates developed by TransCelerate Biopharma, both contained a section about estimands and the estimands was listed together with endpoints.

In my previous post "Should Clinical Trial Protocol be Made Public While the Trial is still ongoing?", I mentioned that the protocols for three phase III studies for Covid-19 vaccine were all made public because of the demand for transparency. I examed all three protocols and they had described the 'estimands'. The concept of 'intercurrent events', 'principal stratum strategy', 'treatment policy' was also mentioned. 

In Phase III study protocol by Moderna "A Phase 3, Randomized, Stratified, Observer-Blind, Placebo-Controlled Study to Evaluate the Efficacy, Safety, and Immunogenicity of mRNA-1273 SARS-CoV-2 Vaccine in Adults Aged 18 Years and Older":

In the estimand of the primary analysis on the primary endpoint, a treatment policy strategy will be used to address the intercurrent events of 1) withdrawal from the study or death unrelated to COVID-19, where the time to COVID-19 will be censored at the date of withdrawal from the study or death; 2) early COVID-19, where the time to COVID-19 will be censored at the time of early infection. Principal stratum strategy will be used to address the other 2 types of intercurrent events in the primary analysis based on the PP Set. The details of intercurrent event description and estimand strategies are presented in Section 11.4.1.

In Phase III study protocol by AstraZeneca / Oxford "A Phase III Randomized, Double-blind, Placebo-controlled Multicenter Study in Adults to Determine the Safety, Efficacy, and Immunogenicity of AZD1222, a Non-replicating ChAdOx1 VectorVaccine, for the Prevention of COVID-19":

The primary estimand will be used for the analysis of the primary efficacy endpoint. It will be based on participants in the full analysis set, defined as all randomized participants who received at least 1 dose of study intervention excluding those participants who are seropositive at baseline, analyzed according to their randomized treatment. For participants with multiple events, only the first occurrence will be used for the primary efficacy endpoint analysis. The set of intercurrent events for this estimand consists of participants who withdraw from the study prior to having met the primary efficacy endpoint. The intercurrent events will be handled using the treatment policy strategy and the absence of data following these participants’ withdrawal will be treated as missing (ie, counted as not having met the criteria). Participants who withdraw before 15 days post second dose or who have a case prior to 15 days post second dose will be excluded from primary endpoint analysis.
Additional estimands will be specified for the primary efficacy endpoint to carry out sensitivity analyses for assessing the robustness of results. These sensitivity analyses will explore different methods for handling intercurrent events and different assumptions for missing data. Estimands will also be specified for the analysis of secondary endpoints. Full details will be provided in the SAP.

It looks like that the concept of 'estimands' has not been widely accepted by the medical community - it is indeed a new concept and a new term for clinical trialists to digest. Using a most recent paper in the New England Journal of Medicine (Rabe et al 2020 "Triple Inhaled Therapy at Two Glucocorticoid
Doses in Moderate-to-Very-Severe COPD"), the main body of the article had no mention of the concept of 'estimands' even though the attached protocol and SAP contained a section about 'estimands':