Monday, April 26, 2021

Within Patient Benefit-Risk Evaluation? Using Outcomes to Analyze Patients versus Using Patients to Analyze Outcomes?

In our daily life, benefit-risk evaluation is something we always do whether we realize it or not. Benefit-risk evaluation is especially critical in drug development and in the regulator's decision process. We often hear that a drug is approved because the benefits outweigh the risks. In the recent decision of resuming the J&J Covid-19 vaccine, the CDC and the FDA cited that the benefits of rolling out the J&J Covid vaccine outweigh the risks of developing the rare blood clot (so-called CVST Cerebral Venous Sinus Thrombosis) in some young women who received the J&J Covid vaccine. 

In a recent New York Times article "Irrational Covid Fears", the benefit and risk of the Covid-19 vaccine are compared to a fable of our times and automobiles. 
A fable for our times
Guido Calabresi, a federal judge and Yale law professor, invented a little fable that he has been telling law students for more than three decades.
He tells the students to imagine a god coming forth to offer society a wondrous invention that would improve everyday life in almost every way. It would allow people to spend more time with friends and family, see new places and do jobs they otherwise could not do. But it would also come with a high cost. In exchange for bestowing this invention on society, the god would choose 1,000 young men and women and strike them dead.
Calabresi then asks: Would you take the deal? Almost invariably, the students say no. The professor then delivers the fable’s lesson: “What’s the difference between this and the automobile?”
In truth, automobiles kill many more than 1,000 young Americans each year; the total U.S. death toll hovers at about 40,000 annually. We accept this toll, almost unthinkingly, because vehicle crashes have always been part of our lives. We can’t fathom a world without them.
It’s a classic example of human irrationality about risk. We often underestimate large, chronic dangers, like car crashes or chemical pollution, and fixate on tiny but salient risks, like plane crashes or shark attacks.
One way for a risk to become salient is for it to be new. That’s a core idea behind Calabresi’s fable. He asks students to consider whether they would accept the cost of vehicle travel if it did not already exist. That they say no underscores the very different ways we treat new risks and enduring ones.
I have been thinking about the fable recently because of Covid-19. Covid certainly presents a salient risk: It’s a global pandemic that has upended daily life for more than a year. It has changed how we live, where we work, even what we wear on our faces. Covid feels ubiquitous.
Fortunately, it is also curable. The vaccines have nearly eliminated death, hospitalization and other serious Covid illness among people who have received shots. The vaccines have also radically reduced the chances that people contract even a mild version of Covid or can pass it on to others.
Yet many vaccinated people continue to obsess over the risks from Covid — because they are so new and salient.
This article reminds me of the seminars presented by Scott Evans. In his seminars, for example, the one posted on youtube, he started with a hypothetical question:
If you are given a choice to choose drug A or drug B, Drug A increases your intelligence, but decreases your good looks; Drug B increases your good looks, but decreases your intelligence; which drug will you choose? 
This is a typical question about the benefit-risk evaluation or benefit-risk tradeoff. With this question, he brought up a topic about an alternative (supposed to be optimal) way to perform the benefit-risk evaluation (i.e., the benefit-risk assessment on each individual patient level before aggregating the data on the group level).  
Currently, in clinical trials, the benefit (efficacy) evaluation and risk (safety) evaluation are performed independently. The study protocol was designed for showing the benefit (efficacy) - selecting the sensitive and clinically meaningful efficacy endpoint, ensuring sufficient large sample size for statistical power, sound statistical analysis methods are all for ensuring that the efficacy results can be used to demonstrate the benefit of the new drug. FDA has issued specific guidance only for efficacy "Demonstrating Substantial Evidence of Effectiveness for Human Drug and Biological Products".

Risk (safety) evaluation is usually assessed separately from the efficacy. While we collect the data for risk (safety) analysis (adverse events, serious adverse events, death, clinical laboratory results, ECG results, vital signs,...), the analyses of safety data are usually based on the summaries (no hypothesis testing) to assess the nature/pattern of the serious adverse events, related to the investigational new drug, if there is elevated levels in certain laboratory parameters,... Safety analyses contain a lot of subjective judgment. Different reviewers may come to different conclusions. 

There is no separate guidance from FDA specifically about the risk (safety assessment). Instead, the safety assessment is included in FDA's Good Review Practice: Clinical Review Template - a checklist for FDA reviewers in evaluating the safety. 

Only after the efficacy and safety are separately analyzed and evaluated, are a benefit-risk section written as a formal evaluation of the benefit-risk - this is usually in CTD module 1 and 2. 

This approach of assessing the efficacy and safety separately evaluates the average effect (efficacy or safety) in the entire study population. The benefit or risk can not be easily translated into the individual patient level. In clinical trials, it is almost impossible to decide if a drug is good (the benefit outweighs the risk) for a specific patient. We have to wait for the aggregate data to determine the benefit and risk on a group level. 

With advances in precision medicine and pharmacogenomics, we hope that in the future, within-patient benefit-risk evaluation can be performed. In the present days (perhaps the foreseeable future), the benefit-risk evaluation (or efficacy-safety evaluation) will still be primarily based on the population level to assess the average group effect. 
  • Average effect (Using Patients to Analyze Outcomes)
  • Subgroup analyses to identify the prognostic factors (phenotypes) to help identify the patients who will more likely to respond to the therapy with fewer side effects
  • Targeted therapies, Precision Medicine to identify the genetic biomarkers (genes) to help identify the subgroup of patients who will more likely to respond to the therapy with few side effects  
  • Individual effect - within patient benefit-risk evaluation 
Even with targeted therapy, it is still not possible to be certain if a therapy will be good (the benefit outweighs the risks) for a specific patient. 

For the J&J Covid-19 vaccine issue, it seems to be clear that the vaccine does appear to increase the risk of the rare blood clot - CVST. Since the CVST is so rare, the benefit of receiving the Covid-19 vaccine outweighs the risk of the rare blood clot - this assessment is on the population as a whole. When it comes to the individual person, it will be his/her own choice - the risk is small, but maybe there.  

Monday, April 19, 2021

Restricted Mean Survival Time (RMST) for Handling the Non-Proportional Hazards Time to Event Data

Time to event analysis (or traditionally survival analysis) is one of the most common analyses in clinical trials. In general, the time to event analysis relies on the assumption of the proportional hazards. However, quietly frequently, we may find that the proportional hazards assumption is violated, especially in many immuno-oncology trials. When the proportional hazards assumption is violated, alternative approaches may be needed to analyze the data to achieve statistical power. As discussed in the previous post "Non-proportional Hazards: how to analyze the time-to-event data?", one of the alternative approaches is the restricted mean survival time (RMST) method. 

RMST is one of the Kaplan-Meier-based methods and is essentially calculating and comparing AUCs under Kaplan-Meier Curves for different treatment groups or different comparative groups. It has been said that RMST analysis has the following advantages:
  • Model-free, robust, and easily interpretable treatment effect information
  • Produces radically powerful patterns of difference as has been observed in some recent Oncology clinical trials
  • Accepted approach by regulatory agencies and industry leaders
RMST has been mentioned in the latest FDA guidance for Industry (2020): Acute Myeloid Leukemia: Developing Drugs and Biological Products for Treatment as an alternative approach to analyzing the data when the non-proportionality hazards occur (e.g., plateauing effect). 

"Plateauing Effect

Trials designed to cure AML often result in survival contours characterized by an initial drop followed by a plateauing effect after some time point post randomization. This is an example of nonproportional hazards. While the log-rank test is somewhat robust to nonproportionality, it generally results in loss of power. Furthermore, nonproportionality can cause difficulty in describing the treatment effect. FDA is open to discussion about analyses based on other approaches, such as weighted Cox regression or other weighted methods, or summarizing the treatment effect using restricted mean survival time (RMST) or landmark survival analysis. Plans that use these alternative approaches should include:
    • justification for what constitutes clinically meaningful difference,
    • justification of design parameters, such as sample size and follow-up duration, based on this endpoint, and
    • justification for the value of the threshold that will be used to calculate the RMST.
RMST analysis has also been used as a primary analysis approach or for sensitivity analysis in FDA reviews: 

In NDA of Baloxavir marboxil in treatment of acute, uncomplicated influenza, both applicants and the FDA reviewer analyzed the data using RMST. It stated:
Restricted mean survival time (RMST) up to Day 10 was estimated for each treatment group along with the difference between RMST in the two treatment groups. RMST is a measurement of the average survival from time 0 to a specified time point (e.g., 10 days) which is equivalent to the area under the Kaplan-Meier curve from the beginning of the study through that time point.

At an FDA CDRH Medical Devices Advisory Committee Circulatory System Panel meeting in 2019, the independent statistical consultant addressed the analysis issue when the proportional hazards assumption is violated:

The proposal they made was the restricted mean survival time. The restricted mean survival time is area under curve. Please note the word restricted. Mean survival time is over a period of time, according to the rules that have been laid out, so that you're not looking, like with proportional hazards, over all the follow-up that could have possibly happened or in binary where you're only looking at the patients that survive. The restricted mean would say we're going to look between, let's say, 0 and 5 years because we have sufficient information to make that kind of assessment.

The paper showed that the restricted mean has just as much power as proportional hazards when the assumptions are there for proportional hazards, and then has more power when the assumptions are violated.

There's also some advantages in terms for clinicians, in terms of explaining this to the patient. It's hard to talk about hazards or number needed to treat. But if you could say to a patient over a 60-month period the average survival time is 55 months with Device A versus 52 months with Device B, now they can look at what their life is going to look like in the next 60 months and make a decision.

Unfortunately, it was not me who noticed this. This was actually from a presentation by FDA. Several very smart statisticians had talked about the restricted mean and have made recommendations on using it for both proportional violations and for its interpretation.

In FDA Briefing Document for Oncologic Drugs Advisory Committee Meeting (December 17, 2019) to review Olaparib for the maintenance treatment of adult patients with deleterious or suspected deleterious germline BRCA mutated (gBRCAm) metastatic adenocarcinoma of the pancreas

FDA performed a test to evaluate whether the proportional hazard assumption was met. This test failed to detect evidence of non-proportionality; however, such a test may lack power to detect non-proportionality due to the small sample size. The Kaplan-Meier curves of PFS appear to show some degree of nonproportionality. The curves did not show separation until approximately 4 months, after approximately 53% of patients either had events or were censored. FDA performed additional sensitivity analyses by applying the restricted mean survival time (RMST) method using different truncation points (15 months and 18 months). The truncated time was selected (15 or 18 months) such that approximately 8-12% patients remained at risk. Based on the truncation times, the estimated RMST difference in PFS between arms ranged from 2.6 months (95% CI: 0.9, 4.3) to 3.1 months (95% CI: 1.0, 5.2). The range of the RMST differences again demonstrated great variation in the difference in PFS and the lower ends did not suggest that there was a clinically meaningful difference.

Thanks to the software, RMST analyses can be easily implemented in SAS or R. In the latest version (version 15.1 or above) of SAS/Stat, RMST is included in SAS Proc LIFETEST with RMST option and Proc RMSTREG. See a nice paper by 
With R, the package for RMST analysis is survRM2 that is developed by Hajime Uno from Dana-Farber Cancer Institute

For RMST analysis, it is important to select the cut-off value (tau) for the truncated time. The different selection of taus will give different results. The selection of tau can sometimes be arbitrary. In an FDA briefing document above, the FDA statistician chose the truncated time such that approximately 8-12% of patients remained at risk.

There are different ways to calculate the RMST:

  • Non-parametric method
  • Regression Analysis Method
  • Pseudo-value Regression Method
  • IPCW Regression - Inverse Probability of Censoring Weighting (IPCW) regression
  • Conditional restricted mean survival time (CRMST)

According to the paper by Guo and Liang (2019) "Analyzing Restricted Mean Survival Time Using SAS/STAT®", non-parametric analysis can be implemented using Proc Lifetest; regression analysis, pseudo-value regression, and IPCW regression can be implemented using SAS Proc RMSTREG. 

FDA statisticians also proposed an approach 'conditional restricted mean survival time' or CRMST. This approach was described in the paper by Qiu et al (2019) "Estimation on conditional restricted mean survival time with counting process" and also in a presentation by Lawrence and Qiu (2020) Novel Survival Analysis When Hazards Are Nonproportional and/or There Are Multiple Types of Events. CRMST can allow the AUC under K-M curves to be calculated from an interval time (not necessarily to be started from the 0 time). They claim CRMST is better for event-driven studies where the time to the first event is the interest. They concluded the following: 
CRMST possesses all the desirable statistical properties of RMST. In particular, it does not rely on proportional hazard assumption. In addition, CRMST measures an average event-free time in the time range at issue and has straightforward interpretation. In case that two survival curves cross, CRMST can be estimated separately before and after crossing and the CRMST differences can be used to assess benefit versus harm.

Further Reading:

Monday, April 05, 2021

Non-proportional Hazards: how to analyze the time-to-event data?

Time to event data is one of the most common data types in clinical trials. Traditionally, the log-rank test is used to compares the survival curves of two treatment groups.; the Kaplan Meier survival plot is used to illustrate the totality of time-to-event kinetics, including the estimated median survival time;  the Cox-proportional hazards model is employed to provide the estimated relative effect (i.e., hazard ratio) between treatment arms. The performance of these analyses largely depends on the proportional hazards (PH) assumption – that the hazard ratio is constant over time. In other words, the hazard ratio provides an average relative treatment effect over time.

Before the time to event data is analyzed, it is typical for statisticians to check the proportional hazards assumption. Various methods can be used to check the proportional hazards assumptions - see a previous post "Visual Inspection and Statistical Tests for Proportional Hazard Assumption".

Recently we have seen more examples of the time to event data not following the proportional hazards assumption, even more examples in immuno-oncology clinical trials. 

It is not the end of the world if the proportional hazards assumption is violated, various approaches have been proposed to handle the time to event data with non-proportional hazards. 

In practice, it is pretty common that in the statistical analysis plan, we prespecify the log-rank test to calculate the p-values and then use Cox-proportional hazards regression model to calculate the hazard ratio, its 95% confidence interval, and p-value - I call this 'Splitting p-value and estimate of the treatment difference". Two different p-values will be calculated: one from the log-rank test and one from the Cox regression. If the proportional hazards assumption is met, it is better to use the p-value from the Cox regression since all estimates and p-value are coming from the model. However, When the proportional hazard assumption is violated, the Cox-proportional hazard model may no longer be the optimal approach to determine treatment effect and the Kaplan-Meier estimate of median survival may not be the most valid measure to summarize the results. 

In a website post "Testing equality of two survival distributions: log-rank/Cox versus RMST", it stated:
“One thing to note is that the log-rank test does not assume proportional hazards per se. It is a valid test of the null hypothesis of equality of the survival functions without any assumptions (save assumptions regarding censoring). It is however most powerful for detecting alternative hypotheses in which the hazards are proportional.”
It is true that the log-rank test does not depend on the proportional hazards assumption. The log-rank test is still a valid test of the null hypothesis of equality of the survival functions without any assumptions even though that the log-rank test may not be optimal under non-proportional hazards. 

In a public workshop "Oncology Clinical Trials in the Presence of Non-Proportional Hazards" organized by Duke in 2018, Dr. Rajeshwari Sridhara from Division of Biometrics V, CDER/FDA stated (@40:45 of the youtube video) that in the non-proportional hazards situation, FDA is ok with presenting the p-value from the log-rank test and hazard ratio to measure the treatment difference. 

At this same workshop "Oncology Clinical Trials in the Presence of Non-Proportional Hazards", ASA Biopharmaceutical Section Regulatory-Industry Statistics Workshop presented their work and proposed the 'max-combo' test as the alternative method to address the non-proportional hazards situation. The “max-combo” test is based on Fleming-Harrington (FH) weighted log-rank statistics. The max-combo test tackles some of the challenges due to non-proportional hazards as it is able to robustly handle a range of non-proportional hazard types, can be pre-specified at the design stage, and can choose the appropriate weight in an adaptive manner (i.e. is able to address the control of family-wise Type I error). In workshop summaries, Max-Combo Test Design was described as the following:

Knezevic & Patil has a paper describing a SAS macro to perform Max-Combo test (or Combination weighted log-rank tests) "Combination weighted log-rank tests for survival analysis with
non-proportional hazards" (2020 SAS Global Forum). 

The NPH workshop has presented or published their work on numerous occasions, here is a list: 
In addition to the Max-Combo test, there are several other methods for handling the non-proportional hazards situation. 
  • RMST (restricted mean survival time): according to a presentation by Lawrence et al from FDA, The idea of Restricted Mean Survival Time (RMST) goes back to Irwin (1949) and is further implemented in survival analysis by Uno et al. (2014). RMST is defined as the area under the survival curve up to t*, which should be pre-specified for a randomized trial. RMST may be loosely described as the event free expectancy over the restricted period between randomization and a defined, clinically relevant time horizon, called t*. RMST analyses are now built into the SAS procedures with Proc Lifetest and Proc RSMTREG. See a paper by Guo and Liang (2019) "Analyzing Restricted Mean Survival Time Using SAS/STAT®"
  • Piecewise exponential regression allows for an early and late effect of treatment comparison. it is especially useful when the non-proportional hazards pattern is cross-over. Piecewise exponential regression can be fitted with SAS Proc MCMC and R package pch
  • Estimation via the average hazard ratios (AHR) method of Schemper (2009) and the average regression effects (ARE) method of Xu and O’Quigley (2000) - the method can be implemented using the COXPHW package in R. COXPHW package is described as:
This package implements weighted estimation in Cox regression as proposed by Schemper, Wakounig and Heinze (Statistics in Medicine, 2009, doi: 10.1002/sim.3623). Weighted Cox regression provides unbiased average hazard ratio estimates also in case of non-proportional hazards. The package provides options to estimate time-dependent effects conveniently by including interactions of covariates with arbitrary functions of time, with or without making use of the weighting option. For more details we refer to Dunkler, Ploner, Schemper and Heinze (Journal of Statistical Software, 2018, doi: 10.18637/jss.v084.i02).

in a presentation by Kaur et al "Analytical Methods Under Non-Proportional Hazards: A Dilemma of Choice", the following methods were described: 

Earlier this year, Mehrotra and West published a paper to describe their proposed method (5-START) to handle the heterogeneity of the patient population and potential non-proportional hazards (Lin et al (2021) Survival Analysis Using a 5-Step Stratified Testing and Amalgamation Routine (5-STAR) in Randomized Clinical Trials or here ):

"The power of the ubiquitous logrank test for a between-treatment comparison of survival times in randomized clinical trials can be notably less than desired if the treatment hazard functions are non-proportional, and the accompanying hazard ratio estimate from a Cox proportional hazards model can be hard to interpret. Increasingly popular approaches to guard against the statistical adverse effects of non-proportional hazards include the MaxCombo test (based on a versatile combination of weighted logrank statistics) and a test based on a between-treatment comparison of restricted mean survival time (RMST). Unfortunately, neither the logrank test nor the latter two approaches are designed to leverage what we refer to as structured patient heterogeneity in clinical trial populations, and this can contribute to suboptimal power for detecting a between-treatment difference in the distribution of survival times. Stratified versions of the logrank test and the corresponding Cox proportional hazards model based on pre-specified stratification factors represent steps in the right direction. However, they carry unnecessary risks associated with both a potential suboptimal choice of stratification factors and with potentially implausible dual assumptions of proportional hazards within each stratum and a constant hazard ratio across strata.
We have developed and described a novel alternative to the aforementioned current approaches for survival analysis in randomized clinical trials. Our approach envisions the overall patient population as being a finite mixture of subpopulations (risk strata), with higher to lower ordered risk strata comprised of patients having shorter to longer expected survival regardless of treatment assignment. Patients within a given risk stratum are deemed prognostically homogeneous in that they have in common certain pre-treatment characteristics that jointly strongly associate with survival time. Given this conceptualization and motivated by a reasonable expectation that detection of a true treatment difference should get easier as the patient population gets prognostically more homogeneous, our proposed method follows naturally. Starting with a pre-specified set of baseline covariates (Step 1), elastic net Cox regression (Step 2) and a subsequent conditional inference tree algorithm (Step 3) are used to segment the trial patients into ordered risk strata; importantly, both steps are blinded to patient-level treatment assignment. After unblinding, a treatment comparison is done within each formed risk stratum (Step 4) and stratum-level results are combined for overall estimation and inference (Step 5)."
Non-proportional hazards and the NPH pattern are usually identified after the study unblinding, which poses the challenges for pre-specifying the best approach to analyze the time to event data with non-proportional hazards. The safest way is to prespecify both the Log-rank test and the Cox proportional hazards regression. If the non-proportional hazards assumption is violated, the p-values from the log-rank test will be used as a measure of the significance. One can also pre-specify the Max-Combo method as the primary method regardless of the NPH assumption