On Biostatistics and Clinical Trials: Non-proportional Hazards: how to analyze the time-to-event data?

Time to event data is one of the most common data types in clinical trials. Traditionally, the log-rank test is used to compares the survival curves of two treatment groups.; the Kaplan Meier survival plot is used to illustrate the totality of time-to-event kinetics, including the estimated median survival time; the Cox-proportional hazards model is employed to provide the estimated relative effect (i.e., hazard ratio) between treatment arms. The performance of these analyses largely depends on the proportional hazards (PH) assumption – that the hazard ratio is constant over time. In other words, the hazard ratio provides an average relative treatment effect over time.

Before the time to event data is analyzed, it is typical for statisticians to check the proportional hazards assumption. Various methods can be used to check the proportional hazards assumptions - see a previous post "Visual Inspection and Statistical Tests for Proportional Hazard Assumption".

Recently we have seen more examples of the time to event data not following the proportional hazards assumption, even more examples in immuno-oncology clinical trials.

It is not the end of the world if the proportional hazards assumption is violated, various approaches have been proposed to handle the time to event data with non-proportional hazards.

In practice, it is pretty common that in the statistical analysis plan, we prespecify the log-rank test to calculate the p-values and then use Cox-proportional hazards regression model to calculate the hazard ratio, its 95% confidence interval, and p-value - I call this 'Splitting p-value and estimate of the treatment difference". Two different p-values will be calculated: one from the log-rank test and one from the Cox regression. If the proportional hazards assumption is met, it is better to use the p-value from the Cox regression since all estimates and p-value are coming from the model. However, When the proportional hazard assumption is violated, the Cox-proportional hazard model may no longer be the optimal approach to determine treatment effect and the Kaplan-Meier estimate of median survival may not be the most valid measure to summarize the results.

In a website post "Testing equality of two survival distributions: log-rank/Cox versus RMST", it stated:

“One thing to note is that the log-rank test does not assume proportional hazards per se. It is a valid test of the null hypothesis of equality of the survival functions without any assumptions (save assumptions regarding censoring). It is however most powerful for detecting alternative hypotheses in which the hazards are proportional.”

It is true that the log-rank test does not depend on the proportional hazards assumption. The log-rank test is still a valid test of the null hypothesis of equality of the survival functions without any assumptions even though that the log-rank test may not be optimal under non-proportional hazards.

In a public workshop "Oncology Clinical Trials in the Presence of Non-Proportional Hazards" organized by Duke in 2018, Dr. Rajeshwari Sridhara from Division of Biometrics V, CDER/FDA stated (@40:45 of the youtube video) that in the non-proportional hazards situation, FDA is ok with presenting the p-value from the log-rank test and hazard ratio to measure the treatment difference.

At this same workshop "Oncology Clinical Trials in the Presence of Non-Proportional Hazards", ASA Biopharmaceutical Section Regulatory-Industry Statistics Workshop presented their work and proposed the 'max-combo' test as the alternative method to address the non-proportional hazards situation. The “max-combo” test is based on Fleming-Harrington (FH) weighted log-rank statistics. The max-combo test tackles some of the challenges due to non-proportional hazards as it is able to robustly handle a range of non-proportional hazard types, can be pre-specified at the design stage, and can choose the appropriate weight in an adaptive manner (i.e. is able to address the control of family-wise Type I error). In workshop summaries, Max-Combo Test Design was described as the following:

Knezevic & Patil has a paper describing a SAS macro to perform Max-Combo test (or Combination weighted log-rank tests) "Combination weighted log-rank tests for survival analysis with

non-proportional hazards" (2020 SAS Global Forum).

The NPH workshop has presented or published their work on numerous occasions, here is a list:

Roychoudhury et al (2021) Robust Design and Analysis of Clinical Trials With Nonproportional Hazards: A Straw Man Guidance From a Cross-Pharma Working Group. Statistics in Pharmaceutical research
Lin et al (2020) Alternative Analysis Methods for Time to Event Endpoints Under Nonproportional Hazards: A Comparative Analysis. Statistics in Pharmaceutical Research
Anderson & Rochoudhury (2018) Design and Analysis of Clinical Trials in the Presence of Non-Proportional Hazards. JSM 2018
Roychoudhury & Anderson (2020) Robust Design and Analysis of Clinical Trials with Non-proportional Hazards: Methodology and Implementation with R. RISW 2020

In addition to the Max-Combo test, there are several other methods for handling the non-proportional hazards situation.

https://ww2.amstat.org/meetings/biopharmworkshop/2018/onlineprogram/ViewPresentation.cfm?file=300719.pdf

RMST (restricted mean survival time): according to a presentation by Lawrence et al from FDA, The idea of Restricted Mean Survival Time (RMST) goes back to Irwin (1949) and is further implemented in survival analysis by Uno et al. (2014). RMST is defined as the area under the survival curve up to t*, which should be pre-specified for a randomized trial. RMST may be loosely described as the event free expectancy over the restricted period between randomization and a defined, clinically relevant time horizon, called t*. RMST analyses are now built into the SAS procedures with Proc Lifetest and Proc RSMTREG. See a paper by Guo and Liang (2019) "Analyzing Restricted Mean Survival Time Using SAS/STAT®"
Piecewise exponential regression allows for an early and late effect of treatment comparison. it is especially useful when the non-proportional hazards pattern is cross-over. Piecewise exponential regression can be fitted with SAS Proc MCMC and R package pch
Estimation via the average hazard ratios (AHR) method of Schemper (2009) and the average regression effects (ARE) method of Xu and O’Quigley (2000) - the method can be implemented using the COXPHW package in R. COXPHW package is described as:

This package implements weighted estimation in Cox regression as proposed by Schemper, Wakounig and Heinze (Statistics in Medicine, 2009, doi: 10.1002/sim.3623). Weighted Cox regression provides unbiased average hazard ratio estimates also in case of non-proportional hazards. The package provides options to estimate time-dependent effects conveniently by including interactions of covariates with arbitrary functions of time, with or without making use of the weighting option. For more details we refer to Dunkler, Ploner, Schemper and Heinze (Journal of Statistical Software, 2018, doi: 10.18637/jss.v084.i02).

in a presentation by Kaur et al "Analytical Methods Under Non-Proportional Hazards: A Dilemma of Choice", the following methods were described:

Earlier this year, Mehrotra and West published a paper to describe their proposed method (5-START) to handle the heterogeneity of the patient population and potential non-proportional hazards (Lin et al (2021) Survival Analysis Using a 5-Step Stratified Testing and Amalgamation Routine (5-STAR) in Randomized Clinical Trials or here ):

"The power of the ubiquitous logrank test for a between-treatment comparison of survival times in randomized clinical trials can be notably less than desired if the treatment hazard functions are non-proportional, and the accompanying hazard ratio estimate from a Cox proportional hazards model can be hard to interpret. Increasingly popular approaches to guard against the statistical adverse effects of non-proportional hazards include the MaxCombo test (based on a versatile combination of weighted logrank statistics) and a test based on a between-treatment comparison of restricted mean survival time (RMST). Unfortunately, neither the logrank test nor the latter two approaches are designed to leverage what we refer to as structured patient heterogeneity in clinical trial populations, and this can contribute to suboptimal power for detecting a between-treatment difference in the distribution of survival times. Stratified versions of the logrank test and the corresponding Cox proportional hazards model based on pre-specified stratification factors represent steps in the right direction. However, they carry unnecessary risks associated with both a potential suboptimal choice of stratification factors and with potentially implausible dual assumptions of proportional hazards within each stratum and a constant hazard ratio across strata.

We have developed and described a novel alternative to the aforementioned current approaches for survival analysis in randomized clinical trials. Our approach envisions the overall patient population as being a finite mixture of subpopulations (risk strata), with higher to lower ordered risk strata comprised of patients having shorter to longer expected survival regardless of treatment assignment. Patients within a given risk stratum are deemed prognostically homogeneous in that they have in common certain pre-treatment characteristics that jointly strongly associate with survival time. Given this conceptualization and motivated by a reasonable expectation that detection of a true treatment difference should get easier as the patient population gets prognostically more homogeneous, our proposed method follows naturally. Starting with a pre-specified set of baseline covariates (Step 1), elastic net Cox regression (Step 2) and a subsequent conditional inference tree algorithm (Step 3) are used to segment the trial patients into ordered risk strata; importantly, both steps are blinded to patient-level treatment assignment. After unblinding, a treatment comparison is done within each formed risk stratum (Step 4) and stratum-level results are combined for overall estimation and inference (Step 5)."

Non-proportional hazards and the NPH pattern are usually identified after the study unblinding, which poses the challenges for pre-specifying the best approach to analyze the time to event data with non-proportional hazards. The safest way is to prespecify both the Log-rank test and the Cox proportional hazards regression. If the non-proportional hazards assumption is violated, the p-values from the log-rank test will be used as a measure of the significance. One can also pre-specify the Max-Combo method as the primary method regardless of the NPH assumption

On Biostatistics and Clinical Trials

Monday, April 05, 2021

Non-proportional Hazards: how to analyze the time-to-event data?

3 comments: