Saturday, August 11, 2018

Splitting p-value and estimate of the treatment difference

When we perform the statistical test to compare the difference between two treatment groups, we usually construct a test statistic, calculate the treatment difference, and then obtain the p-value corresponding to the test statistic and the treatment difference.

For example, for a study with primary efficacy endpoint of proportion of subjects with hemostasis, Treatment difference was estimated using risk ratio and the corresponding p-value was calculated to indicate if the risk ratio is statistically significant. 


Another example is for a study with 6-Min walk distance (6MWD) as the primary endpoint. The treatment difference was estimated using least-squares mean difference. The corresponding p-value was calculated using the analysis of covariance (ANCOVA) approach. 








In these examples, the p-value and the estimate of the treatment difference were from the same test statistic. 


We see many examples where two different methods are used for estimating the treatment difference and for calculating the p-value - I call it 'splitting the p-value and the estimate of the treatment difference'. Here are two situations where this splitting situation occurs.

Analysis of Time to Event: log-rank test for calculating the p-value and proportional hazard model for estimating the hazard ratio 

In clinical trials with time to event endpoint, it is very common to provide the Kaplan-Meier estimate and calculate the p-value using log-rank test - a non-parametric method. The treatment difference or magnitude of treatment effect is usually measured using hazard ratio. Kaplan-Meier estimate does not give an estimate of the hazard ratio. The hazard ratio needs to be estimated using the Proportional Hazard model (or Cox regression model). 

In a study by Sitbon et al, the primary efficacy endpoint was a composite endpoint of time to death or a complication related to PAH. The hazard ratio and p-value were provided in the primary efficacy table below. However, it needs to be noted that two different methods were used to calculate the hazard ratio and the p-value. As stated in the Statistical Analysis section, the statistical methods were provided as the following:  
"In time-to-event analyses, end points were estimated with the use of the Kaplan–Meier method and were analyzed with the use of the log-rank test. Hazard ratios with 99% confidence intervals (for primary and secondary end points) and 95% confidence intervals (for exploratory end points) were estimated with the use of proportional-hazard models."
 

p-value corresponding to the hazard ratio can also be obtained from the proportional hazard model, but is usually not presented in the place where p-value from the log-rank test is provided - to avoid the confusion about two different p-values. 

Why do we present the hazard ratio from one method and p-value from another method? why can't we present both the hazard ratio and the corresponding p-value from the proportional hazard model? 
   
Wilcoxon Rank Sum Test to calculate the p-value and Hodges-Lehmann method to calculate the difference in median

Wilcoxon is also called Mann Whitney U Test and is a non-parametric method to compare the difference in medians for non-normal distributed data. Wilcoxon rank sum test converts the original data into ranks and the p-value is calculated to compare the total ranks between groups. However, the statistics of ranks has no meaning in measuring the magnitude of the treatment effect. Therefore, when the Wilcoxon method is used to calculate the p-value, a different method needs to be employed to estimate the treatment difference (magnitude of the treatment effect). Hodges-Lehmann method is now commonly used to estimate the treatment difference - location shift in medians between two treatment groups. 

Here is a link to FDA's statistical review for Xermelo (telotristat ethyl) oral tablets in indication of  Carcinoid Syndromep-value was calculated from Wilcoxon rank test and treatment difference (location shift in median) and confidence interval were calculated from Hodges Lehmann method.
The primary efficacy endpoints in studies LX301 and LX303 were analyzed by the blocked 2- sample Wilcoxon rank sum statistic stratified by the baseline urinary 5-HIAA levels (≤ upper limit of normal reference range [ULN], >ULN, and Unknown). Descriptive statistics of the primary endpoints and the Hodges-Lehmann estimator of location shift with its respective CLs were reported for each comparison.
In a paper by Jing et al, "Efficacy and Safety of Oral Treprostinil Monotherapy for the Treatment of Pulmonary Arterial Hypertension A Randomized, Controlled Trial", the endpoint of 6MWD was analyzed using Wilcoxon method for p-values and using the Hodges-Lehmann estimator for the treatment effect (location shift in medians).



1 comment:

yellow bridge said...

Do you mean the log-rank test is a non-parametric test that can not provide treatment difference like wilcoxon test?

However, the log-rank test do not perform data ranking, it is performed based on the difference between expected number of death and actual number of death.