Saturday, April 20, 2019

Hodges-Lehmann estimator of location shift: Median of Differences versus Difference in Medians or Median Difference

Hodges-Lehmann estimator has been used to compare the treatment effect while the data is non-normal distributed. See my previous posts:
Many of the journal articles used Hodges-Lehmann estimator to the difference in two medians
In a study by Perkins et al "A Randomized Trial of Epinephrine in Out-of-Hospital Cardiac Arrest",
"The Hodges–Lehmann method was used to estimate median differences with 95% confidence intervals for length-of-stay outcomes"
In a study by Devinsky et al "Trial of Cannabidiol for Drug-Resistant Seizures in the Dravet Syndrome"
"Analysis of the primary end point was performed with the use of a Wilcoxon rank-sum test. An estimate of the median difference between cannabidiol and placebo, together with the 95% confidence interval, was calculated with the use of the Hodges–Lehmann approach. Sensitivity analyses of this primary end point were prespecified in the trial protocol and statistical analysis plan"
Similarly, Hodges-Lehmann estimator was used to estimating the treatment effect in licensure trials:

FDA Clinical/Statistical Review for Vascepa (icosapent ethyl) for reduction of triglycerides in patients with very high triglycerides
The median differences between the treatment groups and 95% CIs were estimated with the Hodges-Lehmann method. P-value is from the Wilcoxon rank-sum test.
FDA Statistical review for RLY5016 for Oral Suspension (Veltassa) for Hyperkalemia
To compare Veltassa with placebo, the difference between the mean ranks was tested using a two-sided t-test. The difference and 95% CI between the treatment groups in median change from baseline was estimated using a Hodges-Lehmann estimator.
FDA Medical Review of Oral Treprostinil for Pulmonary Arterial Hypertension
The magnitude of the treatment effects was defined by the Hodges-Lehmann method to estimate the median difference between treatment groups for the change from baseline in 6MWD.
It sounds like we have found a solution to estimate the difference in medians when the data is not normally distributed. However, if we look at how the Hodges-Lehmann is calculated, we will see that it is not accurate to say the Hodges-Lehmann estimator is to compare the difference in medians, it is actually the estimator of the location shift (the term originally used by the authors) or the estimator of the median of differences (further explained below).

Let's check how medians are calculated using a very simple example: 

Median and the difference in Medians:

Group A
Group B
Original Measures
4, 7, 5, 3, 6
3, 2, 5, 1, 4
Rank the original measures in order
3, 4, 5, 6, 7
1, 2, 3, 4, 5
Median
5
3
The difference in Medians (A-B)
2

Hodges-Lehmann Estimator of Location Shif (median of differences)

Group A
Group B
Original Measures
4, 7, 5, 3, 6
3, 2, 5, 1, 4
Rank the original measures in order
3, 4, 5, 6, 7
1, 2, 3, 4, 5
Each number in Group A is compared to each number in Group B
3 is compared to numbers in Group B:    2, 1, 0, -1, -2
4 is compared to numbers in Group B:    3, 2, 1, 0, -1
5 is compared to numbers in Group B:    4, 3, 2, 1, 0
6 is compared to numbers in Group B:    5, 4, 3, 2, 1
7 is compared to numbers in Group B:    6, 5, 4, 3, 2
Rank the differences from these pair comparisons in order
-2, -1, -1, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 6
Hodges-Lehmann estimator of location shift
Median of all these differences, in this case, the Hodges-Lehmann estimator is 2

The calculations of the medians can be implemented in the following SAS codes: 

data HodgesLehmann;
  input group $ number @@;
  datalines;
  A 3 A 4 A 5 A 6 A 7
  B 1 B 2 B 3 B 4 B 5
;
proc means data=hodgeslehmann median maxdec=0;
   class group;
   var number;
run;

proc npar1way data=hodgeslehmann hl;
   class group;
   var number;
run;

The Hodges-Lehmann estimation of the location shift is confirmed to be 2. In this example, the Hodges-Lehmann estimation of the location shift (2) is exactly the same as the differences in two medians (5-3 = 2). 

However, in many situations, the Hodges-Lehmann estimation of the location shift will be different from the differences between the two medians. the Hodges-Lehmann should really be called the median of differences between the two groups or the location shift (as the original authors used). 

The example below shows that the Hodges-Lehmann estimation of the location shift can be very different than the differences between the two medians. 



Group A
Group B
Original Measures
50.6, 39.2, 35.2, 17.0, 11.2, 14.2, 24.2, 37.4, 35.2
38.0, 18.6, 23.2, 19.0, 6.6, 16.4, 14.4, 37.6, 24.4
Rank the original measures in order
11.2
14.2
17.0
24.2
35.2
35.2
37.4
39.2
50.6
6.6
14.4
16.4
18.6
19.0
23.2
24.4
37.6
38.0
Median
35.2
19.0
The difference in Medians (A-B)
16.2


data HodgesLehmann2;                   
   input Group $ number@@;
   datalines;
A 50.6
A 39.2
A 35.2
A 17.0
A 11.2
A 14.2 
A 24.2 
A 37.4 
A 35.2 
B 38.0 
B 18.6 
B 23.2 
B 19.0 
B 6.6 
B 16.4 
B 14.4 
B 37.6 
B 24.4 

proc means data=hodgeslehmann2 median maxdec=1;
  class group;
  var number;
run;

proc npar1way data=hodgeslehmann2 hl;
  class group;
  var number;
run;

As illustrated above, the Hodges-Lehmann estimation of the location shift is 7.8, however, the difference between two medians is 35.2 - 19.0 = 16.2 (the median for groups A is 35.2 and the median for Group B is 19.0).

While the Hodges-Lehmann estimator is often used to measure the treatment difference when the data is not normally distributed, we need to understand how the Hodges-Lehmann is calculated and how Hodges-Lehmann estimator can be very different than the simple difference between two medians. 

2 comments:

yellow bridge said...

That is the best way to describe and illustrate the difference

Helmut Schütz said...

Hi Dr. Deng,

THX for clarifying an all too common misconception. However, assessing the shift in location is only unbiased if distributions are identical. An alternative not requiring identical distributions was proposed by in 2000: Brunner E, Munzel U. The Nonparametric Behrens‐Fisher Problem: Asymptotic Theory and a Small‐Sample Approximation. Biom. J. 2000; 42(1): 17–25. doi:10.1002/(SICI)1521-4036(200001)42:1%3C17::AID-BIMJ17%3E3.0.CO;2-U.

Best regards,
Helmut