Sunday, May 16, 2010

Hodges-Lehmann Estimator

According to Wikipedia, "the Hodges–Lehmann estimator is a method of robust estimation. The principal form of this estimator is used to give an estimate of the difference between the values in two sets of data. If the two sets of data contain m and n data points respectively, m × n pairs of points (one from each set) can be formed and each pair gives a difference of values. The Hodges–Lehmann estimator for the difference is defined as the median of the m × n differences.
A second type of estimate which has also been called by the name "Hodges–Lehmann" relates to defining a location estimate for a single dataset. In this case, if the dataset contains n data points, it is possible to define n(n + 1)/2 pairs within the data set, allowing each item to pair with itself. The average value is calculated for each pair and the final estimate of location is the median of the n(n + 1)/2 averages.(Note that the two-sample Hodges–Lehmann estimator does not estimate the difference of the means or the difference of the medians (it estimates the median of the differences, which, if the underlying distributions are asymmetric, is a different quantity), while the one-sample Hodges–Lehmann estimator does not estimate either the mean or the median.)"

I first time heard this estimator was in a pharmacokinetic bioequivalence study where we had to compare the Tmax between two groups. Typically, we don't need to compare the Tmax between treatment groups since the bioequivalence is typically based on AUC (area under the plasma-concentration curve) and/or Cmax (maximum concentration). Assessment of tmax was mandatory only if
  • either a clinical claim was made (e.g., rapid onset like for some analgetics),
  • or based on safety grounds (e.g., IR nifedipine).

Tmax is the time to reach the maximum concentration (Cmax) after the drug administration. Tmax data is certainly not following the normal distribution and is usually taking only several pre-specified the sampling time point (depending on how many time points are specified in obtaining the PK profile).In this case, a distribution free non-parametric test needs to be used. Hodges-Lehmann estimator can fit into this situation. In addition to Tmax, Hodges-Lehmann can also be used to test the difference for Thalf (t1/2).

In old days, we have to write the SAS program by ourselves. In the latest version of SAS 9.2, Proc NPAR1WAY can be used for calculating the Hodges-Lehmann estimator and its confidence interval. See Hodges-Lehmann Estimation of Location Shift  for details about the calculation and an example of "Hodges-Lehmann Estimation" from SAS website.

With HL statement and Exact HL statement in SAS Proc NPAR1WAY, Hodges-Lehmann estimator (location shift) can be estimated and its confidence intervals (asymptotic (Moses) for large sample and Exact in small sample situation) are provided. However, SAS procedure does not provide the p-value. The p-value may be obtained from Wilcoxin Rank Sum test.

Also see a newer post regarding "Hodges-Lehmann estimator of location shift: median of differences versus the difference in medians or median difference"

4 comments:

Anonymous said...

Hi,
I was trying to find some information on Hodges-Lehmann estimator and ran into your blog. Would you be able to shed some light on which value to report in scientific journal - the HL estimator or simple Median?

I have done hypotheses testing (in MINITAB) using Wilcoxon signed rank test, which computes HL estimator. This HL estimator then becomes "Estimated Median", and is different from simple median. I am confused which of these values to report in scientific journal.

Thanks,
S Giri,
The Ohio State University

Web blog from Dr. Deng said...

This is the same issue whether or not you report the mean vs. least squares mean (mean and LS mean may not be the same).

It might be better to report both (median and HL median).

Anonymous said...

Thank you.

best regard,
S Giri

Allen Fleishman said...

The Wilcoxon test compares the mean rank of the two groups. Is the H-L estimator, which is the median, related to the difference in mean ranks?