Thursday, December 17, 2015

Median of Differences versus Difference in Medians

Both mean and median are used as location parameters to measure the central tendency of the data. If there is an intervention (such as drug treatment in a clinical trial), the mean and the median for the change from baseline can be used as the point estimate for measuring the magnitude of the effect by the intervention. The statistical test will then be performed to see if the point estimate of the effect is statistically significant or not. 

One mistake people can make is to calculate the difference in medians while the correct way should be to calculate the median of differences. The example below is a typical data presentation for a pre-post study design. The change from baseline will be calculated for each subject. The mean and median will be calculated for change from baseline values across all subjects. One temptation is to calculate the difference in medians as the median for postbaseline - the median for baseline. However, the median of differences and the difference of medians can be very different especially when data is skewed. 

Subject
Baseline
Post
Baseline
Change From Baseline
1
50.6
38
-12.6
2
39.2
18.6
-20.6
3
35.2
23.2
-12
4
17
19
2
5
11.2
6.6
-4.6
6
14.2
16.4
2.2
7
24.2
14.4
-9.8
8
37.4
37.6
0.2
9
35.2
24.4
-10.8





Mean
29.36
22.02
-7.33
Median
35.2
19
-9.8

The median of differences is calculated as the 50th percentile of all individual differences (change from baseline). The Median of differences (the last column) is -9.8. However, the difference in medians = Median of Postbaseline Measures – Median of Baseline Measures = 19 – 35.2 = 16.2

The median of differences (-9.8) and the difference in medians (-16.2) are quite different especially for skewed data.

The median of differences is the correct number to be used and is the number that corresponding to the signed rank test.

It would be ok if we do this for mean. The mean of differences is equal to the difference in means, i.e., -7.33 = 22.02 (mean for postbaseline) – 29.36 (mean for baseline). However, if we need to perform a statistical test such as the paired t-test, the numbers in the last column for change from baseline should be the basis. 

Suppose we have "change from baseline" for two treatment groups, we would need to calculate the median for each treatment group in the same way as above. For treatment comparison, we may use the non-parametric Wilcoxon rank-sum test and calculate the magnitude of the difference in medians using the Hodges-Lehmann estimator.  Hodges Lehmann's estimation of location shift can be calculated in SAS using Proc NPAR1WAY.


Thursday, December 03, 2015

Dose Response Modeling: calculating EC50, ED50 by Fitting Emax Model using SAS Proc NLIN

Dose response data can come from the laboratory test, pre-clinical, and clinical. The responses can be the assay results, fluorescence output, cell counts, hormone concentrations, efficacy measures.
This type of dose response data can be analyzed using model-based approach - assuming a functional relationship between the response and the dose following a pre-specified parametric model. There are many different models used to characterize a dose-response: linear, quadratic, orthogonal polynomials, exponential, linear in log-dose, Emax. If the response is discreet or dichotomous (success/failure, survival/death,...), it is called quantal response. The different set of models will need to be used such as probit, logit, ordinal logistic, and extreme value (or gompit) regression models that can be easily fit using SAS Proc Probit

For continuous response data, one of the most common parametric model is Emax model. There is a 3-parameter Emax model by fitting the dose response function g(D)
where E0 is the response Y at baseline (absence of dose), Emax is the asymptotic maximum dose effect (maximum effect attributable to the drug) and ED50 is the dose which produces 50% of the maximal effect. A generalization is the 4-parameter Emax model for

where 
is the 4th parameter which is sometimes called the Hill parameter or Hill factor, or slope factor. The Hill parameter affects the shape of the curve and is in some cases very difficult to estimate.

The Emax model may be referred as three-parameter logistic model and four-parameter logistic model, or simply three-parameter model and four-parameter model. 

SAS Proc NLIN can be used to fit the Emax model (three-parameter or four-parameter). We can use the data in an online paper "How can I generate a dose response curve in SAS?". The concentration-response is similar to the dose-response from the modeling purpose. In the data below, for some concentration level, there are two duplicates.  

Concentration Response
0.1 21.125
0.1 20.575
0.25 40.525
0.5 26.15
0.75 26.35
0.75 44.275
1 49.725
1 63.6
10 49.35
10 68.875
100 58.025
100 58.075
1000 68.025
1000 52.3
We can read the data into SAS data set as following:

data dr;
input concentration response;
datalines;
.1 21.125
.1 20.575
.25 40.525
.5 26.15
.75 26.35
.75 44.275
1 49.725
1 63.6
10 49.35
10 68.875
100 58.025
100 58.075
1000 68.025
1000 52.3
;

In order to fit the four parameter Emax model above, we will need to provide the initial values for all four parameters. The initial values provided do not need to be precise. Usually, the same results can be obtained with different initial values. However, we want to provide the initial values close to the data. For example, from the data set above, we can choose the min as the E0 (minimum response),  max as Emax (maximum response), a median dose as ED50 (dose corresponding to 50% of Emax). For the fourth parameter (Hill slope), we can run a simple linear regression to obtain an initial slope. It is not critical if the concentration response really follows the linear relationship. The purpose here is just to obtain an initial Hill slope value for the non linear model.

If we run the following simple regression, we will get a slope of 0.01831 and we can use this value as the initial value for the fourth parameter.

proc reg data=dr;
  model response=concentration;
run;   

From the data set, we now have the initial values for all four parameters: E0 = 20.575, Emax = 68.875, ED50 = 1, and slope factor = 0.01831. Again, these initial values do not have to be very accurate.

We can then run the following SAS program to fit the non linear (four parameter Emax) model described above:

proc nlin data = dr method=marquardt;
  parms E0 = 20.575 Emax = 68.875 ED50 = 1 hill = 0.01831;
  model response = Emax + (E0 * concentration**hill) / (ED50**hill + concentration**hill);
run;

From the outputs, we will get an estimate of ED50 = 0.8171.

Notice that in online paper "How can I generate a dose response curve in SAS?", a different four parameter model was presented. However, if we fit the nonlinear model, we will get the same estimate of ED50. The four-parameter model was written as:
The SAS program can be written as:

proc nlin data = dr method=marquardt;
  parms E0 = 20.575 Emax = 68.875 ED50 = 1 hill = 0.01831;
  model response = E0 + (Emax - E0) / (1 + (concentration / ED50)**hill);
run;

Reference/Further reading: