Sunday, April 05, 2009

Least squares means (marginal means) vs. means


If you work with SAS, you probably heard and used the term 'least squares means' very often. Least squares means (LS Means) are actually a sort of SAS jargon. Least square means is actually referred to as marginal means (or sometimes EMM - estimated marginal means). In an analysis of covariance model, they are the group means after having controlled for a covariate (i.e. holding it constant at some typical value of the
covariate, such as its mean value).

I often find that it is neccessary to use a very simple example to illulatrate the difference between LS Means and Means to my non-statistician colleagues. I made up the data in Table 1 above. There are two treatment groups (treatment A and treatment B) that are measured at two centers (Center 1 and Center 2).

The mean value for Treatment A is simply the summation of all measures divided by the total number of observations (Mean for treatment A = 24/5 = 4.8); similarly the Mean for treatment B = 26/5 = 5.2. Mean for treatmeng A > Mean for treatment B.

Table 2 shows the calculation of least squares means. First step is to calculate the means for each cell of treatment and center combination. The mean 9/3=3 for treatment A and center 1 combination; 7.5 for treatment A and center 2 combination; 5.5 for treatment B and center 1 combination; and 5 for treatment B and center 2 combination.

After the mean for each cell is calculated, the least squares means are simply the average of these means. For treatment A, the LS mean is (3+7.5)/2 = 5.25; for treatment B, it is (5.5+5)/2=5.25. The LS Mean for both treatment groups are identical.

It is easy to show the simple calculation of means and LS means in the above table with two factors. In clinical trials, the statistical model often needs to be adjusted for multiple factors including both categorical (treatment, center, gender) and continuous covariates (baseline measures). The calculation of LS mean is not easy to demonstrate. However, the LS mean should be used when the inferential comparison needs to be made. Typically, the means and LS means should point to the same direction (while with different values) for treatment comparison. Occasionally, they could point to the different directions (treatment A better than treatment B according to mean values; treatment B better than treatment A according to LS Mean).

SAS procedure GLM has a nice discussion about the comparison of Least Square Means vs. Means. A small article "Means vs LS Means and Type I vs Type III Sum of Squares"by Dan may also help.

18 comments:

yun said...

Yes, you are right on lsmeans and means. Actually for balanced design, it the final data strueture is balanced, then mean=lsmean. But generally they differ.

Anonymous said...

Thank you very much for posting this blog. That was exactly the explanation I needed.

Anonymous said...

SAS folk have never understood experimental design. They've always had a bias of coming from the regression side of the coin.

These terms are unnecessary, and as you state, exist only in the minds of SAS.

What you describe is the addition of a second "blocking variable" in a design. You could describe it as a factor in a 2-way ANOVA, or control it out with ANCOVA.

But to make two different terms for something that has already existed for a hundred years or so, is SAS being SAS.

Furthermore, when I run a posthoc in JMP for a one-way ANOVA with more than 2 levels, "SAS" gives me LS Means as the group means, just because there's unequal 'n'. This is incorrect. I have to go through and generate descriptives to get the actual group means.

Anonymous said...

Brilliant explanation.

BAten said...

Thanks for this example. Do you have any showing when one is able to calculate a mean, but not a LSM?

白羽 said...

Thx so much! This is exactly what I need!

Anonymous said...

Look like simple. How about for regression model? It seems lsmeans is defined only for effects not for covariates? It is right?
Thanks.

ss said...

got it but u mean to say that , while calculating lsmeans we r considering center effect ...

Anonymous said...

Thank you. Good explanation!

Joe Locascio said...

Yes, SAS's "LSMeans" are means adjusted for the covariate(s). In an imbalanced factorial anova design, the factors are essentially confounded "covariates" and the LSmeans are adjusting for that, giving you an average of cell averages, rather than just the marginal means blind to (and confounded with the other factor(s)). (This can be viewed from a regression/general linear model perspective, with categorical factors being dummy coded). Neither kind of means are right or wrong - they answer different questions. I typically request both in SAS. You can come up with all kinds of combinations of means, covariate means, and correlations of covariates with the dependent variable, resulting in covariate adjusted means being in the same or opposite ordinal relation as the raw descriptive means, or where the covariate adjusted means don't change the descriptive means at all. You can map these things graphically with little group ellipses representing scatterplots and their respective regression lines.

Anonymous said...

Great explanation. Clear and incorporates the use of a familiar concept, that most folks understand - the calculation of a mean score. Linking a new concept to an familiar concept is a great way to teach. Thanks!

Anonymous said...

thanks so much, made it so easy to understand!

Anonymous said...

Thank you for this explanation. Simple and easy to understand!

Anonymous said...

Thanks. It helped me a lot.

Anonymous said...

Thank you so much! It really helped me.

Angus McLean said...

Least square means is used in SAS for bioequivalence parameters such as peak drug concentrations (Cmax).

Can you outline in simple terms how it is calculated? Can I do the calculation in Excel? I know that for a balanced study with all subjects completing it is the geometric mean, but suppose one subject drops out.

Angus McLean said...

Can you outline for me in the most simple terms how the calculation for LS means is done in SAS as applies to bioequivalence parameters such as Cmax (peak drug concentration in plasma). The design to consider is the usual cross over design.

Anonymous said...

To Angus's question:

Please see a separate article "Cookbook SAS Codes for Bioequivalence Test in 2x2x2 Crossover Design"

http://onbiostatistics.blogspot.com/2012/04/cookbook-sas-codes-for-bioequivalence.html