## Sunday, April 05, 2009

### Least squares means (marginal means) vs. means

If you work with SAS, you probably heard and used the term 'least squares means' very often. Least squares means (LS Means) are actually a sort of SAS jargon. Least square means is actually referred to as marginal means (or sometimes EMM - estimated marginal means). In an analysis of covariance model, they are the group means after having controlled for a covariate (i.e. holding it constant at some typical value of the
covariate, such as its mean value).

I often find that it is neccessary to use a very simple example to illulatrate the difference between LS Means and Means to my non-statistician colleagues. I made up the data in Table 1 above. There are two treatment groups (treatment A and treatment B) that are measured at two centers (Center 1 and Center 2).

The mean value for Treatment A is simply the summation of all measures divided by the total number of observations (Mean for treatment A = 24/5 = 4.8); similarly the Mean for treatment B = 26/5 = 5.2. Mean for treatmeng A > Mean for treatment B.

Table 2 shows the calculation of least squares means. First step is to calculate the means for each cell of treatment and center combination. The mean 9/3=3 for treatment A and center 1 combination; 7.5 for treatment A and center 2 combination; 5.5 for treatment B and center 1 combination; and 5 for treatment B and center 2 combination.

After the mean for each cell is calculated, the least squares means are simply the average of these means. For treatment A, the LS mean is (3+7.5)/2 = 5.25; for treatment B, it is (5.5+5)/2=5.25. The LS Mean for both treatment groups are identical.

It is easy to show the simple calculation of means and LS means in the above table with two factors. In clinical trials, the statistical model often needs to be adjusted for multiple factors including both categorical (treatment, center, gender) and continuous covariates (baseline measures). The calculation of LS mean is not easy to demonstrate. However, the LS mean should be used when the inferential comparison needs to be made. Typically, the means and LS means should point to the same direction (while with different values) for treatment comparison. Occasionally, they could point to the different directions (treatment A better than treatment B according to mean values; treatment B better than treatment A according to LS Mean).

SAS procedure GLM has a nice discussion about the comparison of Least Square Means vs. Means. A small article "Means vs LS Means and Type I vs Type III Sum of Squares"by Dan may also help.

#### 29 comments:

Unknown said...

Yes, you are right on lsmeans and means. Actually for balanced design, it the final data strueture is balanced, then mean=lsmean. But generally they differ.

Anonymous said...

Thank you very much for posting this blog. That was exactly the explanation I needed.

Anonymous said...

SAS folk have never understood experimental design. They've always had a bias of coming from the regression side of the coin.

These terms are unnecessary, and as you state, exist only in the minds of SAS.

What you describe is the addition of a second "blocking variable" in a design. You could describe it as a factor in a 2-way ANOVA, or control it out with ANCOVA.

But to make two different terms for something that has already existed for a hundred years or so, is SAS being SAS.

Furthermore, when I run a posthoc in JMP for a one-way ANOVA with more than 2 levels, "SAS" gives me LS Means as the group means, just because there's unequal 'n'. This is incorrect. I have to go through and generate descriptives to get the actual group means.

Anonymous said...

Brilliant explanation.

Bettina said...

Thanks for this example. Do you have any showing when one is able to calculate a mean, but not a LSM?

Unknown said...

Thx so much! This is exactly what I need!

Anonymous said...

Look like simple. How about for regression model? It seems lsmeans is defined only for effects not for covariates? It is right?
Thanks.

ss said...

got it but u mean to say that , while calculating lsmeans we r considering center effect ...

Anonymous said...

Thank you. Good explanation!

Joe Locascio said...

Yes, SAS's "LSMeans" are means adjusted for the covariate(s). In an imbalanced factorial anova design, the factors are essentially confounded "covariates" and the LSmeans are adjusting for that, giving you an average of cell averages, rather than just the marginal means blind to (and confounded with the other factor(s)). (This can be viewed from a regression/general linear model perspective, with categorical factors being dummy coded). Neither kind of means are right or wrong - they answer different questions. I typically request both in SAS. You can come up with all kinds of combinations of means, covariate means, and correlations of covariates with the dependent variable, resulting in covariate adjusted means being in the same or opposite ordinal relation as the raw descriptive means, or where the covariate adjusted means don't change the descriptive means at all. You can map these things graphically with little group ellipses representing scatterplots and their respective regression lines.

Anonymous said...

Great explanation. Clear and incorporates the use of a familiar concept, that most folks understand - the calculation of a mean score. Linking a new concept to an familiar concept is a great way to teach. Thanks!

Anonymous said...

thanks so much, made it so easy to understand!

Anonymous said...

Thank you for this explanation. Simple and easy to understand!

Anonymous said...

Thanks. It helped me a lot.

Anonymous said...

Thank you so much! It really helped me.

Unknown said...

Least square means is used in SAS for bioequivalence parameters such as peak drug concentrations (Cmax).

Can you outline in simple terms how it is calculated? Can I do the calculation in Excel? I know that for a balanced study with all subjects completing it is the geometric mean, but suppose one subject drops out.

Unknown said...

Can you outline for me in the most simple terms how the calculation for LS means is done in SAS as applies to bioequivalence parameters such as Cmax (peak drug concentration in plasma). The design to consider is the usual cross over design.

Anonymous said...

To Angus's question:

Please see a separate article "Cookbook SAS Codes for Bioequivalence Test in 2x2x2 Crossover Design"

http://onbiostatistics.blogspot.com/2012/04/cookbook-sas-codes-for-bioequivalence.html

Anonymous said...

great explanation - thanks

Anonymous said...

Can anyone explain what's the difference between fixed effects estimates and lsmeans in SAS output? In SAS, the highest level is the reference level for fixed effects estimates. It seems that the difference of the lsmeans estimates with the highest level(the same as the fixed effects) lsmeans is the fixed effects estimates. Is this right and why?

Anonymous said...

Thank you for your explanation! However, I still have a question.
If I want to compare the efficacy of treatment A and treatment B, which statistic I should choose: the mean or the LS-mean?

Web blog from Dr. Deng said...

depending on which statistical method you are using to do the comparison. For t-test, you will simply compare the means. for analysis of variance or analysis of covariance, you will likely compare the LS Mean.

Anonymous said...

Your explanation about the LS-means was incorrect as it does not account for the sample size (n) in each cell when you took the simple average of the two centers in Step 2 (Table 2). For example, if n=10000 in the cell Center_1/Treatment_A with each response=3, then the LS-mean for treatment A will be close to 3 as the data in the cell Center_2/Treatment_A are almost negligible. But it would still be 5.5 based on your method.

Anonymous said...

Take your example. Let the variables be TREATMNT, CENTER and VAL. In SAS, if the statements are "MODEL VAL=TREATMNT CENTER TREATMNT*CENTER; LSMEANS TREATMNT;", then the LSMEANs are 5.25, 5.25.

But if the model statement is "MODEL VAL=TREATMNT CENTER;", then the LSMEANs for the variable TREATMNT are 5 and 5.

patadler said...

The above example makes perfect sense. BUT... for those of us who are non-statistician clinicians, I don't know why studies using LSM helps me makes a better decision regarding a treatment for my patient. It seems (in the example above) to overstate the benefit of Treatment A.

Can anyone suggest a good primer for non-statistician clinicians??

Patrick

patadler said...

The above example makes perfect sense. BUT... for those of us who are non-statistician clinicians, I don't know why studies using LSM helps me makes a better decision regarding a treatment for my patient. It seems (in the example above) to overstate the benefit of Treatment A.

Can anyone suggest a good primer for non-statistician clinicians??

Patrick

Jerry081008 said...

Thank you so much for your informative explaination.

Riad Ayech said...

I should be grateful if someone could provide an explanation to the following situation:

The outcome of a statistical analysis of a bioequivalence study ( 2 arms of generic product-population 38) is for me a difficult to understand

one subject, in one arm of the study, showed a very limited absorption of the tested drug. the statistical analysis showed by including the data of that one individual/one arm, bioequivalence failed with very wide Confidence Interval

By removing the data of that one individual, Confidence Interval value has narrowed significantly and test/reference products showed bioequivalnce

Is that possible? data from 1/38, one leg, alters dramatically the value of CI?

I should be grateful for any assistance

Yours

Riad Ayech
ayechc@aol.com

Anonymous said...

Hi Riad,

In this case, we usually perform and provide the analyses with and without this subject. You will probably need to investigate if there is any situation (for example, sampling error, dosing error,...) that causes the subject's PK profile is quite different from others.