Sunday, April 05, 2009

Least squares means (marginal means) vs. means


If you work with SAS, you probably heard and used the term 'least squares means' very often. Least squares means (LS Means) are actually a sort of SAS jargon. Least square means is actually referred to as marginal means (or sometimes EMM - estimated marginal means). In an analysis of covariance model, they are the group means after having controlled for a covariate (i.e. holding it constant at some typical value of the
covariate, such as its mean value).

I often find that it is neccessary to use a very simple example to illulatrate the difference between LS Means and Means to my non-statistician colleagues. I made up the data in Table 1 above. There are two treatment groups (treatment A and treatment B) that are measured at two centers (Center 1 and Center 2).

The mean value for Treatment A is simply the summation of all measures divided by the total number of observations (Mean for treatment A = 24/5 = 4.8); similarly the Mean for treatment B = 26/5 = 5.2. Mean for treatmeng A > Mean for treatment B.

Table 2 shows the calculation of least squares means. First step is to calculate the means for each cell of treatment and center combination. The mean 9/3=3 for treatment A and center 1 combination; 7.5 for treatment A and center 2 combination; 5.5 for treatment B and center 1 combination; and 5 for treatment B and center 2 combination.

After the mean for each cell is calculated, the least squares means are simply the average of these means. For treatment A, the LS mean is (3+7.5)/2 = 5.25; for treatment B, it is (5.5+5)/2=5.25. The LS Mean for both treatment groups are identical.

It is easy to show the simple calculation of means and LS means in the above table with two factors. In clinical trials, the statistical model often needs to be adjusted for multiple factors including both categorical (treatment, center, gender) and continuous covariates (baseline measures). The calculation of LS mean is not easy to demonstrate. However, the LS mean should be used when the inferential comparison needs to be made. Typically, the means and LS means should point to the same direction (while with different values) for treatment comparison. Occasionally, they could point to the different directions (treatment A better than treatment B according to mean values; treatment B better than treatment A according to LS Mean).

SAS procedure GLM has a nice discussion about the comparison of Least Square Means vs. Means. A small article "Means vs LS Means and Type I vs Type III Sum of Squares"by Dan may also help.

34 comments:

  1. Yes, you are right on lsmeans and means. Actually for balanced design, it the final data strueture is balanced, then mean=lsmean. But generally they differ.

    ReplyDelete
  2. Anonymous7:18 PM

    Thank you very much for posting this blog. That was exactly the explanation I needed.

    ReplyDelete
  3. Anonymous3:19 PM

    SAS folk have never understood experimental design. They've always had a bias of coming from the regression side of the coin.

    These terms are unnecessary, and as you state, exist only in the minds of SAS.

    What you describe is the addition of a second "blocking variable" in a design. You could describe it as a factor in a 2-way ANOVA, or control it out with ANCOVA.

    But to make two different terms for something that has already existed for a hundred years or so, is SAS being SAS.

    Furthermore, when I run a posthoc in JMP for a one-way ANOVA with more than 2 levels, "SAS" gives me LS Means as the group means, just because there's unequal 'n'. This is incorrect. I have to go through and generate descriptives to get the actual group means.

    ReplyDelete
  4. Anonymous7:39 AM

    Brilliant explanation.

    ReplyDelete
  5. Thanks for this example. Do you have any showing when one is able to calculate a mean, but not a LSM?

    ReplyDelete
  6. Thx so much! This is exactly what I need!

    ReplyDelete
  7. Anonymous6:58 PM

    Look like simple. How about for regression model? It seems lsmeans is defined only for effects not for covariates? It is right?
    Thanks.

    ReplyDelete
  8. got it but u mean to say that , while calculating lsmeans we r considering center effect ...

    ReplyDelete
  9. Anonymous4:07 PM

    Thank you. Good explanation!

    ReplyDelete
  10. Joe Locascio5:29 PM

    Yes, SAS's "LSMeans" are means adjusted for the covariate(s). In an imbalanced factorial anova design, the factors are essentially confounded "covariates" and the LSmeans are adjusting for that, giving you an average of cell averages, rather than just the marginal means blind to (and confounded with the other factor(s)). (This can be viewed from a regression/general linear model perspective, with categorical factors being dummy coded). Neither kind of means are right or wrong - they answer different questions. I typically request both in SAS. You can come up with all kinds of combinations of means, covariate means, and correlations of covariates with the dependent variable, resulting in covariate adjusted means being in the same or opposite ordinal relation as the raw descriptive means, or where the covariate adjusted means don't change the descriptive means at all. You can map these things graphically with little group ellipses representing scatterplots and their respective regression lines.

    ReplyDelete
  11. Anonymous12:37 PM

    Great explanation. Clear and incorporates the use of a familiar concept, that most folks understand - the calculation of a mean score. Linking a new concept to an familiar concept is a great way to teach. Thanks!

    ReplyDelete
  12. Anonymous12:46 PM

    thanks so much, made it so easy to understand!

    ReplyDelete
  13. Anonymous2:21 PM

    Thank you for this explanation. Simple and easy to understand!

    ReplyDelete
  14. Anonymous12:22 PM

    Thanks. It helped me a lot.

    ReplyDelete
  15. Anonymous12:24 PM

    Thank you so much! It really helped me.

    ReplyDelete
  16. Least square means is used in SAS for bioequivalence parameters such as peak drug concentrations (Cmax).

    Can you outline in simple terms how it is calculated? Can I do the calculation in Excel? I know that for a balanced study with all subjects completing it is the geometric mean, but suppose one subject drops out.

    ReplyDelete
  17. Can you outline for me in the most simple terms how the calculation for LS means is done in SAS as applies to bioequivalence parameters such as Cmax (peak drug concentration in plasma). The design to consider is the usual cross over design.

    ReplyDelete
  18. Anonymous3:00 PM

    To Angus's question:

    Please see a separate article "Cookbook SAS Codes for Bioequivalence Test in 2x2x2 Crossover Design"

    http://onbiostatistics.blogspot.com/2012/04/cookbook-sas-codes-for-bioequivalence.html

    ReplyDelete
  19. Anonymous12:00 PM

    great explanation - thanks

    ReplyDelete
  20. Anonymous8:00 AM

    Can anyone explain what's the difference between fixed effects estimates and lsmeans in SAS output? In SAS, the highest level is the reference level for fixed effects estimates. It seems that the difference of the lsmeans estimates with the highest level(the same as the fixed effects) lsmeans is the fixed effects estimates. Is this right and why?

    ReplyDelete
  21. Anonymous11:12 PM

    Thank you for your explanation! However, I still have a question.
    If I want to compare the efficacy of treatment A and treatment B, which statistic I should choose: the mean or the LS-mean?

    ReplyDelete
  22. depending on which statistical method you are using to do the comparison. For t-test, you will simply compare the means. for analysis of variance or analysis of covariance, you will likely compare the LS Mean.

    ReplyDelete
  23. Anonymous8:08 PM

    Your explanation about the LS-means was incorrect as it does not account for the sample size (n) in each cell when you took the simple average of the two centers in Step 2 (Table 2). For example, if n=10000 in the cell Center_1/Treatment_A with each response=3, then the LS-mean for treatment A will be close to 3 as the data in the cell Center_2/Treatment_A are almost negligible. But it would still be 5.5 based on your method.

    ReplyDelete
  24. Anonymous7:28 AM

    Take your example. Let the variables be TREATMNT, CENTER and VAL. In SAS, if the statements are "MODEL VAL=TREATMNT CENTER TREATMNT*CENTER; LSMEANS TREATMNT;", then the LSMEANs are 5.25, 5.25.

    But if the model statement is "MODEL VAL=TREATMNT CENTER;", then the LSMEANs for the variable TREATMNT are 5 and 5.

    ReplyDelete
  25. The above example makes perfect sense. BUT... for those of us who are non-statistician clinicians, I don't know why studies using LSM helps me makes a better decision regarding a treatment for my patient. It seems (in the example above) to overstate the benefit of Treatment A.

    Can anyone suggest a good primer for non-statistician clinicians??

    Patrick

    ReplyDelete
  26. The above example makes perfect sense. BUT... for those of us who are non-statistician clinicians, I don't know why studies using LSM helps me makes a better decision regarding a treatment for my patient. It seems (in the example above) to overstate the benefit of Treatment A.

    Can anyone suggest a good primer for non-statistician clinicians??

    Patrick

    ReplyDelete
  27. Thank you so much for your informative explaination.

    ReplyDelete
  28. I should be grateful if someone could provide an explanation to the following situation:

    The outcome of a statistical analysis of a bioequivalence study ( 2 arms of generic product-population 38) is for me a difficult to understand

    one subject, in one arm of the study, showed a very limited absorption of the tested drug. the statistical analysis showed by including the data of that one individual/one arm, bioequivalence failed with very wide Confidence Interval

    By removing the data of that one individual, Confidence Interval value has narrowed significantly and test/reference products showed bioequivalnce

    Is that possible? data from 1/38, one leg, alters dramatically the value of CI?

    I should be grateful for any assistance

    Yours

    Riad Ayech
    ayechc@aol.com

    ReplyDelete
  29. Anonymous9:09 AM

    Hi Riad,

    In this case, we usually perform and provide the analyses with and without this subject. You will probably need to investigate if there is any situation (for example, sampling error, dosing error,...) that causes the subject's PK profile is quite different from others.

    ReplyDelete
  30. Biostatistics plays a crucial role in the field of clinical trials, as it provides the analytical framework to interpret and draw meaningful conclusions from complex healthcare data. By employing statistical methods, biostatisticians can design rigorous clinical studies that adhere to ethical and scientific principles. They contribute to the development of study protocols, sample size calculations, randomization procedures, and analysis plans. Biostatisticians also play a critical role in analyzing trial data, assessing drug efficacy and safety endpoints, and quantifying overall treatment effects.

    ReplyDelete
  31. Anonymous3:14 AM

    Thank you so much! This was really helpful.

    ReplyDelete
  32. Excellent explanation. You have explained it step by step. So, any one can understand it easily. The mean for treatment A is 4.8 and the mean for treatment B is 5.2 and you have written that Mean for treatment A > Mean for treatment B.
    Please elaborate this.

    ReplyDelete
  33. The simple way you calculated the LS means is correct ONLY IF the ANOVA model includes the interaction effect. Otherwise, a solution to the normal equation is required.

    ReplyDelete