Tuesday, April 11, 2023

Obtaining standard deviations from standard errors and confidence intervals for group means

Sometimes, we need to obtain a standard deviation (SD) in order to calculate the sample size for a new clinical trial where the primary efficacy endpoint is continuous measure. However, when we look at the literature, the SD may not be presented. Instead, the standard error (SE) or confidence intervals (CIs) are presented. The SD can be obtained from the SE or CIs. 

Below are texts from Cochrane Handbooks:

A standard deviation can be obtained from the standard error of a mean by multiplying by the square root of the sample size:

When making this transformation, standard errors must be of means calculated from within an intervention group and not standard errors of the difference in means computed between intervention groups.

 

Confidence intervals for means can also be used to calculate standard deviations. Again, the following applies to confidence intervals for mean values calculated within an intervention group and not for estimates of differences between interventions (for these, see Section 7.7.3.3). Most confidence intervals are 95% confidence intervals. If the sample size is large (say bigger than 100 in each group), the 95% confidence interval is 3.92 standard errors wide (3.92 = 2 × 1.96). The standard deviation for each group is obtained by dividing the length of the confidence interval by 3.92, and then multiplying by the square root of the sample size:

For 90% confidence intervals 3.92 should be replaced by 3.29, and for 99% confidence intervals it should be replaced by 5.15.

 

If the sample size is small (say less than 60 in each group) then confidence intervals should have been calculated using a value from a t distribution. The numbers 3.92, 3.29 and 5.15 need to be replaced with slightly larger numbers specific to the t distribution, which can be obtained from tables of the t distribution with degrees of freedom equal to the group sample size minus 1. Relevant details of the t distribution are available as appendices of many statistical textbooks, or using standard computer spreadsheet packages. For example the t value for a 95% confidence interval from a sample size of 25 can be obtained by typing =tinv(1-0.95,25-1) in a cell in a Microsoft Excel spreadsheet (the result is 2.0639). The divisor, 3.92, in the formula above would be replaced by 2 × 2.0639 = 4.128.

 

For moderate sample sizes (say between 60 and 100 in each group), either a t distribution or a standard normal distribution may have been used. Review authors should look for evidence of which one, and might use a t distribution if in doubt.

 

As an example, consider data presented as follows:

Group  

Sample size

Mean

95% CI

Experimental intervention

25

32.1

 (30.0, 34.2)

Control intervention

 22

28.3

(26.5, 30.1)

The confidence intervals should have been based on t distributions with 24 and 21 degrees of freedom respectively. The divisor for the experimental intervention group is 4.128, from above. The standard deviation for this group is √25 × (34.2 – 30.0)/4.128 = 5.09. Calculations for the control group are performed in a similar way.

 

It is important to check that the confidence interval is symmetrical about the mean (the distance between the lower limit and the mean is the same as the distance between the mean and the upper limit). If this is not the case, the confidence interval may have been calculated on transformed values (see Section 7.7.3.4).

In the literature, the SEs and CIs are usually calculated from more sophisticated models (analysis of covariance, mixed model,...) - analyses adjusted for additional covariates. The methods described above can still be used to obtain the SDs - the calculated SDs should still be provide a good approximation of the SDs that are needed for planning future trials. 


in a paper by Jastreboff et al (2022) "Tirzepatide Once Weekly for the Treatment of Obesity", the sample size calculation was based on group mean difference and common standard deviation. 

"We calculated that a sample size of 2400 participants would provide an effective power of greater than 90% to demonstrate the superiority of tirzepatide (10 mg, 15 mg, or both) to placebo, relative to the coprimary end points, each at a two-sided significance level of 0.025. The sample-size calculation assumed at least an 11-percentage-point difference in the mean percentage weight reduction from baseline at 72 weeks for tirzepatide (10 mg, 15 mg, or both) as compared with placebo, a common standard deviation of 10%, and a dropout rate of 25%."
However, the study results were presented with LS mean difference in percentage change in body weight between two groups and their 95% confidence intervals. Standard deviations were not provided, but can be easily calculated using the method described above. 

Using Tirzepatide 15 mg group as an example, SD for 'percent change in body weight' can be calculated as SD1 =sqrt(630) * (-19.9 - (-21.8))/3.92 = 12.2; SD for 'Difference from placebo in percentage change in body weight' can be calculated as sqrt(630) x (-16.3 - (-19.3))/3.92 =19.2. The actual SDs from the study data are a little bit higher than the assumed SD of 10%.
 

1 comment:

Yonggang Yao said...

A common standard deviation (SD) can be approximately calculated from a "standard error of the difference in means computed between intervention groups" by using the formula:
common SD = SE / sqrt(1/N1 + 1/N2)
where N1 is the size for intervention group 1 and N2 is the size for intervention group 2.

This formula assumes that the two intervention groups share the common SD.