Monday, October 12, 2015

Sample Size Estimation Based on Precision for Survey and Clinical Studies such as Immunogenicity Studies

Sometimes, we may need to calculate the sample size to estimate a population proportion or a population mean with a precision or margin of error. Here we use the terms ‘precision’ and ‘margin of error’ interchangeably. The precision may also be referred as “half of the confidence interval”, “half of the width of CI”, and “Distance from mean to limit” depending on the sample size calculation software.

Statistician may need to estimate the sample sizes for the following situations:

Example 1: A survey estimated that 20% of all Americans aged 16 to 20 drove under the influence of drugs or alcohol. A similar survey is planned for New Zealand. The researchers want to estimate a sample size for the survey and they want a 95% confidence interval to have a margin of error of 0.04.

Example 2: an immunogenicity study is planned to investigate the occurrence of the antibody to a therapeutic protein. There is no prior information about the percentage patients who may develop the antibody to the therapeutic protein. How many patients are needed for the study with a 95% confidence interval and a precision of 10%?

Example 3: A tax assessor wants to assess the mean property tax bill for all homeowners in Madison, Wisconsin. A survey ten years ago got a sample mean and standard deviation of $1400 and $1000. How many tax records should be sampled for a 95% confidence interval to have a margin of error of $100?

These are set of situations where the sample size estimation is based on the confidence interval and the margin of error. The examples #1 and #2 are dealing with the one-sample proportion where we would like to estimate the sample size in order to obtain an estimate for population proportion with certain precision. The example #3 is dealing with one-sample mean where we would like to estimate the sample size in order to obtain an estimate for population mean with certain precision.

Sample Size to Estimate A Proportion With a Precision

The usually formula is:

N = z^2 p(1-q) / d^2

where p is the proportion (may be obtained from the previous study or and d is the precision or margin of error. Z is the Z-score e.g. 1.645 for a 90% confidence interval, 1.96 for a 95% confidence interval, 2.58 for a 99% confidence interval

For example #1, the sample size will be calculated as:
          N = 1.96^2 x 0.2 x 0.8/0.04^2 = 384.2 round up to 385

Similarly, if we use PASS, the input parameters will be
         Confidence Interval:  Simple Asynptotic
         Interval Type: Two-sided
         Confidence level (1-alpha): 0.95
         Confidence Interval Width (two-sided): 0.08      (note: 0.04 x 2)
         P (Proportion): 0.2

For example #2, since there is no prior information about the proportion, the practical way is that if no estimate of p is available, assume p = 0.50 to obtain a sample that is big enough to ensure precision.

If we use formula, the sample size will be calculated as:

          N = 1.96^2 x 0.5 x 0.5 / 0.1^2 = 96

Similarly, if we use PASS, the input parameters will be
          Confidence Interval:  Simple Asymptotic
          Interval Type: Two-sided
          Confidence level (1-alpha): 0.95
          Confidence Interval Width (two-sided): 0.2    (note: 0.1 x 2)
          P (Proportion): 0.5

Sample Size to Estimate A Proportion With a Precision

The usually formula for is:

          N = (s t/d)^2

Where s is the standard deviation, t is the t-score (approximate to Z-score if assuming normal) and d is the precision or margin of error.

For example #3:

N=(1000 x 1.96/100)^2 = 385   

Similarly, if we use PASS, the input parameters will be:
                    Solved for: Sample size
                    Interval type: two-sided
                    Population size: infinite
                    Confidence Interval (1-alpha): 0.95
                    Distance from mean to limits: 100
                    S (standard deviation): 1000

The sample size calculation based on the precision is population in survey in epidemiology studies and polling in political science. In clinical trials, it seems to be common in immunogenicity studies. In immunogenicity studies, it is not just for one sample situation, it may also be used in the two sample situation. In a book “Biosimilars: Design and Analysis of Follow-on Biologics” by Dr Chow, sample size section mentioned the calculation based on precision:
In immunogenicity studies, the incidence rate of immune response is expected to be low. In this case, the usual pre-study power analysis for sample size calculation for detecting a clinically meaningful difference may not be feasible. Alternatively, we may consider selecting an appropriate sample size based on precision analysis rather than power analysis to provide some statistical inference.
The half of the width of the CI by w=Z(1-alpha)/2*sigma hat which is usually referred to as the maximum error margin allowed for a given sample size n. In practice, the maximum error margin allowed represents the precision that one would expect for the selected sample size. The precision analysis for sample size determination is to consider the maximum error margin allowed. In other words, we are confident that the true difference signma=pR-pr would fall within the margin of w=Z(1-alpha)/2*sigma for a given sample size of n. Thus, the sample size required for achieving the desired precision can be chosen.
This approach, based on the interest in only the type I error, is to specify precision while estimating the true delta for selecting n.
Under a fixed power and significance level, the sample size based on power analysis is much larger than the sample size based on precision analysis with extremely low infection rate difference or large allowed error margin.
SAS Proc Power can also calculate the sample size. The exact method is used for sample size calculation in SAS. The obtained sample size is usually greater that the ones calculated by hand (formula) or using PASS.

For confidence interval for one-sample proportion situation, the SAS codes will be something like this:
      proc power;
           onesamplefreq ci=wilson
           halfwidth = 0.1
           proportion = 0.3
           ntotal = 70
           probwidth = .;
   run;

For confidence interval for one-sample mean, refer to an example provided in SAS online document:  SAS 9.22 User’s Guide Example 68.7 Confidence Interval PrecisionExample

References:

No comments: