Tuesday, September 13, 2011

Confidence Interval for Difference in Two Proportions

In many clinical trials, the outcome is binomial and a 2 x 2 table can be constructed. The analysis can be based on the difference in two proportions (treatment group vs. control group). SAS Proc Freq can be used to obtain the difference between the proportions and the asymptotic confidence interval can be calculated for the difference between two proportions. The formula is (p1-p2) +/- Z(alpha/2)*sqrt((p1*q1/n1)+p2*q2/n2)).
However, the asymptotic confidence interval produced by PROC FREQ requires a somewhat large sample size (say cell counts of at least 12) - this is the case at least for SAS version up to 9.2. For moderately small sample size, it is better to use the formula provided in Fleiss (1981, page 29) Stokes (2000, page 29-30) where the confidence interval is adjusted by 0.5*(1/n1 + 1/n2) - therefore a little wider.  The confidence interval directly from SAS Proc FREQ is a little narrower than those using the formula. In practice, the statistician needs to make the choice which one to use in calculating the confidence interval for difference in proportions depending on the sample size situation.

Fleiss, JL (1981) Statistical Methods for Rates and Proportions. New York: John Wiley & Sons, Inc.
Stokes, Davis, and Kock (2000) Categorical Data Analysis using the SAS System, 2nd edition
FDA Draft Guidance on Tazarotene detailed the calculation of the 90% confidence interval for establishing the bioequivalence for the clinical endpoint using the second approach mentioned above.
 
The example from Stocks book can be implemented in SAS using the following SAS codes:

data respire2;
  input treat $ outcome $ count @@;
  datalines;
test    f 40
test    u 20
placebo f 16
placebo u 48
;

*** the confidence interval directly from SAS PROC FREQ;
proc freq order=data;
  weight count;
  tables treat*outcome / riskdiff;
run;

*** the confidence interval calculated from the formula (See section 2.4 Difference in Proportions
     in Stokes et al 'Categorical Data Analysis Using the SAS System' 2nd edition;
proc freq data=respire2 order=data;
    weight count;
    tables treat/noprint out=tots (drop=percent rename=(count=bign));
  run;
 
proc freq data=respire2;
    weight count;
    tables treat*outcome/noprint out=outcome (drop=percent);
    run;
 
proc sort data=tots;
  by treat;
  run;
 
proc sort data=outcome;
    by treat;
run;
 
data prop;
    merge outcome tots;
    by treat;
    if treat='test' then p1=count/bign;
    if treat='placebo' then p2=count/bign;
run;

data prop1(rename=(count=count1 bign=bign1)) prop2(rename=(count=count2 bign=bign2));
     set prop;
     if treat='test' then output prop1 ;
     if treat='placebo' then output prop2;
run;

data proportion;
  merge prop1(drop= p2 treat) prop2(drop = p1 treat);
run;

***Calculate the difference in proportions, SE, and 95% confidence interval using formula by Fleiss;
data cal;
  set proportion;
    variance=(p1*(1-p1)/(bign1)) + (p2*(1-p2)/(bign2));
    diff=(p1-p2);
    lower=(diff - ((1.96*(sqrt(variance)) + .5*(1/bign1 + 1/bign2))));
    upper=(diff + ((1.96*(sqrt(variance)) + .5*(1/bign1 + 1/bign2))));
    se=(sqrt(variance));

run;

proc print;
  format p1 p2 variance diff lower upper se 5.3;
run;

1 comment:

Cody L. Custis said...

Wouldn't one be better using Wilson style adjustments?