Sunday, April 06, 2014

Simulation Approach to estimate the sample size for Poisson distribution

FDA Guidance for Industry: Safety, Efficacy, and Pharmacokinetic Studies to Support Marketing of Immune Globulin Intravenous (Human) as Replacement Therapy for Primary Humoral Immunodeficiency requires the efficacy endpoint to be the serious bacterial infection rate. The guidance states:

“…Based on our examination of historical data, we believe that a statistical demonstration of a serious infection rate per person-year less than 1.0 is adequate to provide substantial evidence of efficacy. You may test the null hypothesis that the serious infection rate is greater than or equal to 1.0 per person-year at the 0.01 level of significance or, equivalently, the upper one-sided 99% confidence limit would be less
than 1.0.

You should employ a sufficient number of subjects to provide at least 80% power with one-sided hypothesis testing and an alpha = 0.01. Although the responsibility for choosing the sample size rests with you, we anticipate that studies employing a total of approximately 40 to 50 subjects would generally prove adequate to achieve the requisite statistical power, given the design specifications listed in this same section.”

“Statistical considerations

The number of subjects to be included into the study might exceed 40 patients as the study should provide at least 80% power to reject the null-hypothesis of a serious infection rate greater or equal 1 by means of a one-sided test and a Type I error of 0.01. “
If we plan a study to meet the above requirements by regulatory agencies, sample size will need to be estimated and the sample size estimation requires the use of power and sample size calculation using Poisson distribution."

In a webinar  “Design and Analysis of Count Data” by Mani Lakshminarayanan, the following statements were made:

  • If the endpoint is count data then that should be taken into consideration for sample size Calculation
  • Most commonly used sample size software (nQuery Advisor, PASS 2002) do not have the options for discrete data. Poisson regression is available in PASS 2002.
  • If normal approximation is used then the sample size estimates might be too high, which increases cost and time of subject recruitment

The latest version of PASS has a module for poisson regression that allows the sample size calculation when the purpose is to compare two poisson response rates.

Cytel’s EAST version 6.2 offers power analysis and sample size calculations for count data in fixed (not adaptive) sample designs. EAST provides design capabilities for:
  • Test of a single Poisson rate
  • Test for a ratio of Poisson rates
  • Test for a ratio of Negative Binomial rates

However, none of the GUI sample size calculation software can be readily used for calculating the sample size in serious bacterial infection rate situation where the requirement is based on the upper bound of confidence interval for Poisson distribution.

In SAS community, there are some discussions using SAS to calculate the sample size for count / Poisson data. However, there is no easy answer for the question.

For serious bacterial infection rate situation discribed above, the simulation approach can be used. Several years ago, we described that the simulation approach can be used to  estimate the sample size for studies comparing two different slopes using random coefficient model (see Chen, Stock, and Deng (2008)). Similar approach can be used to estimate the sample size to meet the regulatory requirement for the upper bound of the 99% confidence interval meeting a threshold. The SAS macro is attached below.  

options nonotes ;   *suppress the SAS log;
%macro samplesize(n=, lambda=);
  *generating the first data set;
  data one;
         do i = 1 to &n;
            x = RAND('POISSON',&lambda);
          output;
         end;
   run;

   ODS SELECT NONE;
   ods output parameterestimates=parmest ;  
   proc genmod data=one ;
       model x = / dist=poisson link=log scale=deviance lrci alpha=0.02*98% CI for two sides is equivalent to 99% for one-sided;
   run ;


   data parmest;
     set parmest;
     est = exp(estimate);
     lower = exp(lowerlrcl) ;
     upper = exp(upperlrcl) ;
    run;

  data un;
    set parmest;
    iteration = 1;
    run;

*Iteration macro to generate the rest of data sets;
%macro loop(iteration);

%do i = 2 %to &iteration;
   data one;
         do i = 1 to &n;
            x = RAND('POISSON',&lambda);
          output;
         end;
   run;

   ODS SELECT NONE;
   ods output parameterestimates=parmest ;  
   proc genmod data=one ;
       model x = / dist=poisson link=log scale=deviance lrci alpha=0.02 ;   *98% CI for two sides ;
   run ;

    data parmest;
     set parmest;
     est = exp(estimate);
     lower = exp(lowerlrcl) ;
     upper = exp(upperlrcl) ;
    run;
       
  data parmest;
    set parmest;
    iteration = &i;
    run;

*combined the cumulative data sets;
   data un;
     set un parmest;
    run;
  
%end;
%mend;
%loop(100);    * for real application, it needs to be a much larger number (with more iterations);
*Calculate and print out power;
 Data power;
    set un;
    if parameter = 'Intercept' then do;
      if upper >=1 then flag =1;     *upper bound is above 1;
        else if upper <1 then flag =0;
     end;
    if parameter  = 'Intercept';
   run;

 ODS SELECT all;
 proc freq data = power ;
    table flag;
      title "n=&n; lambda=&lambda";
 run;
%mend;

*try different sample size and lambda;
%samplesize(n=10, lambda=0.5);
%samplesize(n=20, lambda=0.5);
%samplesize(n=25, lambda=0.5);
%samplesize(n=30, lambda=0.5);
%samplesize(n=40, lambda=0.5);

%samplesize(n=50, lambda=0.5);