Sunday, March 30, 2014

Computing Confidence Interval for Poisson Mean

For Poisson distribution, there are many different ways for calculating the confidence interval. The paper by Patil and Kulkarni discusses 19 different ways to calculate a confidence interval for the mean of a Poisson distribution.

The most commonly used method is the normal approximation (for large sample size) and the exact method (for small sample size)

Normal Approximation:

For Poisson, the mean and the variance are both lambda (λ).
The standard error is calculated as: sqrt(
λ /n) where λ is Poisson mean and n is sample size or total exposure (total person years, total time observed,…)
The confidence interval can be calculated as:
                  λ ±z(α/2)*sqrt(λ/n).
The 95-percent confidence interval is calculated as: λ ±1.96*sqrt(λ/n).
The 99-percent confidence interval is calculated as: λ ±2.58*sqrt(λ/n).
  
EXACT method:

Refer to the following paper for the description of this method: 

The confidence interval for event X is calculated as: 

           (qchisq(α/2, 2*x)/2, qchisq(1-α/2, 2*(x+1))/2 )

Where x is the number of events occurred under Poisson distribution.

In order to calculate the exact confidence interval for Poisson mean, the obtained confidence interval for the number of events need to be converted to the confidence interval for Poisson mean.

Here are two examples from the internet:

Example 1:
Would like to know how confident I can be in my λ. Anyone know of a way to set upper and lower confidence levels for a Poisson distribution?
  • Observations (n) = 88
  • Sample mean (λ) = 47.18182
what would the 95% confidence look like for this?
With Normal Approximation, the 95% confidence interval is calculated as:

                47.18182 +/- 1.96* sqrt(47.18182/88)

This gives 45.7467, 48.617

With Exact method, we first need to calculate x (# of events):
                X = n * λ = 88 * 48.18182 = 4152

The compute the 95% confidence interval for X = 4152. This will give the 95% confidence interval for X as (4026.66, 4280.25)

The 95% confidence interval for mean (λ) is therefore:
lower bound = 4026.66 / 88 = 45.7575
upper bound = 4280.25 /88 = 48.6392

  
Say that 14 events are observed in 200 people studied for 1 year and 100 people studies for 2 years. Calculate the 95% confidence interval for Poisson mean

In this example, the number of events (X) is given, the Poisson rate (λ) or mean needs to be calculated.

First step is to calculate the person year:
The person time at risk is 200 + 100 x 2 = 400 person years
The poisson rate / poisson mean (λ) is :
  • Events observed = 14
  • Time at risk of event = 400
  • Poisson (e.g. incidence) rate estimate = 14/400 = 0.035

Normal Approximation: 95% confidence interval is calculated as: 
                0.035 +/- 1.96* sqrt(0.035/400)
This will give the 95% confidence interval of (0.0167, 0.0533)

Exact approach:  Calculate the 95% confidence interval for the number of events (X) using: 
(qchisq(0.025, 2*x)/2, qchisq(0.975, 2*(x+1))/2 )

and the result is: [7.65, 23.49]
Exact 95% confidence interval for Poisson mean is:
         Lower bound = 7.65 / 400 =0.019135 for lower bound and
         Upper bound = 23.49 / 400 = 0.058724 for upper bound


We will then say the Poisson mean is 0.035 with 95% confidence interval of (0.019, 0.059).

The following SAS programs can illustrate the calculations above: 

data normal;
  input lambda n ;
  lower = lambda - probit(0.975)*sqrt(lambda/n);
  upper = lambda + probit(0.975)*sqrt(lambda/n);
  datalines;
  47.18182 88
  0.035       400
;
proc print data=normal;
  title 'Normal Approximation for 95% confidence interval for Poisson mean';
run;

data exact;
  input X;
  lower = quantile('CHISQ',.025,2*x)/2;
  upper = quantile('CHISQ',.975,2*(x+1))/2;
  datalines;
 4152
 14
;
proc print data=exact;
  title 'Exact method for 95% confidence interval for Poisson mean';
run;

Thursday, March 06, 2014

Intra-Subject Coefficient of Variation (CV%) for Sample Size Estimation for Crossover Design

To calculate the sample size for a crossover design for bioequivalence study, a key assumption is the intra-subject variation. The intra-subject variation is usually expressed with coefficient of variation (COV).

In sample size calculation software ‘PASS’, “Equivalence Test for Two Means in 2x2 Crossover Design - Specify Using Ratio” procedure requires COV (Coefficient of Variation) as the following:


COV (COEFFICENT OF VARIATION):“...For log-normal data, the following relationship exists: COV(Y) = SQR(Exp(SX*SX)-1) where SX is the square root of the within mean square error in the ANOVA table of the log-transformed values.” 
 In Sample size calculation software Nquery, “T-tests (TOST) of equivalence in ratio of means for crossover design (natural log scale)” procedure specifies that
“the square root of the mean square error is needed, the square root of the mean squared error can be obtained from the crossover ANOVA computed using the natural log scale. sqrt(MSE) is equal to sd/sqrt(2), where sd is the standard deviation of the period difference computed using the natural log scale. To compare results from this table to those in the Diletti, E., et al (1991) paper note that CV = sqrt(exp(s^2-1)”

Suppose we analyze the data using Proc Mixed as following:

proc mixed data=pk ;
    class sequence period treatment subject;
    model logauc = sequence period treatment;
    random subject(sequence);
    lsmeans treatment/pdiff cl alpha=0.1;
run;

Covariance parameter estimates in the SAS outputs will provide the information for calculating intra-subject coeffiicient. 

                                                                The Mixed Procedure
[...]
 
           
Covariance Parameter Estimates
Cov Parm
Estimate
SUBJECT(SEQUENCE)
0.06800
Residual
0.1856

[...]


Here, Subject(Sequence) is the inter-subject variability s2inter and residual the within-subject (or intra-subject) variability s2within.

If we need to design a new study with crossover design, we will convert the intra-subject variability to CV for sample size calculation. CVintra can be calculated with the formula CV=100*sqrt(exp(S2
within)-1) or CV=100*sqrt(exp(Residual)-1). From the table above, s2within=0.1856, CV can be calculated as 45.16%

If we use Proc GLM (for a balanced study with no missing data) as specified below, the within-subject (or intra-subject) variability s2within is the Mean Square Error (0.1856) – identical to the residual in Proc Mixed. Intra-subject CV is then calculated using CV=100*sqrt(exp(MSE)-1).

PROC GLM data=pk;
 CLASS trtsEQ period trtgrp patno;
 MODEL lauc = trtSEQ patno(trtSEQ) period TRtgrp; * TRtgrp*PERIOD;
 LSMEANS TRTgrp / PDIFF CL alpha=0.1;
RUN;


Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
23
6.93036415
0.30132018
1.62
0.1387
Error
20
3.71219721
0.18560986


Corrected Total
43
10.64256136





Further Reading:

Sunday, March 02, 2014

Analysis of 2x2x2 crossover design data: Proc Mixed or Proc GLM

In an early article "Cookbook SAS Codes for Bioequivalence Test in 2x2x2 Crossover Design", I provided the analysis using Proc Mixed model. There is a question about the necessity of using Proc Mixed while the Proc GLM can be used for analyzing the data from a 2x2x2 crossover design. As a matter of fact, FDA guidance  "Statistical Approaches to Establishing Bioequivalence” indeed mentioned the use of Proc GLM for analyzing the non-replicated Crossover Design:

b. Nonreplicated Crossover Designs
For nonreplicated crossover designs, this guidance recommends parametric (normal-theory) procedures to analyze log-transformed BA measures. General linear model procedures available in PROC GLM in SAS or equivalent software are preferred, although linear mixed-effects model procedures can also be indicated for analysis of nonreplicated crossover studies. For example, for a conventional two-treatment, two-period, two-sequence (2 x2) randomized crossover design, the statistical model typically includes factors accounting for the following sources of variation: sequence, subjects nested in sequences, period, and treatment. The Estimate statement in SAS PROC GLM, or equivalent statement in other software, should be used to obtain estimates for the adjusted differences between treatment means and the standard error associated with these differences.
Put it into the SAS statements, the SAS codes will be something like the following:

 PROC GLM data=pk;
    CLASS SEQUENCE PERIOD TREATMENT PATIENT;
    MODEL LogAUC = SEQUENCE PATIENT(SEQUENCE) PERIOD TREATMENT; 
    LSMEANS TREATMENT / PDIFF CL ALPHA=0.1;
RUN;

The SAS output below will provide the mean difference and its 90% confidence interval. Since the data is in log scale, anti-log of the mean difference (1.076) is the ratio of the geometric mean and anti-log of the 90% confidence interval is our 90% confidence interval for the geometric mean ratio (0.860, 1.346). 

Least Squares Means for Effect TREATMENT
i j Difference Between
Means
90% Confidence Limits for LSMean(i)-LSMean(j)
1 2 0.073113 -0.150925 0.297152

If it is balanced design (i.e., there is no missing value) and all subjects have data available for period 1 and period 2, the results from the Proc GLM will be identical to teh results obtained from Proc Mixed as described in my previous article "Cookbook SAS Codes for Bioequivalence Test in 2x2x2 Crossover Design".

However, it is very often, there is possibility of missing data in one of the periods. In this case, if Proc GLM is used, the entire subject will be excluded from the analysis. If Proc Mixed is used, the subject who contributes only the data for one of the two periods will still be included in the analysis.