Saturday, June 21, 2014

Is MMRM Good Enough in Handling the Missing Data in Longitudinal Clinical Trials?

In last 7-8 years, there is a gradual shift from using the single-imputation method such as LOCF to MMRM (mixed model for repeated measurement) especially after the PhRMA initiative published the paper "Recommendations for the Primary Analysis of Continuous Endpoints in Longitudinal Clinical Trials” in 2008 and National Research Council (NAS) published its report “Prevention & Treatment of Missing Data in Clinical Trials” in 2010. MMRM approach handling missing data do not employ formal imputation. MMRM analysis made use of all available data, including subjects with partial data (i.e., with missing data) in order to arrive at an estimate of the mean treatment effect without filling in the missing items.

The use of MMRM is based on the assumption of missing at random (MAR) and the assumption that dropouts would behave similarly to other patients in the same treatment group, and possibly with similar covariate values, had they not dropped out. While MAR seems to be a comprise between the missing completely at random (MCAR) and the missing not at random (MNAR), MAR assumption as the basis of using MMRM has been scrutinized or criticized in two recently FDA advisory committee meetings.

In a paper by Dr. Lisa LaVange “The Role of Statistics in Regulatory Decision Making”, these two advisory committee meetings were mentioned as examples how FDA statisticians helped in dealing with the interpretation of the results from MMRM analysis.

“When a patient discontinues therapy for intolerability, however, such an assumption may not be reasonable. More important, the estimated treatment effect will be biased in this case, leaving regulators with some uncertainty as to what information should be included in the product label, if approved.”

In both advisory committee meetings she cited, FDA statisticians challenged the assumptions of using the MMRM in handling the missing data.

The first advisory committee meeting is Jan 30, 2014 meeting of Pulmonary-Allergy Drugs Advisory Committee to review the New Drug application (NDA) from Pharmaxis Ltd. seeking approval of mannitol inhalation powder for Cystic Fibrosis (CF)

“In January 2013, an advisory committee meeting was held to review mannitol inhalation powder for the treatment of cystic fibrosis, and missing data issues were an important part of the discussion. Results in favor of mannitol were highly significant in one phase 3 study, but differential dropout rates were observed, indicating that some patients receiving mannitol could not tolerate therapy. The prespecified primary analysis involved a comparison between treatment groups of the average outcome across visits in the context of an MMRM. This analysis was flawed in the presence of informatively missing data, but a sensitivity analysis using baseline values in lieu of missing observations was significantly in favor of mannitol. Citing the NAS report, committee members raised questions about the usefulness of this analysis during the committee’s discussion. The statistical review team provided alternative sensitivity analyses based on empirical distribution functions that appropriately addressed the tolerability issue and provided clarity to the discussion.”

The second advisory commettee meeting is September 10, 2013 Meeting of the Pulmonary-Allergy Drugs Advisory Committee to review GSK’s new drug application (NDA) 203975 umeclidinium and vilanterol powder for inhalation in treating COPD

“At a second advisory committee meeting held in September 2013 to discuss umeclidinium and vilanterol inhalation powder for the treatment of chronic obstructive pulmonary disease, differential dropout rates were observed in the phase 3 studies, with more placebo patients discontinuing due to lack of efficacy compared with those treated with the investigational drug. The statistical reviewer presented a sensitivity analysis using a ‘‘jump to reference’’ method that assumed any patient discontinuing treatment early would behave similarly to placebo patients post discontinuation, arguing that such an assumption was reasonable given the drug’s mechanism of action. The results of this and other analyses helped inform committee members about the impact of missing data on the primary results and also helped focus the discussion on the importance of not only how much data were missing but the reasons why and the way in which postdiscontinuation data were incorporated in the analysis”

the Transcript for the September 10, 2013 Meeting of the Pulmonary-Allergy Drugs Advisory Committee (PADAC) detailed the discussions of the issue with the application of MMRM.

“Given the large amount of patient dropout in the primary efficacy studies, it is important to consider the potential effect of missing data on the reliability of efficacy results. Exploratory analyses showed that patients who dropped out on the active treatment arms tended to show benefit over placebo with respect to FEV1 prior to withdrawal. The primary MMRM model assumes that data after dropout are missing at random. Therefore, if the interest is in the effectiveness of the treatment assignment in all randomized subjects, regardless of adherence, i.e., the intention-to-treat estimand, then the primary analysis assumes that patients who dropped out went out to maintain that early treatment effect, even after treatment discontinuation. This assumption is not scientifically plausible because bronchodilators are symptomatic and not disease-modifying treatments, and thus any effect of the treatment will go away within a few days of stopping therapy. Therefore, a sensitivity analysis to evaluate the intention-to-treat estimand should not assume that any early treatment effect was maintained through 24 weeks in patients who prematurely stopped treatment. The sensitivity analysis carried out by the applicant that we considered most reasonable was a Jump to Reference (J2R) multiple amputation approach. The assumptions of this approach in comparison to those of the MMRM are illustrated in this figure, which displays hypothetical data for patients dropping out after week eight.
Average FEV1 values over time are displayed by circles for patients on UMEC/VI and by triangles for patients on placebo. The red lines display observed data prior to dropout, illustrating an early treatment benefit. The green lines display the trends in pulmonary function after dropout that are assumed by the MMRM model, i.e., an assumption that the early benefit was maintained throughout the 24 weeks. The blue lines display the trends assumed by the Jump to Reference sensitivity analysis. This analysis, like the MMRM, assumes that placebo patients continued on the trend observed prior to dropout. However, unlike the MMRM, the Jump to Reference (J2R) approach multiply imputes missing data in patients on active treatment under the assumption that any treatment effect observed prior to dropout would have gone away by the time of the next visit. In other words, the assumption is that pulmonary function in these patients after dropout tends to look like that observed in patients on placebo.

The results of this sensitivity analysis approach as compared to those of the primary analysis are shown in this table for Study 373. In all relative treatment comparisons, statistical significance was maintained in the sensitivity analysis. However, estimated magnitudes of treatment effects were approximately 20 to 30 percent smaller in the sensitivity analyses than in the primary analyses. For example, the estimated effect size relative to placebo for UMEC/VI at the proposed 62.5/25 dose was about 0.13 liters, as compared to 0.17 liters in the primary analysis. Notably, all sensitivity analyses to address the missing data are based on untestable assumptions about the nature of the unobserved data. “

The caveat in using the MMRM approach is also mentioned in EMA's "GUIDELINE ON MISSING DATA IN CONFIRMATORY CLINICAL TRIALS" released in 2009. The guideline mentioned that different type of variance-covariance matrix for MMRM model and assumptions to model the un-observed measurements could lead to different conclusions. it suggested that "the precise option settings must be fully justified and predefined in advance in detail, so that the results could be replicated by an external analyst"

"The methods above (e.g. MMRM and GLMM) are unbiased under the MAR assumption and can be thought of as aiming to estimate the treatment effect that would have been observed if all patients had continued on treatment for the full study duration. Therefore, for effective treatments these methods have the potential to overestimate the size of the treatment effect likely to be seen in practice and hence to introduce bias in favour of experimental treatment in some circumstances. In light of this the point estimates obtained can be similar to those from a complete cases analysis. This is problematic in the context of a regulatory submission as confirmatory clinical trials should estimate the effect of the experimental intervention in the population of patients with greatest external validity and not the effect in the unrealistic scenario where all patients receive treatment with full compliance to the treatment schedule and with a complete follow-up as per protocol. The appropriateness of these methods will be judged by the same standards as for any other approach to missing data (i.e. absence of important bias in favour of the experimental treatment) but in light of the concern above, the use of only these methods to investigate the efficacy of a medicinal product in a regulatory submission will only be sufficient if missing data are negligible. The use of these methods as a primary analysis can only be endorsed if the absence of important bias in favour of the experimental treatment can be substantiated"

When the study endpoint is continuous and measured longitudinally, if MMRM is used, the assumptions for using MMRM may be challenged. Some good practices in using MMRM may be: 1) always thinking about the assumptions for the use of MMRM; 2) using more than one imputation approaches for sensitivity analyses; 3) considering using the most conservative imputation approach such as J2R.

Tuesday, June 10, 2014

Mixed effect Model Repeat Measurement (MMRM) and Random Coefficient Model Using SAS

The clinical trial data presented to us are often in longitudinal format with repeated measurements. For Continuous Endpoints in Longitudinal Clinical Trials, both Mixed effect Model Repeat Measurement (MMRM) and Random Coefficient Model can be used for data analyses.

These two models are very similar, but there are differences. MMRM is used when we compare the treatment difference at the end of the study. Random Coefficient Model is used when we compare the treatment difference in slopes. If SAS mixed model is used, the key difference will be the use of Repeated statement if MMRM model and the use of Random statement if random coefficient model is used.

MMRM

MMRM has been extensively used in the analysis of longitudinal data especially when missing data is a concern and the missing at random (MAR) is assumed.

In a paper by Mallinckrod et al, “Recommendations for the primary analysis of continuous endpoints in longitudinal clinical trials”, the MMRM is recommended over the single imputation methods such as LOCF. The companion slides provided further explanation and how to use MMRM in analysis of longitudinal data.

In a recent paper by Mallinckrod et al (2013), “Recent Developments in the Prevention and Treatment of Missing Data“, the MMRM is again mentioned as one of the preferred method when missing data follow MAR. in this paper, an example was provided and the detail implementation of the MMRM is described as the following:

“The primary analysis used a restricted maximum likelihood (REML)–based repeated-measures approach. The analyses included the fixed, categorical effects of treatment, investigative site, visit, and treatment-by-visit interaction as well as the continuous, fixed covariates of baseline score and baseline score-by-visit interaction. An unstructured (co)variance structure shared across treatment groups was used to model the within-patient errors. The Kenward-Roger approximation was used to estimate denominator degrees of freedom and adjust standard errors. Analyses were implemented with SAS PROC MIXED.20 The primary comparison was the contrast (difference in least squares mean [LSMEAN]) between treatments at the last visit (week 8).”

For MMRM, if SAS mixed model is used, the sample SAS codes will be like the following:

If time variable is continuous (as covariate):

proc mixed;

class subject treatment site;

model Y = baseline treatment site
treatment*time baseline*time/ddfm=kr;

repeated time / sub = subject type = un;

            lsmeans treatment / cl diff at time = t1;
            lsmeans treatment / cl diff at time = t2;
             lsmeans treatment / cl diff at time = tx….;

run;

Where the treatment difference is obtained with lsmean statement for the treatment difference at time tx.

If time variable is categorical (in class statement):

proc mixed;

class subject treatment time site;

model Y = baseline treatment time site
treatment*time baseline*time/ddfm=kr;

repeated time / sub = subject type = un;

            lsmeans treatment*time /slice=time cl ;
             estimate 'treatment difference at tx' treatment -1 1
                                                                      treatment * time 0 0 0 0 -1
                                                                                                 0 0 0 0 1/cl;
           run;

The estimate statement depends on the levels of treatment and time variables. Refer to "Examples of writing CONTRAST and ESTIMATE statements in SAS Proc Mixed".

Random Coefficient Model

A longitudinal model using the RANDOM statement is called random coefficient model because the regression coefficients for one or more covariates are assumed to be a random sample from some population of possible coefficients. Random coefficient models may also be called hierarchical linear models or multi-level model and are useful for highly unbalanced data with many repeated measurements per subject. In random coefficient models, the fixed effect parameter estimates represent the expected values of the population of intercept and slopes. The random effects for intercept represent the difference between the intercept for the ith subject and the overall intercept. The random effects for slope represent the difference between the slope for the ith subject and the overall slope. SAS documents provided an example of using random coefficient model.

If we intent to compare the differences in slopes between two treatment groups, the MMRM model above can be rewritten as:

proc mixed;

class subject treatment site;

model Y = baseline treatment time site
treatment*time baseline*time/ddfm=kr;

random intercept time / sub = subject type = un;

ESTIMATE ‘SLOPE, TRT’ TIME 1 TIME*TREAT 1 0/CL;

ESTIMATE ‘SLOPE, PLACEBO’ TIME 1 TIME*TREAT 0 1/CL;

ESTIMATE ‘SLOPE DIFF & CI’ TIME*TREAT 1 -1 /CL;

run;

From the model, the estimate statement is used to obtain the difference in slopes between two treatment groups. In some case, the main effect (treatment) may not be significant, but the interaction term (treatment * time), a reflection of the difference in two slopes, may be statistically significant.

In a paper by Dirksen et al (2009) Exploring the role of CT densitometry: a randomised study of augmentation therapy in α₁-antitrypsin deficiency, the random coefficient model was employed to obtain the differences in slopes between two treatment groups:

"In Methods 1 and 2 for the densitometric analysis, treatment differences (Prolastin® versus placebo) were tested by linear regression on time of PD15 measurement in a random coefficient regression model as follows. Method 1: TLC-adjusted PD15 from CT scan as the dependent variable; treatment, centre and treatment by time interaction as the fixed effects; and intercept and time as the random effects. Method 2: PD15 from CT scan as the dependent variable; treatment, centre and treatment by time interaction as the fixed effects; logarithm of TLV as a time-dependent covariate; and intercept and time as the random effects. The estimated mean slope for each treatment group represented the rate of lung density change with respect to time. The tested treatment difference was the estimated difference in slope between the two groups, considered to be equivalent to the difference in the rates of emphysema progression."

Sunday, June 08, 2014

Meetings Materials on FDA's website

Because of the FDA Transparency Initiative, the meetings, conferences, and workshops sponsored or co-sponsored by FDA are not open to the public. Shortly after the meetings, conferences, and workshops, the materials and webcasts are usually available to the public. This is a good step forward.

The list of meetings, conferences, and workshops sponsored or co-sponsored by the Center for Drug Evaluation and Research (CDER) can be found at the following website:

Meetings, Conferences, & Workshops (Drugs)

For example, for several public meetings that I am interested in, the meeting materials are all available to the public on the website.

Periodically, FDA organizes the Advisory Committee (AC) meetings. While FDA may or may not follow the recommendations from Advisory Committees, the materials presented at the advisory committee meetings are always important. Sometimes, the statistical issues such as the endpoint selection, missing data handling, interpretation of the safety results are extensively discussed during the AC meeting and in the meeting materials. Fortunately, the AC meeting materials are usually posted on the web and sometimes, the webcasts are available too.

Advisory Committees & Meeting Materials

Saturday, May 03, 2014

Some quotes about statistics or from statisticians

I often see people cite interesting quotes in their presentation. Citing a good quote can entice the audience. I recently saw a presentation with the following quote about the statistics in dealing with uncertainties:

One way of defining statistics is…
The science of quantifying uncertainty, dealing with uncertainty, and making decisions in the face of uncertainty…
…and drug development is a series of decisions under huge uncertainty.

There is a website that lists the famous quotes from Statisticians or about Statisticians

Here are some quotes with top ranking:

All models are wrong, but some are useful. (George E. P. Box)

An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem. (John Tukey)

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. (Ronald Fisher (1938))

Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital. (Aaron Levenstein)

Statisticians, like artists, have the bad habit of falling in love with their models. (George Box)

I think it is much more interesting to live with uncertainty than to live with answers that might be wrong. (Richard Feynman)

If you torture the data enough, nature will always confess. (Ronald Coase)

Absence of evidence is not evidence of absence. (Carl Sagan)

It's easy to lie with statistics; it is easier to lie without them. (Frederick Mosteller)

Then there is a following cartoon::

Thursday, May 01, 2014

New books about 'missing data in clinical trials'

Missing data is one of the critical issues in statistics and in clinical trials. Three new books about the missing data were written by the biostatisticians working directly in pharmaceutical and drug development fields. These three books are worth recommending.

Michael O'Kelley and Baohdana Ratitch (2014) Clinical Trials with Missing Data: A guide for Practitioners - also see O'kelley's recent talk "Missing data in clinical trials: developments in practice"

Craig H. Mallinckrodt (2013) Preventing and Treating Missing Data in Longitudinal Clinical Trials

James R. Carpenter , Michael G. Kenward (2013) Multiple Imputation and Its Application (Statistics in Practice)

For free books regarding the missing data, the following resources are available:

Panel on Handling Missing Data in Clinical Trials Committee on National Statistics, Division of Behavioral and Social Sciences and Education, NATIONAL RESEARCH COUNCIL (2010) The Prevention and Treatment of Missing Data in Clinical Trials

James R. Carpenter & Michael G. Kenward (2007) Missing data in randomised controlled trials — a practical guide

Does FDA or EMA have a preferred method for dealing with missing data for when companies submit new drugs for approval?

The EMA has a guideline document on the handling of missing data in trials titled "Guideline on Missing Data in Confirmatory Clinical Trials"
The FDA does not have a single document which contains its recommendations for handling missing data in clinical trials. It appears that the FDA does not plan to issue a new guidance on missing data any time soon. However, the FDA often cited NRC’s “The Prevention and Treatment of Missing Data in Clinical Trials” for directions in dealing with the missing data.

The general trends in handling the missing data:

Shifting from handling the missing data to preventing the missing data
Single imputation methods are being criticized
None of the missing data handling approach is perfect (for example, MMRM method is criticized in recent FDA advisory committee meeting)
New missing data handling approaches are proposed and used (for example, the jump to reference (or J2R) imputation approach used by FDA statistician in Pulmonary-Allergy Drugs Advisory Committee Meeting September 10, 2013)
When missing data becomes an issue, sensitivity analyses with different imputation approach should be on must-to-do list especially for phase III studies – it will be a problem if the contradictory results are found with different imputation methods. See Olanzapine NDA review memos

Sunday, April 06, 2014

Simulation Approach to estimate the sample size for Poisson distribution

FDA Guidance for Industry: Safety, Efficacy, and Pharmacokinetic Studies to Support Marketing of Immune Globulin Intravenous (Human) as Replacement Therapy for Primary Humoral Immunodeficiency requires the efficacy endpoint to be the serious bacterial infection rate. The guidance states:

“…Based on our examination of historical data, we believe that a statistical demonstration of a serious infection rate per person-year less than 1.0 is adequate to provide substantial evidence of efficacy. You may test the null hypothesis that the serious infection rate is greater than or equal to 1.0 per person-year at the 0.01 level of significance or, equivalently, the upper one-sided 99% confidence limit would be less
than 1.0.

You should employ a sufficient number of subjects to provide at least 80% power with one-sided hypothesis testing and an alpha = 0.01. Although the responsibility for choosing the sample size rests with you, we anticipate that studies employing a total of approximately 40 to 50 subjects would generally prove adequate to achieve the requisite statistical power, given the design specifications listed in this same section.”

EMA Guidance: Guideline on the clinical investigation of human normal immunoglobulin for subcutaneous and/or intramuscular administration (SCIg/IMIg) has the similar requirement:

“Statistical considerations

The number of subjects to be included into the study might exceed 40 patients as the study should provide at least 80% power to reject the null-hypothesis of a serious infection rate greater or equal 1 by means of a one-sided test and a Type I error of 0.01. “
If we plan a study to meet the above requirements by regulatory agencies, sample size will need to be estimated and the sample size estimation requires the use of power and sample size calculation using Poisson distribution."

In a webinar “Design and Analysis of Count Data” by Mani Lakshminarayanan, the following statements were made:

If the endpoint is count data then that should be taken into consideration for sample size Calculation
Most commonly used sample size software (nQuery Advisor, PASS 2002) do not have the options for discrete data. Poisson regression is available in PASS 2002.
If normal approximation is used then the sample size estimates might be too high, which increases cost and time of subject recruitment

The latest version of PASS has a module for poisson regression that allows the sample size calculation when the purpose is to compare two poisson response rates.

Cytel’s EAST version 6.2 offers power analysis and sample size calculations for count data in fixed (not adaptive) sample designs. EAST provides design capabilities for:

Test of a single Poisson rate
Test for a ratio of Poisson rates
Test for a ratio of Negative Binomial rates

However, none of the GUI sample size calculation software can be readily used for calculating the sample size in serious bacterial infection rate situation where the requirement is based on the upper bound of confidence interval for Poisson distribution.

In SAS community, there are some discussions using SAS to calculate the sample size for count / Poisson data. However, there is no easy answer for the question.

Sample size calculation and count data/Poisson regression

Power Calculation for Counts

For serious bacterial infection rate situation discribed above, the simulation approach can be used. Several years ago, we described that the simulation approach can be used to estimate the sample size for studies comparing two different slopes using random coefficient model (see Chen, Stock, and Deng (2008)). Similar approach can be used to estimate the sample size to meet the regulatory requirement for the upper bound of the 99% confidence interval meeting a threshold. The SAS macro is attached below.

options nonotes ; *suppress the SAS log;

%macro samplesize(n=, lambda=);

*generating the first data set;

data one;

do i = 1 to &n;

x = RAND('POISSON',&lambda);

output;

end;

run;

ODS SELECT NONE;

ods output parameterestimates=parmest ;

proc genmod data=one ;

model x = / dist=poisson link=log scale=deviance lrci alpha=0.02 ; *98% CI for two sides is equivalent to 99% for one-sided;

run ;

data parmest;

set parmest;

est = exp(estimate);

lower = exp(lowerlrcl) ;

upper = exp(upperlrcl) ;

run;

data un;

set parmest;

iteration = 1;

run;

*Iteration macro to generate the rest of data sets;

%macro loop(iteration);

%do i = 2 %to &iteration;

data one;

do i = 1 to &n;

x = RAND('POISSON',&lambda);

output;

end;

run;

ODS SELECT NONE;

ods output parameterestimates=parmest ;

proc genmod data=one ;

model x = / dist=poisson link=log scale=deviance lrci alpha=0.02 ; *98% CI for two sides ;

run ;

data parmest;

set parmest;

est = exp(estimate);

lower = exp(lowerlrcl) ;

upper = exp(upperlrcl) ;

run;

data parmest;

set parmest;

iteration = &i;

run;

*combined the cumulative data sets;

data un;

set un parmest;

run;

%end;

%mend;

%loop(100); * for real application, it needs to be a much larger number (with more iterations);

*Calculate and print out power;

Data power;

set un;

if parameter = 'Intercept' then do;

if upper >=1 then flag =1; *upper bound is above 1;

else if upper <1 then flag =0;

end;

if parameter = 'Intercept';

run;

ODS SELECT all;

proc freq data = power ;

table flag;

title "n=&n; lambda=&lambda";

run;

%mend;

*try different sample size and lambda;

%samplesize(n=10, lambda=0.5);

%samplesize(n=20, lambda=0.5);

%samplesize(n=25, lambda=0.5);

%samplesize(n=30, lambda=0.5);

%samplesize(n=40, lambda=0.5);

%samplesize(n=50, lambda=0.5);

Sunday, March 30, 2014

Computing Confidence Interval for Poisson Mean

For Poisson distribution, there are many different ways for calculating the confidence interval. The paper by Patil and Kulkarni discusses 19 different ways to calculate a confidence interval for the mean of a Poisson distribution.

The most commonly used method is the normal approximation (for large sample size) and the exact method (for small sample size)

Normal Approximation:

For Poisson, the mean and the variance are both lambda (λ).
The standard error is calculated as: sqrt(λ /n) where λ is Poisson mean and n is sample size or total exposure (total person years, total time observed,…)

The confidence interval can be calculated as:

λ ±z(α/2)*sqrt(λ/n).

The 95-percent confidence interval is calculated as: λ ±1.96*sqrt(λ/n).

The 99-percent confidence interval is calculated as: λ ±2.58*sqrt(λ/n).

EXACT method:

Refer to the following paper for the description of this method:

The confidence interval for event X is calculated as:

(qchisq(α/2, 2*x)/2, qchisq(1-α/2, 2*(x+1))/2 )

Where x is the number of events occurred under Poisson distribution.

In order to calculate the exact confidence interval for Poisson mean, the obtained confidence interval for the number of events need to be converted to the confidence interval for Poisson mean.

Here are two examples from the internet:

Example 1:

Would like to know how confident I can be in my λ. Anyone know of a way to set upper and lower confidence levels for a Poisson distribution?

Observations (n) = 88

Sample mean (λ) = 47.18182

what would the 95% confidence look like for this?

With Normal Approximation, the 95% confidence interval is calculated as:

47.18182 +/- 1.96* sqrt(47.18182/88)

This gives 45.7467, 48.617

With Exact method, we first need to calculate x (# of events):

X = n * λ = 88 * 48.18182 = 4152

The compute the 95% confidence interval for X = 4152. This will give the 95% confidence interval for X as (4026.66, 4280.25)

The 95% confidence interval for mean (λ) is therefore:

lower bound = 4026.66 / 88 = 45.7575

upper bound = 4280.25 /88 = 48.6392

Example 2

Say that 14 events are observed in 200 people studied for 1 year and 100 people studies for 2 years. Calculate the 95% confidence interval for Poisson mean

In this example, the number of events (X) is given, the Poisson rate (λ) or mean needs to be calculated.

First step is to calculate the person year:

The person time at risk is 200 + 100 x 2 = 400 person years

The poisson rate / poisson mean (λ) is :

Events observed = 14
Time at risk of event = 400
Poisson (e.g. incidence) rate estimate = 14/400 = 0.035

Normal Approximation: 95% confidence interval is calculated as:

0.035 +/- 1.96* sqrt(0.035/400)

This will give the 95% confidence interval of (0.0167, 0.0533)

Exact approach: Calculate the 95% confidence interval for the number of events (X) using:

(qchisq(0.025, 2*x)/2, qchisq(0.975, 2*(x+1))/2 )

and the result is: [7.65, 23.49]

Exact 95% confidence interval for Poisson mean is:

Lower bound = 7.65 / 400 =0.019135 for lower bound and

Upper bound = 23.49 / 400 = 0.058724 for upper bound

We will then say the Poisson mean is 0.035 with 95% confidence interval of (0.019, 0.059).

The following SAS programs can illustrate the calculations above:

data normal;

input lambda n ;

lower = lambda - probit(0.975)*sqrt(lambda/n);

upper = lambda + probit(0.975)*sqrt(lambda/n);

datalines;

47.18182 88

0.035 400

;

proc print data=normal;

title 'Normal Approximation for 95% confidence interval for Poisson mean';

run;

data exact;

input X;

lower = quantile('CHISQ',.025,2*x)/2;

upper = quantile('CHISQ',.975,2*(x+1))/2;

datalines;

4152

;

proc print data=exact;

title 'Exact method for 95% confidence interval for Poisson mean';

run;

On Biostatistics and Clinical Trials

Saturday, June 21, 2014

Is MMRM Good Enough in Handling the Missing Data in Longitudinal Clinical Trials?

the Transcript for the September 10, 2013 Meeting of the Pulmonary-Allergy Drugs Advisory Committee (PADAC) detailed the discussions of the issue with the application of MMRM.

Tuesday, June 10, 2014

Mixed effect Model Repeat Measurement (MMRM) and Random Coefficient Model Using SAS

Sunday, June 08, 2014

Meetings Materials on FDA's website

Saturday, May 03, 2014

Some quotes about statistics or from statisticians

Thursday, May 01, 2014

New books about 'missing data in clinical trials'

Sunday, April 06, 2014

Simulation Approach to estimate the sample size for Poisson distribution

Sample size calculation and count data/Poisson regression

Power Calculation for Counts

Sunday, March 30, 2014

Computing Confidence Interval for Poisson Mean

About Me

Promoting Statistical Insight