Sunday, June 08, 2014

Meetings Materials on FDA's website

Because of the FDA Transparency Initiative, the meetings, conferences, and workshops sponsored or co-sponsored by FDA are not open to the public. Shortly after the meetings, conferences, and workshops, the materials and webcasts are usually available to the public. This is a good step forward.

The list of meetings, conferences, and workshops sponsored or co-sponsored by the Center for Drug Evaluation and Research (CDER) can be found at the following website:

Meetings, Conferences, & Workshops (Drugs)


For example, for several public meetings that I am interested in, the meeting materials are all available to the public on the website. 

Periodically, FDA organizes the Advisory Committee (AC) meetings. While FDA may or may not follow the recommendations from Advisory Committees, the materials presented at the advisory committee meetings are always important. Sometimes, the statistical issues such as the endpoint selection, missing data handling, interpretation of the safety results are extensively discussed during the AC meeting and in the meeting materials. Fortunately, the AC meeting materials are usually posted on the web and sometimes, the webcasts are available too. 

Advisory Committees & Meeting Materials


Saturday, May 03, 2014

Some quotes about statistics or from statisticians

I often see people cite interesting quotes in their presentation. Citing a good quote can entice the audience. I recently saw a presentation with the following quote about the statistics in dealing with uncertainties:
One way of defining statistics is…
The science of quantifying uncertainty, dealing with uncertainty, and making decisions in the face of uncertainty…
…and drug development is a series of decisions under huge uncertainty.
There is a website that lists the famous quotes from Statisticians or about Statisticians

Here are some quotes with top ranking:  

All models are wrong, but some are useful. (George E. P. Box)

An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem. (John Tukey)

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. (Ronald Fisher (1938))

Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital. (Aaron Levenstein)

Statisticians, like artists, have the bad habit of falling in love with their models. (George Box)

I think it is much more interesting to live with uncertainty than to live with answers that might be wrong. (Richard Feynman)

If you torture the data enough, nature will always confess. (Ronald Coase)

Absence of evidence is not evidence of absence. (Carl Sagan)

It's easy to lie with statistics; it is easier to lie without them. (Frederick Mosteller)
Then there is a following cartoon::


Thursday, May 01, 2014

New books about 'missing data in clinical trials'

Missing data is one of the critical issues in statistics and in clinical trials. Three new books about the missing data were written by the biostatisticians working directly in pharmaceutical and drug development fields. These three books are worth recommending.



    For free books regarding the missing data, the following resources are available: 
    Does FDA or EMA have a preferred method for dealing with missing data for when companies submit new drugs for approval?
    The general trends in handling the missing data:



    Sunday, April 06, 2014

    Simulation Approach to estimate the sample size for Poisson distribution

    FDA Guidance for Industry: Safety, Efficacy, and Pharmacokinetic Studies to Support Marketing of Immune Globulin Intravenous (Human) as Replacement Therapy for Primary Humoral Immunodeficiency requires the efficacy endpoint to be the serious bacterial infection rate. The guidance states:

    “…Based on our examination of historical data, we believe that a statistical demonstration of a serious infection rate per person-year less than 1.0 is adequate to provide substantial evidence of efficacy. You may test the null hypothesis that the serious infection rate is greater than or equal to 1.0 per person-year at the 0.01 level of significance or, equivalently, the upper one-sided 99% confidence limit would be less
    than 1.0.

    You should employ a sufficient number of subjects to provide at least 80% power with one-sided hypothesis testing and an alpha = 0.01. Although the responsibility for choosing the sample size rests with you, we anticipate that studies employing a total of approximately 40 to 50 subjects would generally prove adequate to achieve the requisite statistical power, given the design specifications listed in this same section.”

    “Statistical considerations

    The number of subjects to be included into the study might exceed 40 patients as the study should provide at least 80% power to reject the null-hypothesis of a serious infection rate greater or equal 1 by means of a one-sided test and a Type I error of 0.01. “
    If we plan a study to meet the above requirements by regulatory agencies, sample size will need to be estimated and the sample size estimation requires the use of power and sample size calculation using Poisson distribution."

    In a webinar  “Design and Analysis of Count Data” by Mani Lakshminarayanan, the following statements were made:

    • If the endpoint is count data then that should be taken into consideration for sample size Calculation
    • Most commonly used sample size software (nQuery Advisor, PASS 2002) do not have the options for discrete data. Poisson regression is available in PASS 2002.
    • If normal approximation is used then the sample size estimates might be too high, which increases cost and time of subject recruitment

    The latest version of PASS has a module for poisson regression that allows the sample size calculation when the purpose is to compare two poisson response rates.

    Cytel’s EAST version 6.2 offers power analysis and sample size calculations for count data in fixed (not adaptive) sample designs. EAST provides design capabilities for:
    • Test of a single Poisson rate
    • Test for a ratio of Poisson rates
    • Test for a ratio of Negative Binomial rates

    However, none of the GUI sample size calculation software can be readily used for calculating the sample size in serious bacterial infection rate situation where the requirement is based on the upper bound of confidence interval for Poisson distribution.

    In SAS community, there are some discussions using SAS to calculate the sample size for count / Poisson data. However, there is no easy answer for the question.

    For serious bacterial infection rate situation discribed above, the simulation approach can be used. Several years ago, we described that the simulation approach can be used to  estimate the sample size for studies comparing two different slopes using random coefficient model (see Chen, Stock, and Deng (2008)). Similar approach can be used to estimate the sample size to meet the regulatory requirement for the upper bound of the 99% confidence interval meeting a threshold. The SAS macro is attached below.  

    options nonotes ;   *suppress the SAS log;
    %macro samplesize(n=, lambda=);
      *generating the first data set;
      data one;
             do i = 1 to &n;
                x = RAND('POISSON',&lambda);
              output;
             end;
       run;

       ODS SELECT NONE;
       ods output parameterestimates=parmest ;  
       proc genmod data=one ;
           model x = / dist=poisson link=log scale=deviance lrci alpha=0.02*98% CI for two sides is equivalent to 99% for one-sided;
       run ;


       data parmest;
         set parmest;
         est = exp(estimate);
         lower = exp(lowerlrcl) ;
         upper = exp(upperlrcl) ;
        run;

      data un;
        set parmest;
        iteration = 1;
        run;

    *Iteration macro to generate the rest of data sets;
    %macro loop(iteration);

    %do i = 2 %to &iteration;
       data one;
             do i = 1 to &n;
                x = RAND('POISSON',&lambda);
              output;
             end;
       run;

       ODS SELECT NONE;
       ods output parameterestimates=parmest ;  
       proc genmod data=one ;
           model x = / dist=poisson link=log scale=deviance lrci alpha=0.02 ;   *98% CI for two sides ;
       run ;

        data parmest;
         set parmest;
         est = exp(estimate);
         lower = exp(lowerlrcl) ;
         upper = exp(upperlrcl) ;
        run;
           
      data parmest;
        set parmest;
        iteration = &i;
        run;

    *combined the cumulative data sets;
       data un;
         set un parmest;
        run;
      
    %end;
    %mend;
    %loop(100);    * for real application, it needs to be a much larger number (with more iterations);
    *Calculate and print out power;
     Data power;
        set un;
        if parameter = 'Intercept' then do;
          if upper >=1 then flag =1;     *upper bound is above 1;
            else if upper <1 then flag =0;
         end;
        if parameter  = 'Intercept';
       run;

     ODS SELECT all;
     proc freq data = power ;
        table flag;
          title "n=&n; lambda=&lambda";
     run;
    %mend;

    *try different sample size and lambda;
    %samplesize(n=10, lambda=0.5);
    %samplesize(n=20, lambda=0.5);
    %samplesize(n=25, lambda=0.5);
    %samplesize(n=30, lambda=0.5);
    %samplesize(n=40, lambda=0.5);

    %samplesize(n=50, lambda=0.5);

    Sunday, March 30, 2014

    Computing Confidence Interval for Poisson Mean

    For Poisson distribution, there are many different ways for calculating the confidence interval. The paper by Patil and Kulkarni discusses 19 different ways to calculate a confidence interval for the mean of a Poisson distribution.

    The most commonly used method is the normal approximation (for large sample size) and the exact method (for small sample size)

    Normal Approximation:

    For Poisson, the mean and the variance are both lambda (λ).
    The standard error is calculated as: sqrt(
    λ /n) where λ is Poisson mean and n is sample size or total exposure (total person years, total time observed,…)
    The confidence interval can be calculated as:
                      λ ±z(α/2)*sqrt(λ/n).
    The 95-percent confidence interval is calculated as: λ ±1.96*sqrt(λ/n).
    The 99-percent confidence interval is calculated as: λ ±2.58*sqrt(λ/n).
      
    EXACT method:

    Refer to the following paper for the description of this method: 

    The confidence interval for event X is calculated as: 

               (qchisq(α/2, 2*x)/2, qchisq(1-α/2, 2*(x+1))/2 )

    Where x is the number of events occurred under Poisson distribution.

    In order to calculate the exact confidence interval for Poisson mean, the obtained confidence interval for the number of events need to be converted to the confidence interval for Poisson mean.

    Here are two examples from the internet:

    Example 1:
    Would like to know how confident I can be in my λ. Anyone know of a way to set upper and lower confidence levels for a Poisson distribution?
    • Observations (n) = 88
    • Sample mean (λ) = 47.18182
    what would the 95% confidence look like for this?
    With Normal Approximation, the 95% confidence interval is calculated as:

                    47.18182 +/- 1.96* sqrt(47.18182/88)

    This gives 45.7467, 48.617

    With Exact method, we first need to calculate x (# of events):
                    X = n * λ = 88 * 48.18182 = 4152

    The compute the 95% confidence interval for X = 4152. This will give the 95% confidence interval for X as (4026.66, 4280.25)

    The 95% confidence interval for mean (λ) is therefore:
    lower bound = 4026.66 / 88 = 45.7575
    upper bound = 4280.25 /88 = 48.6392

      
    Say that 14 events are observed in 200 people studied for 1 year and 100 people studies for 2 years. Calculate the 95% confidence interval for Poisson mean

    In this example, the number of events (X) is given, the Poisson rate (λ) or mean needs to be calculated.

    First step is to calculate the person year:
    The person time at risk is 200 + 100 x 2 = 400 person years
    The poisson rate / poisson mean (λ) is :
    • Events observed = 14
    • Time at risk of event = 400
    • Poisson (e.g. incidence) rate estimate = 14/400 = 0.035

    Normal Approximation: 95% confidence interval is calculated as: 
                    0.035 +/- 1.96* sqrt(0.035/400)
    This will give the 95% confidence interval of (0.0167, 0.0533)

    Exact approach:  Calculate the 95% confidence interval for the number of events (X) using: 
    (qchisq(0.025, 2*x)/2, qchisq(0.975, 2*(x+1))/2 )

    and the result is: [7.65, 23.49]
    Exact 95% confidence interval for Poisson mean is:
             Lower bound = 7.65 / 400 =0.019135 for lower bound and
             Upper bound = 23.49 / 400 = 0.058724 for upper bound


    We will then say the Poisson mean is 0.035 with 95% confidence interval of (0.019, 0.059).

    The following SAS programs can illustrate the calculations above: 

    data normal;
      input lambda n ;
      lower = lambda - probit(0.975)*sqrt(lambda/n);
      upper = lambda + probit(0.975)*sqrt(lambda/n);
      datalines;
      47.18182 88
      0.035       400
    ;
    proc print data=normal;
      title 'Normal Approximation for 95% confidence interval for Poisson mean';
    run;

    data exact;
      input X;
      lower = quantile('CHISQ',.025,2*x)/2;
      upper = quantile('CHISQ',.975,2*(x+1))/2;
      datalines;
     4152
     14
    ;
    proc print data=exact;
      title 'Exact method for 95% confidence interval for Poisson mean';
    run;

    Thursday, March 06, 2014

    Intra-Subject Coefficient of Variation (CV%) for Sample Size Estimation for Crossover Design

    To calculate the sample size for a crossover design for bioequivalence study, a key assumption is the intra-subject variation. The intra-subject variation is usually expressed with coefficient of variation (COV).

    In sample size calculation software ‘PASS’, “Equivalence Test for Two Means in 2x2 Crossover Design - Specify Using Ratio” procedure requires COV (Coefficient of Variation) as the following:


    COV (COEFFICENT OF VARIATION):“...For log-normal data, the following relationship exists: COV(Y) = SQR(Exp(SX*SX)-1) where SX is the square root of the within mean square error in the ANOVA table of the log-transformed values.” 
     In Sample size calculation software Nquery, “T-tests (TOST) of equivalence in ratio of means for crossover design (natural log scale)” procedure specifies that
    “the square root of the mean square error is needed, the square root of the mean squared error can be obtained from the crossover ANOVA computed using the natural log scale. sqrt(MSE) is equal to sd/sqrt(2), where sd is the standard deviation of the period difference computed using the natural log scale. To compare results from this table to those in the Diletti, E., et al (1991) paper note that CV = sqrt(exp(s^2-1)”

    Suppose we analyze the data using Proc Mixed as following:

    proc mixed data=pk ;
        class sequence period treatment subject;
        model logauc = sequence period treatment;
        random subject(sequence);
        lsmeans treatment/pdiff cl alpha=0.1;
    run;

    Covariance parameter estimates in the SAS outputs will provide the information for calculating intra-subject coeffiicient. 

                                                                    The Mixed Procedure
    [...]
     
               
    Covariance Parameter Estimates
    Cov Parm
    Estimate
    SUBJECT(SEQUENCE)
    0.06800
    Residual
    0.1856

    [...]


    Here, Subject(Sequence) is the inter-subject variability s2inter and residual the within-subject (or intra-subject) variability s2within.

    If we need to design a new study with crossover design, we will convert the intra-subject variability to CV for sample size calculation. CVintra can be calculated with the formula CV=100*sqrt(exp(S2
    within)-1) or CV=100*sqrt(exp(Residual)-1). From the table above, s2within=0.1856, CV can be calculated as 45.16%

    If we use Proc GLM (for a balanced study with no missing data) as specified below, the within-subject (or intra-subject) variability s2within is the Mean Square Error (0.1856) – identical to the residual in Proc Mixed. Intra-subject CV is then calculated using CV=100*sqrt(exp(MSE)-1).

    PROC GLM data=pk;
     CLASS trtsEQ period trtgrp patno;
     MODEL lauc = trtSEQ patno(trtSEQ) period TRtgrp; * TRtgrp*PERIOD;
     LSMEANS TRTgrp / PDIFF CL alpha=0.1;
    RUN;


    Source
    DF
    Sum of Squares
    Mean Square
    F Value
    Pr > F
    Model
    23
    6.93036415
    0.30132018
    1.62
    0.1387
    Error
    20
    3.71219721
    0.18560986


    Corrected Total
    43
    10.64256136





    Further Reading:

    Sunday, March 02, 2014

    Analysis of 2x2x2 crossover design data: Proc Mixed or Proc GLM

    In an early article "Cookbook SAS Codes for Bioequivalence Test in 2x2x2 Crossover Design", I provided the analysis using Proc Mixed model. There is a question about the necessity of using Proc Mixed while the Proc GLM can be used for analyzing the data from a 2x2x2 crossover design. As a matter of fact, FDA guidance  "Statistical Approaches to Establishing Bioequivalence” indeed mentioned the use of Proc GLM for analyzing the non-replicated Crossover Design:

    b. Nonreplicated Crossover Designs
    For nonreplicated crossover designs, this guidance recommends parametric (normal-theory) procedures to analyze log-transformed BA measures. General linear model procedures available in PROC GLM in SAS or equivalent software are preferred, although linear mixed-effects model procedures can also be indicated for analysis of nonreplicated crossover studies. For example, for a conventional two-treatment, two-period, two-sequence (2 x2) randomized crossover design, the statistical model typically includes factors accounting for the following sources of variation: sequence, subjects nested in sequences, period, and treatment. The Estimate statement in SAS PROC GLM, or equivalent statement in other software, should be used to obtain estimates for the adjusted differences between treatment means and the standard error associated with these differences.
    Put it into the SAS statements, the SAS codes will be something like the following:

     PROC GLM data=pk;
        CLASS SEQUENCE PERIOD TREATMENT PATIENT;
        MODEL LogAUC = SEQUENCE PATIENT(SEQUENCE) PERIOD TREATMENT; 
        LSMEANS TREATMENT / PDIFF CL ALPHA=0.1;
    RUN;

    The SAS output below will provide the mean difference and its 90% confidence interval. Since the data is in log scale, anti-log of the mean difference (1.076) is the ratio of the geometric mean and anti-log of the 90% confidence interval is our 90% confidence interval for the geometric mean ratio (0.860, 1.346). 

    Least Squares Means for Effect TREATMENT
    i j Difference Between
    Means
    90% Confidence Limits for LSMean(i)-LSMean(j)
    1 2 0.073113 -0.150925 0.297152

    If it is balanced design (i.e., there is no missing value) and all subjects have data available for period 1 and period 2, the results from the Proc GLM will be identical to teh results obtained from Proc Mixed as described in my previous article "Cookbook SAS Codes for Bioequivalence Test in 2x2x2 Crossover Design".

    However, it is very often, there is possibility of missing data in one of the periods. In this case, if Proc GLM is used, the entire subject will be excluded from the analysis. If Proc Mixed is used, the subject who contributes only the data for one of the two periods will still be included in the analysis.