Friday, July 18, 2008

Some words about adaptive design

Four rules to adapt by:
  • Allocation rule: how subjects will be allocated to the avaialble arms
  • Sampling rule: how many subjects will be sampled in subsequent stages
  • Stopping rule: when to drop an arm or stop the trial (for efficacy, harm or futility)
  • Decision rule: the final decision and interim decisions pertaining to design changes not covered in the previous three rules. Examples of the modifications that can result from these decision-making rules include modifying the sample size, dropping a treatment arm, stopping a study early for success or failure, combining phases, and/or adaptive randomization

Adaptive trials can involve any one of these rules, or a combination of them

Top three misconceptions of adaptive trials:
  • There are certain areas of confirmatory clinical research where adaptive designs are more applicable and other areas where adaptive designs are less or not applicable
  • Adaptive trial designs are characterized by unmanageable complexity and less careful planning
  • Adaptive designs require smaller sample sizes than traditional designs

Human beings lean toward wishful thinking. On average, drug effects are overestimated and the variability of drug effects is underestimated. As yet, trials have either been unknowingly underpowered or intentionally overpowered. In the latter case, an adaptive design is more or less unnecessary (aside from the questionable ethics of overpowered trials). The unknowingly underpowered trials, however, is where adaptive designs come into plan. By using an adaptive design, a potentially underpowered trial can be rescued. Overall, adaptive designs make better use of the patient as a resource. Trials no long need to be overpowered, and the number of underpowered trials is rescued.

Thursday, July 17, 2008

The heavy burden of the modern clinical trial protocols

A recent article by Mr Getz in Applied Clinical Trials elaborated the heavy burden of protocol design in modern clinical trials. The clinical trial design become more and more complicated, requiring more study procedures, collecting more items. More complex and demanding protocols are hurting clinical trial performance and success.

I absolutely agree with his assessment. I can even add that the protocol design may also require more blood draws from the participants for hematology, chemistry, viral testing, biomarkers, pharmacokinetics, pharmacogenetics,… In some studies, patients may be exposed to more radio material exposures than ever. I even heard that a sponsor provided a comprehensive protocol with adaptive study design with many pages of appendices to describe how Bayesian algorithms are applied. We can imagine the reaction from investigator, CRA, even CRO statisticians. I don’t know how the investigator can understand these Bayesian algorithms. One question that needs to be answered is the target of the study protocol: is it for investigators? For regulatory authorities? For IRBs? For CRAs? Or for all of them?

Everything has a balance. Eventually we will get to a point that a too complex and too demanding protocol may actual hurt the study from every aspect in terms of the cost, resource, patient enrollment, data quality, generalization of the study results,…

Who to blame? I can think of the followings:
Regulatory requirement is getting tighter and tighter.
The sponsor is getting more conservative
The sponsor is trying to collect as much information as they can with no distinguishing of the items that are really necessary and the items that is merely nice to have.
Key opinion leaders are often asking for additional items to be added to the protocol for their own interest.

Wednesday, July 16, 2008

Communication with non-statisticians

My fellow colleague expressed his frustration about exlaining a statistical concept to our clinical operation colleagues. I fully undertood his feelings. Sometimes, it is not easy to communicate with non-statisticians. The problem could be on both sides: stastician did not use the plain English or non-statisticians lacked the understanding of very basic statistics.

Considering my medical background, I feel a little lucky when communicating with physicians or non-statisticians. Perhaps also because of my teaching experience, I knew how to explain the complicated statistical issues in plain language to the non-statisticians. So I have one area I am proud of myself.

Below I picked up an example to demonstrate how differently the statistical terms can be explained.

Regarding three types of missing data mechnisms, here are the definition from a recent article in Drug Information Journal:
  • Data are considered missing completely at random (MCAR) if, conditional upon the independent variables in the analytic model, the missingness does not depend on either the observed or unobserved outcomes of the variable being analyzed (Y)
  • Data are missing at random (MAR) if, conditional upon the independent variables in the analytic model, the missingness depends on the observed outcomes of the variable being analyzed (Yobs) but does not depend on the unobserved outcomes of the variable being analyzed (Ymiss).
  • Data are missing not at random (MNAR) if, conditional upon the independent variables in the analytic model, the missingness depends on the unobserved outcomes of the variable being analyzed.
Now, for the same concept, the following definitions seem to be better understandable.
  • MCAR (data are missing completely at random): A "missing" value does not depend on the variable itself or on the values of other variables in the database.
  • MAR (data are missing at random): The probability of missing data on any variable is not related to its particular value. The pattern of missing data is traceable or predictable from other variables in the database.
  • NMAR (not missing at random): Missing data are not random and depend on the values that are missing.

Sunday, July 13, 2008

Rule of three

Rule of three states that consider a Bernoulli random variable with unknown probability p, if in n independent trials no events occur, a quick-and-ready approximation to the upper 95% confidence bound for p is 3/n.
This rule has particularly been used in pre-licensure clinical trials where the adverse event rate is very rare. Sample sizes of pivotal trials for licensure are set for an efficacy endpoint, and vary according to the indication. Therefore, pivotal confirmatory studies provide adequate denominators for determining adverse events that occur at a frequency higher than or similar to the clinical efficacy outcome. However, sample sizes are not sufficient for detecting the rare events. Only reliable post-marketing surveillance systems will allow detection of a rare adverse event or a small increase in adverse event rate.
Rule of three provides a quick calculation of the upper confidence interval of the observed rate (observed rate is zero when there is no event occurred). It is based on the estimated upper limit of the 95% confidence interval when this particular event has not occurred during a clinical trial or during the clinical development program. As an example in vaccine development program, if no event has been observed with a sample size of 100, the upper limit of the 95% CI of the rate of this event is 3%. A sample size in the range of 10 000 subjects can be considered adequate for establishing the protective efficacy of a new vaccine. If no event of any particular sort has been observed during a pre-licensure clinical program involving 10 000 individuals, it can be estimated that this event rate has an upper limit of 3 per 10 000. Rarer adverse events, those occurring at a lower frequency than the vaccine-targeted disease, or an increase in rare adverse events, are unlikely to be detected before licensure their assessment must rely on post-marketing studies (phase IV).
However, if the rare event does occur during the clinical trial (the event rate is not zero), the rule of three should not be used. Instead, the confidence interval should be calculated according to exact or permutation approach (not the formula that is based on the normal approximation).

Also, we should not attempt to calculate the sample size based on the rule of three. The sample size should still be based on the efficacy endpoint instead of the comparison of the rare events – otherwise, the sample size could be huge.

1. Ernst Eypasch et al. Probability of adverse events that have not yet occurred: a statistical reminder. BMJ 1995
2. Steve Simon's web blog. Stistical confidence interval with zero event

Sunday, July 06, 2008

Comparing treatment difference in slopes

In regulatory setting, can we showing the treatment difference by comparing the slopes between two treatment groups?
In a COPD study (e.g., a two arm, parallel group with primary efficacy variable measured at baseline and every 6 months thereafter), one can fit the random coefficient model and compare the treatment difference between two slopes. Also we can compare the treatment difference in terms of change from baseline to the endpoint (the last measure).
To test the difference in slopes, we would need to test whether or not the treatment*time interaction term is statistically significant. The assumption is that at the beginning of the trial, the intercept for both groups are the same - both groups started at the time level. Then if the treatment can slow the disease progression, the treatment group should show a smaller slope comparing with the placebo group.
If all patients are followed up to the end of the study, if the slopes are different, the endpoint (change from baseline) analysis should also be statistically different. However, with a smaller sample size, the results could be inconsistent by using slope comparison approach vs. endpoint analysis approach. For a given study, the decision has to be made which approach is considered as the primary endpoint. Why don't we analyze the data using both approaches? then we have to deal with the adjustment for multiplicity issue.
I used to make a comment and say "some regulatory authorities such as FDA recommend the simpler endpoint analysis"; then I was asked to provide the references to suport my statement. I did quite extensive search, but I could not find any real relevant reference. However, by reviewing 'statistical reviews' in the BLA and NDA in US, it is very rare to see any product approval based on the comparison of the slopes. Many product approvals are based on the comparison of 'change from baseline'.
So this is really a regulatory question. Every indication has their accepted endpoints so tradition takes precedence. According to my colleague, there is a movement in the Alzheimer's arena to look at differences in slopes, but this is basedon trying to claim disease modification. If this is the case, we may also apply this to the COPD area since for certain type of COPD, we can claim the disease modification by showing the differences in slopes.been used in COPD before?
On the other hand, It seems that that the slope model (random coefficient model) may be preferred in academic setting, but endpoint approach - change from baseline (with last value carried forward) may be more practical in the industry setting.
From statistical point of view, the slope approach makes a lot of sense, however, we need to be cautioned about some potential issues: 1. In some endpoint measure, there may be some type of plateau. If you reach that plateau prior to the end of the study there will be a loss of power comparing slopes as compared to some comparison of just the endpoint results or some type of general repeated measures assessment of the average treatment difference.2. If the slope comparison is used as the primary efficacy measure, the # of measurements per year on the primary efficacy variable is relevant. One may think that the more frequent measures will increase the power to show the treatmetn difference in slopes. The question arise when designing the study: are you choose a shorter trial with more frequent measures? or are you choose a longer trial with less frequent measures?

Saturday, July 05, 2008

Geometric Statistics, geometric CV, intra-subject variation

In bioavailability and bioequivalence studies, the pharmacokinetic parameters (AUC, Cmax) are often assumed to follow the log normal distribution. Further about log-normal distribution.

The common technique is to calculate the geometric statistics (geometric mean, geometric CV and geometric SD). Notice that the geometric CV is independent of the geometric mean (unlike the arithmetic CV which is dependent on the arithmetic mean) and the geometric CV is used in the sample size calculation. When calculating the geometric statistics, the data in original scale is log-transformed, then anti-log to transform back.

In crossover design, the geometric CV can be estimated from the mixed model and is used to gauge the intra-subject variation. Geometric CV = sqrt(exp(std^2)-1) or CV=sqrt(exp(variance)-1) where the std^2 is estimated by the MSE. Variance is from ODS ‘CovParms’ table of SAS PROC MIXEd. Another variation is inter-subject CV and the std^2 is estimated by the variance estimate for the random subject effect from the proc mixed procedure.

It should be cautioned that Geometric CV sometimes is just being called CV or intra-subject variability. I heard that some large pharmaceutical companies include 'intra-subject variability' in the standard data presentation for pharmacokinetic parameters.

The topic about the CV, geometric CV was discussed in (, a discussion mailing list on bioavailability and bioequivalences. used to be a great resource for PK-related discussion. However, recently the discussion group was dominated by a lot of the junkies posted by Indian guys. I guess it is because of the booming generic drug development industry in India.

Friday, July 04, 2008

Good Clinical Practice: A question & answer reference guide

I recently find the book edited by Parexel is extremely useful.

Good Clinical Practice: A Question & Answer Reference Guide
Edited by Mark P. Mathieu, Parexel Internationl Corporation or

Due to the fact that I work side by side with study managers and medical directors and due to my responsibility of overseeing the data management activities (outside of my responsibilities for biostatistics), I am involved in a lot of discussions the data collection, data quality. In a lot of situations, the decision has to be made on whether or not an event should be collected as an adverse events or how an event should be collected, ......

The Good Clinical Practice is just like the law. A lot of guidances really depends on how to interpret. The book of "A Question & Answer Reference Guide" is the one attempting to provide the interpretation of the GCPs with practical questions.

Here are two exmamples extracted from this book:

Q. Assuming that it is a study exclusion criterion, is a pregnancy while on study considered an AE? Is it considered an SAE?
A. In and of itself, a pregnancy is not considered an AE or SAE. However, abortion, whether accidental, therapeutic, or spontaneous, should always be classified as a SAE and expeditiously reported to the sponsor. Similarly, any congenital anomaly/birth defect in a child born to a female subject exposed to the investigational product should be recorded and reported as an SAE.

Q. should expected clinical outcomes of the disease under study, which are efficacy endpoints, be reported as AEs/SAEs?
A. Some protocols instruct investigators to record and report all untoward events that occur during a study as AEs/SAEs, which could include common symptoms of the disease under study and/or other expected clinical outcomes. This approach enables frequency comparisons of all events between treatment groups, but can make event recording in the CRF burdensome, result in more expedited reports from investigators to sponsors, and fill safety database with many untoward events that most likely have no relationship to study treatment and that could obscure signal identification.
In some clinical trials, disease symptoms and/or other expected clinical outcomes associated with the disease under study, which might technically meet the ICH definition of an AE or SAE, are collected and assessed as efficacy parameters rather than safety parameters.

Recently, We have a study where several subjects had elective procedures (breast augmentation, mole removal,...). To show the diligence, we might be tempted to consider them as adverse events (even though they are not drug related), however, the elective procedures should not be considered as adverse events. These elective procedures can be collected on a separate CRF page, but should not be reported in AE page.

The best way is to specify the detail either in the study protocol or in the initial training provided to the investigational sites prior to the study start so that the same criteria are followed and all investigators are clear what should be reported and what should not be reported.

Are we become slaves of the Intent to Treat Principle?

The intent to treat (or intention to treat) principle was invented by the statistician about 30 years ago. It took a while for the clinical trial community to accept the this concept. Nowadays, the intent to treat principle has been well accepted by the people well beyond the statisticians. However, I don't think everybody really understand the concept even though he (or she) may mention the intent to treat pricinple every time he (or she) can. I have been really bothered by the comments from regulatory reviewers to suggest us to define an intent to treatment population for studies without randomization and without placebo or active control (for example, a dose escalation study). We seem to become slaves of the intent-to-treat.

In a lot of situations, the intent to treat principle is misunderstood. The intent-to-treat concept is tied with randomization for treatment allocation. No randomization, no Intent-to-treat.
The intent-to-treat concept is really for the large scale, confirmatory, pivitol studies. For very ealier stage studies (for example, the dose escalation studies) with very few subjects, there is no need to follow the intent-to-treat principle.

Intent to treat population includes all randomized patients in the groups to which they were randomly assigned, regardless of their adherence with the entry criteria, regardless of the treatment they actually received, and regardless of subsequent withdrawal from treatment or deviation from the protocol. Stricly according to the intent to treat principle, if a subject is randomized, but never receive study medication, the subject would be included in the statistical analysis; if a subject is randomized to drug A, but wrontly takes the drug B, the subject would be analyzed in treatment group A, not B (so called as randomized, not as treated); if a subject is randomized, but with no outcome measures, the subject would be included in the analysis with subject considered as treatment failure.

Intention to treat analyses are done to avoid the effects of crossover and drop-out, which may break the randomization to the treatment groups in a study. Intention to treat analysis provides information about the potential effects of treatment policy rather than on the potential effects of specific treatment.

To apply the intent to treat principle, an appropriate method for handling the missing data needs to be specified. A popular practical approach (not idea approach from statistical standpoint) is last value carried forward.

Intent to treat principle is not needed for all clinical trials and should not be interpreted as "include all enrolled subjects" or "all subjects who signed informed consent". The intent to treat is from the randomization standpoint, it has nothing to do with "study subject has intention to be treated in the clinical trial".

1. ICH guidance E9
2. My presentation on ITT vs mITT
3. Wikipedian