On Biostatistics and Clinical Trials: May 2017

Saturday, May 27, 2017

Clinical Trial with Insufficient Sample Size: under power or detecting a trend?

When planning for a clinical trial, an important step is to estimate the sample size (the of patients needed to detect the treatment difference) for the study. In calculating the sample size, it is conventional to have significant level set at 0.05 and statistical power set at 80% or above. Sometimes, we need to design a clinical trial with insufficient sample size. This occurs pretty often in early phase clinical trials, in investigator initiated trials (IITs), and in rare disease drug development process due to the constraints in resource, budget, and available patients who can participate in the study. We could design a study without formal sample size calculation and we would simply state that the sample size of xxx is from the clinical consideration even though we don’t what it means exactly ‘the clinical consideration’.

If there are biomarkers or surrogate endpoints and treatment effects for biomarkers and surrogate endpoints are easier to detect than the clinical endpoints, we could design a proof-of-concept study or early phase study using the biomarkers or surrogate endpoints. The sample size can be formally calculated based on the treatment effect in biomarkers or surrogate endpoints. For example, in solid tumor clinical trials, we could design a study with smaller sample size based on the effect in shrinking the tumor size. In studies of inhaled antibiotics in non-CF Bronchiectasis, the early phase study could use the sputum density of the bacteria count as the endpoint so that the smaller sample size is required to demonstrate the effect before the late stage study where the clinical meaningful endpoint such as exacerbations should be used.

We can run into the situation where there is no good or reliable biomarkers or surrogate endpoints and the clinical endpoint is the only one available. The endpoint for the early phase study and the late phase study is the same. In order to design an early phase study with smaller sample size, we will need to do one of the followings:

Increase the significant level (alpha level) to allow greater type I error. Instead of testing the hypothesis at the conventional alpha = 0.05, we can test the hypothesis at alpha = 0.10 or 0.20 – we would say that we are trying to detect a trend.
Lower the statistical power to allow greater type II error – design an underpowered study.

While both approaches have been used in literature, I would prefer the approach with increasing the significant level to detect a trend. Intentionally designing an underpowered study seems to have the ethical concern.

Here are some examples that the clinical trial is to detect a trend using alpha = 0.20 (or one-sided alpha=0.10):

Geisler (2015) Azithromycin versus Doxycycline for Urogenital Chlamydia trachomatis Infection

“…It was estimated that for the study to have 90% power to test the hypothesis at a one-sided 0.10 significance level, the per-protocol population would need to include 153 participants in each group. The failure rate was estimated with binomial proportion and 95% confidence intervals. One-sided 90% exact confidence intervals were used to estimate the difference in the failure rates between the two treatments, which is appropriate for a noninferiority study and which is consistent with the one-sided significance level of 0.10 that was used for the determination of the sample size. “

Bible (2012) A Multiinstitutional Phase 2 Trial of Pazopanib Monotherapy in Advanced Anaplastic Thyroid Cancer

“A three-outcome (promising, inconclusive, not promising), one-stage modified Simon optimal phase II clinical trial study design with an interim analysis was chosen so that there would be a 90% chance of detecting a tumor response rate of at least 20% when the true tumor response rate was at least 5% at a 0.10 significance level, deeming that a RECIST response rate of less than 20% would be of little clinical importance in ATC.”

Virgil (2013) Final analysis of a phase IB/randomized phase II study of gemcitabine (G) plus placebo (P) or vismodegib (V), a hedgehog (Hh) pathway inhibitor, in patients (pts) with metastatic pancreatic cancer (PC): A University of Chicago phase II consortium study.

“Assuming a mPFS of 3.5 months for GP and 5.7 months for GV (HR=0.61), a sample size of 106 subjects (53 per group) provided 85% power to detect this difference, using a one-sided test at the 0.10 significance level.”

Kaufmann (2009) Phase II trial of CoQ10 for ALS finds insufficient evidence to justify Phase III

“The primary null hypothesis was that CoQ10 reduces the mean ALSFRSr decline over 9 months by at least 20% compared to placebo—in short, that CoQ10 is “promising.” It was tested against the alternative that CoQ10 reduces the mean ALSFRSr decline by less than 20% over 9 months compared to placebo, at one-sided alpha = 0.10”

Here are some studies with insufficient power (less than 80% power). Notice that these studies still have 70% power. I can't image people's reaction if we design a study with 50% power.

Bashutski (2010) Teriparatide and Osseous Regeneration in the Oral Cavity

"A detailed calculation of sample size was difficult, since few studies have evaluated medications intended to augment local osseous repair in periodontal therapy. However, in one study of a selective cyclooxygenase-2 inhibitor in periodontal therapy, a sample of 22 patients per group was sufficient for the study to have 70% power to detect a 1-mm difference between the groups in the gain in clinical attachment level and reduction in probing depth, with a type I error rate of 5%."

Rouse (2007) A Trial of 17 Alpha-Hydroxyprogesterone Caproate to Prevent Prematurity in Twins

“We estimated that a sample size of 600 would provide at least 70% power to detect a 33% reduction in the rate of the composite of the following serious adverse fetal or neonatal outcomes”

Hauser (2008) B-Cell Depletion with Rituximab in Relapsing–Remitting Multiple Sclerosis

“With the sample of 99 patients, the study would have 70% power at a two-sided significance level of 0.05”

Thursday, May 04, 2017

Final Version of Protocol Template by FDA/NIH and TransCelerate

Previously, I discussed the protocol template for clinical trials. This week, FDA/NIH and TransCelerate simultaneously released the final version of the protocol template.

FDA/NIH's protocol template is intended for clinical investigators who are writing protocols for phase 2 and phase 3 NIH-funded studies requiring an investigational new drug (IND) or investigational device exemption (IDE) applications, but could also be helpful to other investigators conducting studies of medical products that are not regulated by the FDA.

The final protocol template by TransCelerate is for industry-sponsored clinical trials for licensure.

FDA/NIH protocol template (final)

Word Version of Final Template

TransCelerate protocol template (final)

Common Protocol Template Core Template – Basic Word Edition
Common Protocol Template – Technology-Enabled Edition

REFERENCE: FDA, NIH & Industry Advance Templates for Clinical Trial Protocols | RAPS

Monday, May 01, 2017

Betting on Death: Moral Dilemma

I recently re-read the book “What Money Can't Buy: The Moral Limits of Markets” by Michael J. Sandel. The example used in the book about the viatical industry and the moral dilemma associated with this make me think about the similar dilemma we are facing in the event-driven clinical trials where the event is unfortunate outcome (for example, morbidity and mortality).

A viatical settlement (from the Latin "viaticum") is the sale of a policy owner's existing life insurance policy to a third party for more than its cash surrender value, but less than its net death benefit. Such a sale provides the policy owner with a lump sum. The third party becomes the new owner of the policy, pays the monthly premiums, and receives the full benefit of the policy when the insured dies.

"Viatical settlement" typically is the term used for a settlement involving an insured who is terminally or chronically ill.

The viatical industry started in the 1980s and 1990s, prompted by the AIDS epidemic. It consisted of a market in the life insurance policies of people with AIDS and others who had been diagnosed with a terminal illness. Here is how it worked: Suppose someone with a $100,000 life insurance policy is told by his doctor that he has only a year to live. And suppose he needs money now for medical care, or perhaps simply to live well in the short time he has remaining. An investor offers to buy the policy from the ailing person at a discount, say, $50,000, and takes over payment of the annual premiums. When the original policyholder dies, the investor collects the $100,000.

It seems like a good deal all around. The dying policyholder gains access to the cash he needs, and the investor turns a handsome profit – provided the person dies on schedule.

With viaticals, the financial risk creates a moral complication not present in most other investments: the investor must hope that the person whose life insurance he buys dies sooner rather than later. The longer the person hangs on, the lower the rate of return.

The anti-HIV drugs that extended the lives of tens of thousands of people with AIDS scrambled the calculations of the viatical industry.

The viatical industry can extend to people with other terminal diseases such as cancer. However the concept is the same and the moral issues are the same: betting the people to die sooner than later.

In clinical trials with event-driven design where the event is bad such as death, cancer recurrence, pulmonary exacerbation, transplantation rejection,…), we may face the same dilemma. While the intention of new treatment is to prevent the bad event from happening, as the trial sponsor, we also hope that these bad events can occur more often so that we can finish the study early and have the study results available earlier.

Suppose there is a cancer clinical trial where the primary efficacy endpoint is time to death and suppose we design a randomized, double-blind study to compare two treatment groups: an experimental treatment group and a control group, we will calculate the sample size to see how many death events are needed to have at least 80% statistical power to show the treatment difference. Then based on the accrual rate and dropout rate, we can further calculate the number of subjects needed to have the desired number of death events. During the study, we can check the aggregate death rate to see if the actual results are in line with the assumptions. If the death rate is below our assumptions, we should be happy since the lower death rate could indicate the experimental treatment works, however, as the trial sponsor, we would not be happy since the lower death rate will indicate the longer trial to accrue the requirement number of death events.

As the trial sponsor, we may want to employ the enrichment strategies to select the population who may be likely to die therefore the death events can be accrued quickly. As mentioned in FDA’s guidance “Enrichment Strategies for Clinical Trials to Support Approval of Human Drugs and Biological Products”, this type of enrichment strategy is called prognostic enrichment. Here is what is said in FDA’s guidance:

IV. PROGNOSTIC ENRICHMENT STRATEGIES—IDENTIFYING HIGH-RISK 169 PATIENTS

A wide variety of prognostic indicators have been used to identify patients with a greater likelihood of having the event (or a large change in a continuous measure) of interest in a trial. These indications include clinical and laboratory measures, medical history, and genomic or proteomic measures. Selecting such patients allows a treatment effect to be more readily discerned. For example, trials of prevention strategies (reducing the rate of death or other serious event) in cardiovascular (CV) disease are generally more successful if the patients enrolled have a high event rate, which will increase the power of a study to detect any given level of risk reduction. Similarly, identification of patients at high risk of a particular tumor, or at high risk of recurrence or metastatic disease can increase the power of a study to detect an effect of a cancer treatment. Prognostic enrichment strategies are also applicable, or potentially applicable, to the study of drugs intended to delay progression of a variety of diseases, such as Alzheimer’s disease, Parkinson’s disease, rheumatoid arthritis, multiple sclerosis, and other conditions, where patients with more rapid progression could be selected; it is possible, of course, that such patients might be less responsive to treatment (i.e., that rapid progression would be a negative predictor of response), and that would have to be considered.

For any given desired power in an event-based study, the appropriate sample size will depend on effect size and the event rate in the placebo group. Prognostic enrichment does not increase the relative risk reduction (e.g., percent of responders or percent improvement in a symptom), but will increase the absolute effect size, generally allowing for a smaller sample size. For example, reduction of mortality from 10% to 5% in a high-risk population is the same relative effect as a reduction from 1% to 0.5% in a lower risk population, but a smaller sample size would be needed to show a 5% vs. 0.5% change in absolute risk. It is common to choose patients at high risk for events for the initial outcome study of a drug and, if successful, move on to larger studies in lower risk patients.

While this enrichment strategy is good for the trial sponsor and makes the clinical trial smaller, it also gives a bad taste because we are betting that the selected study population will have high death rate and that the patient die sooner.

On Biostatistics and Clinical Trials