When planning for a clinical trial, an important step is to estimate the sample size (the of patients needed to detect the treatment difference) for the study. In calculating the sample size, it is conventional to have significant level set at 0.05 and statistical power set at 80% or above. Sometimes, we need to design a clinical trial with insufficient sample size. This occurs pretty often in early phase clinical trials, in investigator initiated trials (IITs), and in rare disease drug development process due to the constraints in resource, budget, and available patients who can participate in the study. We could design a study without formal sample size calculation and we would simply state that the sample size of xxx is from the clinical consideration even though we don’t what it means exactly ‘the clinical consideration’.
If there are biomarkers or surrogate endpoints and treatment effects for biomarkers and surrogate endpoints are easier to detect than the clinical endpoints, we could design a proof-of-concept study or early phase study using the biomarkers or surrogate endpoints. The sample size can be formally calculated based on the treatment effect in biomarkers or surrogate endpoints. For example, in solid tumor clinical trials, we could design a study with smaller sample size based on the effect in shrinking the tumor size. In studies of inhaled antibiotics in non-CF Bronchiectasis, the early phase study could use the sputum density of the bacteria count as the endpoint so that the smaller sample size is required to demonstrate the effect before the late stage study where the clinical meaningful endpoint such as exacerbations should be used.
We can run into the situation where there is no good or reliable biomarkers or surrogate endpoints and the clinical endpoint is the only one available. The endpoint for the early phase study and the late phase study is the same. In order to design an early phase study with smaller sample size, we will need to do one of the followings:While both approaches have been used in literature, I would prefer the approach with increasing the significant level to detect a trend. Intentionally designing an underpowered study seems to have the ethical concern.
- Increase the significant level (alpha level) to allow greater type I error. Instead of testing the hypothesis at the conventional alpha = 0.05, we can test the hypothesis at alpha = 0.10 or 0.20 – we would say that we are trying to detect a trend.
- Lower the statistical power to allow greater type II error – design an underpowered study.
Here are some examples that the clinical trial is to detect a trend using alpha = 0.20 (or one-sided alpha=0.10):
“…It was estimated that for the study to have 90% power to test the hypothesis at a one-sided 0.10 significance level, the per-protocol population would need to include 153 participants in each group. The failure rate was estimated with binomial proportion and 95% confidence intervals. One-sided 90% exact confidence intervals were used to estimate the difference in the failure rates between the two treatments, which is appropriate for a noninferiority study and which is consistent with the one-sided significance level of 0.10 that was used for the determination of the sample size. “
Bible (2012) A Multiinstitutional Phase 2 Trial of Pazopanib Monotherapy in Advanced Anaplastic Thyroid Cancer
“A three-outcome (promising, inconclusive, not promising), one-stage modified Simon optimal phase II clinical trial study design with an interim analysis was chosen so that there would be a 90% chance of detecting a tumor response rate of at least 20% when the true tumor response rate was at least 5% at a 0.10 significance level, deeming that a RECIST response rate of less than 20% would be of little clinical importance in ATC.”
Virgil (2013) Final analysis of a phase IB/randomized phase II study of gemcitabine (G) plus placebo (P) or vismodegib (V), a hedgehog (Hh) pathway inhibitor, in patients (pts) with metastatic pancreatic cancer (PC): A University of Chicago phase II consortium study.
“Assuming a mPFS of 3.5 months for GP and 5.7 months for GV (HR=0.61), a sample size of 106 subjects (53 per group) provided 85% power to detect this difference, using a one-sided test at the 0.10 significance level.”
“The primary null hypothesis was that CoQ10 reduces the mean ALSFRSr decline over 9 months by at least 20% compared to placebo—in short, that CoQ10 is “promising.” It was tested against the alternative that CoQ10 reduces the mean ALSFRSr decline by less than 20% over 9 months compared to placebo, at one-sided alpha = 0.10”
Here are some studies with insufficient power (less than 80% power). Notice that these studies still have 70% power. I can't image people's reaction if we design a study with 50% power.
Bashutski (2010) Teriparatide and Osseous Regeneration in the Oral Cavity
"A detailed calculation of sample size was difficult, since few studies have evaluated medications intended to augment local osseous repair in periodontal therapy. However, in one study of a selective cyclooxygenase-2 inhibitor in periodontal therapy, a sample of 22 patients per group was sufficient for the study to have 70% power to detect a 1-mm difference between the groups in the gain in clinical attachment level and reduction in probing depth, with a type I error rate of 5%."
“We estimated that a sample size of 600 would provide at least 70% power to detect a 33% reduction in the rate of the composite of the following serious adverse fetal or neonatal outcomes”
“With the sample of 99 patients, the study would have 70% power at a two-sided significance level of 0.05”