There are scientific basis for this extraordinarily high significant level (0.00125). Here are the contexts extracted from one of the FDA’s NDA Statistical Review for United Therapeutics Corporation Drug: UniprostTM (treprostinol sodium) for pulmonary arterial hypertension.
“...A more important issue is the overall Type I error rate for the proposed analysis in this submission. First, consider the traditional standard for approval at the FDA based on two confirmatory trials. Even if the efficacy of a treatment is shown convincingly in one study, the agency likes to see replication in a second study because we will then be in a better position to infer that the results generalize to the entire population of patients with the disease. The overall Type I error rate (or false positive rate) is the chance that both studies will have a p-value less than 0.05 and the results of both studies are in the same direction. If the treatment effects in the two studies are identically 0, then the chance that both p-values will be less than 0.05 and both treatment effects are in the same direction is 0.001251. For this reason, the Division of Cardio-Renal Drugs has often advised sponsors that one study with a p-value less than 0.00125 may be sufficient for approval…”
While not very common, we do see quite some drug development programs with one-single pivotal trial. One such example is the trial called PROTECT where the sample size and power for primary composite endpoint was based on “90% power at two-sided 0.00125 significance level todetect a difference between a distribution of 33% failure, 35% unchanged and 32% success (placebo group) and 25 failure, 34% unchanged, and 42% success (rolofylline group), using the van Elteren extension of the Wilcoxon test”
Designing one single pivotal trial with a significant level of 0.00125 may not be a good strategy in comparing to the conventional two pivotal trials with a significant level of 0.05. A significant level of 0.00125 is forty times more stringent than a significant level of 0.05. Employing such a small significant level will typically require a large sample size and may be difficult to be successful.
FDA’s perspectives for clinical development of tropical microbicides indicated the followings for a single trial:
- No single site provides unusually large fraction of participants
- No single investigator or site provides a disproportionate favorable effect
- Consistency across study subset
- Statistically persuasive
Single Multi-Center Trial Level of Evidence (p value, 2-sided)
· P < 0.001 : persuasive, robust 2*[0.025^2]=0.00125
· 0.05 > P > 0.01: inadequate
· 0.01> p > 0.001: acceptable, if:
- good internal consistency
- low drop-out rates
- Other supportive data
In the end, the evidence of efficacy should not purely rely on the p-values. There are other considerations in assessing the evidence of efficacy. This has been spelled out in FDA’s guidance for Industry: Providing Clinical Evidence of Effectiveness for Human Drug and Biological Products:
Tthe evidence of effectiveness could come from one single study with the following:
- Large multicenter study
- Consistency across study subsets
- Multiple studies in a single study
- Multiple endpoints involving different events
- Statistically very persuasive finding
Here "statistically very persuasive finding" means a very small p-value even though the guidance does not specifically specify how small the p-value should be. It may depend on the negotiation with the corresponding branches in FDA.
- Lloyd Fisher (1999) ONE LARGE, WELL-DESIGNED, MULTICENTER STUDY AS AN
ALTERNATIVE TO THE USUAL FDA PARADIGM. Drug information journal Vol. 33, pp. 265–271
- Boguang Zhen (2007) Consideration of Operational á Level With Different Approval Strategies. Drug Information Journal, Vol. 41, pp. 23–29, 2007 • 0092-8615
- Guidance for Industry: Evidence-Based Review System for the Scientific Evaluation of Health Claims – Final, 2009
- Shun Z, Chi E, Durrleman S, Fisher L. (2005) Statistical consideration of the strategy for demonstrating clinical evidence of effectiveness—one larger versus two smaller pivotal studies. Stat Med. 24:1619–1637.