On Biostatistics and Clinical Trials: Significant level of 0.00125

Saturday, January 23, 2010

Significant level of 0.00125

Recently, I found some materials related to a lawsuit against the Nuvelo Pharmaceuticals – a company who used to develop a clot-busting product in the indication of occluded central venous catheter. It is very interesting to see quite some arguments about the study design and the significance of a p-value. Since Nuvelo planned to conduct only one pivotal trial (instead of two), they had agreed with FDA to use an extraordinarily high threshold (0.00125). This stringent p-value required for a single pivotal trial, less than 0.00125, was not met and the development program was eventually terminated. The issue is that such a stringent p-value was not communicated to the investment community upfront – one of the reasons for the lawsuit.

There are scientific basis for this extraordinarily high significant level (0.00125). Here are the contexts extracted from one of the FDA’s NDA Statistical Review for United Therapeutics Corporation Drug: UniprostTM (treprostinol sodium) for pulmonary arterial hypertension.

“...A more important issue is the overall Type I error rate for the proposed analysis in this submission. First, consider the traditional standard for approval at the FDA based on two confirmatory trials. Even if the efficacy of a treatment is shown convincingly in one study, the agency likes to see replication in a second study because we will then be in a better position to infer that the results generalize to the entire population of patients with the disease. The overall Type I error rate (or false positive rate) is the chance that both studies will have a p-value less than 0.05 and the results of both studies are in the same direction. If the treatment effects in the two studies are identically 0, then the chance that both p-values will be less than 0.05 and both treatment effects are in the same direction is 0.001251. For this reason, the Division of Cardio-Renal Drugs has often advised sponsors that one study with a p-value less than 0.00125 may be sufficient for approval…”

While not very common, we do see quite some drug development programs with one-single pivotal trial. One such example is the trial called PROTECT where the sample size and power for primary composite endpoint was based on “90% power at two-sided 0.00125 significance level todetect a difference between a distribution of 33% failure, 35% unchanged and 32% success (placebo group) and 25 failure, 34% unchanged, and 42% success (rolofylline group), using the van Elteren extension of the Wilcoxon test”

Designing one single pivotal trial with a significant level of 0.00125 may not be a good strategy in comparing to the conventional two pivotal trials with a significant level of 0.05. A significant level of 0.00125 is forty times more stringent than a significant level of 0.05. Employing such a small significant level will typically require a large sample size and may be difficult to be successful.

FDA’s perspectives for clinical development of tropical microbicides indicated the followings for a single trial:

No single site provides unusually large fraction of participants
No single investigator or site provides a disproportionate favorable effect
Consistency across study subset
Statistically persuasive
Single Multi-Center Trial Level of Evidence (p value, 2-sided)
· P < 0.001 : persuasive, robust 2*[0.025^2]=0.00125
· 0.05 > P > 0.01: inadequate
· 0.01> p > 0.001: acceptable, if:

- good internal consistency
- low drop-out rates
- Other supportive data

In the end, the evidence of efficacy should not purely rely on the p-values. There are other considerations in assessing the evidence of efficacy. This has been spelled out in FDA’s guidance for Industry: Providing Clinical Evidence of Effectiveness for Human Drug and Biological Products:
Tthe evidence of effectiveness could come from one single study with the following:

Large multicenter study
Consistency across study subsets
Multiple studies in a single study
Multiple endpoints involving different events
Statistically very persuasive finding

Here "statistically very persuasive finding" means a very small p-value even though the guidance does not specifically specify how small the p-value should be. It may depend on the negotiation with the corresponding branches in FDA.

Additional reading:

Lloyd Fisher (1999) ONE LARGE, WELL-DESIGNED, MULTICENTER STUDY AS AN
ALTERNATIVE TO THE USUAL FDA PARADIGM. Drug information journal Vol. 33, pp. 265–271
Boguang Zhen (2007) Consideration of Operational á Level With Different Approval Strategies. Drug Information Journal, Vol. 41, pp. 23–29, 2007 • 0092-8615
FDA guidance
Guidance for Industry: Evidence-Based Review System for the Scientific Evaluation of Health Claims – Final, 2009
Shun Z, Chi E, Durrleman S, Fisher L. (2005) Statistical consideration of the strategy for demonstrating clinical evidence of effectiveness—one larger versus two smaller pivotal studies. Stat Med. 24:1619–1637.

2 comments:

keni said...: I've just stumbled on your blog whilst looking for information about Hochberg analysis and read this commentary on p-values. I find your blog to be rather useful and often explains stats in a "humaine" manner.

I have also been trying to look for arguments/information about whether a lower p-value (eg 0.0001 vs 0.01) would always mean "more significance"? If one settles for an alpha of 0.05, I have always thought that a lower p-value does not mean that the results are more significant. Could you please advise? Or point me to a good reference that discuss this point?

Thank you,

Zhen xiang; 5:45 AM
Anonymous said...: You should always talk about the p-value in relevance to the sample size. With the same effect, you could have lower p-value if the sampel size is increased.

There are various misinterpretations of hypothesis test results and p-values. The discussions in following articles may help to answer your question:

Gunst, R. F. (2002). Finding confidence in statistical significance. Quality
Progress, 35 (10), 107-108

Hubbard, R. and M. J. Bayarri (2003). Confusion over measures of evidence (p's)
versus errors (alpha's) in classical testing. The American Statistician 5, 7(3), 171-
178.; 10:53 AM

On Biostatistics and Clinical Trials

Saturday, January 23, 2010

Significant level of 0.00125

2 comments:

About Me

Promoting Statistical Insight