Saturday, January 23, 2010

Significant level of 0.00125

Recently, I found some materials related to a lawsuit against the Nuvelo Pharmaceuticals – a company who used to develop a clot-busting product in the indication of occluded central venous catheter. It is very interesting to see quite some arguments about the study design and the significance of a p-value. Since Nuvelo planned to conduct only one pivotal trial (instead of two), they had agreed with FDA to use an extraordinarily high threshold (0.00125). This stringent p-value required for a single pivotal trial, less than 0.00125, was not met and the development program was eventually terminated. The issue is that such a stringent p-value was not communicated to the investment community upfront – one of the reasons for the lawsuit.

There are scientific basis for this extraordinarily high significant level (0.00125). Here are the contexts extracted from one of the FDA’s NDA Statistical Review for United Therapeutics Corporation Drug: UniprostTM (treprostinol sodium) for pulmonary arterial hypertension.

“...A more important issue is the overall Type I error rate for the proposed analysis in this submission. First, consider the traditional standard for approval at the FDA based on two confirmatory trials. Even if the efficacy of a treatment is shown convincingly in one study, the agency likes to see replication in a second study because we will then be in a better position to infer that the results generalize to the entire population of patients with the disease. The overall Type I error rate (or false positive rate) is the chance that both studies will have a p-value less than 0.05 and the results of both studies are in the same direction. If the treatment effects in the two studies are identically 0, then the chance that both p-values will be less than 0.05 and both treatment effects are in the same direction is 0.001251. For this reason, the Division of Cardio-Renal Drugs has often advised sponsors that one study with a p-value less than 0.00125 may be sufficient for approval…”

While not very common, we do see quite some drug development programs with one-single pivotal trial. One such example is the trial called PROTECT where the sample size and power for primary composite endpoint was based on “90% power at two-sided 0.00125 significance level todetect a difference between a distribution of 33% failure, 35% unchanged and 32% success (placebo group) and 25 failure, 34% unchanged, and 42% success (rolofylline group), using the van Elteren extension of the Wilcoxon test”

Designing one single pivotal trial with a significant level of 0.00125 may not be a good strategy in comparing to the conventional two pivotal trials with a significant level of 0.05. A significant level of 0.00125 is forty times more stringent than a significant level of 0.05. Employing such a small significant level will typically require a large sample size and may be difficult to be successful.

FDA’s perspectives for clinical development of tropical microbicides indicated the followings for a single trial:

  • No single site provides unusually large fraction of participants
  • No single investigator or site provides a disproportionate favorable effect
  • Consistency across study subset
  • Statistically persuasive
    Single Multi-Center Trial Level of Evidence (p value, 2-sided)
    · P < 0.001 : persuasive, robust 2*[0.025^2]=0.00125
    · 0.05 > P > 0.01: inadequate
    · 0.01> p > 0.001: acceptable, if:

    - good internal consistency
    - low drop-out rates
    - Other supportive data

In the end, the evidence of efficacy should not purely rely on the p-values. There are other considerations in assessing the evidence of efficacy. This has been spelled out in FDA’s guidance for Industry: Providing Clinical Evidence of Effectiveness for Human Drug and Biological Products:
Tthe evidence of effectiveness could come from one single study with the following:

  • Large multicenter study
  • Consistency across study subsets
  • Multiple studies in a single study
  • Multiple endpoints involving different events
  • Statistically very persuasive finding

Here "statistically very persuasive finding" means a very small p-value even though the guidance does not specifically specify how small the p-value should be. It may depend on the negotiation with the corresponding branches in FDA.

Additional reading:


keni said...

I've just stumbled on your blog whilst looking for information about Hochberg analysis and read this commentary on p-values. I find your blog to be rather useful and often explains stats in a "humaine" manner.

I have also been trying to look for arguments/information about whether a lower p-value (eg 0.0001 vs 0.01) would always mean "more significance"? If one settles for an alpha of 0.05, I have always thought that a lower p-value does not mean that the results are more significant. Could you please advise? Or point me to a good reference that discuss this point?

Thank you,

Zhen xiang

Anonymous said...

You should always talk about the p-value in relevance to the sample size. With the same effect, you could have lower p-value if the sampel size is increased.

There are various misinterpretations of hypothesis test results and p-values. The discussions in following articles may help to answer your question:

Gunst, R. F. (2002). Finding confidence in statistical significance. Quality
Progress, 35 (10), 107-108

Hubbard, R. and M. J. Bayarri (2003). Confusion over measures of evidence (p's)
versus errors (alpha's) in classical testing. The American Statistician 5, 7(3), 171-