one of the issues in statistics field is the adjustment for multiplicity - adjustment of alpha level for multiple tests. The multiplicity can arise in many different situations in clinical trials; some of them are listed below:
- Multiple arms
- Co-primary endpoints
- Multiple statistical approaches for the same endpoint
- Interim analysis
- More than one doses vs. Placebo
- Meta analysis
- Sub group analysis
There are tons of articles about the multiplicity, but there are few guidances from the regulatory bodies. While the multiplicity issues arise, the common understanding is that the adjustment needs to be made. However, there is no guidance on which approach should be used. The adjustment approach could be the very conservative approach (e.g., Bonferroni) or less conservative (e.g., Hochberg). One could evaluate the various approaches and determine which adjusmtent approach is best suited to the situation in study.
While we are still waiting for FDA's guidance on multiplicity issue (hopefully it will come out in 2009), EMEA has issued a PtC (point to consider) document on multiplicity. The document provide guidances on when an adjustment for multiplicity should be implemented.
While there are so many articles related to multiplicity, I find the following articles suitable for my taste and with practical discussions.
- Proschan and Waclawiw (2000) Practical Guidelines for Multiplicity Adjustment in Clinical Trials. Controlled Clinical Trial
- Capizzi and Zhang (1996) Testing the Hypothesis that Matters for Multiple Primary Endpoints. Drug Information Journal
- Koch and Gansky (1996) Statistical Considerations for Multiplicity in Confirmatory Protocols. Drug information Journal
- Wright (1992) Adjust p values for simutaneous inference. Biometrics
It is always useful to refer to the statistical review documents for previous NDA/BLA to see which kind of approaches have been used in drug approval process. Three approaches below seem to stand out. These three approaches are also mentioned in
- Hochberg procedure
- Bonferroni-Holm procedure
- Hierarchical order for testing null hypotheses
while not exactly the same, In a CDRH guidance on
"Clinical Investigations of Devices Indicated for the Treatment of Urinary Incontinence ", it states “The primary statistical challenge in supporting the indication for use or device performance in the labeling is in making multiple assessments of the secondary endpoint data without increasing the type 1 error rate above an acceptable level (typically 5%). There are many valid multiplicity adjustment strategies available for use to maintain the type 1 error rate at or below the specified level, three of which are listed below:
· Bonferroni procedure;
· Hierarchical closed test procedure; and
· Holm’s step-down procedure. "
Hochberg procedure is based on Hochberg's paper in 1988. It has been used in several NDA/BLA submissions. For example, in Tysabri BLA, it is stated
"Hochberg procedure for multiple comparisons was used for the evaluation of the primary endpoints. For 2 endpoints, the Hochberg procedure results in the following rule: if the maximum of the 2 p-values is less than 0.05, then both hypotheses are rejected and claim the statistical significance for both endpoints. Otherwise, if the minimum of the 2 p-values needs to be less than 0.025 for claiming the statistical significance".
Bonferroni-Holm procedure is based on Holm's paper in 1979 (Holm, S (1979): "A simple sequentially rejective multiple test procedure", Scandinavian Journal of Statistics, 6:65–70). It is a modification to the original method. This method may also be called Holm-Bonferroni approach or Bonferroni-Holm correction. This approach was employed in Flomax NDA (020579). and BLA for HFM-582 (STN 125057).
Both Holm's procedure and Hochberg's procedure are the modifications from the Bonferroni procedure. Holm's procedure is called 'step-down procedure' and Hochberg's procedure is called 'step-up procedure'. An article by Huang and Hsu titled "Hochberg's step-up method: cutting corners off Holm's step-down method" (Biometrika 2007 94(4):965-975) provided a good comparison of these two procedures.
Benjamin-Hochberg also proposed a new procedure which controls the FDR (false discovery rate) instead of controling the overall alpha level. The original paper by Benjamin and Hochberg is titled "controlling the false discovery rate: a practical and powerful approach to multiple testing" appeared in Journal of the Royal Statistical Society. it is interesting that the FDR and Benjamin-Hochberg procedure has been pretty often used in the gene identification/microarray area. A nice comparison of Bonferroni-Holm approach and Benjamin-Hochberg approach is from this website. Another good summary is the slides from 2004 FDA/Industry statistics worshop.
Hierarchical order for testing null hypotheses was cited in EMEA's guidance as
"Two or more primary variables ranked according to clinical relevance. No formal adjustment is necessary. Howeveer, no confirmatory claims can be based on variables that have a rank lower than or equal to that variable whose null hypothesis was the first that could not be rejected. "
This approach can be explained as a situation where a primary endpoint and several other secondary endpoints are defined. The highest ranked hypothesis is similar to the primary endpoint and the lower ranked endpoints are similar to the secondary endpoints.
In one of my old studies, we hypothsized the comparisons as something like below:
"A closed test procedure with the following sort order will be used for the pairwise comparisons. The second hypothesis will be tested only if the first hypothesis has been rejected, thus maintaining the overall significance level at 5%.
1. The contrast between drug 400mg and placebo (two-sided, alpha = 0.05)(H01 : mu of 400 mg = mu of placebo)
2. The contrast between drug 400 mg and a comparator (two-sided, alpha = 0.05)(H02 : mu of 400 mg = mu of the comparator) "