Sunday, May 03, 2009

Adjustment for multiplicity

one of the issues in statistics field is the adjustment for multiplicity - adjustment of alpha level for multiple tests. The multiplicity can arise in many different situations in clinical trials; some of them are listed below:
  • Multiple arms
  • Co-primary endpoints
  • Multiple statistical approaches for the same endpoint
  • Interim analysis
  • More than one doses vs. Placebo
  • Meta analysis
  • Sub group analysis

There are tons of articles about the multiplicity, but there are few guidances from the regulatory bodies. While the multiplicity issues arise, the common understanding is that the adjustment needs to be made. However, there is no guidance on which approach should be used. The adjustment approach could be the very conservative approach (e.g., Bonferroni) or less conservative (e.g., Hochberg). One could evaluate the various approaches and determine which adjusmtent approach is best suited to the situation in study.

While we are still waiting for FDA's guidance on multiplicity issue (hopefully it will come out in 2009), EMEA has issued a PtC (point to consider) document on multiplicity. The document provide guidances on when an adjustment for multiplicity should be implemented.

While there are so many articles related to multiplicity, I find the following articles suitable for my taste and with practical discussions.

  • Proschan and Waclawiw (2000) Practical Guidelines for Multiplicity Adjustment in Clinical Trials. Controlled Clinical Trial
  • Capizzi and Zhang (1996) Testing the Hypothesis that Matters for Multiple Primary Endpoints. Drug Information Journal
  • Koch and Gansky (1996) Statistical Considerations for Multiplicity in Confirmatory Protocols. Drug information Journal
  • Wright (1992) Adjust p values for simutaneous inference. Biometrics

It is always useful to refer to the statistical review documents for previous NDA/BLA to see which kind of approaches have been used in drug approval process. Three approaches below seem to stand out. These three approaches are also mentioned in

  • Hochberg procedure
  • Bonferroni-Holm procedure
  • Hierarchical order for testing null hypotheses

while not exactly the same, In a CDRH guidance on

"Clinical Investigations of Devices Indicated for the Treatment of Urinary Incontinence ", it states “The primary statistical challenge in supporting the indication for use or device performance in the labeling is in making multiple assessments of the secondary endpoint data without increasing the type 1 error rate above an acceptable level (typically 5%). There are many valid multiplicity adjustment strategies available for use to maintain the type 1 error rate at or below the specified level, three of which are listed below:
· Bonferroni procedure;
· Hierarchical closed test procedure; and
· Holm’s step-down procedure. "

Hochberg procedure is based on Hochberg's paper in 1988. It has been used in several NDA/BLA submissions. For example, in Tysabri BLA, it is stated

"Hochberg procedure for multiple comparisons was used for the evaluation of the primary endpoints. For 2 endpoints, the Hochberg procedure results in the following rule: if the maximum of the 2 p-values is less than 0.05, then both hypotheses are rejected and claim the statistical significance for both endpoints. Otherwise, if the minimum of the 2 p-values needs to be less than 0.025 for claiming the statistical significance".

Bonferroni-Holm procedure is based on Holm's paper in 1979 (Holm, S (1979): "A simple sequentially rejective multiple test procedure", Scandinavian Journal of Statistics, 6:65–70). It is a modification to the original method. This method may also be called Holm-Bonferroni approach or Bonferroni-Holm correction. This approach was employed in Flomax NDA (020579). and BLA for HFM-582 (STN 125057).

Both Holm's procedure and Hochberg's procedure are the modifications from the Bonferroni procedure. Holm's procedure is called 'step-down procedure' and Hochberg's procedure is called 'step-up procedure'. An article by Huang and Hsu titled "Hochberg's step-up method: cutting corners off Holm's step-down method" (Biometrika 2007 94(4):965-975) provided a good comparison of these two procedures.

Benjamin-Hochberg also proposed a new procedure which controls the FDR (false discovery rate) instead of controling the overall alpha level. The original paper by Benjamin and Hochberg is titled "controlling the false discovery rate: a practical and powerful approach to multiple testing" appeared in Journal of the Royal Statistical Society. it is interesting that the FDR and Benjamin-Hochberg procedure has been pretty often used in the gene identification/microarray area. A nice comparison of Bonferroni-Holm approach and Benjamin-Hochberg approach is from this website. Another good summary is the slides from 2004 FDA/Industry statistics worshop.

Hierarchical order for testing null hypotheses was cited in EMEA's guidance as

"Two or more primary variables ranked according to clinical relevance. No formal adjustment is necessary. Howeveer, no confirmatory claims can be based on variables that have a rank lower than or equal to that variable whose null hypothesis was the first that could not be rejected. "

This approach can be explained as a situation where a primary endpoint and several other secondary endpoints are defined. The highest ranked hypothesis is similar to the primary endpoint and the lower ranked endpoints are similar to the secondary endpoints.

In one of my old studies, we hypothsized the comparisons as something like below:

"A closed test procedure with the following sort order will be used for the pairwise comparisons. The second hypothesis will be tested only if the first hypothesis has been rejected, thus maintaining the overall significance level at 5%.
1. The contrast between drug 400mg and placebo (two-sided, alpha = 0.05)(H01 : mu of 400 mg = mu of placebo)
2. The contrast between drug 400 mg and a comparator (two-sided, alpha = 0.05)(H02 : mu of 400 mg = mu of the comparator) "


Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...


How to adjust multiplicity when one is doing exact binomial tests on multiple endpoints. Let me explain you the whole story:

There are two possible responses of a variable: D(detected) and ND(not detected). There are two treatment conditions, let say B(before) and A(after). There are three variables, let say Y1, Y2 and Y3.

I want to test whether difference between percentage detected (%D) from two treatment condition(Before and After) should not be more than 5% (0.05) at 5% level of significance and with at least 80% power. For decision making, I am looking at the lower bound of one-sided 95% confidence interval.

The issue here is the multiplicity effect from three simultaneous variables. I need sample size at 80% power to detect significant difference.

It is kind of interesting that how should we plan the study and obtain the sample size with at least 80% power which is able to detect 5% difference between %detected from two treatments over three variables.

If anyone feels it is confusing please let me know, I will try to explain bit more of this. We want to control FWER(Family Wise Error Rate) no more than 5%.


Anonymous said...

A better place for posting this kind of question may be "".