Sunday, July 01, 2012

Multiplicity Adjustments: Gatekeeping, fixed-sequence, and fallback procedures

The multiplicity issue has evolved in last several years and a lot of new procedures have been proposed mainly in handing the issues encountered in the clinical trial and the drug development area.
EMA/CHMP recently released a "Concept paper on the need for a guideline on multiplicity issues in clinical trials" for seeking the public comments. In the introduction of this concept paper, it mentioned some new procedures
“The guideline is not to give advice on technical questions related to a new methodology. However, the increasing complexity of hypothesis frameworks and methods used may result in new issues and pose questions on general principles that haven’t been considered before. These include consistency problems, the construction of simultaneous confidence intervals and the usefulness of newly developed methods e.g. gatekeeping and fallback procedures as well as graphical solutions in the regulatory context.”
It is necessary to differentiate the differences among three new procedures for multiplicity adjustment:
  • Gatekeeping procedure
  • Fixed sequence procedure
  • Fallback procedure
All these three procedures are mainly designed to deal with the issue with multiple endpoints (including the primary endpoints and the secondary endpoints).

Gatekeeping Procedure is used in the situation when there are multiple endpoints and these multiple endpoints are grouped into different families. For example, a clinical trial will typically have one or more primary endpoints (family for primary endpoints) and have multiple secondary endpoints (family for secondary endpoints). If there are many secondary endpoints, the secondary endpoints can be further divided into multiple secondary different families. With gatekeeping procedure, the families are tested in a sequential manner and the tests for subsequent families will be performed only if the tests for the previous family is significant. In other words, the families of hypotheses examined earlier serve as gatekeepers. While the term ‘gatekeeping procedure’ may not used, this approach has been implemented in many clinical trials, especially in the regulatory setting. It is very typical that the secondary endpoints will only be tested only if the primary endpoint is tested significantly. In this way, the alpha-level for primary efficacy endpoints will be tested at alpha=0.05 level and not be compromised due to the consideration of the secondary endpoints.
  • The website maintained by Alex Dmitrienko et al contains a lot of useful information about the gatekeeping procedures. 
  •  A slide presentation by Branching tests in clinical trials with multiple objectives is helpful in understanding the gatekeeping procedure.
  •  The gatekeeping strategy is used in NDA 22-554 GI Drugs Advisory Committee Meeting NDA 22-554 Xifaxan (Rifaximin) where the secondary endpoints were grouped as “Key Secondary Endpoints” and “Other Secondary Endpoints”. Key secondary endpoints are those designated as most clinically important with pre-specified order for their analysis. P-values and confidence intervals for all other analyses are presented with NO adjustment for multiplicity. Nominal p-values and confidence intervals are consequently exploratory and cannot be used as a basis for efficacy claims in the product label if approved.
Fixed Sequence Procedure is a stepwise multiple testing procedure that is constructed using a pre-specified sequence of hypotheses. When there are multiple endpoints, these endpoints can be ordered according to their importance. All tests will be performed at the 0.05 level following the pre-specified order. Once one hypothesis is tested not significantly, all subsequent tests will not be performed.  The advantage and disadvantage of this testing procedure are obvious: power will be maximized as long as previous hypotheses are rejected, but minimized if a previous hypothesis is not rejected.  Another drawback for this procedure is that the ordering of multiple hypotheses based on the clinical importance is subjective in nature.

  • Fixed Sequence Procedure could be used under the umbrella of gatekeeping procedure for one specific family. In previous example of Xifaxan NDA, the gatekeeping procedure is used in general with considering of both primary and secondary endpoints, however the fixed sequence procedure is used in testing the key secondary endpoints.
  • While it is not explicitly stated, fixed sequence procedure is actually mentioned in the EMA’s   Points to consider on multiplicity issues in clinical trials” that is issued in 2002. In the case of “two or more primary variables ranked according to clinical relevance, no formal adjustment is necessary. However, no confirmatory claims can be based on variables that have a rank lower than or equal to that variable whose null hypothesis was the first that could not be rejected.”

The Fallback Procedure is concepturely similar to a fixed sequence test, in which hypotheses are tested in an a priori order at the full alpha level. The difference of the fallback procedure from the fixed sequence test is that the full alpha of 0.05 is split for endpoints in a pre-specified order (based on the clinical relevance) and the hypotheses in late order can still be tested (but with different alpha levels) if the previous hypothesis is not rejected. To explain how the fallback procedure differs from the fixed-sequence procedure, we can use an example from a paper “the Fallback procedure for evaluating a single family of hypotheses” by Wiens and Dmitrienko. There are five endpoints with actual p-values of 0.010, 0.060, 0.0002, 0.0004, and 0.0268. With the fixed-sequence procedure, the endpoints #3, #4, and #5 will never be tested since the endpoint #2 is not significant. However, with the fallback procedure, the endpoints #3, #4, and #5 can still be tested (just at different alpha levels).

In order
With Fixed-sequence procedure
With fallback procedure*
Endpoint #1
0.010 comparing to alpha=0.05
0.010 comparing to alpha=0.04
Endpoint #2
0.06 comparing to alpha=0.05
0.060 comparing to alpha=0.04 + 0.005 and result is not significant
Endpoint #3
Not tested due to the endpoint #2 is not significant
0.0002 comparing to alpha=0.002 and result is significant
Endpoint #4
Not tested
0.0004 comparing to alpha=0.002 + 0.002 and result is significant
Endpoint #5
Not tested
0.0268 comparing to 0.002+0.002+0.001 and result is not significant
* five endpoints are given weights for their importance and alpha levels are assigned as 0.04, 0.005, 0.002, 0.002, and 0.001 (corresponding to 0.80, 0.10, 0.04, 0.04, and 0.02 of total alpha of 0.05)

Additional References: