Sunday, December 11, 2016

Commonly Used Procedure for Multiplicity Adjustment: Fixed Sequence Procedure, Holm Step-down Procedure, Hochberg Step-up Procedure

In clinical trials, we often have the multiple tests or multiplicity issue when there are more than one hypothesis tests built in the same study and we want to claim the trial success if one of multiple hypothesis tests is significant. For example, in steoporosis/breast cancer trial, there may be two endpoints: 
  • Endpoint 1: Incidence of vertebral fractures
  • Endpoint 2: Incidence of breast cancer

We would like to claim the success if at least one endpoint is significant. In a trial with a low dose group, a high dose group, and a placebo control, if we want to claim the success if either lower dose versus placebo or high dose group versus control is statistically significant. In both of these situations, adjustment for multiplicity must be employed.

On the other hand, not all studies with more than one hypothesis tests will need the adjustment for multiplicity. With Alzheimer’s disease trial as example, FDA guidance requires two endpoints
  • Endpoint 1: Cognition endpoint (ADAS-Cog)
  • Endpoint 2: Clinical global scale (CIBIC plus)

and requires that both endpoints must be significant in order to claim success. In this case, both hypotheses are tested at significant level of 0.05 and there is no adjustment for multiplicity is needed.

In late phase clinical trials, if multiplicity issue exists, adjustment for multiplicity must be built into the statistical analysis plan to avoid the inflation of the family-wise type 1 error rate (usually 0.05 or 5%).

Many different approaches have been proposed for handling the multiplicity issue. In a recent article by Wang et al (2015) “Overview of multiple testing methodology and recent development in clinical trials”, the following procedures were reviewed.


Multiple testing procedures for non-hierarchical hypotheses

Non-parametric or semi-parametric procedures
Bonferroni procedure
Simes procedure
Holm step-down procedure
Hochberg step-up procedure
Hommel procedure
Parametric procedures
Dunnett procedure




Multiple testing procedures on hierarchical hypotheses
Simple procedures for hierarchical hypotheses
Fixed-sequence procedure
Fallback procedure


Gatekeeping procedures
Serial gatekeeping procedures
Parallel gatekeeping procedure
Other extensions of gatekeeping procedures

Graphical approaches


In a presentation by Bretz and Xun “introduction to multiplicity in clinical trials” at IMPACT meeting, the multiple testing procedures for non-hierarchical hypotheses were organized based on whether the test is a single step or stepwise and based on whether or not the correlations are considered.
  

 
They also made the following remarks:
·         Single step methods are less powerful than stepwise methods and not often used in practice
·         Accounting for correlations leads to more powerful procedures, but correlations are not always known
·         Simes-based methods are more powerful than Bonferroni-based methods, but control the FWER only under certain dependence structures
·         In practice, we select the procedure that is not only powerful from a statistical perspective, but also appropriate from clinical perspective

For a specific clinical trial with multiplicity issue, the choice of the procedure for multiplicity adjustment depends on the study design, if there is an order in clinical importance of multiple hypothesis tests, or sometimes if there is a prior evidence that one hypothesis test may be more likely to be significant. For example, for a dose-response study, Dunnett procedure or stepdown Dunnett procedure may be preferred. If Multiplicity problems in clinical trials have multiple sources of multiplicity (for example, multiple endpoints + different type of tests (superiority and non-inferiority)), then the gatekeeping procedure may be preferred.


In industry clinical trials, some procedures are more commonly used than others because they are more powerful or more likely to declare the statistical significance. It may usually be the case that the clinical trial sponsor side (the pharmaceutical/biotech companies) would like to choose a procedure that is more powerful (such as Hochberg procedure) while the regulatory side (such as FDA) would prefer a procedure that is more conservative (such as Bonferroni or Holm’s procedure).

We are still waiting for FDA to issue its formal guidance on multiplicity issues. In the meantime, we see that some procedures for handling the multiplicity issue are mentioned in therapeutic area specific guidance or presentations by FDA statisticians. For example, in CDRH’s guidance “Clinical Investigations of Devices Indicated for the Treatment of Urinary Incontinence”, the following paragraph was mentioned in dealing with the multiplicity issue when performing the statistical tests for multiple secondary endpoints.

The primary statistical challenge in supporting the indication for use or device performance in the labeling is in making multiple assessments of the secondary endpoint data without increasing the type 1 error above an acceptable level (typically 5%). There are many valid multiplicity adjustment strategies available for use to maintain the type 1 error rate at or below the specified level, three of which are listed below:
  • Bonferroni procedure
  • Hierarchical closed test procedure
  • Holm’s step-down procedure
Because each of these multiplicity adjustment strategies involves balancing different potential advantages and disadvantages, we recommend you prospectively state the strategy that you intend to use. We recommend your protocol prospectively state a statistical hypothesis for each secondary endpoint related to the indication for use or device performance.


EMA has a guideline “Points to consider on multiplicity issues in clinical trials”. The document was issued in 2002 and might be time for revision. The document mainly focused on when the adjustment for multiplicity is needed and when the adjustment for multiplicity is not needed. There is no mention about the procedures that could be used for multiplicity adjustment.

A recent paper by Sakamaki et al (2016) “Current practice onmultiplicity adjustment and sample size calculation in multi-arm clinicaltrials: an industry survey in Japan” revealed that fixed sequence procedure, gatekeeping procedure, and Hochberg procedure are most commonly used and Holm procedure is rarely used.
 



Assuming that there are two hypothesis tests and the left column indicates the p-values for these two hypothesis tests. Claiming the statistical significance depending on which procedure to use for multiplicity adjustment. In this specific case, the Hochberg step-up procedure is more power than other multiplicity adjustment procedures.




Without any adjustment for multiplicity
Bonferroni correction
Fixed sequence hierarchical

Hochberg step-up Procedure
Compare p1 with 0.05
Compare p2 with 0.05
If  p1 lt  0.025
or
if p2 lt 0.025
If plt 0.05, comparing p2 with 0.05;
If p1 gt 0.05, p2 will not be tested
If min(p1, p2) lt 0.025
Then test
if max(p1, p2) lt 0.05
If max(p1, p2) lt 0.05
then claim both groups are successful;
or
if max(p1, p2) gt 0.05 then test
if min(p1,p2) lt 0.025
p1=0.04
p2=0.03
x
x
p1 gt 0.05
p2=0.03
x
x
x
x
pgt .05
p2=0.02
x
p1=0.04
pgt 0.05
x
x
x
x
p1=0.02
p2=0.02

  
References:

No comments: