Tuesday, November 28, 2017

Bonferroni method, alpha level partition, and gatekeeper hierarchical test strategy in Bronchiectasis clinical trials

In a recent FDA advisory committee meeting in November 16, 2017, we learned the first hand application of the various approaches for multiplicity adjustment: Single step Bonferroni method, Single step arbitrary partition of alpha level, gatekeeping - hierarchical test procedure which was discussed in one of my previous posts.

During this meeting of the Antimicrobial Drugs Advisory Committee (AMDAC), the committee considered new drug application (NDA) 209367 for ciprofloxacin dry powder for inhalation (DPI), sponsored by Bayer HealthCare Pharmaceuticals, Inc. The drug is being proposed for the reduction of exacerbations in non-cystic fibrosis bronchiectasis (NCFB) adult patients (≥18 years of age) with respiratory bacterial pathogens.

The clinical program to evaluate the safety and efficacy of ciprofloxacin DPI consisted of 2 nearly identical phase 3, randomized, multicenter, placebo-controlled trials known as RESPIRE 1 and RESPIRE 2. See table 1 below for the design information.

For both RESPIRE 1 and RESPIRE 2 studies, the primary efficacy endpoint is time to first exacerbation. Within each study, there are three treatment arms with two hypothesis tests. In order to maintain the blinding, the placebo arm is further divided into placebo for 28 days on/off treatment regimen and 14 days on/off treatment regimen. However, for analysis purpose, the placebo groups are pooled. The list of hypothesis testing and the allocated alpha are listed below. For RESPIRE 1 study, the alpha level of 0.025 for each hypothesis test is based on Bonferroni method for multiplicity adjustment. For RESPIRE 2 study, the alpha level of 0.001 and 0.049 is based on the arbitrary partition (as long as the total alpha = 0.05).  

RESPIRE 1 Study (Bonferroni method for multiplicity adjustment):
Hypothesis 1: ciprofloxacin DPI for 28 days on/off treatment regimen versus pooled placebo (alpha=0.025)
Hypothesis 2: ciprofloxacin DPI for 14 days on/off treatment regimen versus pooled placebo (alpha=0.025)
RESPIRE 2 Study (arbitrary partition of alpha level for multiplicity adjustment):
Hypothesis 1: ciprofloxacin DPI for 28 days on/off treatment regimen versus pooled placebo (alpha=0.001)
Hypothesis 2: ciprofloxacin DPI for 14 days on/off treatment regimen versus pooled placebo (alpha=0.049)
The study results indicate some efficacy, but not consistent across all four hypothesis tests. For details about the study results, please see FDA's advisory committee briefing bookstudy results for RESPIRE 1, and study results for RESPIRE 2 on clinicaltrials.gov.

The study also included a long list of the secondary efficacy endpoints. To control the overall type I error rate associated with testing primary and secondary endpoints in two treatment regimens (Cipro 14 and Cipro 28) against placebo, separate hierarchical testing sequences of primary, key secondary and other secondary endpoints were pre-specified for each regimen with statistical testing at α=0.025 for each Cipro arm in RESPIRE 1 and α=0.001 for Cipro 28 and α=0.049 for Cipro 14 in RESPIRE 2. If the primary endpoint was significant for a Cipro regimen then the next endpoint in the sequence (i.e., key secondary endpoint) was tested within that Cipro regimen. Statistical testing would only continue to the next endpoint in the hierarchy if the preceding endpoint in the hierarchy showed significance. Endpoints which could not be statistically tested were considered to be exploratory. The hierarchical testing strategy is shown in Figure 2.

Unfortunately, the hierarchical strategy did not work well and majority of the secondary endpoints were not tested because the insignificant results in primary efficacy endpoints. As mentioned in FDA's briefing book:
Under the pre-specified hierarchical strategy, confirmatory testing of the first secondary endpoint (frequency of exacerbations) against Pooled Placebo, and all subsequent endpoints, could not be performed for Cipro 28 (both trials) and for Cipro 14 (RESPIRE 2) because the respective findings for the primary endpoint of TFE were not significant. In RESPIRE 1, confirmatory testing of Cipro 14 could only be performed up to the first secondary endpoint (FOE) which failed to show significance. With the exception of a statistically significant finding observed for one comparison (i.e., Cipro 14 day vs. Pooled Placebo for the primary endpoint in RESPIRE 1), all other comparisons were considered to be exploratory or not statistically significant. As indicated in Figure 2 there was the potential for up to 32 comparisons to show statistical significance (8 endpoints in each of two Cipro arms across two trials).
FDA advisory committee was not convinced by the evidence of the ciprofloxacin DPI efficacy. Here is the voting result. It is unlikely for FDA to approve a product with such a voting result even though there is currently no approved drug for treating non-cystic fibrosis bronchiectasis.

Had a different study design and different method for multiplicity adjustment been used, the situation might be very different. The evidence for the experimental drug might be more obvious if a simpler study design was used - at least this is the situation for 14 day on/off regimen versus placebo.

We are now closely watching the fate of Aradigm's NDA for Ciprofloxacin in treating non-CF bronchiectasis. Aradigm's pivital studies (Orbit 3 and Orbit 4) are simpler in study design with one of two studies positive. One thing is for sure: there will not be the complicated situations in dealing with the multiplicity adjustment. 


Saturday, November 25, 2017

Co-primary endpoints and multiple primary endpoints

In recent FDA guidance 'Multiple Endpoints in Clinical Trials' and EMA guidance 'Guideline on multiplicity issues in clinical trials', the term 'co-primary endpoints' and 'multiple primary endpoints' are clarified.

Historically, the term 'co-primary endpoints' was used for different meanings in different clinical trial protocols, statistical analysis plans, and journal articles. In many cases, the term 'co-primary endpoints' was inappropriately used for really 'multiple primary endpoints'.

Co-primary endpoints should only be used when there are more than one primary endpoint and declare the study success only if both primary endpoints are statistically significant in favor of the experimental treatment. When co-primary endpoints are used, each primary endpoint is tested at significant level of 0.05. There is no multiplicity issue involved.

In contrary, the term 'multiple primary endpoints' should be used if there are more than one primary endpoint and declare the study success if either one of the primary endpoints is statistically significant in favor of the experimental treatment. In this case, each primary endpoint is tested at a significant level determined by the method for multiplicity adjustment or simply by the partition of the alpha levels.

Here is what EMA guidance 'guideline on multiplicity issues in clinical trials' says:
If more than one primary endpoint is used to define study success, this success could be defined by a  positive outcome in all endpoints or it may be considered sufficient, if one out of a number of endpoints has a positive outcome. Whereas in the first definition the primary endpoints are designated  as co-primary endpoints, the latter case is different and would require appropriate adjustment for multiplicity. More generally, in case of more than two primary endpoints, adjustment is needed if not all endpoints need to be significant to define study success, and the inability to exclude deteriorations in other primary endpoints would have to be considered in the overall benefit/risk assessment.
In FDA's guidance 'multiple endpoints in clinical trials', the term 'co-primary endpoints' was extensively discussed and the examples of co-primary endpoints were provided. In section C of the guidance, it says:
For some disorders, there are two or more different features that are so critically important to the disease under study that a drug will not be considered effective without demonstration of a treatment effect on all of these disease features. The term used in this guidance to describe this circumstance of multiple primary endpoints is co-primary endpoints. Multiple primary endpoints become co-primary endpoints when it is necessary to demonstrate an effect on each of the endpoints to conclude that a drug is effective.
The guidance provided the following examples of co-primary endpoints where both co-primary endpoints needed to be statistically significant in order to declare the trial success:

  • A recent approach to studying treatments is to consider a drug effective for migraines only if pain and an individually-specified most bothersome second feature are both shown to be improved by the drug treatment. 
  • Drugs for Alzheimer’s disease have generally been expected to show an effect on both the defining feature of the disease, decreased cognitive function, and on some measure of the clinical impact of that effect. Because there is no single endpoint able to provide convincing evidence of both, co-primary endpoints are used. One primary endpoint is the effect on a measure of cognition in Alzheimer’s disease (e.g., the Alzheimer’s Disease Assessment Scale-Cognitive Component), and the second is the effect on a clinically interpretable measure of function, such as a clinician’s global assessment or an Activities of Daily Living Assessment.
In an article by Kantarjian et al “Decitabine improves patients outcome in myelodysplastic syndromes: results of a phase III randomized study”, the term ‘coprimary endpoints’ was incorrectly used for ‘multiple endpoints’ even though the multiplicity adjustment method (Bonferroni correction) was appropriately applied.

The coprimary endpoints in the current study were ORR and time to AML transformation or death. Response was assessed according to the International Working group (IWG) criteria……Two analyses, one interim and one final, were planned using the stopping rules of O’Brien and Fleming. The overall type 1 error rate was maintained at a maximum of 5% by applying a Bonferroni correction for the coprimary endpoints at the final analysis. A maximum P value of .024 was required to establish statistical significance using a 2-sided analysis for either of the coprimary endpoints (ORR or time to AML or Death).

In an article by McLaughlin et al "Bosentan added to sildenafil therapy inpatients with pulmonary arterialhypertension", the term of co-primary endpoints was used for a situation that 'multiple endpoints' should be used. Noticed that the original protocol used a study design with two primary endpoints with partition of alpha-level (0.04 for time to morbidity/mortality and 0.01 for change in 6MWD) as an approach for multiplicity adjustment. 
The initial assumptions for the primary end-point were an annual rate of 21% on placebo with a risk reduced by 36% (hazard ratio (HR) 0.64) with bosentan and a negligible annual attrition rate. In addition, it was planned to conduct a single final analysis at 0.04 (two-sided), taking into account the existence of a co-primary end-point (change in 6MWD at 16 weeks) planned to be tested at 0.01 (two-sided). Over the course of the study, a number of amendments were introduced based on the evolution of knowledge in the field of PAH, as well as the rate of enrolment and blinded evaluation of the overall event rate. On implementation of an amendment in 2007, the 6MWD end-point was changed from a co-primary end-point to a secondary endpoint and the Type I error associated with the single remaining primary end-point was increased to 0.05 (two-sided).

Friday, November 03, 2017

SAD and MAD: Single Ascending Dose and Multiple Ascending Dose first-in-human studies

The acronym is everywhere in clinical trials. Previously I mentioned that in 21st Century Cure Act, an acronym RAT was used for Regnerative Advanced Therapy designation – the term ‘RAT’ was criticized and later was changed to MRAT(Regenerative Medicine Advanced Therapy) in FDA’s implementations.

Now we have a pair of names SAD and MAD commonly used in early phase clinical trials. It does not mean anybody will be sad or mad. A sponsor should be happy (not SAD or MAD) when its development program can progress into the clinical trial stage.
SAD stands for single ascending dose and MAD stands for multiple ascending dose. SAD and MAD studies are typically the first-in-human (FIH) studies. They seek to gain information on safety and tolerability, general pharmacokinetic (PK), and pharmacodynamic (PD) characteristics, and identify the maximum tolerated dose (MTD). SAD/MAD study can also be used to test the cardiac safety and evaluate QT/QTc prolongations.

There may be a lot of dose escalation studies that belong to SAD and MAD studies even though the SAD/MAD terms are not used. For example, the popular 3+3 design is one type of the SAD/MAD study with focuses on safety and tolerability.  

SAD/MAD studies are usually conducted in healthy volunteers in clinical research unit (CRU) or phase I unit. But they can be conducted in patients when it is unethical to test the experimental drug (for example, the oncology drugs and plasma-derived drugs) in healthy volunteers. SAD/MAD studies can be combined into one study within the same study protocol or conducted as two separate studies.

For SAD studies, the starting dose is based on the pre-clinical and animal studies. For MAD studies, the starting dose is usually based on results from the SAD study.

From the PK assessment standpoint, in SAD studies, each subject receives a single dose and the series PK samples can be taken to evaluate the PK profiles after single dose. The study will be conducted on cohort basis. Subjects within each cohort receive the same level of dose. In MAD studies, each subject receives multiple doses. After the steady state is achieved, the series PK samples will be taken to evaluate the PK profiles at the steady state. The study is conducted on cohort basis. Subjects within each cohort will receive the same level of dose. With the PK results from SAD/MAD studies, dose linearity and dose proportionality can be evaluated.   

From the safety assessment standpoint, in both SAD/MAD situations, the first cohort of subjects receive the lowest dose (starting dose). Subjects are usually confined in Clinical Research Unit (CRU) with close safety monitoring. After each cohort, safety and tolerability will be assessed to determine if the next cohort with higher dose should be continued. The safety evaluation after each cohort is usually performed by the internal team within the sponsor, but can certainly be performed by the independent committee such as data and safety monitoring committee (DSMB). With the safety data, the maximum tolerated dose (MTD) may be identified.   

In SAD/MAD studies, within each cohort, placebo control can be added. Depending on whether there is a concurrent placebo control group, the SAD/MAD studies could have the following types.
  • SAD without placebo control
  • SAD with placebo control
  • MAD without placebo control
  • MAD with placebo control

When placebo group is added to the SAD/MAD study, to avoid too many subjects in placebo group for the final analysis, it is very common to use a n:1 randomization ratio within each cohort, For the final analysis, subjects in placebo group across all cohorts are pooled together.

Here are a couple of examples for SAD/MAD study designs – they are extracted from a presentation slide I made almost 20 years ago, but is still relevant:

Further Reading/References: