Sunday, May 13, 2018

Grading the Severity of AEs and its Impact on AE Reporting


For all adverse events including serious adverse events in clinical trials, severity (or intensity) should be assessed and recorded. AE severity used to be called AE intensity. Nowadays, severity is more commonly used. The assessment of severity is based on the investigator’s clinical judgement, therefore, there are lot of subjective judgement in the AE severity assessment/reporting.

There seems to be three different grading scale in assessing/recording the severity:

Mild, Moderate, and Severe
This is commonly used in non-oncology studies. The definition of the mild, moderate, and severe may be different from one study protocol to another. The severity (intensity) of each AE including SAE recorded in the CRF should be assigned to one of the following categories:
  • Mild: An event that is easily tolerated by the subject, causing minimal discomfort and not interfering with everyday activities.
  • Moderate: An event that is sufficiently discomforting to interfere with normal everyday activities.
  • Severe: An event that prevents normal everyday activities.

 or 
  • Mild: awareness of sign or symptom, but easily tolerated
  • Moderate: discomfort sufficient to cause interference with normal activities
  • Severe: incapacitating, with inability to perform normal activities

CTCAE 
In oncology clinical trials, the AE severity is usually graded according to NCI’s AE Severity Grading Scale -  Common Terminology Criteria for Adverse Events (CTCAE). CTCAE can also be used to grade the AE for non-oncology studies, but generally not appropriate for studies using healthy volunteers.
  • Grade 1 Mild; asymptomatic or mild symptoms; clinical or diagnostic observations only; no intervention indicated
  • Grade 2 Moderate; minimal, local or noninvasive intervention indicated; limiting age-appropriate instrumental ADL
  • Grade 3 Severe or medically significant but not immediately lifethreatening; hospitalization or prolongation of hospitalization indicated; disabling; limiting self care ADL
  • Grade 4 Life-threatening consequences; urgent intervention indicated.
  • Grade 5 Death related to AE.

Vaccine's Trials
In FDA’s guidance on vaccine trials “Toxicity GradingScale for Healthy Adult and Adolescent Volunteers Enrolled in PreventiveVaccine Clinical Trials”, the AE severity based on clinical abnormalities and laboratory abnormalities was graded as
  • Mild (grade 1)
  • Moderate (Grade 2)
  • Severe (Grade 3)
  • Potentially Life Threatening (Grade 4)

In statistical summaries, the grade 1 is counted as ‘mild’, the grade 2 as ‘moderate’, >= grade 3 will be counted as ‘severe’.  

During the course of an adverse event, the severity may change – which may have impact on how we report the adverse event.

In one of the previous posts ‘SAE Reconciliation and  Determining / recording the SAE Onset Date’, we discussed that an AE with the change in seriousness might need to be split into two events for recording: one non-serious AE with onset date of the first sign/symptom and one serious AE with onset date of the event meeting one of the SAE criteria. The similar issue arises when we try to record the AE with severity change.

The most common instruction for AE recording is that when there is severity change, a new AE should be recorded. Here are some example instructions:
Start Date
Record the date the adverse event started. The date should be recorded to the level of granularity known (e.g., year, year and month, complete date) and in the specified format. If a previously recorded AE worsens, a new record should be created with a new start date. There should be no AE start date prior to the date of the informed consent. Any AE that started prior to the informed consent date belongs instead in the medical history. If an item recorded on the medical history worsens during the study, the date of the worsening is entered as an AE with the start date as the date the condition worsened.
End Date
Record the date the adverse event stopped or worsened.  The date should be recorded to the level of granularity known (e.g., year, year and month, complete date) and in the specified format.  If an AE worsens, record an end date and create a new AE record with a new start date and severity. 
If the AE increases in severity per the DAIDS Grading Table, a new AE Log CRF should be completed to document this change in severity.
the eCRF Completion Guidelines for adverse events:  Enter a new event if action taken, seriousness, causality, severity (intensity), etc. changes over the course of an adverse event.  A timestamp for any changes in events can be seen in the data via event start/stop dates.
However, this way of recording the adverse events may result in splitting the single event into multiple adverse events and may result in over reporting in the number of adverse events.

Suppose the subject experienced a headache adverse event, the event started with mild intensity, then progressed to moderate, and then went back to the mild intensity. Should this headache be reported as three separate adverse events (two with mild severity and one with moderate severity)? or Should it be reported as single event with moderate severity?

This question was submitted to FDA and the FDA response (see the link below) suggested that this should be reported as one event (with the maximum severity)


The second question and answer explicitly stated:

Question 2:


[Redacted] is the sponsor of the study. We have been advised by our data coordinating center to record an AE that changes in severity as two AEs instead of 1 AE - starting a new AE each time the severity changes. This convention is different than that of our previous coordinating center and has caused us great concern.

Answer 2:

We have concerns that an approach to adverse event reporting as you described below (i.e., a change in severity of an adverse event necessitates a new adverse event report) may inaccurately reflect the adverse event profile for the product. Therefore, we strongly recommend that you contact the FDA review division regulating this clinical investigation for additional input on the most scientifically and medically sound approach to the adverse event reporting specifically for this trial.


I recently submitted this same question to FDA’s OC GCPQuestions and Answers and got the following response:

Question: 
We constantly run into the issue how to record the adverse event in the database in the situation there is a severity change or seriousness change during the course of the adverse event.
 A subject in clinical trial reported a mild headache. Two days later, the headache became moderate in severity. Then headache became mild in severity again.
 In this case, shall we record this as one headache event with moderate severity or record as three headache events (a new event is record whenever there is a severity change)?
 Similarly, a subject in clinical trial reported a non-serious adverse event. Several days later, subject needs to be hospitalized for this adverse event – now the event meets the seriousness criteria.
 In a situation of a non-serious adverse event becoming serious, shall we record it as a single AE with seriousness or shall we record as two separate AEs (one non-serious AE and one serious AE)?

OC-GCP Response:
Given your brief description that the subject's headache is ongoing, it would seem that this adverse event would best be reported as a single event with variable severity. However, the clinical judgment of the principal investigator (or, if the principal investigator is not a clinician, then a physician consultant to the research) would be helpful in clarifying the symptoms and hence the reporting of the adverse event(s). There are several cogent clinical scenarios the understanding of which would require more information than you have supplied. For example, the subject's symptomatology could represent an unremitting headache of several days duration or episodic headaches of finite duration with varying intensities or a symptom of another event altogether such as a change in blood pressure, etc. The same would apply for the hospitalization event.
 To best sort out the adverse event(s) itself and therefore the appropriate reporting, I would recommend a clinical assessment of the headache. In addition, the protocol may have detailed how adverse events should be reported. As well, the sponsor (I'm not sure of [Redacted] status in this trial, i.e., is/is not the sponsor) may have specifications for adverse event reporting that could guide you. If you still feel uncertain, I would strongly recommend contacting the FDA review division regulating this trial.
 Lastly, if it becomes apparent that this same "fact pattern" recurs, it may be advisable for the sponsor to clearly articulate standards for adverse event reporting such that there can be consistency in reporting of headaches.
From the statistical analysis standpoint, whether or not it is recorded as one event with maximum severity or multiple events with various seventies do not have impact on our calculation of the incidence of AEs. However, it will have great impact on the calculation of the number of AEs.

It is the common understanding that if an event recorded on the medical history worsens during the study or after the initiation of the study drug, a new AE should be recorded and the date of the worsening is entered as the new AE onset date

Sunday, April 08, 2018

Clinical Trials Using Historical Control in Rare Disease Drug Development


While the randomized, control clinical trial has become and remains to be the golden standard in drug development, we also see the increased use of non-randomized, single-arm study where the effectiveness of the testing drug is compared to the historical control.

When a pivotal study is a single arm and has no concurrent control, the results from the study will need to be compared to the historical control or a common standard that has been accepted by the medical community or regulatory agencies. This seems to be more common in the oncology area and in rare disease areas.

Without any specific statistics, I can only say this is my impression that the historical control seems to be more accepted by the US FDA in its approvals in oncology and rare disease areas. Here are three examples of recent drug approvals based on historical control:

Venetoclax in Relapsed / Refractory Chronic Lymphocytic Leukemia (CLL)

The pivotal study is a single-arm study without concurrent control and the primary efficacy endpoint is objective response rate (ORR). The result is compared with the historical / standard rate of 40%.

eteplirsen in Duchenne Muscular Dystrophy (DMD)

The original study was a randomized, double-blind, placebo control study with three arms (eteplirsen 30 mg/kg weekly; eteplirsen 50 mg/kg weekly, and placebo) – 4 subjects in each arm for a total 12 subjects. All subjects including placebo subjects were rolled over to an open-label extension study for long-term assessment.

The results from double-blind portion of the study did not provide the strong evidence for efficacy. The sponsor conducted a post-hoc comparison with a historical control.

FDA was not convinced with eteplirsen’s efficacy and conducted the advisory committee. In the end, the eteplirsen was approved as the first treatment in Duchenne Muscular Dystrophy with a lot of controversies. The comparison with historical control (while hotly debated) was a big part of the evidence contributing to the approval.

Brineura for Batten Disease

The entire clinical program included: 
  • A natural history study with 69 subjects (42 evaluable)
  • A Phase 1/2 FIM single-arm study with 24 subjects (23 completed)
  • A long-term follow-up study with 23 subjects

Natural history study was based on registry data; Provided the basis as the historical control group
Comparability between natural history study and phase 1/2 study were extensively debated during the review process.

FDA finally approved Brineura for Treating Batten disease in 2017.

The use of historical control is not new. It was stated in the ICH guideline E10 “CHOICE OF CONTROL GROUP AND RELATED ISSUES IN CLINICAL TRIALS”
Historical control was again mentioned in FDA’s guidance for industry “Rare Diseases: Common Issues in Drug Development”. FDA encourages the natural history study to establish the historical control.

During the FDA advisory committee meeting, Dr Temple gave a presentation about "Historically Controlled Trials": see FDA's presentation slides (Dr Temple's presentation started from page 20).

In FDA's statistical review of eteplirsen in DMD, the following comments were made on the use of historical control: 
Historical Control Cohort
The comparison of eteplirsen with historical controls was not part of an adequate and wellcontrolled study. The applicant obtained historical data after observations were made for the eteplirsen patients. Historical data were obtained from 2 DMD patient registries (Italian DMD Registry and the Leuven Neuromuscular Reference Center – NMRC) for comparison to eteplirsen-treated patients
……
According to the ICH E10 guidance on Control Group and Related Issues in Clinical Trials, the major and well-recognized limitation of externally controlled (including historical control) trials is inability to control bias. The best group and control group can be dissimilar with respect to a wide range of observable and unobservable factors that could affect outcome. It may be possible to match the historical control group to the test group in observed factors but there is no assurance for any unobserved factors. “The lack of randomization and blinding, and the resultant problems with lack of assurance of comparability of test group and control group, make the possibility of substantial bias inherent in this design and impossible to quantitate.”
 Because of the serious concern about the inability to control bias, the use of the external control design is restricted only to unusual circumstances.
  • ICH E10 states that “an externally controlled trial should generally be considered only when prior belief in the superiority of the test therapy to all available alternatives is so strong that alternative designs appear unacceptable…” However, such prior belief does not exist for eteplirsen.
  • ICH E10 states that “use of external controls should be limited to cases in which the endpoints are objective…” however, performance on the 5-minute walk test can be influenced by motivation. Patients may not achieve maximal 6MWT due to concerns of falling or injury, or patients could try harder with encouragement and with the expectation that the drug might be effective.
  • Pocock’s criteria for acceptability of a historical control group require that “the methods of treatment evaluation must be the same,” and “the previous study must have been performed in the same organization with largely the same clinical investigators.” This is especially important when assessing endpoints such as 6MWT, in contrast to hard endpoints such as mortality. For this NDA, these requirements are not met.
Moreover, the historical control group was identified post-hoc in this NDA, leading to potential selection bias that cannot be quantified. If a historical control is to be utilized, selection of the control group and matching on selection criteria should be prospectively planned without knowing the outcome of the drug group and control group.
 Based on ICH E10, “a consequence of the recognized inability to control bias is that the potential persuasiveness of findings from externally controlled trials depends on obtaining much more extreme levels of statistical significance and much larger estimated differences between treatments than would be considered necessary in concurrently controlled trials.” The success criteria for this historical control study were not discussed or pre-specified in the protocol.
 Given all these concerns, including issues of comparability of eteplirsen-treated patients and historical control cohort patients, the fact that 6MWT is not a “hard” efficacy endpoint, the potential of selection bias due to the post-hoc identification of the control cohort by the applicant, and all the known pitfalls with the use of historical controls, the comparison of the eteplirsen with historical control is not statistically interpretable. 


However, even though the statistical review on the use of historical control was very negative, the eteplirsen was still approved as the first drug treating the DMD and presumably the results from the comparison to historical control played the pivotal role in decision. 

In general, the use of historical control can be accepted in some situations especially in rare disease area where there is no approved drug available. Whenever the historical control is used, the following factors need to be considered:
  • If the proposed historical control cohort is a priori or post hoc. It is encouraged to collect historical control information from natural history studies. 
  • if the patient population from historical control is comparable
  • If the outcome measurement is comparable
  • if the outcome measurement is a hard endpoint (such as death) or soft endpoint (such as 6MWD)
  • If the endpoint measure is easily affected by other factors
  • if the endpoint is a soft endpoint (such as objective response rate ORR), whether or not any approach is implemented to avoid the bias (such as using central reader)

Saturday, April 07, 2018

Generating graph / figure in publication quality

As I was recently preparing a poster for ATS Internal Conference, I was told that the plots I provided were not in high quality. When placing the plots on the poster, they became blurry. I realized that the issue was with the DPI. 

DPI is used to describe the resolution number of dots per inch in a digital print and the printing resolution of a hard copy print dot gain. High DPI = High Resolution.

The journal may have a requirement for the minimum resolution for the graphs and figures, for example, the PLOT One requires the resolution in the range of DPI 300-600. The Science magazine has the following requirement: 
Resolution. For manuscripts in the revision stage, adequate figure resolution is essential to a high-quality print and online rendering of your paper. Raster line art should have a minimum resolution of 600 dots per inch (dpi) and, preferably, should have a resolution of 1200 dpi. Grayscale and color artwork should have a minimum resolution of 400 dpi, and a higher resolution if possible.
Wiley had a paper discussing the challenges the authors might face for providing the high resolution figures. See the editorial: How to meet dots per inch requirements for images

I used SAS procedure sgplot to create the plots. The default DPI is 100, which is too low for publication or poster. Fortunately, there are easy ways to change the DPI for the output plots. Below are some programs for doing so: 

*default DPI=600; low resolution;
ods listing gpath='c:Temp\';
ods graphics on;
proc sgplot data=sashelp.stocks (where=(date >= "01jan2000"d
                                 and date <= "01jan2001"d
                                 and stock = "IBM"));
   title "Stock Volume vs. Close";
   vbar date / response=volume;
   vline date / response=close y2axis;
run;
ods graphics off;
ods listing close;


*set DPI=400;
ods listing gpath="c:\Temp\" dpi=400;
ods graphics on;
proc sgplot data=sashelp.stocks (where=(date >= "01jan2000"d
                                 and date <= "01jan2001"d
                                 and stock = "IBM"));
   title "Stock Volume vs. Close";
   vbar date / response=volume;
   vline date / response=close y2axis;
run;
ods graphics off;
ods listing close; 

*set DPI =400 and also use style=journal;
ods listing gpath="c:\Temp\" style=journal dpi=400;
ods graphics on;
proc sgplot data=sashelp.stocks (where=(date >= "01jan2000"d
                                 and date <= "01jan2001"d
                                 and stock = "IBM"));
   title "Stock Volume vs. Close";
   vbar date / response=volume;
   vline date / response=close y2axis;
run;
ods graphics off;

ods listing close;

When high DPI is chosen, the size of the file will increase. For the same plot, the file sizes for three plots (in png format) above are 26, 160, and 158 kb. 

The following program will service the same purpose, but use ods pdf commend. 

ods pdf file="c:\Temp\test.pdf";
proc sgplot data=sashelp.stocks (where=(date >= "01jan2000"d
                                 and date <= "01jan2001"d
                                 and stock = "IBM"));
   title "Stock Volume vs. Close";
   vbar date / response=volume;
   vline date / response=close y2axis;
run;
ods pdf close;


ods pdf file="c:\temp\test.pdf" dpi=600;
proc sgplot data=sashelp.stocks (where=(date >= "01jan2000"d
                                 and date <= "01jan2001"d
                                 and stock = "IBM"));
   title "Stock Volume vs. Close";
   vbar date / response=volume;
   vline date / response=close y2axis;
run;
ods pdf close;


ods pdf file="c:\Temp\test.pdf" style=journal dpi=600;
proc sgplot data=sashelp.stocks (where=(date >= "01jan2000"d
                                 and date <= "01jan2001"d
                                 and stock = "IBM"));
   title "Stock Volume vs. Close";
   vbar date / response=volume;
   vline date / response=close y2axis;
run;
ods pdf close;

The same issue with graph quality is also true when we use R. There is a RES = option to select the desired DPI. Please see the blog post by Daniel Hocking "High Resolution Figures in R".

Monday, March 05, 2018

Handling Randomization Errors in Clinical Trials with Stratified Randomization

The stratified randomization is very common in randomized, controlled clinical trials. The usage of the stratified randomization has been discussed in previous posts. 
While the stratified randomization has its benefits, it does not mean the more stratification factors are  better. The more stratification factors we have, the more easily the randomization error of using a wrong stratum can occur. 

It becomes common to utilize the interactive response technology (IRT) system such as interactive response system (IVR) or interactive web response (IWR) systems for implementing the randomization and treatment assignments. The IRT system usually has to go through extensive quality control (QC) and user acceptance test (UAT) before the implementation, therefore the randomization errors can be minimized. Comparing to the manual randomization process, the randomization error rate is lower in studies with IRT system for implementing the randomization. 

However, the use of IRT system requires the investigation site staff (pharmacist, investigator, or study coordinator) to enter the stratification information at the time of randomization. The site staff can enter the incorrect stratification information into the IRT system, the treatment assignment will then be pulled from the wrong stratum. The randomization error due to choosing a wrong stratum is probably the most common randomization error we see in clinical trials with stratified randomization. The more stratification factors we have, the more likely incorrect stratum can be chosen. 

In addition to the number of stratification factors, ambiguous description / definition of the randomization stratum and lack of clarity about source of measurement (for example, the local lab or central lab results for a lab related stratification factor) can all contribute to choosing an incorrect stratum for randomization. 

For example, in a clinical trial in neurology area, the sponsor plan to have patients stratified by their use of cholinesterase inhibitors, corticosteroid, immunosuppressant/immunomodulator. The following stratification factor is constructed.
  • Regimen includes only cholinesterase inhibitors
  • Regimen includes corticosteroid (CS) as the only
  • immunosuppressant/immunomodulator, alone or in combination with other MG medications (e.g., a subject on prednisone plus a cholinesterase inhibitor would be in this stratum)
Without appropriate training, it is likely that the site staff will choose a wrong category for the randomization.

It is also common that the stratification factor is based on one of the laboratory measures. The original laboratory measure is a continuous result and it is then categorized for the stratification purpose. In this case, the protocol must be clear whether or not the stratification will be based on the lab results from the local lab or central lab because the results from local versus central labs can be different. 

When a wrong stratification stratum is chosen for the randomization (the randomization error occurs), the natural reaction is trying to fix it. However, with the IRT system, it is not easy to go back to the system to fix the randomization error. Actually it is strongly encouraged not to try to fix the issue. 

"...the safest option is to accept the randomisation errors that do occur and leave the initial randomisation records unchanged. This approach is consistent with the ITT principle, since it enables participants to be analysed as randomised, and avoids further problems that can arise when attempts are made to correct randomisation errors. A potential disadvantage of accepting randomisation errors is that imbalance could be introduced between the randomised groups in the number of participants or their baseline characteristics. However, any imbalance due to randomisation errors is expected to be minimal unless errors are common. Imbalance can be monitored by an independent data monitoring committee during the trial and investigated by the trial statistician at the analysis stage."
It is true that if randomization errors can skew the analyses especially when the occurrence of the randomization errors is not infrequent. In a paper by Ke et al "On Errors in Stratified Randomization", the impact of the randomization errors on treatment balance and properties of analysis approaches was evaluated. 

If there are a lot of randomization errors, the study quality and integrity will be questioned. From the statistical analysis standpoint, the strict intention-to-treat analysis may not be appropriate. With significant number of randomization errors with incorrect treatment assignment, we may need to analyze the data using 'as treated' instead of 'asrandomized'. With significant number of randomization errors due to incorrect selection of the randomization stratum, we may need to base the stratum information from the clinical database (assuming it is correctly recorded) instead of from the information used in IRT system. 

When randomization errors are identified during a study, the root cause of the error should be investigated. Additional training may be needed to prevent the further occurrence of the randomization error. 


Thursday, February 22, 2018

New FDA Guidance Documents for Drug Development in Neurological Conditions Aiming to Ease the Drug Approval Paths


This month, we saw FDA issued five guidance documents for drug development in five different neurological conditions/diseases (Alzheimer’s disease, DMD, ALS, Migraine, and Pediatric epilepsy). These newly issued guidance documents are intended to ease the drug approval requirements or offer the charities for the drug development pathway.

We think that this is a general trend in FDA and we expect that the similar guidance documents will be issued for other conditions/diseases aiming to ease the requirements for drug development – eventually speed up the drug development process, and the innovative drugs available to patients.
“Today I’m pleased to issue five guidance documents that benefited from the streamlined approach of this pilot as part of a broader, programmatic focus on advancing treatments for neurological disorders that aren’t adequately addressed by available therapies. These guidance documents provide details on how researchers can best approach drug development for certain neurological conditions – Duchenne muscular dystrophy (DMD) and closely related conditions, migraine, epilepsy, AD and ALS. These guidance documents provide our current thinking and sound regulatory and scientific advice for product developers so that safe and effective treatments can ultimately be made available to patients. These documents are each a culmination of thoughtful scientific collaboration within the agency and incorporate important input from patients, researchers and advocates. We hope that providing up-to-date, clear information about our scientific expectations, such as clinical trial design and ways to measure effectiveness, will save companies time and resources and ultimately, bring effective new medicines to patients more efficiently.”
Below is a table to summarize the key points from these five guidance documents:
Indication
Guidance Title
Key Points
Alzheimer's disease
  • No longer requiring co-primary efficacy endpoints to show the benefit in both cognitive and functional (or global) measures
  • Staging the AZ as four different stages accepting different endpoints for different stages
  • Allowing biomarker effects to be the primary
  • endpoint in patients with Alzheimer pathology but no current symptoms

Duchenne muscular dystrophy (DMD) and related conditions
  • Emphasizing the difficulties in designing trials of drugs for these conditions. 
  • Efficacy endpoints, which basically leave it up to individual study sponsors to discuss with FDA staff the best approach on a case-by-case basis
  • the DMD guideline did not do, is open a path for approval based solely on biomarker effects such dystrophin levels in muscle, although, effects on objective measures such as respiratory and cardiac muscle function can be used to support approval.

Amyotrophic lateral sclerosis (ALS)
  • Offering more clarity
  • Efficacy must be demonstrated at "clinically meaningful" levels for symptoms, function, or survival -- period
Migraine
  • Sponsors would no longer be required to conduct trials addressing four different classes of symptoms: pain, nausea, photophobia, and phonophobia.
  • Trials will only need two primary endpoints: pain reduction and effects on individual patients' "most bothersome symptom."

Pediatric epilepsy

  • For drugs intended for children age 4 and older with partial onset seizures, the FDA will no longer require that efficacy trials be conducted in children. The agency will now consider efficacy data from adult patients to be sufficient for pediatric approval.



Monday, February 12, 2018

Weighted Bonferroni Method (or partition of alpha) in Clinical Trials with Multiple Endpoints


In a previous post, the terms of ‘multiple endpoints’ and ‘co-primary endpoints’ were discussed. If a study contains two co-primary efficacy endpoints, study is claimed to be successful if both endpoints have statistical significance at alpha=0.05 (no adjustment for multiplicity is necessary). If a study contains multiple (two) primary efficacy endpoints, the study is claimed to be successful if either endpoint is statistically significant. However, in later situation, the adjustment for multiplicity is necessary to maintain the overall alpha at 0.05. In other words, for hypothesis test for each individual endpoint, the significant level alpha is less than 0.05.

The most simple and straightforward approach is to apply the Bonferroni correction. The Bonferroni correction compensates for the increase in number of hypothesis tests. each individual hypothesis is tested at a significance level of alpha/m, where alpha is the desired overall alpha level (usually 0.05) and m is the number of hypotheses. If there are two hypothesis tests (m=2), each individual hypothesis will be tested at alpha=0.025.

In FDA guidance 'Multiple Endpoints in Clinical Trials', the Bonferroni Method was described as the following:
The Bonferroni method is a single-step procedure that is commonly used, perhaps because of its simplicity and broad applicability. It is a conservative test and a finding that survives a Bonferroni adjustment is a credible trial outcome. The drug is considered to have shown effects for each endpoint that succeeds on this test. The Holm and Hochberg methods are more powerful than the Bonferroni method for primary endpoints and are therefore preferable in many cases. However, for reasons detailed in sections IV.C.2-3, sponsors may still wish to use the Bonferroni method for primary endpoints in order to maximize power for secondary endpoints or because the assumptions of the Hochberg method are not justified. The most common form of the Bonferroni method divides the available total alpha (typically 0.05) equally among the chosen endpoints. The method then concludes that a treatment effect is significant at the alpha level for each one of the m endpoints for which the endpoint’s p-value is less than α /m. Thus, with two endpoints, the critical alpha for each endpoint is 0.025, with four endpoints it is 0.0125, and so on. Therefore, if a trial with four endpoints produces two-sided p values of 0.012, 0.026, 0.016, and 0.055 for its four primary endpoints, the Bonferroni method would compare each of these p-values to the divided alpha of 0.0125. The method would conclude that there was a significant treatment effect at level 0.05 for only the first endpoint, because only the first endpoint has a p-value of less than 0.0125 (0.012). If two of the p-values were below 0.0125, then the drug would be considered to have demonstrated effectiveness on both of the specific health effects evaluated by the two endpoints. The Bonferroni method tends to be conservative for the study overall Type I error rate if the endpoints are positively correlated, especially when there are a large number of positively correlated endpoints. Consider a case in which all of three endpoints give nominal p-values between 0.025 and 0.05, i.e., all ‘significant’ at the 0.05 level but none significant under the Bonferroni method. Such an outcome seems intuitively to show effectiveness on all three endpoints, but each would fail the Bonferroni test. When there are more than two endpoints with, for example, correlation of 0.6 to 0.8 between them, the true family-wise Type I error rate may decrease from 0.05 to approximately 0.04 to 0.03, respectively, with negative impact on the Type II error rate. Because it is difficult to know the true correlation structure among different endpoints (not simply the observed correlations within the dataset of the particular study), it is generally not possible to statistically adjust (relax) the Type I error rate for such correlations. When a multiple-arm study design is used (e.g., with several dose-level groups), there are methods that take into account the correlation arising from comparing each treatment group to a common control group.
The guidance also discussed the weighted Bonferroni approach:
The Bonferroni test can also be performed with different weights assigned to endpoints, with the sum of the relative weights equal to 1.0 (e.g., 0.4, 0.1, 0.3, and 0.2, for four endpoints). These weights are prespecified in the design of the trial, taking into consideration the clinical importance of the endpoints, the likelihood of success, or other factors. There are two ways to perform the weighted Bonferroni test:  
  • The unequally weighted Bonferroni method is often applied by dividing the overall alpha (e.g., 0.05) into unequal portions, prospectively assigning a specific amount of alpha to each endpoint by multiplying the overall alpha by the assigned weight factor. The sum of the endpoint-specific alphas will always be the overall alpha, and each endpoint’s calculated p-value is compared to the assigned endpoint-specific alpha.
  • An alternative approach is to adjust the raw calculated p-value for each endpoint by the fractional weight assigned to it (i.e., divide each raw p-value by the endpoint’s weight factor), and then compare the adjusted p-values to the overall alpha of 0.05.
These two approaches are equivalent

The guidance mentioned that reason for using the weighted Bonferroni test are:
  • Clinical importance of the endpoints
  • The likelihood of success
  • Other factors
Other factors could include:
  • With two primary efficacy endpoints, the expectation for regulatory approval for one endpoint is greater than another
  • Sample size calculation indicates that the sample size that is sufficient for primary efficacy endpoint #1 is overestimated for the primary efficacy endpoint #2 
With the weighted Bonferroni correction, the weights are subjective and are essentially arbitrarily selected which results in the partition of unequal significant levels (alphas) for different endpoints.

There are a lot of applications of Bonferroni and weighted Bonferroni in practice. Here are some examples: 
In the publication Antonia 2017 "Durvalumab after Chemoradiotherapy in Stage III Non–Small-Cell Lung Cancer", two coprimary end points were used in the study
The study was to be considered positive if either of the two coprimary end points, progression free or overall survival, was significantly longer with durvalumab than with placebo. Approximately 702 patients were needed for 2:1 randomization to obtain 458 progression-free survival events for the primary analysis of progressionfree survival and 491 overall survival events for the primary analysis of overall survival. It was estimated that the study would have a 95% or greater power to detect a hazard ratio for disease progression or death of 0.67 and a 85% or greater power to detect a hazard ratio for death of 0.73, on the basis of a log-rank test with a two-sided significance level of 2.5% for each coprimary end point.
However, in the original study protocol, the weighted Bonferroni method was used and unequal alpha levels were assigned to OS and PFS.  
The two co-primary endpoints of this study are OS and PFS. The control for type-I error, a significance level of 4.5% will be used for analysis of OS and a significance level of 0.5% will be used for analysis of PFS. The study will be considered positive (a success) if either the PFS analysis results and/or the OS analysis results are statistically significant.
Here, a weight of 0.9 (resulting in an alpha 0.9 x 0.05 = 0.045) was given to OS and a weight of 0.1 (resulting in an alpha 0.1 x 0.05 = 0.005) was given to PFS.

In COMPASS-2 Study (Bosentan added to sildenafil therapy in patients with pulmonary arterial hypertension), the original protocol contained two primary efficacy endpoints and weighted Bonferroni method (even though it was not explicitly mentioned in publication) was used for multipolicy adjustment. A weight of 0.8 (resulting in an alpha 0.8 x 0.05 = 0.04) was given to time to first mortality/morbidity event and a weight of 0.2 (resulting in an alpha 0.2 x 0.05 = 0.01) was given to the change from baseline to Week 16 in 6MWD.
The initial assumptions for the primary end-point were an annual rate of 21% on placebo with a risk reduced by 36% (hazard ratio (HR) 0.64) with bosentan and a negligible annual attrition rate. In addition, it was planned to conduct a single final analysis at 0.04 (two-sided), taking into account the existence of a co-primary end-point (change in 6MWD at 16 weeks) planned to be tested at 0.01 (two-sided). Over the course of the study, a number of amendments were introduced based on the evolution of knowledge in the field of PAHs, as well as the rate of enrolment and blinded evaluation of the overall event rate. On implementation of an amendment in 2007, the 6MWD end-point was change from a co-primary end-point to a secondary endpoint and the Type I error associated with the single remaining primary end-point was increased to 0.05 (two-sided).
According to FDA’s briefing book on” Ciprofloxacin Dry Powder for Inhalation (DPI)
Meeting of the Antimicrobial Drugs Advisory Committee (AMDAC) “, the sponsor (Bayer) conducted two pivotal studies: RESPIRE 1 and RESPIRE 2. Each study contained two hypotheses. Interestingly, for multiplicity adjustment, the Bonferroni method was used for RESPIRE 1 study and the weighted Bonferroni method for RESPIRE 2 study. We can only guess why weights of 0.02 and 0.98 (resulting in a partition of alpha of 0.001 and 0.049) was chosen in RESPIRE 2 study
RESPIRE 1 Study:
  • Hypothesis 1: ciprofloxacin DPI for 28 days on/off treatment regimen versus pooled placebo (alpha=0.025)
  • Hypothesis 2: ciprofloxacin DPI for 14 days on/off treatment regimen versus pooled placebo (alpha=0.025)
RESPIRE 2 Study:
  • Hypothesis 1: ciprofloxacin DPI for 28 days on/off treatment regimen versus pooled placebo (alpha=0.001)
  • Hypothesis 2: ciprofloxacin DPI for 14 days on/off treatment regimen versus pooled placebo (alpha=0.049)

Thursday, February 01, 2018

Handling site level protocol deviations

In previous post, the CDISC data structure for protocol deviations was discussed. The protocol deviation data set (DV domain) is an event data set (just like how we record the adverse event). The tabulation data set should contain one record per protocol deviation per subject. In other words, each protocol deviation is always tied to each individual subject. In DV data set, each record of the protocol deviations should have an unique identifier for subject ID (usubjid).

There are situations where the protocol deviations are on the site level, not the subject level. For example, many study protocols have a specific requirement for handling the study drugs (or IP - investigational products). The study drug must be stored under the required temperatures. An temperature excursion occurs when a time temperature sensitive pharmaceutical product is exposed to temperature outside the ranges prescribed for storage. The temperature excursion may result in inactivation of the study drug efficacy or cause safety concern. If there are multiple subjects enrolled in the problematic site, the protocol deviation associated with temperature excursion will have impact on all subjects at this site - this is called the site level protocol deviation.

There is no specific discussion about documenting and handling site level protocol deviations in ICH and CDISC guidelines.

According to CDISC SDTM, Protocol Deviations should be captured in DV domain. According to current SDTM standard, all tabulation data sets including DV are designed for subject data (with the only exception of Trial Design info).

For site level deviations, the deviations are not associated with any specific subjects, they can not be directly included in the DV data set. There may be two ways to handle the site level protocol deviations:

  • Document the site level protocol deviations separately from the subject level protocol deviations. Then document them in Clinical Study Report (CSR) and in Study Data Reviewer's Guide (SDRG) if applicable.
  • If any site level deviation has impact on all or multiple subjects enrolled at that site, the specific deviation can be repeated for each affected subject
It is advisable to pre-specify the instructions for handling the site level protocol deviations so that the site level protocol deviations are recorded appropriately.


Identifying and recording the protocol deviations including site level protocol deviations should be an ongoing process during the conduct of the clinical trials. If we wait until the end of the study, we may have difficulties to determine if a specific site level deviation has impact on all subjects or partial subjects at that site.