Saturday, December 23, 2017

Composite Endpoint and Competing Risk Model

A competing risk is an event whose occurrence precludes the occurrence of the primary event of interest. For example, when the primary outcome is death due to cardiovascular causes, then death due to non-cardiovascular causes serves as a competing risk, because subjects who die of non-cardiovascular causes (e.g., death due to cancer) are no longer at risk of death due to a cardiovascular cause. However, when the primary outcome is all-cause mortality, then competing risks are absent, as there are no events whose occurrence precludes the occurrence of death due to any cause. In event-driven clinical trials, if a study subject drops out from the study prior to occurrence of the event in interest, the event of dropout precludes the occurrence of the event in interest, this is also a competing risk.

Competing risk issue occurs in clinical trials with a composite endpoint or an endpoint with composite outcome. A composite outcome consists of two or more component outcomes. Patients who have experienced any one of the events specified by the components are considered to have experienced the composite outcome. The main advantages supporting the use of a composite outcome are that it increases statistical efficiency because of higher event rates, which reduces sample size requirement, costs, and time; it helps investigators avoid an arbitrary choice between several important outcomes that refer to the same disease process; and it is a means of assessing the effectiveness of a patient reported outcome that addresses more than one aspect of the patient’s health status
It is common to use a composite endpoint in clinical trials, especially in clinical trials where the primary interest is to reduce the adverse outcomes, but the occurrence of these adverse outcomes may not be frequent enough. If we do a study with each individual component as the endpoint, the sample size required will be too large.

MACE (major adverse cardiac events) is a composite endpoint frequently used in clinical trials assessing the treatment effect in cardiac health. MACE is defined as any event of all-cause mortality, myocardial infarction, or stroke. If a patient died during the study, the MI or stroke will not be observed. If a MI or Stroke event occurred and the subject is discontinued from the study once one of these events occurred, the death event will not be observed – one component is a competing risk for another component.
In clinical trials in pulmonary arterial hypertension, the composite endpoint is used to evaluate the treatment effect in reducing the mortality and morbidity events. EMA guidance “GUIDELINE ON THE CLINICAL INVESTIGATIONS OF MEDICINAL PRODUCTS FOR THE TREATMENT OF PULMONARY ARTERIAL HYPERTENSION “ suggested the time to clinical worsening as the primary efficacy endpoint where the clinical worsening is defined as a composite endpoint consisting of:
1. All-cause death.
2. Time to non-planned PAH-related hospitalization.
3. Time to PAH-related deterioration identified by at least one of the following parameters:
  • increase in WHO FC;
  • deterioration in exercise testing
  • signs or symptoms of right-sided heart failure

Arterial Hypertension”, the primary end point in a time-to-event analysis was a composite of death or a complication related to pulmonary arterial hypertension, whichever occurred first, up to the end of the treatment period. The composite endpoint includes the following events:
  • death (all-cause mortality)
  • hospitalization for worsening of PAH based on criteria defined in the study protocol
  • worsening of PAH resulting in need for lung transplantation or balloon atrial septostomy initiation of parenteral (subcutaneous or intravenous) prostanoid therapy or chronic oxygen therapy due to worsening of PAH
  • disease progression (patients in modified NYHA/WHO functional class II or III at Baseline) confirmed by a decrease in 6MWD from Baseline (≥ 15%, confirmed by 2 tests on different days within 2 weeks) and worsening of NYHA/WHO functional class
  • disease progression (patients in modified NYHA/WHO functional class III or IV at Baseline) confirmed by a decrease in 6MWD from Baseline (≥ 15%, confirmed by 2 tests on different days within 2 weeks) and need for additional PAH-specific therapy.

There is a competing risk issue here, for example, lung transplantation and death are competing each other. If patient has a lung transplantation, the disease course will be changed, and the chance of death and occurrence of other events will be altered. 

A common approach to avoid the competing risk issue is to analyze the time to first event (any one of the components defined in the composite endpoint) as the primary efficacy endpoint even though this approach is often criticized because the importance / severity of these components is not equal (death should be given way more weight than other non-fatal events). FDA seems to be totally comfortable with the time to first event approach in both composite endpoint situation (as evidenced by the approval ofSelexipeg) and recurrent event situation (as evidenced by the FDA advisorycommittee meeting discussion). In a panel discussion at the regulatory-industry workshop in 2017 on the topic of Better Characterization of Disease Burden by Using Recurrent Event Endpoints (View Presentation), Drs Bob Temple and Norman Stockbridge both commented that FDA is fine with the time to fist event analysis as long as further analyses  are performed to evaluate the treatment effect on each individual component.

Competing risk model may be used in statistical analysis of the clinical trial data either as the primary method or as sensitivity analysis. In Schaapveld et al (2015) Second Cancer Risk Up to 40 Years after Treatment for Hodgkin’s Lymphoma, the competing risk model was used for analyzing the cumulative incidence of second cancers.
The cumulative incidence of second cancers was estimated with death treated as a competing risk, and trends over time were evaluated in competing-risk models, with adjustment for the effects of sex, age, and smoking status when appropriate

Competing risk model is more likely to be used as a sensitivity analysis, for example, in SPRINT study “A Randomized Trial of Intensive versus Standard Blood-Pressure Control”, The Fine–Gray model for the competing risk of death was used as a sensitivity analysis.

There are quite some discussions about the competing risk model in clinical trials:

In the situation where there is a competing risk issue, the Grey’s method or Fine and Gray method can be used. These methods are based on the paper below:
  • Gray, R. J. (1988), “A Class of K-Sample Tests for Comparing the Cumulative Incidence of a Competing  Risk,” Annals of Statistics, 16, 1141–1154.
  • Fine, J. P. and Gray, R. J. (1999), “A Proportional Hazards Model for the Subdistribution of a Competing Risk,” Journal of the American Statistical Association, 94, 496–509.

There are SAS macros for Gray’s method. Recently, Gray’s method and Fine and Gray methods are built in SAS PHREG and SAS PHREG can be handily used for performing the competing risk model. Here are some SAS papers regarding competing risk model analysis.

Sunday, December 10, 2017

Recurrent Events versus Composite Events: Statistical Analysis Methods for Recurrent Events

Recurrent events are repeated occurrences of the same type of event.
Composite endpoint is a combination of various clinical events that might happen, such as heart attack or death or stroke, where any one of those events would count as part of the composite endpoint.

While composite endpoint may also be discussed within the scope of the recurrent event endpoint, there are some distinctions between these two terms. The methods for statistical analysis are also different:
Recurrent Event Endpoint
Composite Endpoint
  • Relapses in multiple sclerosis
  • Exacerbations in pulmonary diseases such as chronic obstructive pulmonary disease
  • Bleeding episodes in hemophilia

  • MACE in cardiovascular trials where MACE (major adverse cardiac event) includes death, MI, and stroke.
  • Clinical worsening event in pulmonary arterial hypertension where clinical worsening includes all-cause death, PAH-related hospitalization, PAH-related deterioration of disease,…

Same type of event
Different type of event
Each event has the same contribution to the total number of events.
It is usually criticized that each component may contribute differently to the total counts (death is much severe event comparing with others)
The study design is usually with fixed duration. Events are collected over a fixed duration of time
The study design is usually an event-driven study. Different subjects may be followed up for different durations
Usually for events with relatively frequency
Usually for events that not frequently or rarely happen (so that we combine all these types to increase the power and minimize the sample size)
Can be analyzed as:
Frequency of events
Annualized rate of events
Time to first event
Duration of events
Duration of event free
Can be analyzed:
Time to the first event
Time to event for each component
Frequency of events
Competing risk is less an issue
Competing risk is an issue
Example of a trial with recurrent event endpoint:
Example of a trial with composite endpoint:

While the composite endpoint is usually analyzed as time to first event (whichever occurs the first for any of the components) using log rank test or Cox proportional hazard model, the recurrent event may be analyzed using different ways. Below are some examples of  

Emicizumab Prophylaxis in Hemophilia A with Inhibitors

The primary end point was the difference in the rate of treated bleeding events (hereafter referred to as the bleeding rate) over a period of at least 24 weeks between participants receiving emicizumab prophylaxis (group A) and those receiving no prophylaxis (group B) after the last randomly assigned participant had completed 24 weeks in the trial or had discontinued participation, whichever occurred first.
For all bleeding-related end points, comparisons of the bleeding rate in group A versus group B and the intraindividual comparisons were performed with the use of a negative binomial-regression model to determine the bleeding rate per day, which was converted to an annualized bleeding rate.
The primary efficacy end point was the annual rate of sickle cell–related pain crises, which was calculated as follows: total number of crises× 365 ÷ (end date − date of randomization + 1),with the end date defined as the date of the last dose plus 14 days. Annualized rates were used for the comparisons because they take into account the duration that a participant was in the trial. The crisis rate for every patient was annualized to 12 months. The annual crisis rate was imputed for patients who did not complete the trial. The difference in the annual crisis rate between the high-dose crizanlizumab group and the placebo group was analyzed with the use of the stratified Wilcoxon rank-sum test, with the use of categorized history of crises in the previous year (2 to 4 or 5 to 10 crises) and concomitant hydroxyurea use (yes or no) as strata. A hierarchical testing procedure was used (alpha level of 0.05 for high-dose crizanlizumab vs. placebo, and if significant, low-dose crizanlizumab vs. placebo).

A painful crisis was defined as a visit to a medical facility that lasted more than four hours for acute sickling-related pain (hereinafter referred to as a medical contact), which was treated with a parenterally administered narcotic (except for a few facilities in which only orally administered narcotics were used); the definition is similar to that used in a previous study. Annual rates were computed by dividing the number of crises by the number of years elapsed (e.g., 6 crises in 1.9 years - 3.16 crises per year). To test the effect of treatment on the crisis rate, the patients were ranked according to the number of crises they had had per year for observed periods of up to two years. Death was considered the worst outcome, followed by a stroke (defined as a documented new neurologic deficit lasting more than 24 hours, confirmed by a neurologist) or the institution of long-term transfusion therapy (more than four months); outcomes for all other patients were ranked according to the individual crisis rate. These ranks were used to compare the two treatment groups (Van der Waerden’s test). A rank statistic was planned for the primary analysis because it was expected to have more power to detect differences and to be less influenced by extreme values than a t-test of the means.

The primary efficacy endpoint was mean change from baseline in frequency of headache days for the 28-day period ending with week 24. A headache day was defined as a calendar day (00:00 to 23:59) when the patient reported four or more continuous hours of a headache, per the patient diary. Subsequent to study initiation, but prior to study completion and treatment unmasking, the protocol and statistical analysis plan for PREEMPT 2 was amended to change the primary and secondary endpoints, making frequency of headache days the PREEMPT 2 primary endpoint. This change was made based on several factors: availability of PREEMPT 1 data, guidance provided in newly issued International Headache Society clinical trial guidelines for evaluating headache prophylaxis in CM (34) and the earlier expressed preference of the US Food and Drug Administration (FDA), all of which supported using headache day frequency as a primary outcome measure for CM. For each primary and secondary variable, prespecified comparisons between treatment groups were done by analysis of covariance of the change from baseline, with the same variable’s baseline value as a covariate, with main effects of treatment group and medication overuse strata. The baseline covariate adjustment was prespecified as the primary analysis; sensitivity analyses (e.g., rank-sum test on changes from baseline without a baseline covariate) were also performed.

The primary outcome was the time to the first acute exacerbation of COPD, with acute exacerbation of COPD defined as “a complex of respiratory symptoms (increased or new onset) of more than one of the following: cough, sputum, wheezing, dyspnea, or chest tightness with a duration of at least 3 days requiring treatment with antibiotics or systemic steroids.” The primary analysis was based on a log-rank test of the difference between the two treatment groups in the time to the first exacerbation, with no adjustments for baseline covariates. A Cox proportional-hazards  model was used to adjust for differences in prespecified, prerandomization factors that might predict the risk of acute exacerbations of COPD.

The primary outcome was the effect of simvastatin on the exacerbation rate, which was defined as the number of exacerbations per person-year.

COPD exacerbation rates in the two study groups were compared with the use of a rate ratio. The independence of individual exacerbations was ensured by considering participants to have had two separate exacerbations if the onset dates were at least 14 days apart. Exacerbation rates in each group and the between-group differences were analyzed with the use of negative binomial regression modeling and time-weighted intention-to-treat analyses with adjustments of confidence intervals for between-participant variation (overdispersion).

FDA recommended the time to first exacerbation as the primary efficacy endpoint over the use of frequency of exacerbations as primary endpoints. The time to first exacerbation will be analyzed using log-rank test or Cos proportional hazard model.
Even though the FDA agrees that the frequency of exacerbations may be a clinically relevant endpoint; however, there are several statistical issues and challenges in providing a reliable and unbiased estimate of treatment effect using this endpoint:
  • Dependencies of exacerbation on previous exacerbations within patients
  • Effect of influential cases as it can potentially impact the results
  • Distinguishing between early vs. late exacerbations as a function of time
  • Distinguishing between first vs. subsequent exacerbations within patients
  • Investigator biases in assessing the number of events (e.g. events occurring close together)