In clinical trials, the efficacy endpoints are often measured as continuous variables. The hypothesis tests are used to determine whether or not there are statistically significant differences between one group vs. another group. This is desired by the statisticians. However, for treating physicians, the treatment effect on group basis may not translate to the effect to an individual patient. When we move toward to the personalized medicine, the individual response may be more important than just the group response.
It is interesting that the individual response and individual assessment (or within patient analysis, intra-subject changes...) were greatly discussed in this year's FDA/Industry Statistics Workshop.
For patient reported outcome, the statistically significant group change does not necessarily imply a meaningful difference for individual patients. To provide meaningful interpretation of patient reported outcome intervention and treatment effects, there should be a responder definition to classify each individual subjects as responder or non-responder. The FDA guidance stated "Regardless of whether the primary endpoint for the clinical trial is based on individual responses to treatment or the group response, it is usually useful to display individual responses, often using an a priori responder definition (i.e., the individual patient PRO score change over a predetermined time period that should be interpreted as a treatment benefit). The responder definition is determined empirically and may vary by target population or other clinical trial design characteristics. Therefore, we will evaluate an instrument’s responder definition in the context of each specific clinical trial." The challenging issue is how to determine the cutpoint or benchmark for the definition of the responder. Several approaches have been proposed in the literature. We had actually implemented various approaches to determine the responder (or clinical meaningful difference) in a neurology disease. In the article, two of the anchors are used: one based on physician's assessment and one based on global assessment by the patient (question #2 in SF-35 instrument). It is interesting that the statistical approaches are employed to find the clinical meaningful difference.
Once the cutpoint (clinical meaningful difference) is decided, the continous variable will be dichotomized into responder and non-responder. The analysis will them be shifted from the parametric method (t-test, ANOVA, ANCOVA,...) to categorical data analysis method (chisquare, logistic regression, generalized linear model,...). Statistician will argue that by doing so, we lost a lot of efficiency in statistical testing. A paper by Snappin and Jiang titled "Responder analyses and the assessment of a clinically relevant treatment effect" just did this argument.
In the recently published EMEA "Guideline on missing data in confirmatory clinical trials", responder analysis was mentioned to have a benefit of handling the missing data. It stated:
"In some circumstances, the primary analysis of a continuous variable is supported by a responder analysis. In other circumstances, the responder analysis is designated as primary. How missing data are going to be categorised in such analyses should be pre-specified and justified. If a patient prematurely withdraws from the study it would be normal to consider this patient as a treatment failure. However, the best way of categorisation will depend on the trial objective (e.g. superiority compared to non-inferiority).
In a situation where responder analysis is not foreseen as the primary analysis, but where the proportion of missing data may be so substantial that no imputation or modelling strategies can be considered reliable, a responder analysis (patients with missing data due to patient withdrawal treated as failures) may represent the most meaningful way of investigating whether there is sufficient evidence of the existence of a treatment effect."
Within-patient analyses were brought up again in assessing benefit:risk. Currently, the benefit:risk assessment relies on separate marginal analyses. The efficacy (benefit) and safety (risk) are analyzed separately. The aggregation of the benefit:risk relies on the assessment of medical reviewers, not statisticians. The aggregate analyses of benefit:risk are typically qualitative, not quantitative with significant subjectivity. With within-patient analyses, each patient was assessed for benefit and risk before performing the group comparison for treatment effect. One of these approaches is so called Q-Twist (The quality-adjusted time without symptoms of disease or toxicity of treatment) where the toxicity or safety information is incorporated into the efficacy assessment for each patient before any group comparison. The paper by Sherrill et al is one of these examples.