Thursday, February 22, 2024

Advancing Psychedelic Clinical Study Design - Virtual Public Meeting Organized by Reagan-Udall Foundation

Over the past decade, psychedelic compounds like psilocybin and ecstasy have emerged as potentially life-changing treatments for mental illnesses, including major depressive disorder and posttraumatic stress disorder. These psychedelic products may be synthetic compounds or extracts from natural products (such as magic mushroom). They usually have hallucinations effects and belongs to the substance control products. 

Academics and pharmaceutical/biotech companies are now interested in developing the psychedelic products for therapeutic uses in treating diseases like major depression, PTSD, Pakington's disease,...

In order to obtain the regulatory approval, various phases of clinical trials need to be conducted. The clinical trial designs for psychedelic drugs are more complicated than typical drugs because of its known side effects and the nature of the substance-controlled products. 

Reagan-Udall Foundation recently organized a virtual public meeting to discuss "advancing psychedelic clinical study design". During this public meeting, attendees discussed the experience of scientists working with psychedelics in FDA-authorized clinical studies and drug development, considerations for psychedelics in clinical trial designs, and perspectives and current research in psychedelic clinical trials. FDA presenters provided an overview of its newly issued guidance "Psychedelic Drugs: Considerations for Clinical Investigations."

Video recording of this virtual public meeting is available on Reagan-Udall Foundation website: 

Monday, January 15, 2024

Terminal events as intercurrent events in clinical trials

ICH E9 "Addendum on Estimands and Sensitivity Analysis in Clinical Trials to the Guideline on Statistical Principles for Clinical Trials" contained discussions about intercurrent events and strategies for handling intercurrent events. Intercurrent events were defined as: 

Events occurring after treatment initiation that affect either the interpretation or the existence of the measurements associated with the clinical question of interest. It is necessary to address intercurrent events when describing the clinical question of interest in order to precisely define the treatment effect that is to be estimated.

The terminal events are one kind of intercurrent event. ICH E9 Addendum did not provide the formal definition for 'terminal events', but gave examples of the terminal events: 

Examples of intercurrent events that would affect the existence of the measurements include terminal events such as death and leg amputation (when assessing symptoms of diabetic foot ulcers), when these events are not part of the variable itself.

In a paper by Siegel et al "The role of occlusion: potential extension of the ICH E9 (R1) Addendum on Estimands and Sensitivity Analysis for Time-to-Event oncology studies", the terminal events were described as the following: 

The estimands guidance also introduces the concept of a terminal event. Terminal events prevent the possibility of subsequent measurement. "For terminal events such as death, the variable cannot be measured after the intercurrent event, but neither should these data generally be regarded as missing." There are two examples given in the guidance, death and leg amputation. These examples clarify that terminal events physically prevent subsequent measurement, for any estimand in any study. 

Terminality is an objective property of an event which renders further observation physically impossible. If an event is terminal, it is impossible to devise a study that can look beyond it. Indeed there is no meaningful clinical question regarding the treatment effect that manifests after a terminal event. 

Terminal events can be defined as events that make the outcome measures impossible and the events are not part of the outcome such as death and ankle amputation in a trial assessing ankle function). Sometimes, the outcome measure after the terminal events may still be possible, but the measures after the terminal events are not meaningful. For example, in clinical trials of pulmonary diseases with spirometry measure as the primary outcome, lung transplantation will be a terminal event. After the lung transplantation, the spirometry measure can still be performed, but the spirometry measure is a reflection of the transplanted lungs, not the intended measure of the clinical trial endpoint. 

Terminal events should be separated as fatal (death, mortality) and non-fatal terminal events (may be called 'terminal events excluding mortality'). While they are all considered intercurrent events, the strategies for handling the fatal and non-fatal terminal events need to be different. 

Strategies for Handling the Fatal Terminal Events

Treatment policy strategy can not be used for handling fatal terminal events (death events). ICH E9 Addendum mentioned the following: 

In general, the treatment policy strategy cannot be implemented for intercurrent events that are terminal events, since values for the variable after the intercurrent event do not exist. For example, an estimand based on this strategy cannot be constructed with respect to a variable that cannot be measured due to death.

Composite strategies (or composite variable strategies) are particularly useful for handling fatal terminal events (deaths). The occurrence of the fatal terminal intercurrent event is informative about the effect of the treatment and so it is incorporated in the endpoint. In practice, the outcomes after the fatal terminal intercurrent event can not be observed, but need to be assumed to have the worst values. 

With the composite strategy, the terminal intercurrent events will be assigned a failed value. A failed value may be:

    • Worse possible measure (for example, 0 for 6MWD and 0 for FEV1 or FVC measures)
    • Worst observed value across all subjects at the endpoint visit
    • Trimmed means (trimmed means and quantiles were mentioned in ICH E9 addendum training materials)
    • The worst change (from baseline) of all subjects plus a random error. The error can be randomly drawn from a normal distribution with a mean of 0 and a variance equal to the residual variance estimated from the mixed model for all observed values of change from baseline
In FDA's guidance, "Amyotrophic Lateral Sclerosis: Developing Drugs for Treatment Guidance for Industry", deaths were integrated into the functional measure by the ALS Functional Rating Scale-Revised (ALSFRS-R). The guidance said:
Functional endpoints can be confounded by loss of data because of patient deaths. To address this, FDA recommends sponsors use an analysis method that combines survival and function into a single overall measure, such as the joint rank test.
In pivotal clinical trials in ALS, the joint rank test is almost the default method for analyzing the primary efficacy endpoint of the ALSFRS-R. The Joint Rank statistic ranks study participants in each treatment group, first by survival and then by ALSFRS-R score. The Joint Rank can increase power relative to analysis of either ALSFRS-R or survival analysis alone in some circumstances, for example when mortality rates are high 

Joint Rank test was described and used in the NEJM paper by Miller et al "Trial of Antisense Oligonucleotide Tofersen for SOD1 ALS".

Strategies for Handling the Non-Fatal Terminal Events

It is acceptable to use hypothetical strategy to handle the non-fatal terminal intercurrent events. "Hypothetical strategies: A scenario is envisaged in which the intercurrent event would not occur: the value of the variable to reflect the clinical question of interest is the value which the variable would have taken in the hypothetical scenario defined." 

The value of the variable to reflect the clinical question of interest is the value which the variable would have taken in the hypothetical scenario defined. The value to be considered would have been the one collected if patients had not had the non-fatal terminal event. Outcomes after the non-fatal terminal events do not need to be measured. If the outcomes after the non-fatal terminal events are measured (for example, the spirometry measure after lung transplantation), the measures can be disregarded and not used in the analyses. The outcomes after the non-fatal terminal events cannot be observed, can be left as missing values, and usually need to be implicitly or explicitly predicted/imputed.

Friday, January 12, 2024

Post-Marketing Requirement (PMR) versus Post-Marketing Commitment (PMC)

Following the NDA or BLA approval, the sponsors may be required to conduct post-marketing studies. The phrase post marketing requirements and commitments refers to studies and clinical trials that sponsors conduct after approval to gather additional information about a product's safety, efficacy, or optimal use. Some of the studies and clinical trials may be required; others may be studies or clinical trials a sponsor has committed to conduct.

Post-approval studies can be classified by FDA as a postmarketing requirement (PMR) or a postmarketing commitment (PMC). 

A PMR is a study or clinical trial that an applicant (or sponsor) is required by statute or regulation to conduct postapproval. A PMC is a study or clinical trial that an applicant (or sponsor) agrees in writing to conduct postapproval, but that is not required by statute or regulation. PMRs and PMCs can be issued upon approval of a drug or postapproval, if warranted.

As a result, failure to conduct a PMR would be a violation of the Federal Food, Drug, and Cosmetic Act (FDCA) and/or implementing regulations, subject to enforcement action.  Potential enforcement actions can include an FDA Warning Letter, charges under section 505(o)(1) of the FDCA, misbranding charges under section 502(z), or civil monetary penalties.  In contrast, failure to conduct a PMC would not be a violation of the FDCA or regulations, and therefore not subject to enforcement action.

The table below compares the features of the PMR versus PMC:

FeaturePost-Marketing Requirements (PMR)Post-Marketing Commitments (PMC)
DefinitionRegulatory obligations imposed by authoritiesVoluntary commitments made by the sponsor
PurposeGather additional data on safety, efficacy, etc.Obtain more information post-approval
EnforcementMandatory; non-compliance may lead to penaltiesVoluntary, but sponsors are expected to fulfill
ImpositionImposed by health regulatory agenciesMade voluntarily by the sponsor during approval
Consequences of Non-complianceRegulatory actions, fines, or product withdrawalRegulatory actions; may impact marketing authorization
FlexibilityTypically less flexible; regulatory mandatesVoluntary, but commitment should be honored
OriginExternal (regulatory agency)Internal (sponsor during regulatory approval)
ExamplesPost-approval safety studies, surveillanceAdditional clinical trials, long-term safety studies

According to FDA's guidance "Guidance for Industry Postmarketing Studies and Clinical Trials — Implementation of Section 505(o)(3) of the Federal Food, Drug, and Cosmetic Act", PMR may be required in the following situations:

PMC may be required in the following situations: 
These PMCs were generally agreed upon by FDA and the applicant. Prior to the passage of FDAAA, FDA required postmarketing studies or clinical trials only in the situations described below:
• Subpart H and subpart E accelerated approvals for products approved under 505(b) of the Act or section 351 of the PHS Act, respectively, which require postmarketing studies to demonstrate clinical benefit (21 CFR 314.510 and 601.41, respectively);
Deferred pediatric studies, where studies are required under section 505B of the Act (21 CFR 314.55(b) and 601.27(b)); 6 and
• Subpart I and subpart H Animal Efficacy Rule approvals, where studies to demonstrate safety and efficacy in humans are required at the time of use (21 CFR 314.610(b)(1) and 601.91(b)(1), respectively). 7
Is the confirmatory trial after the accelerated approval PMR or PMC?  

As a condition of accelerated approval, the applicant should conduct confirmatory trials to verify the clinical benefit of the drug or demonstrate an effect on irreversible morbidity or mortality. If these trials completed post-approval verify the clinical benefit of an indication granted accelerated approval, the indication is granted traditional approval.

Per regulatory guidance, the confirmatory trial after the accelerated approval can be PMC, not PMR. However, without the results from the confirmatory trial to verify the clinical benefit, the approval will remain as 'conditional' and benefits will remain on the surrogate endpoint or biomarkers. A traditional approval (or full approval) can be obtained after the confirmatory trial verifies the benefit of clinical benefit. A drug with full approval will have advantages in the marketing and reimbursement positions. 

Searching the FDA's Postmarket Requirements and Commitments Database, almost all confirmatory trials after accelerated approval are PMRs, not PMCs. 

Is Post-Approval Pregnancy Study PMR or PMC?

TheFDAlawblog.com had a blog article "Why are Post-Approval Pregnancy Studies Post-Marketing Requirements Rather Than Post-Marketing Commitments?". It said the following: 
"Notably, of the 99 postmarketing pregnancy studies in the 10-year period, all but one were PMRs. The only example of a pregnancy PMC is for Paxlovid, for treatment of COVID-19, which is a distinguishable example because the sponsor committed to this study while the drug was still under an Emergency Use Authorization (EUA), not an NDA."

In general, the post-approval pregnancy's studies are PMR, not PMC.

What are examples of the PMR versus PMC?

Large pharmaceutical companies posted their PMRs and PMCs only for the purpose of transparency. For example, here are the lists of PMRs and PMCs for Amgen and Janssen. These PMRs and PMCs provide great examples what kind of studies they are. 

Monday, January 01, 2024

Exposure adjusted event rate (EAER) and exposure adjusted incidence rate (EAIR)

In clinical trials, analyses of the adverse events (AEs) including serious adverse events, mortality, and adverse event of special interests (AESI) are the center piece of the safety analyses. The default approach is to calculate the incidence of AEs (the number of subjects with one specific type of AE divided by the number of patients in the specific group). The presentation of the AE profiles is usually based on the incidence of AEs. For example, in an example table below, the incidence of AEs was presented for each preferred term by treatment group. The numerator is the number of subjects with one specific AE (preferred term) and the denominator is the total number of subjects in each treatment groups (109 for Drug-X group and 110 for Placebo group). 


Comparing the incidence of AEs between treatment groups is valid and fair if the length of exposure is balanced between two treatment groups. If the treatment exposure in one group is significantly longer than another group, the incidence of AEs may give a biased comparison. 

Ii the average duration of exposure differs significantly between treatment groups within a trial or between trials included in an analysis due to differential drop-out rates or study design, such incidence of AEs will need some adjustment to make the comparison meaningful. In the situation that the treatment exposures between treatment groups are not balanced, the exposure adjusted event rate (EAER) or exposure adjusted incidence rate (EAIR) may be calculated where the denominator for the calculation is the exposure duration expressed in person-time such as person-year or patient-year.

In FDA CDER's MAPP "Good Review Practice: Clinical Review Template", it mentioned the following: 
"An analysis of the overall rate of serious events and the rate of specific serious events for each treatment group in critical subgroups (e.g., demographic, disease severity, excretory function, concomitant therapy) and by dose. The median duration of exposure should be examined across treatment groups. If there is a substantial difference in exposure across treatment groups, incidence rates should be calculated using person-time exposure in the denominator, rather than number of subjects in the denominator,...by treatment group"

"Analyses should be corrected for differences in drug exposure using person-time in the denominator to calculate mortality rates. If person-time exposure is not included in the submission (ideally, it should be requested at the pre-NDA/pre-BLA meeting), it should be requested as soon as the need is recognized..."

When the incidence rate of AEs (or exposure adjusted incidence rate) needs to be calculated, the denominator is for sure the person-time, but the numerator for the calculation can be confusing. Should the numerator be the number of events or the number of subjects with at least one event? 

If the event can occur only once for each subject, the number of events and the number of subjects with at least one event will be the same (each subject can die only once!). Therefore, the exposure adjusted mortality rate will be same no matter which numerator is used. 

If the same subject experience multiple occurrences of the same AE during the study period, the incidence of AEs will be different depending on which numerator is used in calculation. In a previous post, Incidence Rate (IR) – How could this be wrongly calculated?, I stated that the number of events should be used as the numerator for incidence rate calculation and criticized the use of the number of subjects with events as the numerator. In most situations, it is better and more accurate to count more than one event to the numerator if the subject experience multiple occurrences of the same event during the study period. 

In FDA guidance "Safety, Efficacy, and PharmacokineticStudies to Support Marketing ofImmune Globulin Intravenous(Human) as Replacement Therapy forPrimary Humoral Immunodeficiency", the serious bacterial infection is the primary interest. The serious bacterial infection can occur more than one in a specific patient. The guidance said the following: 
"We recommend that you provide in the BLA descriptive statistics for the number of serious infection episodes per person-year during the period of study observation. Additional information important to our review includes a frequency table giving the number of subjects with 0, 1, 2… serious infections, a description of each serious infection, and summary statistics for the length of observation of each subject."
"Based on our examination of historical data, we believe that a statistical demonstration of a serious infection rate per person-year less than 1.0 is adequate to provide substantial evidence of efficacy. You may test the null hypothesis that the serious infection rate is greater than or equal to 1.0 per person-year at the 0.01 level of significance or, equivalently, the upper one-sided 99% confidence limit would be less than 1.0."
MACE (Major Adverse Cardiac Event) is the most frequently used endpoint in cardiovascular outcome studies. Incidence rate for MACE should be calculated with number of events as the numerator and the person-year as the denominator. This can be exemplified in FDA's Briefing Document for Cardiovascular and Renal Drugs Advisory Committee Meeting for GSK's drug Daprodustat for for the treatment of anemia due to chronic kidney disease (CKD). The same patient can have more than one MACE event (from the non-fatal events to fatal event). 


However, in practice, the number of subjects has often been used as the numerator in incidence rate calculation. It turned out both the number of events and the number of subjects can be used in the calculation depending on how the denominator of exposure (person-time) is calculated.  

PSUSE published a document "Analysis and Displays Associated with Safety Topics of Interest: Focus on Phase II to IV Clinical Trials". In the document, two different terms were proposed: exposure adjusted event rate (EAER) and exposure adjusted incidence rate (EAIR). EAER is calculated using the number of events as the numerator and EAIR is calculated using the number of subjects as the numerator.
Exposure-adjusted event rate (EAER): The number of events (if a patient has more than one occurrence of the same event, all occurrences are counted) divided by the total time exposed. This is sometimes referred to as person-time absolute rate. Total time exposed is calculated as the sum of each patient’s time in the interval, whether or not the patient experienced the event. The time unit used can be changed (e.g. if the original units are events per person-year, this can easily be converted to events per 100 person-years by multiplying by 100). The exposure time should be based on the same time interval in which any events that occur would be counted.
Exposure-adjusted incidence rate (EAIR): The number of patients with an event divided by the total time at risk for the event. Total time at risk will be calculated as the sum of time from the first dose (or randomisation) to first event for patients who experienced the event and the time during the entire assessment interval for patients who do not experience the event. This is sometimes referred to as the incidence rate or person-time incidence rate . As noted above, we believe the addition of “exposure-adjusted” or “person-time” is beneficial for clarity.
Notice that for EAIR calculation, the exposure duration (the person-time) is from the first dose to the first event, not the entire exposure duration (from the first dose to the last dose). 

In practice, EAER and EAIR were not specifically distinguished, the numerator and the denominator may be mismatched with the intended use. The numerator can be either the number of events or the number of subjects as long as the exposure or person-time is correctly calculated. 

For recurrent events or composite events where the same patient can have more than one event, if the exposure (denominator, person-time) is calculated for the entire duration, the numerator should be the number of events so that all events (not the first event) are counted in the calculation. If the exposure (denominator, person-time) is calculated from the first dose to the occurrence of the first event (or the entire exposure duration if the subject does not have any event), the numerator should then be the number of subjects. For example, in a paper "Exposure-adjusted incidence rates (EAIRs) of adverse events (AEs) from the phase 3 TROPiCS-02 study of sacituzumab govitecan (SG) vs treatment of physician’s choice (TPC) in HR+/HER2- metastatic breast cancer (MBC)", how the denominator of exposure time is calculation was explicitly spelled out. 
Time-at-risk EAIR considers patient’ exposure of a specific AE in quantifying the risk of AE, defined as the number of pts who experienced at least 1 specific AE, divided by the total exposure time (patient-year of exposure [PYE]) in each arm. For patients who experienced specific AEs, exposure time was calculated from first dose date up to the first AE onset, and for patients who did not, from first dose up to data cutoff (if still on study treatment) or up to last dose (if discontinued study treatment).
In some cases, both EAER and EAIR may be calculate, for example, in FDA's clinical review for NDA 211675Orig1s000, both EAER and EAIR were calculated. In a statistical analysis plan by 
UCB, the details about the calculations of both EAER and ERIR were provided. 

Whether or not the EAER or EAIR is calculated may or may not change the conclusion of the safety assessment, however, it is worth noting the differences in the numerator and denominator used in the calculations. 

In summary, the AE analyses can be adjusted for the exposure duration. If the numerator is the number of events, the denominator should be the total exposure duration or person time (from the first dose to the last dose) and the calculated parameter is "Exposure adjusted event rate (EAER)". If the numerator is the number of subjects, the denominator should be the exposure duration or person time where the exposure duration is calculated from the first dose to the time of the first event for subjects with event(s) and from the first dose to the last dose for subjects who do not have any event. The calculated parameter is "exposure adjusted incidence rate (EAIR). 

Friday, December 15, 2023

Randomized start design (RSD), delayed start design, randomized withdrawal design to assess disease modification effect

In the latest Global CardioVascular Clinical Trialists (CVCT) Workshop", one of the topics was "How to assess the disease modification in pulmonary arterial hypertension". the academic and industry representatives discussed the definition of disease modification and if the various individual drugs met the criteria as disease modifiers. 

Disease modification requires that the intervention have an impact of the underlying pathology and pathophysiology of the disease. For regulatory purposes, a disease modifying effect is when an intervention delays the underlying pathological processes and is accompanied by improvement in clinical signs and symptoms of the disease. The opposite of the disease modifying effect is symptomatic improvement which is defined as "may improve symptoms but does not affect the long-term survival or outcome in the disease, for example, use of diuretics in PAH". 

For a drug to be defined as a disease modification therapy (DMT), disease modifier, or disease-modifying agent (DMA), the following criteria need to be met: 
  • a drug targeting the underlying pathophysiology
  • distinction should be made from the "symptomatic treatment" (do not affect underlying pathophysiology)
  • Can achieve the goal of remission (partial or complete)
  • endures sustained clinical benefit (referred to as DMA). 
The last criterion "endures sustained clinical benefit" is difficult to meet. The traditional randomized, controlled, parallel design will not be sufficient. Clinical trial designs like delayed start design and randomized withdrawal design are needed to assess the disease modification effect. 

Randomized Start Design or Delayed Start Design

This design was discussed in previous posts: 
Randomized withdrawal design: 

The randomized withdrawal design was extensively discussed in FDA's guidance "Enrichment Strategies for Clinical Trials to Support Determination of Effectiveness of Human Drugs and Biological Products". The following paragraphs are extracted from the guidance. 


The randomized withdrawal design was originally proposed as an approach to enrich the study and reduce the sample size. The randomized withdrawal design (if it is feasible to implement) can be used to evaluate the long-term disease-modifying effect. 

In practice, the randomized withdrawal design can be implemented after the RCT - the responders from both the experimental drug group and the placebo group are re-randomized to receive the experimental drug or placebo. This is exactly what we did in the ICE study ("Intravenous immune globulin (10% caprylate-chromatography purified) for the treatment of chronic inflammatory demyelinating polyradiculoneuropathy (ICE study): a randomised placebo-controlled trial") where the study contained a RCT portion to assess the treatment response and a re-randomized withdrawal portion to assess the relapse after the study drug withdrawal. 

The delayed start design and randomized withdrawal design have been mentioned frequently as a way to assess the disease modification effect. 

In FDA's guidance "Early Alzheimer’s Disease: Developing Drugs for Treatment", the randomized-start or randomized-withdrawal trial design was suggested:

In the FDA’s webinar on “Draft Guidance For Industry On Alzheimer’s Disease: Developing Drugs For The Treatment Of Early-Stage Disease”, the FDA presenters discussed the randomized start design or withdrawal design: 

“… If there is a significant effective treatment that couldn't serve as the basis of approval, we do not believe that that argument in and of itself does not demonstrate. This is where biomarkers come in. We learned in the trial results, the effect on up Alzheimer disease biomarker, it is still very and clear. Where the biomarker has been altered but there is no clinical effect. The clinical outcome was the opposite of what you want to see. The bottom line is that that understanding and how it relates to the clinical outcome still needs a bit of work. We would not be willing to accept the effect on the biomarker, as a basis for a circuit -- Sarah get approval. For that to be the case, it would be a fundamental itself in the disease process. In addition to biomarkers there are other ways to show disease modification, a randomized start design, or withdrawal design. These are based on clinical endpoints. These are difficult studies to design, and conduct, and interpret. We are open to use these approaches to show modification, let me show you what I mean by randomized start design. One would be on .8, the other will be on placebo, the patients on group to will be switched over to active treatment, patients in group 2 will be caught up to group 1, they will have a systematic effect of treatment. Patients that were switched to never really caught up to the first group, can argue for an effect on the disease, this is challenging to do, but we are open to the approach. It is a devastating condition, and an epidemic make -- particularly in late stages, the field is moving to conduct trials in early stages of the illness. As I pointed out they will pose regulatory challenges. We hope that's where our guidance will come in and suggest pathways forward. Thank you. I will have Russell Katz, come up and talk for the rest of the webinar.”

In one of the EMA presentations “The scientific and regulatory approaches to facilitating disease-modifying drug development and registration in a global environment”, the delayed start design (or randomized start) and randomized withdrawal design were mentioned.

 

In 2011, there was an FDA advisory committee meeting to discuss Teva's Parkinson's drug ((rasagiline mesylate)) for disease modification indication. Even though the disease modification claim was voted down), the delayed start design was confirmed to be adequate to evaluate the disease modification effect. Arterial Hypertension

Thursday, December 14, 2023

Defining 'disease modification effect', 'disease modification therapy (DMT)' or 'disease modifier'

The concept of "disease modification," "disease modification therapy (DMT)," and "disease modifier" has been a focal point in the realm of drug development for chronic diseases. The distinction reaches a heightened significance when a drug under development qualifies as a true disease modifier. Disease modification is a default for acute diseases (for example, the acute infections) and for gene therapies, transplants, some surgeries. The focus of our discussion about the disease modification is mainly for chronic diseases. 

Disease modification entails interventions or treatments designed not solely to alleviate symptoms but also to actively influence the trajectory of the disease, effectively impeding or halting its progression.

It's important to note that a unified definition for disease modification does not exist. The nuances of the term, as well as what qualifies as a disease modifier, can vary across different diseases. The understanding and criteria for disease modification may differ, reflecting the intricacies inherent to each specific medical condition.

In a review paper by Vollenhoven et al "Conceptual framework for defining disease modification in systemic lupus erythematosus: a call for formal criteria", authors put together a table summarizing various definitions for disease modification in different disease areas: 


As we are doing the clinical trials, the spectrum of treatment response can be listed as the following: 
Harm -> No Response -> Modest Response -> Strong Response -> Disease Modifying -> Cure. For most chronic diseases, the ultimate outcome of a 'cure' may not be achievable. A therapy with a disease modification effect will be desirable. 

There was a proposal to classify the disease modification into five different levels: 
Level 1: Slowing decline
Level 2: Arrest decline
Level 3: Disease improvement
Level 4: Remission
Level 5: Cure

In a review article for Alzheimer's disease, "Trial Designs Likely to Meet Valid Long-Term Alzheimer's Disease Progression Effects: Learning from the Past, Preparing for the Future", the changes in the level of functioning across time were depicted as the following. 'Slowing progression' would be considered as 'disease modification'. 


Alzheimer’s Disease: Towards a Personalized Polypharmacology Patient-Centered Approach", the following was said about the disease modification therapy in Alzheimer's disease:
A disease-modifying treatment (DMT) is defined as an intervention that produces an enduring change in the clinical progression of AD by interfering with the underlying pathophysiological mechanisms of the disease process that lead to neuronal death. Consequently, a true DMT cannot be established conclusively based on clinical outcome data alone, such a clinical effect must be accompanied by strong supportive evidence from a biomarker program.
In 2011, there was an FDA advisory committee meeting to discuss Teva's Parkingson's drug for disease modification indication. According to the FDA briefing book, to demonstrate the disease modification effect, three hypothesis tests are needed to analyze the data from the study with a delayed start design (even though the disease modification claim was voted down): 
The study was to be analyzed according to three hypotheses, in the following order: 
  • Hypothesis 1-the contrast between the slope of drug and placebo response at Week 36 (using data from weeks 12-36; Linear Mixed Model with random intercept and slope) 
  • Hypothesis 2-the contrast of scores between baseline and Week 72 (Repeated Measures) 
  • Hypothesis 3-a non-inferiority analysis of the slopes of the ES and DS patients from weeks 48-72 (Linear Mixed Model with random intercept and slope) 
The first hypothesis was designed to determine that a difference between treatments emerged in Phase 1, the second hypothesis was designed to determine that there was a difference between ES and DS patients at the end of the study, and the third hypothesis was to determine that an “absolute” difference between the ES and DS patients persisted during Phase 2 (that is, even though a difference between groups at the end of the study might have existed [what was 4 tested by Hypothesis 2], it was important to show that the two groups were not approaching each other). 
To delve into the realm of disease modification therapy research, it is imperative to establish a standardized definition for disease modification, particularly tailored to the nuances of a specific disease area. This foundational step serves as a compass guiding subsequent investigations. Following the definition, the identification of endpoints to measure the disease modification effect becomes paramount. Given the nuanced nature of disease modification effects, the conventional clinical trial designs may prove insufficient. Hence, a specialized approach involving clinical trial designs (such as delayed start design and randomized withdrawal design) with multiple hypothesis tests becomes a requisite. Such a methodological shift is essential to comprehensively capture and validate the nuanced impacts of disease modification therapies.

Sunday, December 10, 2023

Significant level versus p-value

Sometimes, the significant level and p-value are getting mixed up and confusing to some non-statisticians. It is not surprising to receive a question or request for statistician to design a study to obtain a p-value of 0.05 or 0.01. While the significant level and p-value are closed related, ,they are used in different stage of the trial - significant level is used in the study design stage and p-value is used in the analysis stage.


A significant level is usually set at 0.05 at the study design stage. After the study, data is analyzed and p-value is calculated. The p-value is then compared to the pre-specified significant level to determine if the study results is statistically significant. 

If the significant level is set at 0.01 at the study design stage, which is temped for avoiding doing two pivotal studies, it will set the unnecessary high bar for declaring the successful trial in the analysis stage. 

"The significance level," "alpha" (α), and "Type I error rate" are essentially referring to the same concept in the context of hypothesis testing. These terms are often used interchangeably and are closely related. Here's a brief explanation of each:

Significance Level (Alpha, α): The significance level is a pre-defined threshold (usually denoted as α) set by the researcher before conducting a statistical test. It represents the maximum acceptable probability of making a Type I error. Common choices for alpha include 0.05 (5%), 0.01 (1%), and others. It determines the level of stringency for the test, where a smaller alpha indicates a more stringent test.

Significant level is just one of the parameters in calculating the sample size during the study design stage. Other parameters include the effect size (assumed treatment difference), the standard deviation, statistical power (type 2 error), and alpha adjustment due to multiplicity issue, interim analyses,...

Alpha (α): Alpha is the symbol used to represent the significance level in statistical notation. When you see α, it's referring to the predetermined threshold for statistical significance.

Type I Error Rate: The Type I error rate is the probability of making a Type I error, which occurs when you reject the null hypothesis when it is actually true. The significance level (alpha) directly relates to the Type I error rate because the significance level sets the limit for how often you are willing to accept such an error. The Type I error rate is typically equivalent to the significance level (alpha), assuming the test is properly conducted.

P-value: The p-value is calculated as part of the statistical analysis after the data has been collected. It measures the strength of the evidence against the null hypothesis based on the collected data. A smaller p-value indicates stronger evidence against the null hypothesis, and a larger p-value suggests weaker evidence.

The p-value measures the strength of evidence against a null hypothesis. The p-value is the probability under the assumption of no effect or no difference (null hypothesis) of obtaining a result equal to or more extreme than what was actually observed. The 'p' stands for probability and measures how likely it is what any observed value between 0 and 1. Values close to 0 indicate that the observed difference is unlikely to be due to chance, whereas a p value close to 1 suggests that it is highly likely that the difference observed is due to chance. If the p-value is low, it suggests evidence against the null hypothesis, and then alternative hypothesis (assumption of the effect or difference) will be accepted. 

The p-value indicates how incompatible the data are with a specified statistical model constructed under a set of assumptions, together with a null hypothesis. The smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis. When we get a p-value that is greater than the pre-specified significant level, we fail to reject the null hypothesis - it means that there is insufficient evidence to reject. 

STAT national biotech reporter Damian Garde explains what p-value is:

Even though hypothesis testing and p-value have been criticized (see a previous post "Retire Statistical Significance and p-value?"), the p-value is still the primary indicator by the sponsor, regulator, medical community, and pretty much everybody to judge if a clinical trial is successful or not. 
,
Regulatory approval of a medicinal product depends on more than just a p-value. The approval depends on the totality of the evidence, the magnitude of the treatment difference, clinical significance or clinical meaningfulness, the confidence interval of the estimate, the safety profile, whether the benefit outweighs the risk.

We have seen the cases that the drug is approved even though the p-value was not statistically significant (i.e., did not reach the pre-specified significant level). See the previous post "Drugs Approved by FDA Despite Failed Trials or Minimal/Insufficient Data". We also see the cases that the drug was not approved even though the p-value was statistically significant. See the article "FDA blocks Alnylam's bid to expand Onpattro label" even though the study results were statistically significant and published in the NEJM "Patisiran Treatment in Patients with Transthyretin Cardiac Amyloidosis".

In the end, we can't retire the p-value. We relied on the p-value to measure how strong the evidence is. However, we should not be the slave of the p-value.