Sunday, November 16, 2014

VALOR Trial - A Successful and Failed Phase III Study with Adaptive Sample Size Re-stimation for Promising Zone

Motivated by searching for the innovative clinical trial methodologies to increase the clinical trial success and minimize the clinical trial cost, various adaptive design methods have been proposed. Initially, clinical trials using the adaptive designs are usually in the early phase (phase I or II) clinical trials. For phase III confirmatory clinical trials, the traditional clinical trial methods are still dominating. Many publications about using the adaptive design in late stage trials are based on retrospective assessment or simulation: had the original studies been done with adaptive design, how much cost would have been saved or a failed trial might have been rescued. After many years of education and advocate, adaptive designs with innovative methods in phase III studies have actually been implemented and some of the trial results start to surface. One of the examples is a trial called VALOR –  a phase III, placebo-controlled, randomized, double-blind study in relapsed/refractory Acute Myeloid Leukemia (AML). The study adopted one of the key adaptive design features - the Sample Size Re-estimation (SSR).

The rationale behind the Sample Size Re-estimation is that the assumptions for designing the confirmatory trial is either not entirely available or is available but with a high degree of uncertainty.  This uncertainty could result in the incorrect or inaccurate estimates of sample size during the design stage. With the Sample Size Re-estimation, an interim analysis can be performed during the middle of the study to re-check these assumptions. Depending on the findings from interim analysis, the decision about the next step can be made.

In VALOR study, the Sample Size Re-estimation was based on a Promising Zone approach. The SSR based on Promising Zone was proposed by Mehta and Pocock and described in their paper “Adaptive increase in sample size when interim results are promising: A practical guide with examples”. The general idea is to start a phase III trial with the best or better scenario with optimistic assumptions. The optimistic assumptions will require a trial with smaller sample size to start with and consequently require less commitment in resources and finance in the beginning. During the study, an interim analysis is performed to check the reality and to plan for the next step with the following choices.

  • Stop early if overwhelming evidence of efficacy
  • Stop early for futility if low conditional power
  • Increase the number of sample size if results are promising

This can be illustrated in the diagram below. Notice that with this method, the sample size can only be adjusted up (not down), can only be increased (not decreased). The sample size increase is one-time with pre-specified fixed number preferred.


Since VALOR study was initiated in December 2010, this SSR method with Promising Zone approach had been widely followed in statistical community and had been the topic in many adaptive design discussions. See the presentation by Zoran Antonijevic "Harvard Catalyst Adaptie Clinical Trials Case Study - The VALOR Trial for AML". There is also a youtube video titled "The Phase 3 VALOR Trial: Adaptive Sample Size Re-estimation"

Cytel Inc. had built the SSR with Promising Zone approach in their EAST software for study design. They advocate that adaptive sample size re-estimation in EAST reduces risk and enhances the clinical trial success. With Promising Zone SSR method, an adaptive design can:
  • DE-RISK INVESTMENT – Avoid expensive up-front commitments of sample size
  • ENHANCE SUCCESS – Boost power when initial assumptions fail
  • PROMISING ZONETM – Increase sample size conditional on interim data
  • ALPHA CONTROL – Guarantee strong type I error control required by regulators

Had the VALOR study achieved the primary efficacy endpoint of statistical significance, it would be a wonderful story to tell how the Promising Zone SSR method had De-Risked Investment, Enhanced success.

Unfortunately, after all of these extra efforts (in adaptive design, DSMB, interim analysis, sample size re-estimation), the study failed and did not reach the statistical significance for the primary efficacy endpoint. P-value just missed the magical number of p=0.05. Here is the announcement from the VALOR study sponsor – Sunesis Pharmaceuticals:
Sunesis Announces Results From Pivotal Phase 3 VALOR Trial of Vosaroxin and Cytarabine in Patients With First Relapsed or Refractory Acute Myeloid Leukemia
“Sunesis Pharmaceuticals, Inc. (Nasdaq:SNSS) today announced results from the pivotal Phase 3 VALOR trial, a randomized, double-blind, placebo-controlled trial of vosaroxin and cytarabine in patients with first relapsed or refractory acute myeloid leukemia (AML). At more than 100 leading international sites, the trial enrolled 711 patients, who were stratified for age, geography and disease status. The trial did not meet its primary endpoint of demonstrating a statistically significant improvement in overall survival, with a median overall survival of 7.5 months for vosaroxin and cytarabine compared to 6.1 months for placebo and cytarabine (HR=0.865, p=0.06).”
Additional details about the trial design are coming to surface. See the screen shot from the Sunesis presentation:


The study was planned based on the most optimistic assumption (i.e., HR=0.71) and the sample size re-estimation was based on the most conservative assumption (i.e., HR=0.80) at that time. Unfortunately, the actual result of HR=0.865 was beyond the most conservative assumption of HR=0.80. It would be interesting to know what exactly the HR was from the interim analysis.  

I guess that Sunesis and Cytel are now analyzing the data to search for the clue why the study did not meet the primary endpoint. It is very possible that the study conduct, patient population might be different before and after the interim analysis. While the study team were strictly blinded to the details of the interim analysis results, the decision on whether or not to increase the sample size had to be announced. This announcement could have impact on the patient characteristics or conduct of the study. Here was a discussion about the announcement of increasing the sample size after the interim analysis at that time. It is clear the announcement of increasing the sample size have some impacts on the financial analyst, potentially also have some impacts on the study team / investigators in the study.   

           Sunesis Pharmaceuticals to Implement One-Time Sample Size Increase to Phase 3
           VALOR Trial in AML
 When last September the Data and Safety Monitoring Board (DSMB) recommended expanding the sample size of the study based on interim data that suggested a "promising" outcome, vosaroxin garnered even more investor attention.  Valor Trial Design And Alpha SpendAt the analyst meeting in October 2012, Sunesis provided an update on the adaptive design of the study that allows for a potential one-time sample size increase of the patient population. Based on its review, the DSMB recommended the Valor study increase the sample size to 675 patients for a 90% statistical power to detect a 30% overall survival difference (5 months versus 6.5 months) with an HR of 0.77. The DSMB concluded that the interim data indicated a "promising" outcome - ruling out futility and an "unfavorable" scenario, but falling short of a "favorable" result.
Based on the nuances of statistical analysis, ruling out both favorable and unfavorable scenarios for a promising outcome strongly suggests that vosaroxin was closer to non-inferiority and in need of a larger sample size in order to show a statistically significant treatment difference. It was a smart idea by management to utilize the first interim analysis of Valor as a proxy for a randomized Phase 2 study whereby it could better estimate the sample size needed to demonstrate a clinical effect. Powering the study has thus been the main factor in influencing its "promising" outcome

VALOR study is a well-conducted study. From the standpoint of the study implementation including the sample size re-estimation, the study is a success. However, the study failed to reach the statistical significance for the primary efficacy endpoint.

In the end, the statistics is about the uncertainty. While the sample size re-estimation can reduce the uncertainty to some degree, it can not eliminate the uncertainty. We will never be able to design a study to guarantee the success.

Saturday, November 01, 2014

Standard of Care (SOC) as Control Group in Clinical Trials

For randomized, controlled clinical trials, the selection of the control group is one of the key issues in the study design. This is why ICH has a specific guideline (E10) for “CHOICE OF CONTROL GROUP AND RELATED ISSUES IN CLINICAL TRIALS”. The choice of the control group will decide whether or not the trial is a superiority or non-inferiority study, double-blinded/single-blinded/open label, and will decide the sample size.

It becomes pretty common that the Standard of Care (SOC) may be chosen as the control group. We often run into an issue that for a specific disease (indication), there is no regulatory-approved therapy (existing therapy) and it is not ethical to conduct the Placebo-controlled study, the comparison of experimental therapy versus Standard of Care seems to be the only choice. 

What is the Definition of the SOC?

There is no standard definition for SOC from regulatory guidelines. According to  Webster’s New World Medical Dictionary, SOC is defined as “the level at which the average, prudent provider in a given situation would managed the patient’s care under the same or similar circumstances.”

From National Cancer Institute: “standard of care” is defined as “treatment that experts agree is appropriate, accepted, and widely used. Also called best practice, standard medical care,  and standard therapy.”

There are more definitions, but all similar.
“A standard of care is a formal diagnostic and treatment process a doctor will follow for a patient with a certain set of symptoms or a specific illness. That standard will follow guidelines and protocols that experts would agree with as most appropriate, also called "best practice."
In legal terms, a standard of care is used as the benchmark against a doctor's actual work. For example, in a malpractice lawsuit, the doctor's lawyers would want to prove that the doctor's actions were aligned with the standard of care. The plaintiff's lawyers would want to show how a doctor violated the accepted standard of care and was therefore negligent.”
Standards of care are developed in a number of ways: Sometimes they are simply developed over time, and in other cases, they are the result of clinical findings. In modern era, the SOC are typically based on the evidence-based medicine. The SOC are based on the results of clinical trials, the Meta analysis results if there are multiple clinical trials, and the Cochrane systematic review of evidences. The SOC may come out as suggestions and treatment guidelines issued by the professional societies. There are actually so many treatment guidelines by different professional societies and by different countries. Just to list a couple of treatment guidelines below:

§         National Comprehensive Cancer Network guidelines

§         Evidence-based guideline: Intravenous immunoglobulin in the treatment of neuromuscular disorders

Does A Standard of Care therapy have to be approved by regulatory authority (such as FDA)?

Not necessarily. As a matter of fact, some of the SOCs may not be regulated by FDA at al. For example, the surgery and the plasma exchange are techniques and procedures that may not be part of FDA regulation.

In FDA’s guidance  “Expedited Programs for Serious Conditions – Drugs and Biologics”, SOC was discussed as part of the discussions for ‘available therapy’. The guidance states:

“For purposes of this guidance, FDA generally considers available therapy (and the terms existing treatment and existing therapy) as a therapy that:
  §         Is approved or licensed in the United States for the same indication being considered for the new drug and
 §         Is relevant to current U.S. standard of care (SOC) for the indication
 FDA’s available therapy determination generally focuses on treatment options that reflect the current SOC for the specific indication (including the disease stage) for which a product is being developed. In evaluating the current SOC, FDA considers recommendations by authoritative scientific bodies (e.g., National Comprehensive Cancer Network, American Academy of Neurology) based on clinical evidence and other reliable information that reflects current clinical practice. When a drug development program targets a subset of a broader disease population (e.g., a subset identified by a genetic mutation), the SOC for the broader population, if there is one, generally is considered available therapy for the subset, unless there is evidence that the SOC is less effective in the subset.
 Over the course of new drug development, it is foreseeable that the SOC for a given condition may evolve (e.g., because of approval of a new therapy or new information about available therapies). FDA will determine what constitutes available therapy at the time of the relevant regulatory decision for each expedited program a sponsor intends to use (e.g., generally early in development for fast track and breakthrough therapy designations, at time of biologics license application (BLA) or new drug application (NDA) submissions for priority review designation, during BLA or NDA review for accelerated approval). FDA encourages sponsors to discuss available therapy considerations with the Agency during interactions with FDA.
 As appropriate, FDA may consult with special Government employees or other experts when making an available therapy determination.”

The newly issued  FDA Guidance on Available Therapy echoes the similar opinion:
available therapy (and the terms existing treatments and existing therapy) should be interpreted as therapy that is specified in the approved labeling of regulated products, with only rare exceptions.
 FDA recognizes that there are cases where a safe and effective therapy for a disease or condition exists but it is not approved for that particular use by FDA. However, for purposes of the regulations and policy statements described in Section III, which are intended to permit prompt FDA approval of medically important therapies, only in exceptional cases will a treatment that is not FDA-regulated (e.g., surgery) or that is not labeled for use but is supported by compelling literature evidence (e.g., certain established oncologic treatments) be considered available therapy.”
FDA guidance Non-Inferiority Clinical Trials answered the question if the active comparator for a non-inferiority study can be a product without label. The active comparator could be a SOC.

“Can a drug product be used as the active comparator in a study designed to show non-inferiority if its labeling does not have the indication for the disease being studied, and could published reports in the literature be used to support a treatment effect of the active control?
 The active control does not have to be labeled for the indication being studied in the NI study, as long as there are adequate data to support the chosen NI margin. FDA does, in some cases, rely on published literature and has done so in carrying out the meta-analyses of the active control used to define NI margins. An FDA guidance for industry on Providing Clinical Evidence of Effectiveness for Human Drug and Biological Products describes the approach to considering the use of literature in providing evidence of effectiveness, and similar considerations would apply here. Among these considerations are the quality of the publications (the level of detail provided), the difficulty of assessing the endpoints used, changes in practice between the present and the time of the studies, whether FDA has reviewed some or all of the studies, and whether FDA and the sponsor have access to the original data. As noted above, the endpoint for the NI study could be different (e.g., death, heart attack, and stroke) from the primary endpoint (cardiovascular death) in the studies if the alternative endpoint is well assessed”
How Standard are the Standards of Care?

It depends on the specific disease area and the available treatment. A standard of care in one country, one hospital may not necessarily be the same standard in another. Further, one doctor's standard can vary from another doctor's. In many cases, even though the same therapy is considered as the standard of care, the usage of the therapy may be quite different. For example, the tPA is considered as a standard of care in US to treat the leg attack (peripheral arterial occlusion). However, different medical centers and different doctors may give tPA therapy differently – the differences are reflected in the total amount of the tPA dose, bolus versus continuous infusion, infusion rate, total length of the tPA treatment.


The heterogeneity of the standard of care presents great challenges in conducting clinical trials using the standard of care as the control group. This issue was extensively discussed in FDA’s guidance on Chronic Cutaneous Ulcer and Burn Wounds — Developing Products for Treatment. If we think about doing a multi-national clinical trial with the standard of care as the control group, the challenges will be even greater or the trial is not entirely feasible because of the difficulties in defining the SOC for a specific disease treatment.  Here are the paragraphs from FDA’s guidance concerning about using the Standard of Care as the control group.
“Standard care refers to generally accepted wound care procedures, other than the investigational product, that will be used in the clinical trial. Good standard care procedures in a wound-treatment product trial are a prerequisite for assessing safety and efficacy of a product. Since varying standard care procedures can confound the outcome of a clinical trial, it is generally advisable that all participating centers agree to use the same procedures and these procedures are described within the clinical protocol. If it is not practical to apply uniform standard care procedures across study centers, randomization stratified by study center should be considered. It is also important that the sample size within study centers and wound care records be adequate to assess the effect of wound care variation.
A number of standard procedures for ulcer and burn care are widely accepted. Several professional groups have initiated development of care guidelines for ulcers and burns. The Agency does not require adherence to any specific guidelines, the basic principle being that standard care regimens in wound-treatment product trials should optimize conditions for healing and be prospectively defined in the protocol. The rationale for the standard care chosen should be included in the protocol, and the study plan should be of sufficient detail for consistent and uniform application across study centers. Case report forms (CRFs) should be designed such that, at each visit, investigators describe the type of ulcer or burn care actually delivered (e.g., extent of debridement, use of concomitant medications). For outpatients, the CRF should also capture compliance with standard care measures, including wound dressing, off-loading, and appropriate supportive factors, such as dietary intake.
The value of study site consistency in standard care regimens within a trial cannot be over-emphasized because of the profound effects these procedures have on clinical outcome for burns and chronic wounds. Consistency in standard care regimens is important for minimizing variability and allowing assessment of treatment effect. It may be reasonable to evaluate a single standard care regimen in early trials to minimize this variability. If comparison of an investigational product to more than one commonly used standard care option is desired, the overall development plan should include specific assessment of the effect of these standard care options on the experimental treatment. These common options should be identified and addressed prospectively in clinical trial design including being clearly described in the clinical protocol and compliance captured via the CRFs; criteria for data poolability should be defined prospectively. Every attempt should be made to minimize deviations from the procedures described in the protocol and subject compliance recorded in CRFs. If more than one standard care regimen is used in the same clinical trial, then randomized treatment allocation within strata defined by these options in standard care is important.”

To minimize the heterogeneity of the standard of care, cluster randomization may also be emplyed. As stated in FDA’s guidance “Antibacterial Therapies for Patients With Unmet Medical Need for the Treatment of Serious Bacterial Diseases”, with cluster randomization, “Patients enrolled at sites randomized to the standard-of-care arm would be treated no differently than is usual practice at that site, while patients enrolled at sites randomized to the investigational drug arm would be treated with the investigational drug.”

When a clinical trial uses standard of case as control group, should the study be designed as superiority or non-inferiority?

It depends on whether or not the experimental treatment group is a stand alone (without standard of case) or add-on (on top of the standard of care) therapy.

If the experimental treatment group is an add-on therapy and the experimental treatment is given on top of the existing standard of case, the trial design must be a superiority study to demonstrate that the add-on therapy is superior to the existing standard of case.

If the experimental treatment group is a stand alone therapy and can be given without the standard of care, the trial design can be either non-inferiority or superiority depending on the effect size of the experimental therapy.

In FDA’s guidance “Non-Inferiority Clinical Trials”, the ‘Add-on study’ was suggested  as an alternative to the non-inferiority study design. In the guidance, ‘treatment that are already available’ can include the standards of care. The combo therapy of the novel treatment plus the existing treatment must be shown to be superior to the existing treatment (standard of care alone) or the existing treatment + Placebo.

“Add-on study
In many cases, for a pharmacologically novel treatment, the most interesting question is not whether it is effective alone but whether the new drug can add to the effectiveness of treatments that are already available. The most pertinent study would therefore be a comparison of the new agent and placebo, each added to established therapy. Thus, new treatments for heart failure have added new agents (e.g., ACE inhibitors, beta blockers, and spironolactone) to diuretics and digoxin. As each new agent became established, it became part of the background therapy to which any new agent and placebo would be added. This approach is also typical in oncology, in the treatment of seizure disorders, and, in many cases, in the treatment of AIDS. “

“In this multicenter, randomized, controlled superiority trial, 542 patients scheduled for elective, high-risk abdominal surgery will be included. Patients are allocated to standard care (control group) or early goal-directed therapy (intervention group) using a randomization procedure stratified by center and type of surgery. In the control group, standard perioperative hemodynamic monitoring is applied. In the intervention group, early goal-directed therapy is added to standard care, based on continuous monitoring of cardiac output with arterial waveform analysis.”

Saturday, October 18, 2014

The fixed margin method or the two confidence interval method for obtaining the non-inferiority margin

For non-inferiority clinical trials, the key issue is to pre-specify the non-inferiority margin and the non-inferiority margin has to be based on the historical supporting data from the studies that compare the active control group with Placebo. If there are multiple historical studies comparing the active control group with Placebo, meta analysis will need to be performed. From the meta analysis, the point estimate and the 95% confidence interval will be obtained.

As indicated in FDA’s guidance "Non-Inferiority Clinical Trials", there are essentially two approaches to derive the non-inferiority margin:

“Having established a reasonable assumption for the control agent’s effect in the NI study,  there are essentially two different approaches to analysis of the NI study, one called the fixed  margin method (or the two confidence interval method) and the other called the synthesis method. Both approaches are discussed in later sections of section IV and use the same data  from the historical studies and NI study, but in different ways.”

The guidance further explained the fixed margin method as:
 “in the fixed margin method, the margin M1 is based upon estimates of the effect of the active comparator in previously conducted studies, making any needed adjustments for changes in trial circumstances. The NI margin is then pre-specified and it is usually chosen as a margin smaller than M1 (i.e., M2), because it is usually felt that for an important endpoint a reasonable fraction of the effect of the control should be preserved. The NI study is successful if the results of the NI study rule out inferiority of the test drug to the control by the NI margin or more. It is referred to as a fixed margin analysis because the past studies comparing the drug with placebo are used to derive a single fixed value for M1, even though this value is based on results of placebo-controlled trials (one or multiple trials versus placebo) that have a point estimate and confidence interval for the comparison with placebo. The value typically chosen is the lower bound of the 95% CI (although this is potentially flexible) of a placebo-controlled trial or meta-analysis of trials. This value becomes the margin M1, after any adjustments needed for concerns about constancy. The fixed margin M1, or M2 if that is chosen as the NI margin, is then used as the value to be excluded for C-T in the NI study by ensuring that the upper bound of the 95% CI for C-T is < M1 (or M2). This 95% lower bound is, in one sense, a conservative estimate of the effect size shown in the historical experience. It is recognized, however, that although we use it as a “fixed” value, it is in fact a random variable, which cannot invariably be assumed to represent the active control effect in the NI study.”

Suppose we are planning to design a non-inferiority study to compare a new experimental thrombolytic agent the meta analysis of "Thrombolysis for acute ischaemicstroke"

“Thrombolytic therapy, mostly administered up to six hours after ischaemic stroke, significantly reduced the proportion of patients who were dead or dependent (modified Rankin 3 to 6) at three to six months after stroke (odds ratio (OR) 0.81, 95% confidence interval (CI) 0.72 to 0.90).”


(Existing Thrombolysis Agent) / Placebo = 0.90 (0.90 is the upper bound of 95% confidence interval)

1-0.90 = 0.10 is the treatment effect of Existing Thrombolysis Agent in reduction in patients with unfavorable outcome

If we plan to do a trial to compare the new thrombolytic agent with Existing Thrombolysis Agent and we would like to preserve 50% of the treatment effect of Existing Thrombolysis Agent, the non-inferiority margin would be calculated as:


(new thrombolytic agent / Placebo)               0.90 + 0.10/2
_____________________ __________ =    __ _______              =  1.06
(Existing Thrombolysis Agent / Placebo)               0.90

The non-inferiority margin would be 1.06.

From the non-inferiority trial comparing New Thromblitic Agent with Existing Thrombolysis Agent, we will need to calculate the 95% confidence interval for odds ratio of (New Thrombolytic Agent / Existing Thromblysis Agent). We will then compare the upper bound of this 95% confidence interval with the non-inferiority margin of 1.06 calculated above. The non-inferiority can be declared if the upper bound of this 95% confidence interval is below the non-inferiority margin of 1.06 – This is why the fixed margin method is called two confidence interval method. Two confidence intervals are involved in the study design: the first 95% confidence interval is from the comparison of the Active Control with Placebo from the historical data; the second 95% confidence interval is from the comparison of the new experimental treatment with Active Control from the new non-inferiority trial.

Several comments on the fixed margin method:

1. Depending on the outcome being good or bad, either the lower bound or upper bound of 95% confidence interval of the Active Control versus Placebo should be used when deriving the non-inferiority margin

2. Depending on the statistics being the numeric difference (difference between two means) or ratio (for example, odds ratio, risk ratio, hazard ratio), the treatment effect M1 is based on the 95% CI of the difference (distance from 0) or the ratio (the distance from 1) including odds ratio, risk ratio, hazard ratio.


3.  While it is typical to choose a M2 (non-inferiority margin) to preserve at least 50% of the treatment effect of the active control group in comparison with Placebo, depending on the disease, the number of 50% may be adjusted. In the thrombolytic treatment for ischemia stroke situation, it may be acceptable to preserve 30-40% of the treatment effect of active control. In other words, in terms of the assay sensitivity, we are willing to accept a lose of large percentage of treatment effect of the active control group (over the historical placebo) in order to have a reasonable non-inferiority margin and to have a feasible sample size for the clinical trial.  

4. in some therapeutical areas (for example in antibacterial and orphan disease areas), there is no historical data to support the statistical justification of the non-inferiority margin and there is no data available for the calculate the first 95% confidence interval in deriving the non-inferiority margin. 

Additional Reading: 



Saturday, September 13, 2014

N of 1 Clinical Trial Design and its Use in Rare Disease Studies

In the beginning (February) of this year, I attended a workshop titled “Clinical Trial Design for Alpha-1 Deficiency: A Model for Rare Diseases”. During the meeting, the N of 1 design was mentioned as one of the study methods to address the challenges in clinical trials in rare disease areas.

This was echoed in FDA’s “Public Workshop – Complex Issues in Developing Drug and Biological Products for Rare Diseases”. Session 2: “Complex Issues for Trial Design: Study Design, Conduct and Analysis” had some extensive discussions about the N of 1 trial design and its potential use in rare disease clinical trials.

In a presentation by Dr. Temple in FDA titled “The Regulatory Pathway for Rare Diseases Lessons Learned from Examples of Clinical Study Designs for Small Populations”, N of 1 study design was mentioned along with other methods such as randomized withdrawal, enrichment, crossover designs.  

According to Wikipedia, “an N of 1 trial is a clinical trial in which a single patient is the entire trial, a single case study. A trial in which random allocation can be used to determine the order in which an experimental and a control intervention are given to a patient is an N of 1 randomized controlled trial. The order of experimental and control interventions can also be fixed by the researcher. “

While N of 1 is not commonly used in clinical trials, the concept of the N of 1 method with focusing on the single patient is actually pretty common in the clinical trial setting. There is some similarities between Aggregated N of 1 and the typically crossover design, especially the high order cross over design. For safety assessment in clinical trials, challenge – dechallenge – rechallenge (or CDR) is often used to assess if an event is indeed caused by the drug. CDR can be considered as an simple N of 1 design.
“Challenge–dechallenge–rechallenge (CDR) is a medical testing protocol in which a medicine or drug is administered, withdrawn, then re-administered, while being monitored for adverse events at each stage. The protocol is used when statistical testing is inappropriate due to an idiosyncratic reaction by a specific individual, or a lack of sufficient test subjects and unit of analysis is the individual. During the withdraw phase, the medication is allowed to wash out of the system in order to determine what effect the medication is having on an individual.
 CDR is one means of establishing the validity and benefits of medication in treating specific conditions as well as any adverse drug reactions. The Food and Drug Administration of the United States lists positive dechallenge reactions (an adverse event which disappears on withdrawal of the medication) as well as negative (an adverse event which continues after withdrawal), as well as positive rechallenge (symptoms re-occurring on re-administration) and negative rechallenge (failure of a symptom to re-occur after re-administration). It is one of the standard means of assessing adverse drug reactions in France.”
While N of 1 is the experiment on a single patient, using aggregated single patient  (N-of-1) trials will involve multiple patients – quantitative analyses become more feasible. See examples below for using aggregated N of 1 trials.  
N of 1 clinical trials could involve some complicated statistical analyses. See the discussions below:
N of 1 clinical trial design is rarely discussed in statistical conferences, perhaps because of the perception that not too much statistics is involved in the analysis of N of 1 study data. However, we do see that N of 1 study can be a very effective method in demonstrating the efficacy if the characteristics of the indication/drug fit.
One of the key questions is that the N of 1 study design is only applicable in certain situations – it  depends on the disease characteristics, treatment (short washout period), endpoint (quick measurements). We can see some of the discussions about the situations where the N of 1 study design may be used from the transcripts of the FDA Public Workshop on Complex Issues in Rare Disease Drug Development Complex Issues for Trial Design: Study Design, Conduct  and Analysis:
“Ellis Unger: We have no ... No comments right now, so let me put a question to the group. I  presented us a slide on the N of 1 study, which we almost never see. Just to remind you, the N of 1 study is a scenario where a patient doesn't contribute and end, but of course the treatment contributes an end, and of course the treatment can be capped in a certain number above ... weeks.
 Unless someone has the amount of interest, in which case, you give up on that course, that aborts that course to treatment and then they re-randomize. Are there therapies, disease states people around the table can think off that would be ... where this design could be applicable, because we don't see these studies. Dr. Walton?
 Marc Walton: I'll just mention that by firing away in all the clinical trials I've reviewed, the most powerful piece of evidence about the effectiveness of a drug came from a N of 1 type of study where it was a study with Pulmozyme cystic fibrosis where patients were treated Pulmozyme that are pulmonary function tested, then the Pulmozyme was discontinued and then tested again, and then several cycles, and I think it was maybe five cycles and you saw such remarkably reproducible effects that it was utterly convincing that the drug was effective for that.
The utility comes about though when you have, as you have said, a disorder that has enough stability and drugs that have a short enough washout period, that you are able to have that repeatedly look as if it was a new exposure to the patient. In disorders where we have that and treatments that are expected to have that sort of reversible effect, this N of 1 becomes a truly powerful piece of information, as well worth considering when those circumstances present themselves.
 Ellis Unger: Typically, a company will come in and say, "You randomize to our treatment or placebo. We're going to count the number of exacerbations or pain episodes or whatever over the course of the study." This is basically saying once you have one of these events, we're going to re-randomize you. Just again, so anybody around the room ... Okay, Dr. Summar.
 Marshall Summar: Yeah, it seems like from the intermediary metabolism, the effects where you have frequent attacks of hyperammonemia, acidosis, things like that, that actually might be a fairly ideal group washout for most of the treatment is pretty fast. That seems like a group where that might actually play out pretty well. I have to think about that but it seems to make some sense.
 Ellis Unger: Dr. Kakkis?
Nicole Hamblett: Thanks. I think the N of 1s, studies are incredibly intriguing and I think one thing I need to wrap my head around is the consigning by commons in medications, for instance, if you're measuring exacerbations during on and off periods in their treatment for that event could alter what's going to happen during the next events. I think that's a little bit difficult to the chronic study, but I guess I also wonder what are the parameters for being able to use an N of 1 study or N of 1 studies for your pivotal trial, as well as difficult enough to conduct confirmatory study. How would we define that for these types of newer or more customized study designs?
 Ellis Unger: Well, I think the N of 1 study, again, you have to have a treatment where there's an offset that's reasonably rapid and you're not expecting the effect on the disease to be ... the effect is not lasting. It's not like Dr. Hyde was mentioning in a gene therapy, as that would be the extreme opposite where you couldn't do this, but if you have something where there's an offset in a reasonable amount of time and patients are subjected to repeated events, I think that's the key.
 If it's progression and it happens slowly with time, you're not going to be able to do an N of 1 study, but if you have some episodic issue and you have a drug with a reasonable offset, I think it will lends itself to this and we're talking about a dozen patients to do a study, the whole deal, and that could be your phase 3 study. I mean that the example I showed was just about a dozen patients. You don't need a lot of patients. “

It is clearly that the N of 1 study design is not appropriate for a study with the efficacy endpoint measured for very long period of time. N of 1 study design may be applicable for short–term endpoints (bio-markers, metabolites, …). However, over the last 10-20 years, the direction of regulatory agencies is moving toward to the long-term endpoints. For an enzyme replacement therapy, a drug showing the increase in enzyme level would be considered sufficient for approval 20 years ago. Nowadays, an endpoint measuring the long term clinical benefit may be required. Similarly, for a thrombolytic agent, it is not sufficient to show the thrombolysis in short term, the long-term benefit of the thrombolytic agent will be required. This trend of requiring the long-term measures in efficacy endpoints make the N of 1 study design unlikely to be used in the licensure studies.

Saturday, September 06, 2014

Full Analysis Set and Intention-to-Treat Population in Non-randomized Clinical Trials?

Intention to treatment principle has now been an routine term in the statistical analysis for randomized, controled clinical trials. If a publication is for a randomized, controled clinical trial, it is almost universal that the intention to treatement principle will be mentioned even though the actual analysis may not exactly follow the intention to treat principle in some studies.

Strictly speaking, the intention to treatment principle indicates that the intention to treatment population includes all randomized patients in the groups to which they were randomly assigned, regardless of their adherence with the entry criteria, regardless of the treatment they actually received, and regardless of subsequent withdrawal from treatment or deviation from the protocol. See one of my early articles and the presentation on ITT versus mITT.

According to ICH E9 “STATISTICAL PRINCIPLES FOR CLINICAL TRIALS”, Full Analysis Set (FAS) is identical to the Intentio-to-Treat (ITT) population. It states:

“The intention-to-treat (see Glossary) principle implies that the primary analysis should include all randomised subjects. Compliance with this principle would necessitate complete follow-up of all randomised subjects for study outcomes. In practice this ideal may be difficult to achieve, for reasons to be described. In this document the term 'full analysis set' is used to describe the analysis set which is as complete as possible and as close as possible to the intention-to-treat ideal of including all randomised subjects.

Here both FAS and ITT population are tied to the randomization. However, in the real world, there are also a lot of non-randomized trials, for example, a clinical study without concurrent control, an early phase dose escalation study without a concurrent control, a long-term safety follow up study where all subjects receive the experimental medication. In these situations, since there is no randomization, it is inappropriate to define an ITT population even though the general principle should remain the same, ie, to preserve as many subjects as possible to avoid bias. The issue is that without randomization, what will be trigger point for defining the ITT population? It looks like that the trigger point could be the time of the administration of the first study medication. Instead of allocating subjects in ITT once randomized’, the subject is in ITT ‘once dosed’. This seems to be the case in the following example, according to CSL’s RIASTAP summary basis of approval, the ITT population was defined for a study without concurrent control and without randomization. It implied that the ITT population includes all subjects who received the study medication, which is essentially the same as the Safety population.

For non-randomized studies, it may be better to use Full Analysis Set instead of ITT population. It seems to be logical to define the full analysis set to include any subjects who receive any amount of the study medication. If this definition is used with the trigger point being the first dose of the study medication, most likely, the full anlaysis set and the safety population will be identical. It is not uncommon that we define two populations that is identidical, but use it for different analyses. For safety analyses, the safety population is used; for efficacy analyses, full analysis set is used.

Another term we can use in non-randomized studies is Evaluable Population which is usually defined as any subjects who receive any amount of the study medication and have at least one post-baseline efficacy measurement. Evaluable population in non-randomized clinical trials is similar to the modified ITT population in randomized clinical trials where some randomized subjects are excluded from the analysis with justifiable rationales.

While the ICH E9 did not use the term ‘modified Intention-to-Treat’, the following paragraphs are intented to provide the guidelines or examples when the subjects can be excluded from the full analysis data set or Intention to treatment population:

“There are a limited number of circumstances that might lead to excluding randomised subjects from the full analysis set including the failure to satisfy major entry criteria (eligibility violations), the failure to take at least one dose of trial medication and the lack of any data post randomisation. Such exclusions should always be justified.Subjects who fail to satisfy an entry criterion may be excluded from the analysis without the possibility of introducing bias only under the following circumstances:(i) the entry criterion was measured prior to randomisation;
(ii) the detection of the relevant eligibility violations can be made completely objectively;
(iii) all subjects receive equal scrutiny for eligibility violations; (This may be difficult to ensure in an open-label study, or even in a double-blind study if the data are unblinded prior to this scrutiny, emphasising the importance of the blind review.)
(iv) all detected violations of the particular entry criterion are excluded.
In some situations, it may be reasonable to eliminate from the set of all randomised subjects any subject who took no trial medication. The intention-to-treat principle would be preserved despite the exclusion of these patients provided, for example, that the decision of whether or not to begin treatment could not be influenced byknowledge of the assigned treatment. In other situations it may be necessary to eliminate from the set of all randomised subjects any subject without data post randomisation. No analysis is complete unless the potential biases arising from these specific exclusions, or any others, are addressed.
In some situations, it may be reasonable to eliminate from the set of all randomised subjects any subject who took no trial medication. The intention-to-treat principle would be preserved despite the exclusion of these patients provided, for example, that the decision of whether or not to begin treatment could not be influenced byknowledge of the assigned treatment. In other situations it may be necessary to eliminate from the set of all randomised subjects any subject without data post randomisation. No analysis is complete unless the potential biases arising from these specific exclusions, or any others, are addressed.
Because of the unpredictability of some problems, it may sometimes be preferable to defer detailed consideration of the manner of dealing with irregularities until the blind review of the data at the end of the trial, and, if so, this should be stated in the protocol.”
In summary, while the general principle is the same, the different terms may be preferred to be used depending on a study being a randomized or non-randomized.

Randomized studies
Non-randomized studies
ITT
Full Analysis Set
Safety
Safety
mITT
Evaluable


Interesting talks about the Intention to treatment principle:


Friday, August 29, 2014

Subgroup Analysis in Clinical Trials - Revisited

I had previously written an article about the sub-group analysis in clinical trials. I would like to revisit this topic. The subgroup analysis has been one of the regular discussion topics in statistical conferences recently. The pitfalls of the subgroup analyses are well-understood in statistical communities. However, the subgroup analyses in regulatory setting for product approval, in multi-regional clinical trials, in confirmatory trials are quite complicated.

EMA is again ahead of FDA in issuing its regulatory guidelines on this topic. Following an expert workshop on subgroup analysis, EMA issued its draft guideline titled “Guideline on the investigation of subgroups in confirmatory clinical trials”. In addition to the general considerations, they provided the guidelines on issues to be addressed during the study planning stage and the issues to be addressed during the assessment stage.

In practice, the sub-group analysis is almost always conducted. For a study with negative results, the purpose of the sub-group analysis is usually to see if there is a sub-group where the statistical significant results can be found. For a study with positive results, the purpose of the sub-group analysis is usually to see if the result is robust across different sub-groups. The sub-group analysis is not just performed in industry sponsor trials, it may even more often performed in academic clinical studies for publication purpose.

Sometimes it is not so easy to explain the caveats of the sub-group analysis (especially the unplanned sub-group analysis) to non-statisticians. The explanation of the sub-group analysis issues needs the good understanding of the multiplicity adjustments and the statistical power. I recently saw some presentation slides on sub-group analysis issues and pitfalls of the sub-group analysis were well explained in the table below. Either way can make the sug-group analysis results unreliable.


Dr George (2004) “Subgroup analyses in clinical trials
When H0 is true
Increased probability of type I error
Too many “differences”
  • Because the probability of each “statistically significant difference” not being real is 5%
  • So lots of 5% all add together
  • Some of the apparent effects (somewhere) will not be real
  • We have no way of knowing which ones are and which ones aren’t
When H1 is true
Decreased power (increased type II error) in individual subgroup
  • Not enough “differences”
  • The more data we have, the higher the probability of detecting a real effect (“power”)
  • But sub-group analyses “cut the data”
  • Trials are expensive and we usually fix the size of the trial to give high “power” to detect important differences overall (primary efficacy endpoint)
  • When we start splitting the data (only look at men, or only look at women, or only look at renally impaired; or only look at the elderly; etc., etc.), the sample size is smaller … the power is much reduced 

In clinical trials for licensure, the regulatory agencies such as FDA may require the sub-group analyses (planned or unplanned) to see if the results are consistent across different sub-groups or if there are different risk-benefit profiles across different sub-groups. The reviewers may also perform their own sub-group analyses. However, they are aware of the pitfalls of these sub-group analyses. The recently approved Zontivity by FDA is a great example for this exact issue. See Pink Sheet article "FDA Changed Course On Zontivity Because Of Skepticism Of Subgroups At High Levels". Initially, FDA reviewers performed sub-group analyses and identified that the subjects with weight less than 60 kg had different risk-benefit profile comparing to subjects with weight greater than 60 kg. An advisory committee meeting was organized to discuss the issue if the approved indication should be limited to the specific sub-group. However, eventually FDA changed the course and did not impose the label restriction for specific sub-group. They commented that “The point is that one has to be careful not to over-interpret these subgroup findings.”









Friday, August 15, 2014

SAE Reconciliation and Determining/recording the SAE Onset Date


Traditionally clinical operations and drug safety / pharmacovigilence departments have elected to independently collect somewhat different sets of safety data from clinical trials. For serious adverse events (SAE), drug safety / pharmacovigilence department will collect the information through the SAE form and the information will be maintainted in a safety database. In clinical operation or data management departments, the adverse events (AE) including SAEs will be collected on case report form (CRFs) or eCRFs if it is an EDC study. For SAEs, the information from safety database and clinical database come from the same source (the investigational sites).  During the study or at the end of the study, the key fields regarding the SAEs from two independently maintained databases will need to be reconciled and the key data fields must match in both databases.

A poster by Chamberlain et al “Safety Data Reconciliation for Serious Adverse Events (SAE)” has nicely described the SAE reconciliation process. They stated that for these fields to be reconciled, “some will require a one to one match with no exception, while some may be deemed as acceptable discrepancies based on logical match. “

They also gave examples for fields which require an exact match or logical determination are in the following Table 1



Among these fields, the onset date is the one usually causing problems. It is due to the different interpretation of the regulatory guidelines by the clinical operations and the drug safety/pharmacovigilence departments. The onset date of SAE could be reported as the first date when signs and symptoms appears or as the date when the event meets one of the following SAE criteria (as defined in ICH E2A).

* results in death,
* is life-threatening,
* requires inpatient hospitalisation or prolongation of existing hospitalisation,
* results in persistent or significant disability/incapacity, or
            * is a congenital anomaly/birth defect.

Klepper and Dwards did a survery and published their results in their paperIndividual Case Safety Reports – How to Determine the Onset Date of an Adverse Reaction”. The results indicated the variability of determining the onset date of a suspected adverse reaction. They recommend that a criterion for onset time, i.e., beginning of signs or symptoms of the event, or date of diagnosis, be chosen as the standard.

However, many companies and organizations (such as NIH and NCI) indicated in their SAE completion guidelines that event start date should be the date when the event satisfied one of the serious event criteria (for example, if criteria “required hospitalization” was met, the date of admission to the hospital would be the Event Start Date).  If the event started prior to becoming serious (was less severe), it should be recorded on the AE page as non-serious AE with a different severity.

In NIDCR Serious Adverse Event Form Completion Instructions, SAE onset date is torecord the date that the event became serious

In SAE Recording and Reporting Guidelines for Multiple Study Products by Division of Microbiology and Infectious Disease, NIH, the onset date of SAE is instructed to be the date the investigator considers the event to meet one of the serious categories

In the HIV Prevention Trials Network, Adverse Event Reporting and Safety Monitoring section indicated that  

“If an AE increases in severity or frequency (worsens) after it has been reported on an Adverse Experience Log case report form, it must be reported as a new AE, at the increased severity or frequency, on a new AE Log. In this case, the status outcome of the first AE will be documented as “severity/frequency increased.” The status of the second AE will be documented as “continuing”. The outcome date of the first AE and the onset date of the new (worsened) AE should be the date upon which the severity or frequency increased.”

In Serious Adverse Event Form Instructions for Completion by National Cancer Institute Division of Cancer Prevention, the event onset date is to be entered as the date the outcome of the event fulfilled one of the serious criteria.

In Good Clinical Practice Q&A: Focus on Safety Reporting in Journal of Clinical Research Best Practice, it contains the following example for reporting SAE onset date.

What would an SAE’s onset date be if a patient on study develops symptoms of congestive heart failure (CHF) on Monday and is admitted to the hospital the following Friday?
 If known, the complete onset date (month-day-year) of the first signs and/or symptoms of
the most recent CHF episode should be recorded. In this case, it would be Monday. If the
onset date of the first signs and/or symptoms is unknown, the date of hospitalization or
diagnosis should be recorded.”

If the SAE onset date is recorded as the date when one of the SAE criteria is met (this seems to be more popular in practice), it may essentially require the splitting of the event. If an event start as non-serious and later on meet one of the serious criteria, the same event will be recorded as two events: one as non-serious event with onset date being the first sign and symptom date and one as serious adverse event with the onset date being the date when one of the SAE criteria is met. Therefore this approach results in a late onset date and a short SAE duration; but double counting perhaps the same event.


If the SAE onset date is recorded as the date when the first sign or symptom appears, it will result in an early onset date and a longer SAE duration. Since SAE reporting to the regulatory authorities / IRBs is based on the SAE onset date, this may be more stringent in meeting the SAE reporting requirement.