Monday, March 15, 2021

Drug-related AEs, AE Causality, AE relationship, and SUSAR

In pre-marketing clinical trials or post-marketing drug uses, adverse events (AEs) can always occur. When AEs are reported, assessments need to be made to judge if the reported AEs are caused by the study drug or study treatment. There are different terms to describe this assessment: drug-related, causality, relationship to the study drug, attributable to the study drug,...

In clinical trials, the causality assessment is made by the investigators. In post-marketing experience (surveillance or spontaneous reporting), all reported AEs are considered as related to the drug - so-called 'adverse drug reaction'. 

In an early post, I discussed "Causality Assessment, Causality Categories for Reporting Adverse Events or Adverse Reactions" and summarized various ways to categorize the AE causality: 

ICH E2B
Not Related, Unlikely Related, Possibly Related, Related)
CDISC
Not Related, Unlikely Related, Possibly Related, Related).
WHO
Certain, Probable/Likely, Possible, Unlikely, Conditional/Unclassified
Other options
None, Unlikely, Possible, Probable, Not assessable

In a recently published CDISC CDASH eCRF form for AEs, the causality is simply assessed as Yes, No.  


For statistical analyses of the AE data, drug-related AEs will usually be summarized and analyzed. Drug-related AEs are usually defined as the AEs with causality assessment at least probably related to the investigational study drug (including the placebo). For example, using ICH2B or CDISC criteria, AEs with causality assessment of 'possibly related', 'related', will be considered as 'drug-related. If CDISC CDASH eCRF is followed, the drug-related AEs will be those with 'Yes' answer to the 'Relationship to Study Treatment' question. Some of my European colleagues once argued that the 'unlikely' category should also be considered as 'drug-related - but I never found any regulatory guidance to support this.  

It is difficult to compare drug-related AEs across different studies that are conducted by different sponsors because the number of causality categories may be different in different studies. The number of categories can be anywhere between 2 (yes/no) to 5 categories. 

In some studies, additional causality assessments may be performed to judge if the AEs are related to the background therapies (especially in studies with add-on design) or related to the medical device (such as inhalation device and infusion pumps). The same AE could be assessed to be related to the study treatment, background therapy, and/or the medical device. 

For the past year, we have seen plenty of headlines about the AE causality assessment in actions in Covid-19 vaccine studies and in gene therapy studies. AE causality can be assessed on an individual case (the unexpected event in AZ's and J&J's Covid-19 vaccine trials) or on an aggregate level (in the case of blood clots by AZ's Covid-19 vaccine). 

Causality assessment based on the individual case:

"After a thorough evaluation of a serious medical event experienced by one study participant, no clear cause has been identified. There are many possible factors that could have caused the event. Based on the information gathered to date and the input of independent experts, the Company has found no evidence that the vaccine candidate caused the event."

Causality assessment on an aggregate level: 


"A careful review of all available safety data of more than 17 million people vaccinated in the European Union (EU) and UK with COVID-19 Vaccine AstraZeneca has shown no evidence of an increased risk of pulmonary embolism, deep vein thrombosis (DVT) or thrombocytopenia, in any defined age group, gender, batch or in any particular country.

So far across the EU and UK, there have been 15 events of DVT and 22 events of pulmonary embolism reported among those given the vaccine, based on the number of cases the Company has received as of 8 March. This is much lower than would be expected to occur naturally in a general population of this size and is similar across other licensed COVID-19 vaccines. The monthly safety report will be made public on the European Medicines Agency website in the following week, in line with exceptional transparency measures for COVID-19.
One type of AEs requires special attention and needs to be reported to the regulatory agencies and local IRB/ECs expeditiously. It is called Suspected Unexpected Serious Adverse Reaction (SUSAR). According to FDA guidance Safety Reporting Requirements for INDs and BA/BE Studies, SUSARs are those AEs meeting the following criteria: 

  • Serious (S)
  • Unexpected (U)
  • Suspected Adverse Reactions (SAR)

Fatal or life-threatening SUSAR should be reported to FDA no later than 7 days; Others SUSAR should be reported to FDA no later than 15 days.

We saw the news that Bluebird Bio temporarily suspended their gene therapy clinical trials due to a reported SUSAR of acute myeloid leukemia (AML).


Bluebird bio did their assessment and ruled that Gene Therapy for Sickle Cell Not Linked to Cancer (AML event)

"The company released a statement yesterday (March 10) claiming an investigation has found “it is very unlikely” the AML is related to the therapy and the firm is seeking approval from the US Food and Drug Administration (FDA) to resume the trials.

“VAMP4 has no known association with the development of AML nor with processes such as cellular proliferation or genome stability,” Bluebird’s Chief Scientific Officer Philip Gregory says in the press release. Furthermore, the patient’s cells had mutations in other genes, which are related to leukemia."
Some additional comments on AE causality assessment:
  • AEs that occurred prior to the first dose of study treatment (i.e., non-treatment-emergent AEs) should always have the causality 'unrelated' to the study treatment - an edit check should be in place to prevent the investigators to enter a non-TEAE as drug-related. 
  • In blinded studies, if a SUSAR event is reported, the individual patient's treatment assignment should be unblinded so that the sponsor can assess the causality and report the SUSAR event appropriately. 
  • While drug-related AEs are typically summarized, analyzed, and included in the clinical study report, FDA reviewers will focus their review on all AEs and all SAEs regardless of the causality. Drug-related AEs are usually not included in the drug-label. Instead, the drug label will list the most frequent AEs (whether or not they are drug-related).  
  • Sometimes, causality assessment by the investigator may be subjective and arbitrary to some degree. Important events (such as SUSAR and AESI (AE of special interest)) may be further reviewed by the sponsor, data monitoring committee, clinical event adjudication committee, and regulatory agencies. For example, some oncology drugs may induce pneumonitis/interstitial lung disease and these events of pneumonitis/interstitial lung disease can be reviewed and adjudicated by a committee. 
  • Statistical summary and analysis of AE causality are always based on the assessment by the investigator that is recorded in the clinical database. Causality assessment by the sponsor and other parties is not part of the clinical database and will be analyzed separately.
Additional References: 

Wednesday, March 10, 2021

Intention-to-Treat Principle versus Treatment Policy Estimand: Different Names, but Same Meaning?

ICH E9 "Statistical Principles for Clinical Trials" was finalized in February 1998. The E9 guidelines established the Intention-to-Treat principle for the design and analysis of clinical trials. With the intention-to-treatment principle, we are required to include all study participants (full analysis set) in the analyses. Here are the definitions for 'full analysis set' and 'intention-to-treat principle' from ICH E9. 



In 90's, it took a while for the people to understand and accept the concept of the intention-to-treat principle. We also see that the intention-to-treat principle was misused, over-used, or undercut by the use of practical intention-to-treat and modified intention to treat. I had a presentation (in 2004) about the misuse/overuse of intention-to-treat and modified intention-to-treat. What I said then is still applicable today. 

The strict definition of intention-to-treat can be traced back to the book chapter by Fisher, LD et al. Intention to treat in clinical trials in Statistical Issues in Drug Research and Development. Edited by Peace KE (1990). The intention-to-treat was defined as:

Includes all randomized patients in the groups to which they were randomly assigned, regardless of their adherence with the entry criteria, regardless of the treatment they actually received, and regardless of subsequent withdrawal from treatment or deviation from the protocol

The intention-to-treat principle includes all randomized subjects in the analyses and ignores what happens to the subjects after the randomization (whether or not the subject discontinued the study drug, took prohibited or rescue therapies, crossed over the alternate treatment,...), which is obviously not the best option in estimating the treatment effect in some situations.  This leads to the development of Addendum to ICH E9 "ICH E9 (R1) Estimands and Sensitivity Analysis in Clinical Trials". ICH E9 (R1) explained the issues with the intention-to-treat principle and introduced the new concept of estimands (including treatment policy estimand) and intercurrent events. 

This addendum clarifies and extends ICH E9 in respect of the following topics. Firstly, ICH E9 introduced the Intention-To-Treat (ITT) principle in connection with the effect of a treatment policy in a randomised controlled trial, whereby subjects are followed, assessed and analysed irrespective of their compliance to the planned course of treatment, indicating that preservation of randomisation provides a secure foundation for statistical tests. Multiple consequences arising from the ITT principle can be distinguished. Firstly, that the trial analysis should include all subjects relevant for the research question. Secondly, that subjects should be included in the analysis as randomised. Taken directly from the definition of the ITT principle (see ICH E9 Glossary), a third consequence is that subjects should be followed-up and assessed regardless of adherence to the planned course of treatment and that those assessments should be used in the analysis. It remains undisputed that randomisation is a cornerstone of controlled clinical trials and that analysis should aim at exploiting the advantages of randomisation to the greatest extent possible. However, the question remains whether estimating an effect in accordance with the ITT principle always represents the treatment effect of greatest relevance to regulatory and clinical decision making. The framework outlined in this addendum gives a basis for describing different treatment effects and some points to consider for the design and analysis of trials to give estimates of these treatment effects that are reliable for decision making. Secondly, issues considered generally under data handling and “missing data” (see Glossary) are re-visited. Two important distinctions are made. 

With the intention-to-treat principle, subjects who discontinued the study drug prematurely should continue to be followed up and the data after dose discontinuation should continue to be collected. However, in practice for many studies, the data collection was stopped for subjects who discontinued the study drug, or the data collected after subjects' discontinuation of study drug were collected, but not used in the analyses. To some extent, the intention-to-treat principle was not fully followed. That is why the FDA has issued its guidance "Data Retention When Subjects Withdraw from FDA-RegulatedClinical Trials" to encourage the data collection after the subjects withdraw from the study. As discussed in the guidance:

The validity of a clinical study would also be compromised by the exclusion of data collected during the study. There is long-standing concern with the removal of data, particularly when removal is non-random, a situation called “informative censoring.” FDA has long advised “intent-to-treat” analyses (analyzing data related to all subjects the investigator intended to treat), and a variety of approaches for interpretation and imputation of missing data have been developed to maintain study validity. Complete removal of data, possibly in a non-random or informative way, raises great concerns about the validity of the study. 

The addendum to ICH E9 introduced the concept of estimands and intercurrent events. Those events that occurred after the randomization were previously ignored even though the analyses were under the intention-to-treat principle. With the addendum, Those events that occurred after the randomization would be called 'intercurrent events'. Here is the official definition of the intercurrent events:

Intercurrent Events:
Events occurring after treatment initiation that affect either the interpretation or the existence of the measurements associated with the clinical question of interest. It is necessary to address intercurrent events when describing the clinical question of interest in order to precisely define the treatment effect that is to be estimated.

Estimands can be classified based on the strategies of handling the intercurrent events. One way to handle the intercurrent events is the 'treatment policy' strategy - therefore, we have a treatment policy estimand. The treatment policy estimand under the addendum is almost identical to the intention-to-treatment principle under the original ICH E9. 

Treatment policy strategy
The occurrence of the intercurrent event is considered irrelevant in defining the treatment effect of interest: the value for the variable of interest is used regardless of whether or not the intercurrent event occurs. For example, when specifying how to address use of additional medication as an intercurrent event, the values of the variable of interest are used whether or not the patient takes additional medication.
If applied in relation to whether or not a patient continues treatment, and whether or not a patient experiences changes in other treatments (e.g. background or concomitant treatments), the intercurrent event is considered to be part of the treatments being compared. In that case, this reflects the comparison described in the ICH E9 Glossary (under ITT Principle) as the effect of a treatment policy.

The intention-to-treat and treatment policy estimand are two different names with the same meaning. If we have to differentiate them, we can say that the intention-to-treatment principle is more focused on which subjects should be included in the analyses while the treatment policy estimand is more focused on which data points should be included in the analyses. If a randomized subject has an intercurrent event (for example, discontinued the study treatment), the subject is still included in the intention-to-treatment population for analysis, but will the measures after the subject's discontinuation of the study treatment be included in the analyses? With the treatment policy estimand, these measures after the subject's discontinuation of the study treatment will need to be included in the analyses. 

Here is a thread discussing the difference between the Intention-to-treat principle and the treatment policy estimand in resident360.nejm.com



We have started to see that the ICH E9 addendum and the concept of estimands are gradually adopted, especially in EU countries. The adoption of the ICH E9 in the US is much slower than in EU countries. The concept of estimands and intercurrent events is still considered as the words invented by statisticians. It will take a while for non-statisticians to understand the concept and to accept these new terms. A presentation "Regulator’s experience with estimands" by Andreas Brandt from EMA summarized the challenges for the adoption and implementation of the ICH E9 Addendum. We will anticipate the difficulties ahead for non-statisticians and clinicians to accept the concept of estimand and intercurrent events. This is reflected in a paper by Min & Bain "Estimands in diabetes clinical trials"

During 2019 several type 2 diabetes trials results using the term estimand were published. This word will be unfamiliar to many clinicians (and to spellcheck) but given that regulatory bodies have endorsed its use, this word is likely to become a staple of medical jargon in the future.

ICH E9 Addendum described five different strategies for handling the intercurrent events: treatment policy strategy, hypothetical strategy, composite variable strategy, while on treatment strategy, and principle stratum strategy. However, in practice, the treatment policy estimand is used the vast majority of the studies where the estimand concept is mentioned. There are a few studies using the principle stratum strategy. The other three strategies (hypothetical strategy, composite variable strategy, while on treatment strategy) are rarely used in practice perhaps because they are relatively new, are uncertain with the regulatory acceptance, and because there is no available method to estimate the treatment difference for some estimands.  

If the vast majority of the estimand application is treatment policy strategy which is almost identical to the traditional intention-to-treat principle, we will question if it is worth revamping the entire ICH E9 to come up with an addendum for estimand and intercurrent event concept.  

Monday, February 22, 2021

Randomization Using Envelopes In Randomized, Controlled, and Blinded Clinical Trials

I read an article by Clark et al “Envelope use and reporting in randomized controlled trials: A guide for researchers”. The article reminds me of the old times when envelopes were the popular ways for randomization and blinding (treatment concealment). In the 1990s and 2000s, for randomized, blinded clinical trials, the concealed envelope is the only way for the investigator to do the emergency unblinding (or code breaking) and sometimes the way to administer the randomization for single-blinded studies.

In Berende et al (2016, NEJM) “Randomized Trial of Longer-Term Therapy for Symptoms Attributed to Lyme Disease”, the study protocol described the following procedure for "unblinding of randomization" where sealed envelopes were used.  

I used to be an unblinded statistician to prepare the randomization schedule (including the randomization envelopes) for clinical trials. The following procedures will need to be followed:

  • Based on the study protocol, develop the randomization specifications describing randomization ratio, stratification factors, block size, the number of randomization codes, recipients of the randomization schedule, or code-break envelopes
  • Generate the dummy randomization schedule for the study team to review and approval
  • Replace the random seed to generate the final randomization schedule (a list of all randomized assignments)
  • Prepare the randomization envelopes (randomization number, stratification factors outside the envelope, and treatment assignment inside the envelope)
  • QC the randomization envelopes (to make sure that inside/outside information matches the randomization schedule
  • Shipping and tracking

For double-blinded studies, both the investigator and the patient are blinded to the treatment assignment. The randomization schedule will usually be sent to a third party (for example, the pharmacist) who is unblinded to the treatment assignment and can prepare the study drug for dispensing or administration. The third-party (for example, the pharmacist) must not be involved in other aspects of the clinical trial conduct. The concealed envelopes can be sent to the investigators for emergency unblinding. If there is a medical emergency requiring the unblinding of an individual subject, the investigator can open the code break envelope to reveal the treatment assignment for the specific subject.

For single-blinded studies, the investigator is unblinded to the treatment assignment and the patient is blinded to the treatment assignment. The randomization schedule and/or the randomization envelopes can be sent to the investigators.

Nowadays, randomization through envelopes is obsolete. The randomization procedures are integrated into the overall CTM (clinical trial material)  management process through the IRT (interactive response technologies). In the last 20 years, the randomization process has shifted from randomization envelopes -> IVRS (interactive voice response system) -> IWRS (Interactive Web Response System) - > IRT.

With IRT, the randomization schedule will be sent to the IRT vendor and uploaded into the IRT system. The study team members can be assigned different levels of access to the IRT system depending on their roles in the study. The investigators and pharmacovigilance personnel can be granted the emergency access code for them to gain the access to the treatment assignment in IRT when necessary.  

However, in some situations, randomization envelopes may still the best way for implementing the randomization.

In a study by Chetter et al “A Prospective, Randomized, MulticenterClinical Trial on the Safety and Efficacy of a Ready-to-Use Fibrin Sealant as an Adjunct to Hemostasis during Vascular Surgery”, the randomization occurred in the operation room and only after the target bleeding site (TBS) was identified after the surgical procedure. There would not be ideal for the surgeon (the investigator) to log into the IRT system to obtain the treatment assignment information. The better approach would be for the surgeon or surgeon’s assistant to open the randomization envelope to obtain the treatment assignment information in the operation room. The randomization procedure was described as the following in the paper:

Randomization

In the Primary Study, patients were randomized 2:1to treatment with FS Grifols or MC after the identification of the TBS during the procedure. Treatment group assignments were generated by the randomization function of the statistics software and communicated using sealed opaque envelopes. Due to the obvious differences between the 2 treatments, blinding of investigators was not possible following randomization

Additional Reads:

Monday, February 01, 2021

BLQs (below limit of quantification) and LLOQ (Lower Limit of Quantification): how to handle them in analyses?

In data analyses of the clinical trial, one type of data is the laboratory data containing the results measured by the central laboratory or specialty laboratory on the specimen (blood sample, plasma or serum sample, urine sample, bronchoalveolar lavage,...) collected from clinical trial participants. The laboratory results are usually reported as quantitative measures in numeric format. However, sometimes, we will see the results reported as '<xxx' or 'BLQ'.

The laboratory measures rely on the assay and the assay has its limit and can only accurately measure the level or concentration to a certain degree - the limit is called the Lower Limit of Quantification (LLOQ) or the Limit of Quantification (LOQ) or the Limit of Detection (LOD). 

In FDA's guidance (2018) "Bioanalytical Method Validation", they defined the Quantification range, LLOQ and ULOQ: 

The quantification range is the range of concentrations, including the ULOQ and the LLOQ that can be reliably and reproducibly quantified with accuracy and precision with a concentration-response relationship.

Lower limit of quantification (LLOQ): The LLOQ is the lowest amount of an analyte that can be quantitatively determined with acceptable precision and accuracy.

Upper limit of quantification (ULOQ): The ULOQ is the highest amount of an analyte in a sample that can be quantitatively determined with precision and accuracy.

According to the article by Vashist and Luong "Bioanalytical Requirements and Regulatory Guidelines for Immunoassays". The LLOQ and LOQ are different. In practice, the LLOQ and LOQ may be used interchangeably. 

The LOQ is the lowest analyte concentration that can be quantitatively detected with a stated accuracy and precision [24]. However, the determination of LOQ depends on the predefined acceptance criteria and performance requirements set by the IA developers. Although such criteria and performances are not internationally adopted, it is of importance to consider the clinical utility of the IA to define such performance requirements.

The LLOQ is the lowest calibration standard on the calibration curve where the detection response for the analyte should be at least five times over the blank. The detection response should be discrete, identifiable, and reproducible. The precision of the determined concentration should be within 20% of the CV while its accuracy should be within 20% of the nominal concentration.

In FDA's guidance "Studies to Evaluate the Metabolism and ResidueKinetics of Veterinary Drugs in Food-ProducingAnimals: Validation of Analytical Methods Used in Residue Depletion Studies", the LOD and LOQ are differentiated a little bit. 

3.4. Limit of Detection
The limit of detection (LOD) is the smallest measured concentration of an analyte from which it is possible to deduce the presence of the analyte in the test sample with acceptable certainty. There are several scientifically valid ways to determine LOD and any of these could be used as long as a scientific justification is provided for their use. 
3.5. Limit of Quantitation
The LOQ is the smallest measured content of an analyte above which the determination can be made with the specified degree of accuracy and precision. As with the LOD, there are several scientifically valid ways to determine LOQ and any of these could be used as long as scientific justification is provided. 

If the level or concentration is below the range that the assay can detect, it will be reported as the BLQ (Below the Limit of Quantification), BQL (Below Quantification Level), BLOQ (Below the Limit Of Quantification), or <xxx where xxx is the LLOQ. The results are seldom reported as 0 or missing since the result is only undetectable using the corresponding assay. It is usually agreed that the BLQ values are not missing values - they are measured, but not measurable. 

In clinical laboratory data with the purpose of safety assessment, the BLQ or <xxx is reported in the character variable. When converting the character variable to the numerical variable, the BLQ or <xxx will be automatically treated as missing unless we do something. The following four approaches may be seen in handling the BLQ values (with an example assuming LLOQ 0.01 ng/mL). 

Reported Value

Converted Value

Explanation

< 0.01 ng/mL

missing

The specific measure will be set to missing and will not be included in summary and analysis.

< 0.01 ng/mL

0

The specific measure will be set to 0 in summary and analysis

< 0.01 ng/mL

0.005 ng/mL

Half of the LLOQ – commonly used in clinical pharmacology studies (Bioavailability and Bioequivalence studies)

<0.01 ng/mL

0.01 ng/mL

Ignore the less than the ‘<’ sign and take the LLOQ as the value for summary and analysis. This approach can also handle the values beyond the ULOQ (upper limit of quantification), for example, '>1000 ng/mL' by removing the greater than '>' sign.

In clinical pharmacology studies (bioavailability and bioequivalence studies), series pharmacokinetic (PK) samples will be drawn and analyzed to get a PK profile for a specific compound or formulation. The series samples will include a pre-dose sample (the sample drawn before the dosing) and multiple time points after the dosing. It is entirely possible to have results reported as BLQ especially for the pre-dose sample and the late time points. BLQ values can also be possible for samples in the middle of the PK profile (i.e., between two samples with non-BLQ values). The rules for handling these BLQs are different depending on the samples at pre-dose, at the middle of the profile, and at the end of the PK profile (with an example assuming LLOQ 0.01 ng/mL)

 Timepoint

Reported Value

Converted Value

Explanation

Pre-dose sample for a compound with no endogenous level

< 0.01 ng/mL

0

The BLQ(s) occurring before the first quantifiable concentration will be set to zero. 

Pre-dose sample for a compound with endogenous level or pre-dose at the steady-state

< 0.01 ng/mL

0.005 ng/mL

The endogenous pre-dose level will be set to half of the LLOQ. 

In multiple-dose situation, the pre-dose sample (trough or Cmin) is set to half of the LLOQ

At middle of the PK profile or between two non-BLQ time points

< 0.01 ng/mL

missing

The BLQ values between the two reported concentrations will be set to missing in the analysis – essentially the linear interpolation rule will be used in AUC calculation.

The last time point(s) of the PK profile

< 0.01 ng/mL

or

0.005 ng/mL

It is common to set the last BLQ(s) to 0 to be consistent with the rule for pre-dose BLQ handling. According to FDA's "Bioequivalence Guidance", "For a single dose bioequivalence study, AUC should be calculated from time 0 (predose) to the last sampling time associated with quantifiable drug concentration AUC(0-LOQ)."

In some situations, the BLQ values after the last non-BLQ measure can also be set to half of the LLOQ.

There are some discussions that these single imputation methods will generate biased estimates. In a presentation by Helen Barnett et al "Non-compartmental methods for BelowLimit of Quantification (BLOQ)responses", they concluded:

It is clear that the method of kernel density imputation is the best performing out of all the methods considered and is hence is the preferred method for dealing with BLOQ responses in NCA. 

In a recent paper by Barnetta et al (2021 Statistics in Biopharmaceutical Research) "Methods for Non-Compartmental Pharmacokinetic AnalysisWith Observations Below the Limit of Quantification", eight different methods were discussed for handling the BLQs (or BLOQs). The authors conclude that the kernel-based method performs best for most situations.

  • Method 1; replace BLOQ values with 0
  • Method 2: replace BLOQ values with LOQ/2
  • Method 3: regression on order statistics (ROS) imputation
  • Method 4: maximum likelihood per timepoint (summary)
  • Method 5: maximum likelihood per timepoint (imputation)
  • Method 6: Full Likelihood
  • Method 7: Kernel Density Imputation
  • Method 8: Discarding BLOQ Values
For the specific study, rules for handling the BLQs may be different depending on the time point in the PK profile, the measured compound (with or without endogenous concentrations), the single dose or multiple doses, study design (single dose, parallel, crossover). No matter what the rules are, they need to be specified (preferably pre-specified before the study unblinding if it is pivotal study and the PK analysis results are the basis for regulatory approval) in the statistical analysis plan (SAP) or PK analysis plan (PKAP).   

Here are two examples with descriptions of the BLQ handling rules. In a phase I study by Shire, the BLQ handling rules are specified as the following: 


In a phase I study by Emergent Product Development, the BLQ rules are described as the following:


REFERENCES:

Sunday, January 17, 2021

Arithmetic mean, geometric mean, harmonic mean, least square mean, and trimmed mean

In statistics, a central tendency is a central or typical value for data distribution. Mean (or average) is commonly used to measure the central tendency. However, depending on the data distribution or the special situation, different types of Mean may be used: arithmetic mean, geometric mean, least-squares mean, harmonic mean, and trimmed mean.

The most common Mean is the arithmetic mean. If we say ‘Mean’, it is the default for arithmetic mean.

Arithmetic Mean is calculated as the sum of all measurements (all observations) divided by the number of observations in the data set.


Geometric Mean is the nth root of the product of the data values, where there are n of these. This measure is valid only for data that are measured absolutely on a strictly positive scale. Geometric mean is often used in the data that follows the log-normal distribution (for example, the pharmacokinetics drug concentration data, the antibody titer data...). 

In practice, geometric mean is usually calculated with the following three steps:
  • log-transform the original data
  • calculate the arithmetic mean of the log-transformed data
  • back transform the calculated value to the original scale
Harmonic Mean is the reciprocal the arithmetic mean of the reciprocals of the data values. This measure too is valid only for data that are measured absolutely on a strictly positive scale.

The harmonic mean are calculated with the following steps:

  • Add the reciprocals of the numbers in the set. To find a reciprocal, flip the fraction so that the numerator becomes the denominator and the denominator becomes the numerator. For example, the reciprocal of 6/1 is 1/6.
  • Divide the answer by the number of items in the set.
  • Take the reciprocal of the result.
The harmonic mean is not often used in day-to-day statistics but is quite often used in some statistical formula. For example, for two-group t-statistics with unequal sample size in two groups, the t value can be calculated using the following formula with harmonic mean to measure the average sample size.


Least Squares Mean is a mean estimated from a linear model. Least squares means are adjusted for other terms in the model (like covariates), and are less sensitive to missing data. Theoretically, they are better estimates of the true population mean.

In a previous post "Least squares means (marginal means) vs. means", the calculation of least squares mean is compared with the arithmetic mean.
In analyses of clinical trial data, the least-squares mean is more frequently used than the arithmetic mean since it is calculated from the analysis model (for example, analysis of variance, analysis of covariance,...). The difference between two least-squares means is called the ratio of geometric least-squares means (or geometric least-squares mean ratio) - along with its 90% confidence intervals - is the common approach for assessing the bioequivalence. 

Trimmed Mean may also be called truncated mean and is the arithmetic mean of data values after a certain number or proportion of the highest and/or lowest data values have been discarded. The data values to be discarded can be one-sided or two-sided. 

The key for trimmed mean calculation is to determine the percentage of data to be discarded and whether or not the data to be discarded is one-sided or two-sided. The percentage of data to be discarded may be tied to the percentage of missing data. 

Trimmed mean can be calculated and then used to fill in the missing data - a single imputation method for handling the missing data. Trimmed mean as a single imputation method for missing data has its limitations, but it is still used in analyses of clinical trials - usually for sensitivity analyses.

In ICH E9-R1 "Addendum on Estimands and Sensitivity Analysis in Clinical Trials" training material, about the composite strategy to handle the intercurrent event, trimmed mean is mentioned to be an approach in handling the intercurrent event. 
 


Monday, January 11, 2021

Single Imputation Methods for Missing Data: LOCF, BOCF, LRCF (Last Rank Carried Forward), and NOCB (Next Observation Carried Backward)

The missing data is always an issue when analyzing the data from clinical trials. The missing data handling has been moved toward the model-based approaches (such as multiple imputation and mixed model repeated measures (MMRM)). The single imputation methods, while being heavily criticized and cast out, remain as practical approaches for handling the missing data, especially for sensitivity analyses.

Single imputation methods replace a missing data point by a single value and analyses are conducted as if all the data were observed. The single value used to fill in the missing observation is usually coming from the observed values from the same subject - Last Observation Carried Forward (LOCF), Baseline Observation Carried Forward, and Next Observation Carried Backward (NOCB, the focus of this post). The single value used to fill in the missing observation can also be derived from other sources: Last Rank Carried Forward (LRCF), Best or Worst Case Imputation (assigning the worst possible value of the outcome to dropouts for a negative reason (treatment failure) and the best possible value to positive dropouts (cures)), Mean value imputation, trimmed mean,…Single imputation approaches also include regression imputation, which imputes the predictions from a regression of the missing variables on the observed variables; and hot deck imputation, which matches the case with missing values to a case with values observed that is similar with respect to observed variables and then imputes the observed values of the respondent.

In this post, we discussed the single imputation method of LOCF, BOCF, LRCF, and NOCB (the focus of this post). 

Last Observation Carried Forward (LOCF): A single imputation technique that imputes the last measured outcome value for participants who either drop out of a clinical trial or for whom the final outcome measurement is missing. LOCF is usually used in the longitudinal study design where the outcome is measured repeatedly at pre-specified intervals. LOCF usually requires there is at least one post-baseline measure. The LOCF is the widely used single imputation method.

Baseline Observation Carried Forward (BOCF): A single imputation technique that imputes the baseline outcome value for participants who either drop out of a clinical trial or for whom the final outcome measurement is missing. BOCF is usually used in a study design with perhaps only one post-baseline measure (i.e., the outcome is only measured at the baseline and at the end of the study).

Last Rank Carried Forward (LRCF): The LRCF method carries forward the rank of the last observed value at the corresponding visit to the last visit and is the non-parametric version of LOCF. However, unlike the LOCF that is based on the observation from the same subject, for the LRCF method, the ranks come from all subjects with non-missing observations at a specific visit.  From the early visits to the later visits, the number of missing values will be different, the constant ranking, carried forward, and re-ranking will be needed. Here are some good references for LRCF:

LRCF is thought to have the following features:

In a paper by Jing et al, the LRCF was used for missing data imputation: 

"...The last rank carried forward or last observation carried forward was assigned to patients who withdrew prematurely from the study or study drug for other reasons or who did not perform the 6-minute walk test for any reason not mentioned above (eg, missed visit), provided that the patient performed at least 1 postbaseline 6-minute walk test.
Next Observation Carried Backward (NOCB): NOCB is a similar approach to LOCF but works in the opposite direction by taking the first observation after the missing value and carrying it backward. NOCB may also be called Next Value Carried Backward (NVCB) or Last Observation Carried Backward (LOCB).

NOCB may be useful in handling the missing data arising from the external control group, from Real-World Data (RWD), Electronic health records (EHRs) where the outcome data collection is usually not structured and not according to the pre-specified visit schedule. 

I can foresee that the NOCB may also be an approach in handing the missing data due to the COVID-19 pandemic. Due to the COVID-19 pandemic, subjects may not be able to come to the clinic for the outcome measure at the end of the study. The outcome measure may be performed at a later time beyond the visit window allowance. Instead of having a missing observation for the end of the study visit, the NOCB approach can be applied to carry the next available outcome measure backward. 

The NOCB approach, while not popular, can be found in some publications and regulatory approval documents. Here are some examples: 


In an article by Wyles et al (2015, NEJM) Daclatasvir plus Sofosbuvir for HCV in Patients Coinfected with HIV-1, "Missing response data at post-treatment week 12 were inferred from the next available HCV RNA measurement with the use of a next-value-carried-backward approach."

In BLA 761052 of Brineura (cerliponase alfa) Injection Indication(s) for Late-Infantile Neuronal Ceroid Lipofuscinosis Type 2 (CLN2)- Batten Disease, the NOCB was used to handle the missing data for comparison to the data from a natural history study. 

Because intervals between clinical visits vary a lot in Study 901, the agency recommended performing analyses using both the last available Motor score and next observation carried backward (NOCB) for the intermediate data points although the former one is determined as the primary. 

In FDA Briefing Document for Endocrinologic and Metabolic Drugs Advisory Committee Meeting for NDA 210645, Waylivra (volanesorsen) injection for the treatment of familial chylomicronemia syndrome, NOCF was used as one of the sensitivity analyses:

Similar planned (prespecified) analyses using different variables, such as slightly different endpoint definitions (e.g. worst maximum pain intensity versus average maximum pain intensity), or imputation methods for missing data (next observation carried backward versus imputation of zero for missing values) did not demonstrate treatment differences.

 Missing values were pre-specified to be imputed using Next Observation Carried Back (NOCB); i.e., if a patient did not complete the questionnaire for several weeks, the next value entered was assumed to have occurred during all intervening (missing) weeks.

 Missing data for any post-baseline visit will be imputed by using Next Observation Carried Back (NOCB) if there is a subsequent score available. Missing data after the last available score of each patient will not be imputed.

in NDA 212157 of Celecoxib Oral Solution for Treatment of acute migraine, the NOCB was used for sensitivity analysis

Headache Pain Freedom at 2 hours - Sensitivity Analysis

To analyze the missing data for the primary endpoint, Dr. Ling performed an analysis analyzing patients who took rescue medications as nonresponders and then also imputing missing data at the 2-hour time point using the next available time point of information (Next Observation Carried Backward (NOCB)) or a worst-case type of imputation (latter not shown in table).

Single imputation methods are generally not recommended for the primary analysis because of the following disadvantages (issues): 

  • Single imputation usually does no provides an unbiased estimate
  • Inferences (tests and confidence intervals) based on the filled-in data can be distorted by bias if the assumptions underlying the imputation method are invalid
  • Statistical precision is overstated because the imputed values are assumed to be true.
  • Single imputation methods risk biasing the standard error downwards by ignoring the uncertainty of imputed values. Therefore, the confidence intervals for the treatment effect calculated using single imputation methods may be too narrow and give an artificial impression of precision that does not really exist.  
  • the single imputation method such as LOCF, NOCB, and BOCF do not reflect MAR (missing at random) data mechanisms.

Further Readings:

Monday, January 04, 2021

Synthetic Control Arm (SCA), External Control, Historical Control

Lately, the term 'synthetic control' or 'synthetic control arm' or SCA, in short, is becoming popular - it is mainly driven by the desire to design more efficient clinical trials that are not traditional, the golden standard RCT (randomized controlled trials) with a concurrent control group. 

In a previous post, I compared historical control versus external control in clinical trials. The subtle difference is mainly in the time element. Historical control is one type of external control, but the reverse is not true. External control can be historical control or contemporaneous control. For example, in a clinical trial to assess the efficacy and safety of the donor lung preserved using ex-vivo lung perfusion (EVLP) technique, the EVLP lung transplantation cohort was compared to a contemporaneous (not concurrent) control cohort that was formed through the matched control from the traditional lung transplantation patients.   

Then what is 'synthetic control' or 'synthetic control arm'?

Synthetic control arm is the use of synthetic data as a control arm in clinical trials. According to an article "Synthetic data in the civil service" in the latest issue of SIGNIFICANCE, synthetic data is defined as "artificially generated data that are modelled on real data, with the same structure and properties as the original data, except that they do not contain any real or specific information about individuals. The goal of synthetic data generation is to create a realistic copy of the real data set, carefully maintaining the nuances of the original data, but without compromising important pieces of personal information."

Synthetic control arm is a control arm generated through existing data resources representing normal patient statistics. Synthetic control arm can serve as a comparator for a single-arm clinical trial or augment the smaller concurrent control group (for example with active:control ratio of 3:1 or 4:1) in RCTs. 

In a presentation by at Harvard Medical School Executive Education Webinar Series,  Mr. Chatterjee presented "Synthetic Control Arms in Clinical Trials and Regulatory Applications" and he defined the 'synthetic control arm' as the following:

In a paper by Thorlund et al "Synthetic and External Controls in Clinical Trials – A Primer for Researchers", they stated that synthetic control arms are external control arms - two terms can be used interchangeably:
External control arms are also called “synthetic” control arms as they are not part of the original concurrent patient sample that would have been randomized into the experimental or the control treatment arms as in a traditional RCT. External controls can take many forms. For example, external control arms can be established using aggregated or pooled data from placebo/control arms in completed RCTs or using RWD (Real World Data) and pharmacoepidemiological methods. Pooled data from historical RCTs can serve as external controls depending on the availability of selected “must have” data, similarity of patients, recency and relevancy of experimental treatments that were tested, availability and similarity of relevant endpoints (eg, operational definitions and assessments), and similarity of other important study procedures that were conducted in these historical trials. It is important to note that using control data from historical RCTs still results in a nonrandomized comparison but has the advantage of standardized data collection in a trial setting and patients who enroll in clinical trials may have more similar characteristics than those who do not.

However, I think that there are subtle differences between these two terms. With 'synthetic' control arms, the term 'synthetic' implies there are some selection, manipulation, derivation, matching, pooling, borrowing from the source data. Just like the meta-analysis is also called research synthesis and requires the statistical approaches to combine the results from multiple scientific studies, the 'synthetic' control also requires the use of statistical approaches to process the data from multiple sources to form a control group to replace the concurrent control in traditional RCT clinical trials. 

The source data for constructing synthetic control can be the data from previous RCT clinical trials, real-world data, registry data, data from natural history studies, electronic health records, ... The source data must be the subject-level data, not the summary or aggregate data. 

ICH E10 "CHOICE OF CONTROL GROUP AND RELATED ISSUES IN CLINICAL TRIALS" included "External Control (including Historical Control)" as one of the options as the control groups in clinical trials. The external control here is not the same as synthetic control. 

1.3.5 External Control (Including Historical Control)
An externally controlled trial compares a group of subjects receiving the test treatment with a group of patients external to the study, rather than to an internal control group consisting of patients from the same population assigned to a different treatment. The external control can be a group of patients treated at an earlier time (historical control) or a group treated during the same time period but in another setting. The external control may be defined (a specific group of patients) or nondefined (a comparator group based on general medical knowledge of outcome). Use of this latter comparator is particularly treacherous (such trials are usually considered uncontrolled) because general impressions are so often inaccurate. So-called baseline controlled studies, in which subjects' status on therapy is compared with status before therapy (e.g., blood pressure, tumor size), have no internal control and are thus uncontrolled or externally controlled.  

How to Create a Synthetic Control Arm? 

The first step of creating a synthetic control arm is to harmonize the source data. The data from different sources or from different clinical trials should be standardized so that they can be used for the synthesis process. 

Various statistical approaches can be used to create a synthetic control arm. In an audiobook on synthetic control arms by Cytel, propensity scoring and Bayesian Dynamic Borrowing methods were discussed. 

The synthetic control arm can be considered as an approach of 'borrowing control' - i.e., some controls are borrowed from historical data. There are numerous options for borrowing controls: 

  • Pooling: adds historical controls to randomized controls 
  • Performance criterion: uses historical data to define performance criterion for current, treated-only trial to beat 
  • Test then pool: test if controls sufficiently similar for pooling 
  • Power priors: historical control discounted when added to randomized controls
  • Hierarchical modeling: variation between current vs. historical data is modeled in Bayesian fashion 

In the article by Thorlund et al, the pros and cons of different methods for generating synthetic control arms were discussed. 


In Mr Chatterjee presentation, "Synthetic Control Arms in Clinical Trials and Regulatory Applications", there is a diagram to describe the process for creating a synthetic control arm. 


Even though the synthetic control arms, the use of real-world data, conducting the single-arm clinical trials are very appealing, the challenges are ahead and the regulatory acceptance is uncertain. There may be limited use in special cases (such as ultra-rare diseases, pediatric clinical trials) and for post-marketing activities (such as label expansion, label modification, post-marketing studies), but not in prime time to replace the concurrent control in traditional RCTs. 

In an article at Statnews.com "Synthetic control arms can save time and money in clinical trials", 

Even with the FDA making the use of real-world data a strategic priority, synthetic control arms can’t be used across the board to replace control arms. Synthetic control arms require that the disease is predictable (think idiopathic pulmonary fibrosis) and that its standard of care is well-defined and stable. That certainly isn’t the case for every disease.

It’s also important to consider that even when information is available from real-world data sources, it may be difficult to extract or of low quality. Routinely captured health care data, such as electronic health records, are typically siloed, fragmented, and unstructured. They are also often incomplete and difficult to access. New tools and methodologies are needed to consolidate, organize, and structure real-world data to generate research-grade evidence and ensure that confounding variables are accounted for in analyses. Analytic techniques such as natural language processing and machine learning will be needed to extract relevant information from structured and unstructured data.

The same view is also expressed in a Pink Sheet article "External Control Arms: Better Than Single-Arm Studies But No Replacement For Randomization".

Synthetic control group derived from historical clinical trial data could augment smaller randomized trials and yield better information than single-arm studies, but this approach should not be viewed as a substitute for randomized trials where feasible

ADDITIONAL REFERENCES: