Monday, January 04, 2021

Synthetic Control Arm (SCA), External Control, Historical Control

Lately, the term 'synthetic control' or 'synthetic control arm' or SCA, in short, is becoming popular - it is mainly driven by the desire to design more efficient clinical trials that are not traditional, the golden standard RCT (randomized controlled trials) with a concurrent control group. 

In a previous post, I compared historical control versus external control in clinical trials. The subtle difference is mainly in the time element. Historical control is one type of external control, but the reverse is not true. External control can be historical control or contemporaneous control. For example, in a clinical trial to assess the efficacy and safety of the donor lung preserved using ex-vivo lung perfusion (EVLP) technique, the EVLP lung transplantation cohort was compared to a contemporaneous (not concurrent) control cohort that was formed through the matched control from the traditional lung transplantation patients.   

Then what is 'synthetic control' or 'synthetic control arm'?

Synthetic control arm is the use of synthetic data as a control arm in clinical trials. According to an article "Synthetic data in the civil service" in the latest issue of SIGNIFICANCE, synthetic data is defined as "artificially generated data that are modelled on real data, with the same structure and properties as the original data, except that they do not contain any real or specific information about individuals. The goal of synthetic data generation is to create a realistic copy of the real data set, carefully maintaining the nuances of the original data, but without compromising important pieces of personal information."

Synthetic control arm is a control arm generated through existing data resources representing normal patient statistics. Synthetic control arm can serve as a comparator for a single-arm clinical trial or augment the smaller concurrent control group (for example with active:control ratio of 3:1 or 4:1) in RCTs. 

In a presentation by at Harvard Medical School Executive Education Webinar Series,  Mr. Chatterjee presented "Synthetic Control Arms in Clinical Trials and Regulatory Applications" and he defined the 'synthetic control arm' as the following:

In a paper by Thorlund et al "Synthetic and External Controls in Clinical Trials – A Primer for Researchers", they stated that synthetic control arms are external control arms - two terms can be used interchangeably:
External control arms are also called “synthetic” control arms as they are not part of the original concurrent patient sample that would have been randomized into the experimental or the control treatment arms as in a traditional RCT. External controls can take many forms. For example, external control arms can be established using aggregated or pooled data from placebo/control arms in completed RCTs or using RWD (Real World Data) and pharmacoepidemiological methods. Pooled data from historical RCTs can serve as external controls depending on the availability of selected “must have” data, similarity of patients, recency and relevancy of experimental treatments that were tested, availability and similarity of relevant endpoints (eg, operational definitions and assessments), and similarity of other important study procedures that were conducted in these historical trials. It is important to note that using control data from historical RCTs still results in a nonrandomized comparison but has the advantage of standardized data collection in a trial setting and patients who enroll in clinical trials may have more similar characteristics than those who do not.

However, I think that there are subtle differences between these two terms. With 'synthetic' control arms, the term 'synthetic' implies there are some selection, manipulation, derivation, matching, pooling, borrowing from the source data. Just like the meta-analysis is also called research synthesis and requires the statistical approaches to combine the results from multiple scientific studies, the 'synthetic' control also requires the use of statistical approaches to process the data from multiple sources to form a control group to replace the concurrent control in traditional RCT clinical trials. 

The source data for constructing synthetic control can be the data from previous RCT clinical trials, real-world data, registry data, data from natural history studies, electronic health records, ... The source data must be the subject-level data, not the summary or aggregate data. 

ICH E10 "CHOICE OF CONTROL GROUP AND RELATED ISSUES IN CLINICAL TRIALS" included "External Control (including Historical Control)" as one of the options as the control groups in clinical trials. The external control here is not the same as synthetic control. 

1.3.5 External Control (Including Historical Control)
An externally controlled trial compares a group of subjects receiving the test treatment with a group of patients external to the study, rather than to an internal control group consisting of patients from the same population assigned to a different treatment. The external control can be a group of patients treated at an earlier time (historical control) or a group treated during the same time period but in another setting. The external control may be defined (a specific group of patients) or nondefined (a comparator group based on general medical knowledge of outcome). Use of this latter comparator is particularly treacherous (such trials are usually considered uncontrolled) because general impressions are so often inaccurate. So-called baseline controlled studies, in which subjects' status on therapy is compared with status before therapy (e.g., blood pressure, tumor size), have no internal control and are thus uncontrolled or externally controlled.  

How to Create a Synthetic Control Arm? 

The first step of creating a synthetic control arm is to harmonize the source data. The data from different sources or from different clinical trials should be standardized so that they can be used for the synthesis process. 

Various statistical approaches can be used to create a synthetic control arm. In an audiobook on synthetic control arms by Cytel, propensity scoring and Bayesian Dynamic Borrowing methods were discussed. 

The synthetic control arm can be considered as an approach of 'borrowing control' - i.e., some controls are borrowed from historical data. There are numerous options for borrowing controls: 

  • Pooling: adds historical controls to randomized controls 
  • Performance criterion: uses historical data to define performance criterion for current, treated-only trial to beat 
  • Test then pool: test if controls sufficiently similar for pooling 
  • Power priors: historical control discounted when added to randomized controls
  • Hierarchical modeling: variation between current vs. historical data is modeled in Bayesian fashion 

In the article by Thorlund et al, the pros and cons of different methods for generating synthetic control arms were discussed. 


In Mr Chatterjee presentation, "Synthetic Control Arms in Clinical Trials and Regulatory Applications", there is a diagram to describe the process for creating a synthetic control arm. 


Even though the synthetic control arms, the use of real-world data, conducting the single-arm clinical trials are very appealing, the challenges are ahead and the regulatory acceptance is uncertain. There may be limited use in special cases (such as ultra-rare diseases, pediatric clinical trials) and for post-marketing activities (such as label expansion, label modification, post-marketing studies), but not in prime time to replace the concurrent control in traditional RCTs. 

In an article at Statnews.com "Synthetic control arms can save time and money in clinical trials", 

Even with the FDA making the use of real-world data a strategic priority, synthetic control arms can’t be used across the board to replace control arms. Synthetic control arms require that the disease is predictable (think idiopathic pulmonary fibrosis) and that its standard of care is well-defined and stable. That certainly isn’t the case for every disease.

It’s also important to consider that even when information is available from real-world data sources, it may be difficult to extract or of low quality. Routinely captured health care data, such as electronic health records, are typically siloed, fragmented, and unstructured. They are also often incomplete and difficult to access. New tools and methodologies are needed to consolidate, organize, and structure real-world data to generate research-grade evidence and ensure that confounding variables are accounted for in analyses. Analytic techniques such as natural language processing and machine learning will be needed to extract relevant information from structured and unstructured data.

The same view is also expressed in a Pink Sheet article "External Control Arms: Better Than Single-Arm Studies But No Replacement For Randomization".

Synthetic control group derived from historical clinical trial data could augment smaller randomized trials and yield better information than single-arm studies, but this approach should not be viewed as a substitute for randomized trials where feasible

ADDITIONAL REFERENCES:

1 comment:

OpenSource Research Collaboration said...

Synthetic data refers to artificially generated data that can be used for research and development purposes. This data is often created to mimic real-world data and can be used to train machine learning models, test algorithms, and perform various experiments in fields like computer vision and natural language processing. The advantage of using synthetic data is that it can be easily controlled and manipulated, making it an ideal tool for research and development in a controlled environment.