Sunday, November 29, 2020

Handling of Missing Data: Comparison of MMRM (mixed model repeated measures) versus MI (multiple imputation)

Longitudinal study has become one of the most commonly adopted designs in clinical trials. Since the outcome measures are performed at various visits, it is usually the case that for some subjects in the study, the outcome measures will not be available at some visits (for example after subjects drop out from the study or lost-to-follow-up) - this is where the missing data issue arises. If the outcome measure is a continuous variable, the missing data issues can be handled implicitly through using the mixed-effects repeated measure (MMRM) models or explicitly through multiple imputations (MI).

Both MMRM and MI methods are based on the assumption of missing at random (MAR) and are model-based approaches suggested by EMA's Guideline on Missing Data in Confirmatory Clinical Trials and US National Research Council: The Prevention and Treatment of Missing Data in Clinical Trials. US FDA has not issued any guidance on handling the missing data in clinical trials, but generally follows the guidelines from the National Research Council. 

In terms of MMRM and MI, which one should be the primary method for handling the missing data? For a long time, it seems that in the US, the MMRM is the preferred method in handling the missing data and analyzing the longitudinal data with continuous outcome measures. The MI methods are generally used as sensitivity analyses to check the robustness of the primary analyses against the deviation from the MAR assumption. This can be observed by the article by Dr. Siddiqui in FDA "MMRM versus MI in Dealing with Missing Data - a Comparison Based on 25 NDA data sets" and many NDA / BLA reviews (listed below). 

FDA Statistical Review for NDA 210655 in the indication of Schizophrenia:
"The primary analysis was conducted on the change from baseline in the total PANSS score at Day 57 (primary time point) based on the ITT population. A mixed-effects model for repeated measures (MMRM) was used with treatment, visit, interaction of treatment and visit as fixed effects and the baseline total PANSS score as a covariate. Data from Days 15, 29, 43, and 57 were used. The unstructured covariance matrix was be used to model the within-subject variance-covariance errors."

"In addition to the model-based missing data approach of the MMRM model, the primary efficacy analysis was also analyzed using a pattern mixture model (PMM) and a multiple imputation approach as sensitivity analyses. "

FDA BLA 761037 Kevzara (sarilumab) in Treatment of rheumatoid arthritis
"The continuous HAQ-DI change from baseline at Week 16 was analyzed with a mixed model for repeated measures (MMRM). The repeated-measures analysis was based on the restricted maximum likelihood method assuming an unstructured covariance structure to model the within-subject errors. The model, including treatment, region, prior biologic use, visit (all visits from week 2 to week 16), and treatment-by-visit interaction as fixed effects and baseline as a covariate, was used to test the difference between each active treatment group versus placebo in the change from baseline in HAQ-DI at Week 16. The data collected after treatment discontinuation or rescue were set to missing. Therefore, the MMRM analysis assumed a missing-at-random (MAR) mechanism for missing data due to dropout and post-rescue data."
FDA NDA 203313/203314S-2 /S-3Tresiba;Ryzodeg 70/30Glycemic Control in Patients with Diabetes
The applicant used a mixed effect model for repeated measure (MMRM) to assess the efficacy of IDegAsp compared with IDet. The MMRM model included treatment, sex, region, age group and visits as factors and baseline as covariate, and interactions between visits and all factors and covariate. An unstructured covariance matrix was utilized for model fitting.

Multiple imputation was performed as sensitivity analysis
SNDA for Merck's Dulera in the treatment of asthma (2019)

"Missing Data Handling and Sensitivity Analyses The primary analysis incorporated a control-based multiple imputation of missing data. Missing data for subjects who discontinued treatment early were estimated using the MF group; that is, the change from baseline AM post-dose ppFEV1 in patients who discontinued treatment and missed study visits was assumed to be similar to the change from baseline in patients who continued study visits through Week 12 in the MF treatment group. The dataset was first multiply imputed to have monotone missing patterns, then for each visit, a regression method was used to impute for missing data on both study drug arm and the control arm based on trend from the control arm. After applying the control-based multiple imputation, the cLDA analysis was performed. MF/F 100/10 mcg BID was considered superior to MF 100 mcg BID with a p-value less than 0.05. "


EMA seems to have a different opinion about missing data handling using MMRM or MI. On several occasions, we have heard that EMA prefers the MI approach in handling the missing data especially the reference-baseline multiple imputation. They are moving towards developing the reference-based multiple imputation into the new standard missing data approach. 


Here is a table summarizing some comparisons between the MMRM and MI in handling the missing data. 

 

MMRM

MI

Missing data mechanism

MAR (missing at random)

Missing data imputation

Not imputed for individual missing values

But missing data is implicitly imputed

Individual missing values are explicitly imputed

# of steps for calculations

One step

At least three steps:

Imputation model to create multiple data sets with missing values filled in

Analysis model to analyze each imputed data set

Using Robin’s rule to combine results for inference

Analysis Model

Mixed model with Maximum likelihood-based method

Analysis of Covariance or Mixed model using maximum likelihood-based method

Data points used in analyses

Utilized all observed data points from all visits

Usually, with ANCOVA, only the data points for the corresponding visits (with imputed values) are used.

SAS procedure(s)

Proc Mixed

Imputation model: Proc MI

Analysis model: Proc Mixed, Proc GLM, Proc Genmod,…

Robin’s rule: Proc MIANALYZE

Results

The two approaches will be approximately equivalent, provided the variables used in the imputation model are the same as those included in the analysis model, and conditionals are accommodated by a single joint model. In such settings, MI essentially provides an approximation to the observed likelihood analysis. If an infinite number of imputations could be performed, then the two approaches would be equivalent. In practice, the level of equivalence will depend on the number of imputations due to the Monte Carlo (simulation) sampling variability of the imputation process (described in more detail below), thus will be stronger for a larger number of imputations.

Auxiliary variables

Can not be used

Auxiliary variables can be used in the imputation model to improve the accuracy of the missing data prediction

Information observed post-randomization

Can not be included in the MMRM model

Can be included in the imputation model to improve the accuracy of the missing data prediction and can’t be included in the analysis model (MI approach allows the differences in the covariates used in the imputation model and in analysis model

Justification of MAR assumption

Not available through MMRM model

Justification of MAR assumption can be performed through the tipping point approach or delta-based imputation

Handling the MNAR (missing not at random)

Not directly available through MMRM

Can be performed through PMM (pattern mixed model), reference-based or control-based multiple imputation

For studies with only one post-baseline measure

Not appropriate

Appropriate to use MI to impute the missing data and then run analysis of covariance model as the analysis model

For outcome measures that are not continuous variables

Like MMRM, there are statistical approaches that handle missing data without employing explicit imputation. As mentioned in the EMA guideline “For categorical responses and count data, the so-called marginal (e.g. generalized estimating equations (GEE)) and random-effects (e.g. generalized linear mixed models (GLMM)) approaches are in use. Likelihood-based methods (MMRM and GLMM) and some extended GEE (i.e. weighted GEE) models are applicable under MCAR and MAR assumptions.”

MI approach can be easily applied to the outcome measures that are categorical responses or count data with missing data. The analysis model may need to be PROC Logistics; PROC GLIMMIX, PROC NLMIXED, or

PROC GENMOD

Preferred by regulatory agencies

US FDA

but with multiple imputation approaches as sensitivity analyses (for example, reference-based MI, PMM, tipping point)

EMA

REFERENCES:


2 comments:

Anonymous said...

Very informative. Thanks for sharing!

Joakim Englund said...

Thank you Dr Deng for a very informative blog post (as always)! It very well mimics my own understanding of the topic as well.

However, I do wonder if it is reasonable to (as FDA might prefer) first do MMRM as the primary analysis and then MI with MMRM as the analysis model (now with imputed values) as an extra sensitivity analysis. The table you provide seems to indicate that this is an option (if I read it correctly). But would it be more logical/informative to compare a primary MMRM analysis with an MI-ANCOVA (for a specific timepoint)? What is your opinion on this?