Saturday, August 01, 2015

Missing Data Mechanisms/Assumptions and the Corresponding Imputation Methods

Missing data is one of the classical issues in clinical trials and biostatistics. Since the National Research Council's report on missing data is issued in 2010, the paradigm has been shifted to the prevention of the missing data. Even the prevention has been given the great emphasis, the missing data is still inevitable in pretty much any clinical trial. When analyzing a clinical trial with the missing data, it is common that various sensitivity analyses need to be performed to see how the study result is robust to the handling of the missing data. Handling of the missing data depends on the assumptions. 

Missing Data Assumptions and the Corresponding Imputation Methods 

No assumption

Missing Complete at Random
Missing at Random – ignorability assumption
Missing Not at Random

The missingness is independent of both unobserved and observed data.

The probability of missingness is the same for all units.
Conditional on the observed data, the missingness is independent of the unobserved measurements.

The probability a variable is missing depends only on available information.
Not MCAR or MAR.

Missingness that depends on unobserved predictors.

Missingness is no longer at random if it depends on information that has not been recorded and this information also predicts the missing values.

Missingness that depends on the missing value itself

LOCF (last observation carried forward)

BOCF (baseline value carried forward)

WOCF (worst observation carried forward)

Imputation based on logical rules
CC (Complete-case Analysis) - listwise deletion

Pairwise Deletion

Available Case analysis

Single-value Imputation (for example, mean replacement, regression prediction (conditional mean imputation), regression prediction plus error (stochastic regression imputation )

– under MCAR, throwing out cases with missing data does not bias your inferences. However, there are many drawbacks
Maximum Likelihood using the EM algorithm – FIML (full information maximum likelihood)

MMRM (mixed model repeated measurement) – REML (restricted maximum likelihood)

Multiple Imputation

Two assumptions: the joint distribution of the data is multivariate normal and the missing data mechanism is ignorable

Under MAR, it is acceptable to exclude the missing cases, as long as the regression controls for all the variables that affect the probability of missingness
PMM (Pattern-mixture modeling)
Jump to Reference
Last Mean Carried Forward.
Copy Differences in Reference
Copy Reference

Tipping Point Approach

Selection model (Heckman)

Web resources are available in discussing the missing data and the handling of the missing data. Some of the recent materials are listed below. For people who are using SAS, SAS procedures MI and MIANALYZE are handy for use in performing the multiple imputation and pattern mixture model:

No comments: