Matching and Weighting Multiply Imputed Datasets


One of the significant challenges in matching procedures is the occurrence of missing data on the covariates. Matching involves comparing the values of covariates for units in control and treated subgroups, or relying on predictions from a logistic regression model. When there are missing values in the covariates within the model, it becomes impossible to make a valid comparison or generate accurate predictions for that unit. To tackle this issue, several solutions have been proposed, including complete-case analysis. However, these approaches have their flaws and limitations. As a result, the adoption of algorithms for multiply imputing the missing data is gaining popularity as an alternative.

The mice and Amelia packages are recognized statistical tools for imputing missing data within the R. In combination with these packages, the MatchThem package streamlines the matching and weighting processes for multiply imputed datasets. It facilitates the credible implementation of matching and weighting approaches and methods in practical applications.


The MatchThem package can be installed from the Comprehensive R Archive Network (CRAN) repository:


And, the latest version of the package can be installed from GitHub:

devtools::install_github(repo = "FarhadPishgar/MatchThem")

Suggested Workflow

Implementing algorithms for multiple imputation of missing data, as well as the matching or weighting procedures, may appear complex at first. To simplify this process, a suggested workflow has been designed, consisting of five steps. For more detailed information, please refer to the package’s cheat sheet or vignette.

  1. Multiply Imputing of Missing Data in the Dataset: mice and Amelia packages are recommended for performing multiple imputation of missing data in the dataset.
  2. Matching or Weighting the Multiply Imputed Datasets: The matching procedure for selecting matched units from the control and treated subgroups of each imputed dataset or the weighting procedure can be accomplished using the matchthem() or weightthem() functions provided by the MatchThem package.
  3. Assessing Balance on Matched or Weighted Datasets: To evaluate the balance of all covariates in the multiply imputed datasets after matching or weighting, the cobalt package can be employed. It provides tools and functions specifically designed for assessing the extent of balance.
  4. Analyzing the Matched or Weighted Datasets: To estimate causal effects in each matched or weighted dataset, the with() function from the MatchThem package should be utilized. This function provides the necessary tools for conducting the analysis on the datasets.
  5. Pooling the Causal Effect Estimates: To combine the causal effect estimates obtained from analyzing each dataset, the pool() function from the MatchThem package should be employed. This function facilitates the pooling of the estimates to obtain an overall estimate of the causal effects.


The logo for this package, a trip to the Arctic, was designed by Max Josino. You can view and explore more of his work on his website and Dribble profile. We sincerely thank Max Josino for his kind contribution. This package relies on the functionality provided by the mice, MatchIt, and WeightIt packages.