Sampling weights (also known as survey weights) frequently appear
when using large, representative datasets. They are required to ensure
any estimated quantities generalize to a target population defined by
the weights. Evidence suggests that sampling weights need to be
incorporated into a propensity score matching analysis to obtain valid
and unbiased estimates of the treatment effect in the sampling weighted
population (DuGoff,
Schuler, and Stuart 2014; Austin, Jembere,
and Chiu 2016; Lenis et al.
2019). In this guide, we demonstrate how to use sampling
weights with `MatchIt`

for propensity score estimation,
balance assessment, and effect estimation. Fortunately, doing so is not
complicated, but some care must be taken to ensure sampling weights are
incorporated correctly. It is assumed one has read the other vignettes
explaining matching (`vignette("matching-methods")`

), balance
assessment (`vignette("assessing-balance")`

), and effect
estimation (`vignette("estimating-effects")`

.

We will use the same simulated toy dataset used in
`vignette("estimating-effects")`

except with the addition of
a sampling weights variable, `SW`

, which is used to
generalize the sample to a specific target population with a
distribution of covariates different from that of the sample. Code to
generate the covariates, treatment, and outcome is at the bottom of
`vignette("estimating-effects")`

and code to generate the
sampling weights is at the end of this document. We will consider the
effect of binary treatment `A`

on continuous outcome
`Y_C`

, adjusting for confounders
`X1`

-`X9`

.

```
## A X1 X2 X3 X4 X5 X6 X7 X8 X9 Y_C SW
## 1 0 0.1725 -1.4283 -0.4103 -2.36059 1 -1.1199 0.6398 -0.4840 -0.59385 -3.5907 1.675
## 2 0 -1.0959 0.8463 0.2456 -0.12333 1 -2.2687 -1.4491 -0.5514 -0.31439 -1.5481 1.411
## 3 0 0.1768 0.7905 -0.8436 0.82366 1 -0.2221 0.2971 -0.6966 -0.69516 6.0714 2.332
## 4 0 -0.4595 0.1726 1.9542 -0.62661 1 -0.4019 -0.8294 -0.5384 0.20729 2.4906 1.644
## 5 1 0.3563 -1.8121 0.8135 -0.67189 1 -0.8297 1.7297 -0.6439 -0.02648 -0.6687 2.722
## 6 0 -2.4313 -1.7984 -1.2940 0.04609 1 -1.2419 -1.1252 -1.8659 -0.56513 -9.8504 14.773
```

When using sampling weights with propensity score matching, one has the option of including the sampling weights in the model used to estimate the propensity scores. Although evidence is mixed on whether this is required (Austin, Jembere, and Chiu 2016; Lenis et al. 2019), it can be a good idea. The choice should depend on whether including the sampling weights improves the quality of the matches. Specifications including and excluding sampling weights should be tried to determine which is preferred.

To supply sampling weights to the propensity score-estimating
function in `matchit()`

, the sampling weights variable should
be supplied to the `s.weights`

argument. It can be supplied
either as a numerical vector containing the sampling weights, or a
string or one-sided formula with the name of the sampling weights
variable in the supplied dataset. Below we demonstrate including
sampling weights into propensity scores estimated using logistic
regression for optimal full matching for the average treatment effect in
the population (ATE) (note that all methods and steps apply the same way
to all forms of matching and all estimands).

```
mF_s <- matchit(A ~ X1 + X2 + X3 + X4 + X5 +
X6 + X7 + X8 + X9, data = d,
method = "full", distance = "glm",
estimand = "ATE", s.weights = ~SW)
mF_s
```

```
## A matchit object
## - method: Optimal full matching
## - distance: Propensity score
## - estimated with logistic regression
## - sampling weights included in estimation
## - number of obs.: 2000 (original), 2000 (matched)
## - sampling weights: present
## - target estimand: ATE
## - covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9
```

Notice that the description of the matching specification when the
`matchit`

object is printed includes lines indicating that
the sampling weights were included in the estimation of the propensity
score and that they are present in the `matchit`

object. It
is stored in the `s.weights`

component of the
`matchit`

object. Note that at this stage, the matching
weights (stored in the `weights`

component of the
`matchit`

object) do not incorporate the sampling weights;
they are calculated simply as a result of the matching.

Now let’s perform full matching on a propensity score that does not
include the sampling weights in its estimation. Here we use the same
specification as was used in
`vignette("estimating-effects")`

.

```
mF <- matchit(A ~ X1 + X2 + X3 + X4 + X5 +
X6 + X7 + X8 + X9, data = d,
method = "full", distance = "glm",
estimand = "ATE")
mF
```

```
## A matchit object
## - method: Optimal full matching
## - distance: Propensity score
## - estimated with logistic regression
## - number of obs.: 2000 (original), 2000 (matched)
## - target estimand: ATE
## - covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9
```

Notice that there is no mention of sampling weights in the
description of the matching specification. However, to properly assess
balance and estimate effects, we need the sampling weights to be
included in the `matchit`

object, even if they were not used
at all in the matching. To do so, we use the function
`add_s.weights()`

, which adds sampling weights to the
supplied `matchit`

objects.

```
## A matchit object
## - method: Optimal full matching
## - distance: Propensity score
## - estimated with logistic regression
## - sampling weights not included in estimation
## - number of obs.: 2000 (original), 2000 (matched)
## - sampling weights: present
## - target estimand: ATE
## - covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9
```

Now when we print the `matchit`

object, we can see lines
have been added identifying that sampling weights are present but they
were not used in the estimation of the propensity score used in the
matching.

Note that not all methods can involve sampling weights in the
estimation. Only methods that use the propensity score will be affected
by sampling weights; coarsened exact matching or Mahalanobis distance
optimal pair matching, for example, ignore the sampling weights, and
some propensity score estimation methods, like `randomForest`

and `bart`

(as presently implemented), cannot incorporate
sampling weights. Sampling weights should still be supplied to
`matchit()`

even when using these methods to avoid having to
use `add_s.weights()`

and remembering which methods do or do
not involve sampling weights.

Now we need to decide which matching specification is the best to use
for effect estimation. We do this by selecting the one that yields the
best balance without sacrificing remaining effective sample size.
Because the sampling weights are incorporated into the
`matchit`

object, the balance assessment tools in
`plot.matchit()`

and `summary.matchit()`

incorporate them into their output.

We’ll use `summary()`

to examine balance on the two
matching specifications. With sampling weights included, the balance
statistics for the unmatched data are weighted by the sampling weights.
The balance statistics for the matched data are weighted by the product
of the sampling weights and the matching weights. It is the product of
these weights that will be used in estimating the treatment effect.
Below we use `summary()`

to display balance for the two
matching specifications. No additional arguments to
`summary()`

are required for it to use the sampling weights;
as long as they are in the `matchit`

object (either due to
being supplied with the `s.weights`

argument in the call to
`matchit()`

or to being added afterward by
`add_s.weights()`

), they will be correctly incorporated into
the balance statistics.

```
##
## Call:
## matchit(formula = A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 +
## X9, data = d, method = "full", distance = "glm", estimand = "ATE",
## s.weights = ~SW)
##
## Summary of Balance for All Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
## distance 0.470 0.225 1.200 1.533 0.281 0.467
## X1 0.185 -0.120 0.287 1.309 0.105 0.171
## X2 -0.565 -0.202 -0.368 0.854 0.108 0.194
## X3 -0.059 -0.057 -0.003 0.739 0.041 0.083
## X4 0.852 0.150 0.680 1.016 0.180 0.293
## X5 0.567 0.713 -0.307 . 0.146 0.146
## X6 0.169 -0.008 0.171 1.014 0.048 0.103
## X7 0.378 -0.089 0.474 1.205 0.130 0.210
## X8 -0.376 -0.142 -0.216 1.190 0.071 0.129
## X9 0.088 -0.001 0.090 0.945 0.027 0.066
##
## Summary of Balance for Matched Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
## distance 0.317 0.292 0.125 1.026 0.035 0.088 0.007
## X1 -0.025 -0.030 0.004 1.298 0.034 0.084 0.956
## X2 -0.387 -0.278 -0.111 1.099 0.032 0.109 1.081
## X3 -0.083 -0.018 -0.070 0.639 0.047 0.105 1.124
## X4 0.386 0.335 0.049 1.198 0.023 0.081 0.904
## X5 0.666 0.668 -0.004 . 0.002 0.002 0.853
## X6 0.021 0.056 -0.035 1.093 0.028 0.086 1.165
## X7 0.152 0.028 0.126 1.169 0.027 0.075 1.066
## X8 -0.217 -0.223 0.005 1.355 0.022 0.059 0.996
## X9 0.001 0.038 -0.037 1.142 0.023 0.085 1.139
##
## Sample Sizes:
## Control Treated
## All (ESS) 987.1 177.1
## All 1559. 441.
## Matched (ESS) 513.1 126.5
## Matched 1559. 441.
## Unmatched 0. 0.
## Discarded 0. 0.
```

```
##
## Call:
## matchit(formula = A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 +
## X9, data = d, method = "full", distance = "glm", estimand = "ATE")
##
## Summary of Balance for Matched Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
## distance 0.282 0.276 0.034 0.953 0.024 0.077 0.008
## X1 -0.219 0.025 -0.229 1.252 0.055 0.119 0.944
## X2 -0.360 -0.256 -0.106 0.987 0.043 0.122 1.110
## X3 -0.044 -0.067 0.025 0.654 0.049 0.086 1.111
## X4 0.541 0.284 0.248 1.113 0.068 0.157 0.989
## X5 0.676 0.672 0.008 . 0.004 0.004 0.817
## X6 0.021 -0.014 0.034 1.056 0.025 0.075 1.125
## X7 0.029 0.032 -0.003 1.031 0.016 0.054 0.985
## X8 -0.324 -0.211 -0.104 1.276 0.024 0.085 0.979
## X9 -0.024 0.043 -0.068 0.860 0.037 0.119 1.083
##
## Sample Sizes:
## Control Treated
## All (ESS) 987.1 177.14
## All 1559. 441.
## Matched (ESS) 583.8 96.07
## Matched 1559. 441.
## Unmatched 0. 0.
## Discarded 0. 0.
```

The results of the two matching specifications are similar. Balance
appears to be slightly better when using the sampling weight-estimated
propensity scores than when using the unweighted propensity scores.
However, the effective sample size for the control group is larger when
using the unweighted propensity scores. Neither propensity score
specification achieves excellent balance, and more fiddling with the
matching specification (e.g., by changing the method of estimating
propensity scores, the type of matching, or the options used with the
matching) might yield a better matched set. For the purposes of this
analysis, we will move forward with the matching that used the sampling
weight-estimated propensity scores (`mF_s`

) because of its
superior balance. Some of the remaining imbalance may be eliminated by
adjusting for the covariates in the outcome model.

Note that had we not added sampling weights to `mF`

, the
matching specification that did not include the sampling weights, our
balance assessment would be inaccurate because the balance statistics
would not include the sampling weights. In this case, in fact, assessing
balance on `mF`

without incorporated the sampling weights
would have yielded radically different results and a different
conclusion. It is critical to incorporate sampling weights into the
`matchit`

object using `add_s.weights()`

even if
they are not included in the propensity score estimation.

Estimating the treatment effect after matching is straightforward
when using sampling weights. Effects are estimated in the same way as
when sampling weights are excluded, except that the matching weights
must be multiplied by the sampling weights to yield accurate,
generalizable estimates. `match.data()`

and
`get_matches()`

do this automatically, so the weights
produced by these functions already are a product of the matching
weights and the sampling weights. Note this will only be true if
sampling weights are incorporated into the `matchit`

object.

Below we estimate the effect of `A`

on `Y_C`

in
the matched and sampling weighted sample, adjusting for the covariates
to improve precision and decrease bias.

```
md_F_s <- match.data(mF_s)
fit <- lm(Y_C ~ A * (X1 + X2 + X3 + X4 + X5 +
X6 + X7 + X8 + X9), data = md_F_s,
weights = weights)
library("marginaleffects")
avg_comparisons(fit,
variables = "A",
vcov = ~subclass,
newdata = subset(md_F_s, A == 1),
wts = "weights")
```

```
##
## Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## A 1 - 0 1.69 0.34 4.97 <0.001 20.5 1.03 2.36
##
## Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
```

Note that `match.data()`

and `get_weights()`

have the option `include.s.weights`

, which, when set to
`FALSE`

, makes it so the returned weights do not incorporate
the sampling weights and are simply the matching weights. Because one
might to forget to multiply the two sets of weights together, it is
easier to just use the default of `include.s.weights = TRUE`

and ignore the sampling weights in the rest of the analysis (because
they are already included in the returned weights).
`avg_comparisons()`

also works more smoothly when the weights
supplied to `weights`

is a single variable rather than the
product of two.

```
#Generatng data similar to Austin (2009) for demonstrating
#treatment effect estimation with sampling weights
gen_X <- function(n) {
X <- matrix(rnorm(9 * n), nrow = n, ncol = 9)
X[,5] <- as.numeric(X[,5] < .5)
X
}
#~20% treated
gen_A <- function(X) {
LP_A <- - 1.2 + log(2)*X[,1] - log(1.5)*X[,2] + log(2)*X[,4] - log(2.4)*X[,5] +
log(2)*X[,7] - log(1.5)*X[,8]
P_A <- plogis(LP_A)
rbinom(nrow(X), 1, P_A)
}
# Continuous outcome
gen_Y_C <- function(A, X) {
2*A + 2*X[,1] + 2*X[,2] + 2*X[,3] + 1*X[,4] + 2*X[,5] + 1*X[,6] + rnorm(length(A), 0, 5)
}
#Conditional:
# MD: 2
#Marginal:
# MD: 2
gen_SW <- function(X) {
e <- rbinom(nrow(X), 1, .3)
1/plogis(log(1.4)*X[,2] + log(.7)*X[,4] + log(.9)*X[,6] + log(1.5)*X[,8] + log(.9)*e +
-log(.5)*e*X[,2] + log(.6)*e*X[,4])
}
set.seed(19599)
n <- 2000
X <- gen_X(n)
A <- gen_A(X)
SW <- gen_SW(X)
Y_C <- gen_Y_C(A, X)
d <- data.frame(A, X, Y_C, SW)
```

Austin, Peter C., Nathaniel Jembere, and Maria Chiu. 2016.
“Propensity Score Matching and Complex Surveys.”
*Statistical Methods in Medical Research* 27 (4): 1240–57. https://doi.org/10.1177/0962280216658920.

DuGoff, Eva H., Megan Schuler, and Elizabeth A. Stuart. 2014.
“Generalizing Observational Study Results: Applying Propensity
Score Methods to Complex Surveys.” *Health Services
Research* 49 (1): 284–303. https://doi.org/10.1111/1475-6773.12090.

Lenis, David, Trang Quynh Nguyen, Nianbo Dong, and Elizabeth A. Stuart.
2019. “It’s All about Balance: Propensity Score
Matching in the Context of Complex Survey Data.”
*Biostatistics* 20 (1): 147–63. https://doi.org/10.1093/biostatistics/kxx063.