`tidyhte`

provides tidy semantics for estimation of
heterogeneous treatment effects through the use of Kennedy’s (n.d.) doubly-robust
learner.

The goal of `tidyhte`

is to use a sort of “recipe” design.
This should (hopefully) make it extremely easy to scale an analysis of
HTE from the common single-outcome / single-moderator case to many
outcomes and many moderators. The configuration of `tidyhte`

should make it extremely easy to perform the same analysis across many
outcomes and for a wide-array of moderators. It’s written to be fairly
easy to extend to different models and to add additional diagnostics and
ways to output information from a set of HTE estimates.

The best place to start for learning how to use `tidyhte`

are the vignettes which runs through example analyses from start to
finish: `vignette("experimental_analysis")`

and
`vignette("observational_analysis")`

. There is also a writeup
summarizing the method and implementation in
`vignette("methodological_details")`

.

You will be able to install the released version of tidyhte from CRAN with:

`install.packages("tidyhte")`

But this does not yet exist. In the meantime, install the development version from GitHub with:

```
# install.packages("devtools")
::install_github("ddimmery/tidyhte") devtools
```

To set up a simple configuration, it’s straightforward to use the Recipe API:

```
library(tidyhte)
library(dplyr)
basic_config() %>%
add_propensity_score_model("SL.glmnet") %>%
add_outcome_model("SL.glmnet") %>%
add_moderator("Stratified", x1, x2) %>%
add_moderator("KernelSmooth", x3) %>%
add_vimp(sample_splitting = FALSE) -> hte_cfg
```

The `basic_config`

includes a number of defaults: it
starts off the SuperLearner ensembles for both treatment and outcome
with linear models (`"SL.glm"`

)

```
%>%
data attach_config(hte_cfg) %>%
make_splits(userid, .num_splits = 12) %>%
produce_plugin_estimates(
outcome_variable,
treatment_variable,
covariate1, covariate2, covariate3, covariate4, covariate5, covariate6%>%
) construct_pseudo_outcomes(outcome_variable, treatment_variable) -> data
%>%
data estimate_QoI(covariate1, covariate2) -> results
```

To get information on estimate CATEs for a moderator not included previously would just require rerunning the final line:

```
%>%
data estimate_QoI(covariate3) -> results
```

Replicating this on a new outcome would be as simple as running the following, with no reconfiguration necessary.

```
%>%
data attach_config(hte_cfg) %>%
produce_plugin_estimates(
second_outcome_variable,
treatment_variable,
covariate1, covariate2, covariate3, covariate4, covariate5, covariate6%>%
) construct_pseudo_outcomes(second_outcome_variable, treatment_variable) %>%
estimate_QoI(covariate1, covariate2) -> results
```

This leads to the ability to easily chain together analyses across many outcomes in an easy way:

```
library("foreach")
%>%
data attach_config(hte_cfg) %>%
make_splits(userid, .num_splits = 12) -> data
foreach(outcome = list_of_outcomes, .combine = "bind_rows") %do% {
%>%
data produce_plugin_estimates(
outcome,
treatment_variable,
covariate1, covariate2, covariate3, covariate4, covariate5, covariate6%>%
) construct_pseudo_outcomes(outcome, treatment_variable) %>%
estimate_QoI(covariate1, covariate2) %>%
mutate(outcome = rlang::as_string(outcome))
}
```

The function `estimate_QoI`

returns results in a tibble
format which makes it easy to manipulate or plot results.

There are two main ways to get help:

If you have a problem, feel free to open an issue on GitHub. Please try to provide a minimal reproducible example. If that isn’t possible, explain as clearly and simply why that is, along with all of the relevant debugging steps you’ve already taken.

Support for the package will also be provided in the Experimentation Community Discord:

You are welcome to come in and get support for your usage in the
`tidyhte`

channel. Keep in mind that everyone is volunteering
their time to help, so try to come prepared with the debugging steps
you’ve already taken.