policy_data

library(polle)

This vignette is a guide to policy_data(). As the name suggests, the function creates a policy_data object with a specific data structure making it easy to use in combination with policy_def(), policy_learn(), and policy_eval(). The vignette is also a guide to some of the associated S3 functions which transform or access parts of the data, see ?policy_data and methods(class="policy_data").

We will start by looking at a simple single-stage example, then consider a fixed two-stage example with varying actions sets and data in wide format, and finally we will look at an example with a stochastic number of stages and data in long format.

Single-stage: wide data

Consider a simple single-stage problem with covariates/state variables \((Z, L, B)\), binary action variable \(A\), and utility outcome \(U\). We use sim_single_stage() to simulate data:

(d <- sim_single_stage(n = 5e2, seed=1)) |> head()
#>            Z          L B A          U
#> 1  1.2879704 -1.4795962 0 1 -0.9337648
#> 2  1.6184181  1.2966436 0 1  6.7506026
#> 3  1.2710352 -1.0431352 0 1 -0.3377580
#> 4 -0.2157605  0.1198224 1 0  1.4993427
#> 5 -1.0671588 -1.3663727 0 1 -9.1718727
#> 6 -1.4469746 -0.4018530 0 0 -2.6692961

We give instructions to policy_data() which variables define the action, the state covariates, and the utility variable:

pd <- policy_data(d, action="A", covariates=list("Z", "B", "L"), utility="U")
pd
#> Policy data with n = 500 observations and maximal K = 1 stages.
#> 
#>      action
#> stage   0   1   n
#>     1 278 222 500
#> 
#> Baseline covariates:
#> State covariates: Z, B, L
#> Average utility: -0.98

In the single-stage case the history \(H\) is just \((B, Z, L)\). We access the history and actions using get_history():

get_history(pd)$H |> head()
#> Key: <id, stage>
#>       id stage          Z     B          L
#>    <int> <int>      <num> <num>      <num>
#> 1:     1     1  1.2879704     0 -1.4795962
#> 2:     2     1  1.6184181     0  1.2966436
#> 3:     3     1  1.2710352     0 -1.0431352
#> 4:     4     1 -0.2157605     1  0.1198224
#> 5:     5     1 -1.0671588     0 -1.3663727
#> 6:     6     1 -1.4469746     0 -0.4018530
get_history(pd)$A |> head()
#> Key: <id, stage>
#>       id stage      A
#>    <int> <int> <char>
#> 1:     1     1      1
#> 2:     2     1      1
#> 3:     3     1      1
#> 4:     4     1      0
#> 5:     5     1      1
#> 6:     6     1      0

Similarly, we access the utility outcomes \(U\):

get_utility(pd) |> head()
#> Key: <id>
#>       id          U
#>    <int>      <num>
#> 1:     1 -0.9337648
#> 2:     2  6.7506026
#> 3:     3 -0.3377580
#> 4:     4  1.4993427
#> 5:     5 -9.1718727
#> 6:     6 -2.6692961

Two-stage: wide data

Consider a two-stage problem with observations \(O = (B, BB, L_{1}, C_{1}, U_{1}, A_1, L_2, C_{2}, U_{2}, A_2, U_{3})\). Following the general notation introduced in Section 3.1 of (Nordland and Holst 2023), \((B,BB)\) are the baseline covariates, \(S_k =(L_{k, C_{k}})\) are the state covariates at stage k, \(A_{k}\) is the action at stage k, and \(U_k\) is the reward at stage \(k\). The utility is the sum of the rewards \(U=U_{1}+U_{2}+U_{3}\).

We use sim_two_stage_multi_actions() to simulate data:

d <- sim_two_stage_multi_actions(n=2e3, seed = 1)
colnames(d)
#>  [1] "B"   "BB"  "L_1" "C_1" "A_1" "L_2" "C_2" "A_2" "L_3" "U_1" "U_2" "U_3"

Note that the data is in wide format. The data is transformed using policy_data() with instructions on which variables define the actions, baseline covariates, state covariates, and the rewards:

pd <- policy_data(d,
                  action = c("A_1", "A_2"),
                  baseline = c("B", "BB"),
                  covariates = list(L = c("L_1", "L_2"),
                                    C = c("C_1", "C_2")),
                  utility = c("U_1", "U_2", "U_3"))
pd
#> Policy data with n = 2000 observations and maximal K = 2 stages.
#> 
#>      action
#> stage default   no  yes    n
#>     1       0 1017  983 2000
#>     2     769  826  405 2000
#> 
#> Baseline covariates: B, BB
#> State covariates: L, C
#> Average utility: 0.39

The length of the character vector action determines the number of stages K (in this case 2). If the number of stages is 2 or more, the covariates argument must be a named list. Each element must be a character vector with length equal to the number of stages. If a covariate is not available at a given stage we insert an NA value, e.g., L = c(NA, "L_2").

Finally, the utility argument must be a single character string (the utility is observed after stage K) or a character vector of length K+1 with the names of the rewards.

In this example, the observed action sets vary for each stage. get_action_set() returns the global action set and get_stage_action_sets() returns the action set for each stage:

get_action_set(pd)
#> [1] "default" "no"      "yes"
get_stage_action_sets(pd)
#> $stage_1
#> [1] "no"  "yes"
#> 
#> $stage_2
#> [1] "default" "no"      "yes"

The full histories \(H_1 = (B, BB, L_{1}, C_{1})\) and \(H_2=(B, BB, L_{1}, C_{1}, A_{1}, L_{2}, C_{2})\) are available using get_history() and full_history = TRUE:

get_history(pd, stage = 1, full_history = TRUE)$H |> head()
#> Key: <id, stage>
#>       id stage        L_1        C_1          B     BB
#>    <int> <num>      <num>      <num>      <num> <char>
#> 1:     1     1  0.9696772  1.7112790 -0.6264538 group2
#> 2:     2     1 -2.1994065 -2.6431237  0.1836433 group1
#> 3:     3     1  1.9480938  2.0619342 -0.8356286 group2
#> 4:     4     1  0.1798532  1.0066957  1.5952808 group2
#> 5:     5     1  0.4150568  0.1538534  0.3295078 group2
#> 6:     6     1  0.6468405 -0.0982121 -0.8204684 group3
get_history(pd, stage = 2, full_history = TRUE)$H |> head()
#> Key: <id, stage>
#>       id stage    A_1        L_1        L_2        C_1        C_2          B
#>    <int> <num> <char>      <num>      <num>      <num>      <num>      <num>
#> 1:     1     2    yes  0.9696772 -0.7393434  1.7112790  2.4243702 -0.6264538
#> 2:     2     2     no -2.1994065  0.4828756 -2.6431237 -2.6647281  0.1836433
#> 3:     3     2     no  1.9480938  0.4803055  2.0619342  2.4747615 -0.8356286
#> 4:     4     2    yes  0.1798532 -0.3574497  1.0066957  2.0571959  1.5952808
#> 5:     5     2     no  0.4150568  2.0473541  0.1538534 -0.9649004  0.3295078
#> 6:     6     2    yes  0.6468405 -2.3701135 -0.0982121  1.0989523 -0.8204684
#>        BB
#>    <char>
#> 1: group2
#> 2: group1
#> 3: group2
#> 4: group2
#> 5: group2
#> 6: group3

Similarly, we access the associated actions at each stage via list element A:

get_history(pd, stage = 1, full_history = TRUE)$A |> head()
#> Key: <id, stage>
#>       id stage    A_1
#>    <int> <num> <char>
#> 1:     1     1    yes
#> 2:     2     1     no
#> 3:     3     1     no
#> 4:     4     1    yes
#> 5:     5     1     no
#> 6:     6     1    yes
get_history(pd, stage = 2, full_history = TRUE)$A |> head()
#> Key: <id, stage>
#>       id stage     A_2
#>    <int> <num>  <char>
#> 1:     1     2      no
#> 2:     2     2      no
#> 3:     3     2 default
#> 4:     4     2     yes
#> 5:     5     2     yes
#> 6:     6     2      no

Alternatively, the state/Markov type history and actions are available using full_history = FALSE:

get_history(pd, full_history = FALSE)$H |> head()
#> Key: <id, stage>
#>       id stage          L         C          B     BB
#>    <int> <int>      <num>     <num>      <num> <char>
#> 1:     1     1  0.9696772  1.711279 -0.6264538 group2
#> 2:     1     2 -0.7393434  2.424370 -0.6264538 group2
#> 3:     2     1 -2.1994065 -2.643124  0.1836433 group1
#> 4:     2     2  0.4828756 -2.664728  0.1836433 group1
#> 5:     3     1  1.9480938  2.061934 -0.8356286 group2
#> 6:     3     2  0.4803055  2.474761 -0.8356286 group2
get_history(pd, full_history = FALSE)$A |> head()
#> Key: <id, stage>
#>       id stage       A
#>    <int> <int>  <char>
#> 1:     1     1     yes
#> 2:     1     2      no
#> 3:     2     1      no
#> 4:     2     2      no
#> 5:     3     1      no
#> 6:     3     2 default

Note that policy_data() overrides the action variable names to A_1, A_2, … in the full history case and A in the state/Markov history case.

As in the single-stage case we access the utility, i.e. the sum of the rewards, using get_utility():

get_utility(pd) |> head()
#> Key: <id>
#>       id         U
#>    <int>     <num>
#> 1:     1  1.110369
#> 2:     2 -1.788041
#> 3:     3  2.836251
#> 4:     4  3.173743
#> 5:     5  1.891312
#> 6:     6 -1.120837

Multi-stage: long data

In this example we illustrate how polle handles decision processes with a stochastic number of stages, see Section 3.5 in (Nordland and Holst 2023). The data is simulated using sim_multi_stage(). Detailed information on the simulation is available in ?sim_multi_stage. We simulate data from 2000 iid subjects:

d <- sim_multi_stage(2e3, seed = 1)

As described, the stage data is in long format:

d$stage_data[, -(9:10)] |> head()
#>       id stage event        t      A          X     X_lead         U
#>    <num> <num> <num>    <num> <char>      <num>      <num>     <num>
#> 1:     1     1     0 0.000000      1  1.3297993  0.0000000 0.0000000
#> 2:     1     2     0 1.686561      1 -0.7926711  1.3297993 0.3567621
#> 3:     1     3     0 3.071768      0  3.5246509 -0.7926711 2.1778778
#> 4:     1     4     1 3.071768   <NA>         NA         NA 0.0000000
#> 5:     2     1     0 0.000000      1  0.7635935  0.0000000 0.0000000
#> 6:     2     2     0 1.297336      1 -0.5441694  0.7635935 0.5337427

The id variable is important for identifying which rows belong to each subjects. The baseline data uses the same id variable:

d$baseline_data |> head()
#>       id     B
#>    <num> <int>
#> 1:     1     0
#> 2:     2     0
#> 3:     3     1
#> 4:     4     1
#> 5:     5     1
#> 6:     6     0

The data is transformed using policy_data() with type = "long". The names of the id, stage, event, action, and utility variables must be specified. The event variable, inspired by the event variable in survival::Surv(), is 0 whenever an action occur and 1 for a terminal event.

pd <- policy_data(data = d$stage_data,
                  baseline_data = d$baseline_data,
                  type = "long",
                  id = "id",
                  stage = "stage",
                  event = "event",
                  action = "A",
                  utility = "U")
pd
#> Policy data with n = 2000 observations and maximal K = 4 stages.
#> 
#>      action
#> stage    0    1    n
#>     1  113 1887 2000
#>     2  844 1039 1883
#>     3  956   74 1030
#>     4   72    0   72
#> 
#> Baseline covariates: B
#> State covariates: t, X, X_lead
#> Average utility: 2.46

In some cases we are only interested in analyzing a subset of the decision stages. partial() trims the maximum number of decision stages:

pd3 <- partial(pd, K = 3)
pd3
#> Policy data with n = 2000 observations and maximal K = 3 stages.
#> 
#>      action
#> stage    0    1    n
#>     1  113 1887 2000
#>     2  844 1039 1883
#>     3  956   74 1030
#> 
#> Baseline covariates: B
#> State covariates: t, X, X_lead
#> Average utility: 2.46

SessionInfo

sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: aarch64-apple-darwin22.6.0 (64-bit)
#> Running under: macOS Sonoma 14.4.1
#> 
#> Matrix products: default
#> BLAS:   /Users/oano/.asdf/installs/R/4.3.2/lib/R/lib/libRblas.dylib 
#> LAPACK: /Users/oano/.asdf/installs/R/4.3.2/lib/R/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Copenhagen
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] splines   stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] polle_1.4           SuperLearner_2.0-29 gam_1.22-3         
#> [4] foreach_1.5.2       nnls_1.5           
#> 
#> loaded via a namespace (and not attached):
#>  [1] progressr_0.14.0    cli_3.6.2           knitr_1.45         
#>  [4] rlang_1.1.3         xfun_0.41           jsonlite_1.8.8     
#>  [7] data.table_1.15.4   listenv_0.9.1       future.apply_1.11.2
#> [10] lava_1.8.0          htmltools_0.5.7     sass_0.4.7         
#> [13] rmarkdown_2.25      grid_4.3.2          evaluate_0.23      
#> [16] jquerylib_0.1.4     fastmap_1.1.1       yaml_2.3.7         
#> [19] compiler_4.3.2      codetools_0.2-19    future_1.33.2      
#> [22] lattice_0.21-9      digest_0.6.35       R6_2.5.1           
#> [25] parallelly_1.37.1   parallel_4.3.2      Matrix_1.6-1.1     
#> [28] bslib_0.5.1         tools_4.3.2         iterators_1.0.14   
#> [31] globals_0.16.3      survival_3.5-7      cachem_1.0.8

References

Nordland, Andreas, and Klaus K. Holst. 2023. “Policy Learning with the Polle Package.” https://doi.org/10.48550/arXiv.2212.02335.