Working with the socialrisk Package

Wyatt P. Bensken

2023-02-15

Introduction

The goal of socialrisk is to create an efficient way to identify social risk from administrative health care data using ICD-10 diagnosis codes.

Load Sample Data

We’ve created a sample dataset of ICD-10 administrative data which we can load in.

i10_wide
#>    patient_id    sex date_of_serv    dx1    dx2    dx3    dx4    dx5 visit_type
#> 1        1001   male   2020-02-14   E876   Z560  Z6372   Z654   E440         ip
#> 2        1001   male   2021-05-15   J189   Z644   A408    I10   G309         ip
#> 3        1001   male   2021-01-10   I119   Z628    I10   <NA>   <NA>         ot
#> 4        1001   male   2021-04-02   G309   K731   Z591   <NA>   <NA>         ot
#> 5        1001   male   2021-05-06   E039    I10   J189   <NA>   <NA>         ot
#> 6        1001   male   2021-06-04   J189   Z604   F329   <NA>   <NA>         ot
#> 7        1001   male   2021-10-01  E0800   G309    I10   <NA>   <NA>         ot
#> 8        1001   male   2021-11-05  I6011    I10   F329   R930   <NA>         ot
#> 9        1001   male   2022-02-01   M546   G309    I10  I6011   <NA>         ot
#> 10       1001   male   2022-03-15  E0800    I10   J189   F329   <NA>         ot
#> 11       1002 female   2020-01-09   G459   Z598   E840   <NA>   <NA>         ip
#> 12       1002 female   2020-03-23   E840   Z591   <NA>   <NA>   <NA>         ot
#> 13       1002 female   2020-09-07   E119   Z558   <NA>   <NA>   <NA>         ot
#> 14       1002 female   2020-12-05   E840   E119   <NA>   <NA>   <NA>         ot
#> 15       1002 female   2022-03-25   F419   E119   G459   <NA>   <NA>         ot
#> 16       1003   male   2020-02-15  F3010  F1910    I10 G40909   R296         ip
#> 17       1003   male   2020-03-31  F3010   Z562   E109   <NA>   <NA>         ot
#> 18       1003   male   2020-12-31   K762   R569   Z576   <NA>   <NA>         ot
#> 19       1003   male   2021-12-22   E109   R569  F1910  F4310   <NA>         ot
#> 20       1003   male   2021-12-25 G40909  F1910   R569   <NA>   <NA>         ot
#> 21       1003   male   2022-08-28   K762   Z564   <NA>   <NA>   <NA>         ot
#> 22       1003   male   2022-09-05   E109   K762  F4310   <NA>   <NA>         ot
#> 23       1004 female   2021-01-09 C50111  F1020   F330   <NA>   <NA>         ot
#> 24       1004 female   2021-04-15 C50111   F330   <NA>   <NA>   <NA>         ot
#> 25       1004 female   2021-06-08   F329 C50111  F1020   <NA>   <NA>         ot
#> 26       1005 female   2020-01-27  K4000   G839  R1030   R251 G43909         ip
#> 27       1005 female   2020-11-13 G43909  K4000   G839   <NA>   <NA>         ot
#> 28       1005 female   2021-12-07    J22   G839 G43909   <NA>   <NA>         ot
#> 29       1005 female   2021-12-26  B2790    J22   G839   <NA>   <NA>         ot
#>    hcpcs icd_version
#> 1  E2201          10
#> 2  E2201          10
#> 3  E2201          10
#> 4  E2201          10
#> 5  E2201          10
#> 6  E2201          10
#> 7  E2201          10
#> 8  E2201          10
#> 9  E2201          10
#> 10 E2201          10
#> 11 E0159          10
#> 12 E0159          10
#> 13 E0159          10
#> 14 E0159          10
#> 15 E0159          10
#> 16 E1353          10
#> 17 E1353          10
#> 18 E1353          10
#> 19 E1353          10
#> 20 E1353          10
#> 21 E1353          10
#> 22 E1353          10
#> 23 A7047          10
#> 24 A7047          10
#> 25 A7047          10
#> 26 K0669          10
#> 27 K0669          10
#> 28  <NA>          10
#> 29  <NA>          10

Preparing the Data

We use the built-in clean_data() function to specify the: dataset, patient id, current data format (wide or long), and the prefix of the diagnoses variables.

data <- clean_data(dat = i10_wide,
                   id = patient_id,
                   style = "wide",
                   prefix_dx = "dx")
#> # A tibble: 10 × 2
#>    patient_id dx   
#>    <fct>      <chr>
#>  1 1001       E876 
#>  2 1001       Z560 
#>  3 1001       Z6372
#>  4 1001       Z654 
#>  5 1001       E440 
#>  6 1001       J189 
#>  7 1001       Z644 
#>  8 1001       A408 
#>  9 1001       I10  
#> 10 1001       G309

Social Risk

Now, we can run our various social risk functions, with varying taxonomies.

Centers for Medicare and Medicaid Services (CMS)

cms <- socialrisk(dat = data, id = patient_id, dx = dx, taxonomy = "cms")
#> # A tibble: 5 × 12
#>   patient_id any_social_risk number_domains z55_education z56_employment
#>   <fct>                <dbl>          <dbl>         <dbl>          <dbl>
#> 1 1001                     1              7             0              1
#> 2 1002                     1              2             1              0
#> 3 1003                     1              2             0              1
#> 4 1004                     0              0             0              0
#> 5 1005                     0              0             0              0
#> # … with 7 more variables: z57_occupation <dbl>, z59_housing <dbl>,
#> #   z60_social <dbl>, z62_upbringing <dbl>, z63_family <dbl>,
#> #   z64_psychosocial <dbl>, z65_psychosocial_other <dbl>

Missouri Hospital Association

mha <- socialrisk(dat = data, id = patient_id, dx = dx, taxonomy = "mha")
#> # A tibble: 5 × 8
#>   patient_id any_social_risk number_domains employment family housing
#>   <fct>                <dbl>          <dbl>      <dbl>  <dbl>   <dbl>
#> 1 1001                     1              5          1      1       1
#> 2 1002                     1              2          0      0       1
#> 3 1003                     1              1          1      0       0
#> 4 1004                     0              0          0      0       0
#> 5 1005                     0              0          0      0       0
#> # … with 2 more variables: psychosocial <dbl>, ses <dbl>

SIREN - UCSF

siren <- socialrisk(dat = data, id = patient_id, dx = dx, taxonomy = "siren")
#> Note: The SIREN Compendium assigns multiple domains to each code, resulting in non-mutally exclusive groups.
#> # A tibble: 5 × 19
#>   patient_id any_social_risk number_domains access education employment finances
#>   <fct>                <dbl>          <dbl>  <dbl>     <dbl>      <dbl>    <dbl>
#> 1 1001                     1              5      0         0          1        0
#> 2 1002                     1              6      1         1          0        1
#> 3 1003                     1              1      0         0          1        0
#> 4 1004                     0              0      0         0          0        0
#> 5 1005                     0              0      0         0          0        0
#> # … with 12 more variables: food <dbl>, housing <dbl>, immigration <dbl>,
#> #   incarceration <dbl>, language <dbl>, race_eth <dbl>, safety <dbl>,
#> #   soc_connect <dbl>, stress <dbl>, transportation <dbl>, utilities <dbl>,
#> #   veteran <dbl>