Working with the multimorbidity Package

Wyatt P. Bensken

2023-02-15

Introduction

The multimorbidity package is a simple and transparent one-stop-shop for those working with claims or other administrative health care data who wish to obtain comorbidity, frailty, and/or multimorbidity measures. The goal of the package is to first clean and organize the data in a way that can then be easily used for various algorithms in a uniform and standard format.

Load Sample Data

We’ve created two sample datasets.

Claims Data

This features 5 hypothetical patients and hypothetical claims across ICD-9 and ICD-10.

claims <- i9_i10_comb
head(claims, 10)
#>    patient_id  sex date_of_serv visit_type   dx1   dx2   dx3   dx4   dx5 hcpcs
#> 1        1001 male   2012-02-14         ip  2768  4019  3310 29620  2630 E2201
#> 2        1001 male   2013-05-15         ip   486  2768 99591  4019  3310 E2201
#> 3        1001 male   2013-01-10         ot 40290 29620  4019  <NA>  <NA> E2201
#> 4        1001 male   2013-04-02         ot  3310 57149  4019  <NA>  <NA> E2201
#> 5        1001 male   2013-05-06         ot  2449  4019   486  <NA>  <NA> E2201
#> 6        1001 male   2013-06-04         ot   486  4019 29620  <NA>  <NA> E2201
#> 7        1001 male   2013-10-01         ot 24920  3310  4019  <NA>  <NA> E2201
#> 8        1001 male   2013-11-05         ot   430  4019 29620  7930  <NA> E2201
#> 9        1001 male   2014-02-01         ot  7241  3310  4019   430  <NA> E2201
#> 10       1001 male   2014-03-15         ot 24920  4011   486 29620 39891 E2201
#>    icd_version
#> 1            9
#> 2            9
#> 3            9
#> 4            9
#> 5            9
#> 6            9
#> 7            9
#> 8            9
#> 9            9
#> 10           9

ID Data

This is our one-row-per-patient dataset which is only needed if we intend to use the function to limit our time window (comorbidity_window).

#>   patient_id date_of_interest10 date_of_interest9
#> 1       1001         2021-06-04        2013-06-04
#> 2       1002         2021-03-11        2013-03-11
#> 3       1003         2021-08-02        2013-08-02
#> 4       1004         2021-01-20        2013-01-20
#> 5       1005         2021-02-14        2013-02-14

Preparing the Data

The first step is to “prepare” our data for the subsequent algorithms. The end-goal is to have a dataset that has 1 column with a patient ID, 1 column which contains the diagnosis code, and 1 column which will note if it’s ICD-9 (9) or ICD-10 (10). There are other variables that may be of interest depending on the specification including type (inpatient or outpatient) and date.

The arguments used here are (in order): telling it the name of our data, specifying the ID variable, noting if it’s wide or long (long would be if the data is in our final format), the prefix for the diagnosis columns (dx1, dx2, dx3 would be “dx”), noting if our data include a HCPCS/CPT column, specifying the variable which notes if it’s ICD-9 or ICD-10, specifying the variable which tells us the type of visit (inpatient or outpatient), and finally specifying which column is the date.

prepared_data <- prepare_data(dat = claims,
                              id = patient_id,
                              style = "wide",
                              prefix_dx = "dx",
                              hcpcs = "yes",
                              prefix_hcpcs = "hcpcs", 
                              version_var = icd_version,
                              type_name = visit_type,
                              date = date_of_serv)
#> # A tibble: 10 × 5
#>    patient_id claim_date dx    version type 
#>    <fct>      <date>     <chr>   <dbl> <fct>
#>  1 1001       2012-02-14 2768        9 ip   
#>  2 1001       2012-02-14 4019        9 ip   
#>  3 1001       2012-02-14 3310        9 ip   
#>  4 1001       2012-02-14 29620       9 ip   
#>  5 1001       2012-02-14 2630        9 ip   
#>  6 1001       2013-05-15 486         9 ip   
#>  7 1001       2013-05-15 2768        9 ip   
#>  8 1001       2013-05-15 99591       9 ip   
#>  9 1001       2013-05-15 4019        9 ip   
#> 10 1001       2013-05-15 3310        9 ip

Setting Comorbidity Window

Oftentimes, we may be interested in limiting our claims to a specific window, such as the 1-year before diagnosis. To accommodate this, this package includes a function which will merge datasets and limit the claims to that window.

In the example below we do the following: tell it the name of our ID dataset, the name of our claims data, specify our mutual ID variable, specify the variable name in the ID dataset which is our “date of interest”, specify the variable in the claims data which is our date of the claim, and specify the time window (in this example, pre only) we are interested in. There is a complementary argument for post (time_post), which is set to infinity as the default. In this example we are only taking the claims that occur within the 60 days before our date of interest as well as all claims after our date of interest. A common extension on this would be if we were interested in only those claims that occurred before diagnosis. In this case we could ignore the time_pre argument and set time_post = 0.

Note: in this example we ignore date_of_interest10 but this could be used instead as we include both ICD-9 and ICD-10 claims and dates.

limit_data <- comorbidity_window(id_dat = id, dat = prepared_data, id = patient_id, 
                                 id_date = date_of_interest9, claims_date = claim_date,
                                 time_pre = 60)
#> # A tibble: 10 × 7
#>    patient_id claim_date dx    version type  date_of_interest10 date_of_interes…
#>    <fct>      <date>     <chr>   <dbl> <fct> <date>             <date>          
#>  1 1001       2013-05-15 486         9 ip    2021-06-04         2013-06-04      
#>  2 1001       2013-05-15 2768        9 ip    2021-06-04         2013-06-04      
#>  3 1001       2013-05-15 99591       9 ip    2021-06-04         2013-06-04      
#>  4 1001       2013-05-15 4019        9 ip    2021-06-04         2013-06-04      
#>  5 1001       2013-05-15 3310        9 ip    2021-06-04         2013-06-04      
#>  6 1001       2013-05-06 2449        9 ot    2021-06-04         2013-06-04      
#>  7 1001       2013-05-06 4019        9 ot    2021-06-04         2013-06-04      
#>  8 1001       2013-05-06 486         9 ot    2021-06-04         2013-06-04      
#>  9 1001       2013-06-04 486         9 ot    2021-06-04         2013-06-04      
#> 10 1001       2013-06-04 4019        9 ot    2021-06-04         2013-06-04

Running Indices

The real advantage of this package is now that we have our data in a standard format, we are able to apply a multitude of comorbidity indices to these following a near-identical format. More information about these indices can be found in the package documentation, and the code below just demonstrates how to execute them.

The arguments are similar and include: the dataset name, the variable of our patient ID, the variable of our diagnoses, the version (9 = ICD-9 only, 10 = ICD-10 only, and 19 = both), the variable which specifies the version of that diagnosis code (9 or 10), and whether or not we want to require there to be two outpatient visits for an individual to be positively coded with a comorbidity. While not frequently used, this adaptation may limit rule-out diagnoses and the package was built with this flexibility in mind.

Elixhauser Comorbidity Index

elix_data <- elixhauser(dat = limit_data, id = patient_id, dx = dx, version = 19, version_var = version, outpatient_two = "yes")
#> Message: Specifying that your data uses both ICD-9 and ICD-10 will result in only the Elixhauser comorbidities 
#>  which are compatible with ICD-9, as the changes and additions which are seen in 
#>  ICD-10 have, to date, not been back-mapped to ICD-9.
#> Message: You have specified that for a comorbidity to be positvely coded, an individual must have two outpatient claims with it. Please make sure the levels of your variable denoting outpatient type are either 'ot' or 'OT'
#> # A tibble: 5 × 34
#>   id      chf valve pulmcirc perivasc elix_htn_uc elix_htn_c  para neuro
#>   <fct> <dbl> <dbl>    <dbl>    <dbl>       <dbl>      <dbl> <dbl> <dbl>
#> 1 1001      0     0        0        0           1          0     0     1
#> 2 1002      0     0        0        0           0          0     0     0
#> 3 1003      0     0        0        0           1          0     0     1
#> 4 1004      0     0        0        0           0          0     0     0
#> 5 1005      0     0        0        0           0          0     1     0
#> # … with 25 more variables: chrnlung <dbl>, dm <dbl>, dmcx <dbl>,
#> #   hypothy <dbl>, renlfail <dbl>, liver <dbl>, ulcer <dbl>, aids <dbl>,
#> #   lymph <dbl>, mets <dbl>, tumor <dbl>, arth <dbl>, coag <dbl>, obese <dbl>,
#> #   wghtloss <dbl>, lytes <dbl>, bldloss <dbl>, anemdef <dbl>, alcohol <dbl>,
#> #   drug <dbl>, psych <dbl>, depress <dbl>, htn_c <dbl>, elix_death <dbl>,
#> #   elix_readmit <dbl>

Charlson Comorbidity Index

charlson_data <- charlson(dat = limit_data, id = patient_id, dx = dx, version = 19, version_var = version, outpatient_two = "yes")
#> Message: You have specified that for a comorbidity to be positvely coded, an individual must have two outpatient claims with it. Please make sure the levels of your variable denoting outpatient type are either 'ot' or 'OT'
#> # A tibble: 5 × 19
#>   id    charlson_myocar charlson_chf charlson_periph_vasc charlson_cerebro
#>   <fct>           <dbl>        <dbl>                <dbl>            <dbl>
#> 1 1001                0            0                    0                1
#> 2 1002                0            0                    0                1
#> 3 1003                0            0                    0                0
#> 4 1004                0            0                    0                0
#> 5 1005                0            0                    0                0
#> # … with 14 more variables: charlson_dementia <dbl>,
#> #   charlson_chronic_pulm <dbl>, charlson_rheum <dbl>,
#> #   charlson_peptic_ulcer <dbl>, charlson_mild_liv <dbl>,
#> #   charlson_diab_uc <dbl>, charlson_diab_c <dbl>, charlson_hemi_para <dbl>,
#> #   charlson_renal <dbl>, charlson_malig <dbl>, charlson_mod_sev_liv <dbl>,
#> #   charlson_met_solid <dbl>, charlson_hiv <dbl>, charlson_score <dbl>

Claims Frailty Index

cfi_data <- cfi(dat = limit_data, id = patient_id, dx = dx, version = 19, version_var = version)
#> # A tibble: 5 × 2
#>   id    frailty_index
#>   <fct>         <dbl>
#> 1 1001          0.365
#> 2 1002          0.279
#> 3 1003          0.313
#> 4 1004          0.272
#> 5 1005          0.337

Multimorbidity Weighted Index

mwi_data <- mwi(dat = limit_data, id = patient_id, dx = dx, version = 9, version_var = version)
#> # A tibble: 5 × 2
#>   id       mwi
#>   <fct>  <dbl>
#> 1 1001  21.0  
#> 2 1002   3.91 
#> 3 1003   3.51 
#> 4 1004   3.55 
#> 5 1005   0.614

Nicholson and Fortin Conditions

nf_data <- nicholsonfortin(dat = limit_data, id = patient_id, dx = dx, version = 19, version_var = version, outpatient_two = "yes")
#> Message: You have specified that for a comorbidity to be positvely coded, an individual must have two outpatient claims with it. Please make sure the levels of your variable denoting outpatient type must be either 'ot' or 'OT'
#> # A tibble: 5 × 21
#>   id      htn obesity diabetes  clrd hyperlipid cancer   cvd heartfail
#>   <fct> <dbl>   <dbl>    <dbl> <dbl>      <dbl>  <dbl> <dbl>     <dbl>
#> 1 1001      1       0        0     1          0      0     0         0
#> 2 1002      0       0        1     0          0      0     0         0
#> 3 1003      1       0        1     0          0      0     0         0
#> 4 1004      0       0        0     0          0      1     0         0
#> 5 1005      0       0        0     0          0      1     1         0
#> # … with 12 more variables: anxietydepress <dbl>, arthritis <dbl>,
#> #   stroketia <dbl>, thyroid <dbl>, ckd <dbl>, osteo <dbl>, dementia <dbl>,
#> #   musculo <dbl>, stomach <dbl>, colon <dbl>, liver <dbl>, urinary <dbl>