library(DrugUtilisation)
library(CodelistGenerator)
library(CDMConnector)
library(dplyr)
<- DBI::dbConnect(duckdb::duckdb(), ":memory:")
con <- list(
connectionDetails con = con,
writeSchema = "main",
cdmPrefix = NULL,
writePrefix = NULL
)<- mockDrugUtilisation(
cdm connectionDetails = connectionDetails,
numberIndividual = 100
)
To generate a cohort, we will need a concept list, this can be obtained through different ways.
To get it from json file, both function readConceptList() and codesFromConceptSet() can be used.
#get concept from json file using readConceptList from this package or CodelistGenerator
<- readConceptList(here::here("inst/Concept"), cdm)
conceptSet_json_1 <- codesFromConceptSet(here::here("inst/Concept"), cdm)
conceptSet_json_2
conceptSet_json_1#> $asthma
#> [1] 317009
conceptSet_json_2#> $asthma
#> [1] 317009
Or a list can be created manually with the target codes:
#get concept using code directly
<- list(asthma = 317009)
conceptSet_code
conceptSet_code#> $asthma
#> [1] 317009
If there is a certain ingredient of interest, code can also be obtained by getDrugIngredientCodes() from CodelistGenerator.
#get concept by ingredient
<- getDrugIngredientCodes(cdm, name = "simvastatin")
conceptSet_ingredient
conceptSet_ingredient#> $simvastatin
#> [1] 1539403 1539462 1539463
ATC code can also be obtained, using getATCCodes() from CodelistGenerator.
#get concept from ATC codes
<- getATCCodes(cdm,
conceptSet_ATC level = "ATC 1st",
name = "ALIMENTARY TRACT AND METABOLISM")
conceptSet_ATC#> $alimentary_tract_and_metabolism
#> [1] 35897399
Now having the conceptSet, we can proceed to generate cohort. There are two functions in this package to generate cohort:
First, let’s use generateConceptCohortSet to get the asthma cohort using the conceptSet_code, it will also give the same output if changed to conceptSet_json_1 or conceptSet_json_2, as they are using the same concept code.
<- generateConceptCohortSet(cdm,
cdm1 conceptSet = conceptSet_code,
name = "asthma_1",
end = "observation_period_end_date",
requiredObservation = c(10, 10),
overwrite = TRUE
)$asthma_1
cdm1#> # Source: table<main.asthma_1> [?? x 4]
#> # Database: DuckDB v0.9.1 [martics@Windows 10 x64:R 4.2.3/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 12 2019-03-06 2022-03-30
#> 2 1 36 2013-05-27 2022-12-19
#> 3 1 16 2019-04-02 2020-08-17
#> 4 1 92 2012-12-12 2013-05-15
#> 5 1 58 2005-03-29 2012-07-14
#> 6 1 34 2008-12-23 2016-12-13
#> 7 1 71 2017-09-02 2019-10-18
#> 8 1 1 1996-12-22 2001-09-25
#> 9 1 97 1990-06-26 1993-01-20
#> 10 1 57 2021-02-19 2021-03-10
#> # ℹ more rows
The count of the cohort can be assessed using cohortCount() from CDMConnector
cohortCount(cdm1$asthma_1)
#> # A tibble: 1 × 3
#> cohort_definition_id number_records number_subjects
#> <int> <dbl> <dbl>
#> 1 1 45 45
Cohort attrition can be assessed using cohortAttrition() from CDMConnector
cohortAttrition(cdm1$asthma_1)
#> # A tibble: 1 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <dbl> <dbl> <dbl> <chr>
#> 1 1 45 45 1 Qualifying init…
#> # ℹ 2 more variables: excluded_records <dbl>, excluded_subjects <dbl>
The end parameter set how the cohort end date is defined. Now it is changed to event end date to demonstrate the difference from previous observation period end date. See that now the cohort_end_date is different:
<- generateConceptCohortSet(cdm,
cdm1 conceptSet = conceptSet_code,
name = "asthma_2",
end = "event_end_date",
requiredObservation = c(10, 10),
overwrite = TRUE
)$asthma_2
cdm1#> # Source: table<main.asthma_2> [?? x 4]
#> # Database: DuckDB v0.9.1 [martics@Windows 10 x64:R 4.2.3/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 1 1996-12-22 1998-03-13
#> 2 1 71 2017-09-02 2017-10-31
#> 3 1 57 2021-02-19 2021-02-27
#> 4 1 10 2021-08-16 2021-12-14
#> 5 1 18 2001-01-11 2004-06-05
#> 6 1 49 2017-10-22 2018-03-31
#> 7 1 93 2015-05-31 2016-04-07
#> 8 1 97 1990-06-26 1990-10-19
#> 9 1 2 2018-10-31 2018-11-15
#> 10 1 59 2020-11-09 2021-01-27
#> # ℹ more rows
The requiredObservation parameter is a numeric vector of length 2, that defines the number of days of required observation time prior to index and post index for an event to be included in the cohort. Let’s check it now to see how reducing required observation affect the asthma_1 cohort.
<- generateConceptCohortSet(cdm,
cdm1 conceptSet = conceptSet_code,
name = "asthma_3",
end = "observation_period_end_date",
requiredObservation = c(1, 1),
overwrite = TRUE
)$asthma_3
cdm1#> # Source: table<main.asthma_3> [?? x 4]
#> # Database: DuckDB v0.9.1 [martics@Windows 10 x64:R 4.2.3/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 10 2021-08-16 2022-03-30
#> 2 1 18 2001-01-11 2004-09-26
#> 3 1 49 2017-10-22 2018-09-30
#> 4 1 93 2015-05-31 2016-08-13
#> 5 1 1 1996-12-22 2001-09-25
#> 6 1 71 2017-09-02 2019-10-18
#> 7 1 57 2021-02-19 2021-03-10
#> 8 1 97 1990-06-26 1993-01-20
#> 9 1 58 2005-03-29 2012-07-14
#> 10 1 34 2008-12-23 2016-12-13
#> # ℹ more rows
cohortCount(cdm1$asthma_3)
#> # A tibble: 1 × 3
#> cohort_definition_id number_records number_subjects
#> <int> <dbl> <dbl>
#> 1 1 48 48
cohortAttrition(cdm1$asthma_3)
#> # A tibble: 1 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <dbl> <dbl> <dbl> <chr>
#> 1 1 48 48 1 Qualifying init…
#> # ℹ 2 more variables: excluded_records <dbl>, excluded_subjects <dbl>
Now let’s try function DrugUtilisation::generateDrugUtilisationCohortSet() to get the drug cohort for ingredient simvastatin. This function has a lot more options you can set. We first use default settings:
<- generateDrugUtilisationCohortSet(cdm,
cdm2 name = "dus_alleras",
conceptSet = conceptSet_ingredient
)$dus_alleras
cdm2#> # Source: table<main.dus_alleras> [?? x 4]
#> # Database: DuckDB v0.9.1 [martics@Windows 10 x64:R 4.2.3/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <dbl> <date> <date>
#> 1 1 57 2020-09-14 2020-11-26
#> 2 1 89 2018-08-26 2018-11-02
#> 3 1 29 2020-10-11 2021-02-07
#> 4 1 27 2019-02-26 2020-06-10
#> 5 1 57 2020-06-19 2020-08-21
#> 6 1 77 1966-12-10 1967-02-03
#> 7 1 80 2022-08-31 2022-09-07
#> 8 1 84 2010-08-15 2017-05-13
#> 9 1 90 2001-06-18 2005-07-05
#> 10 1 70 2019-12-08 2021-01-24
#> # ℹ more rows
cohortCount(cdm2$dus_alleras)
#> # A tibble: 1 × 3
#> cohort_definition_id number_records number_subjects
#> <int> <dbl> <dbl>
#> 1 1 53 48
cohortAttrition(cdm2$dus_alleras) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 7 × 4
#> number_records reason excluded_records excluded_subjects
#> <dbl> <chr> <dbl> <dbl>
#> 1 58 Qualifying initial records 0 0
#> 2 58 Duration imputation; affect… 0 0
#> 3 53 Join eras 5 0
#> 4 53 prior use wahout of 0 days 0 0
#> 5 53 at least 0 prior observation 0 0
#> 6 53 cohort_start_date >= NA 0 0
#> 7 53 cohort_end_date <= NA 0 0
The parameter durationRange specifies the range within which the duration must fall, where duration = end date - start date + 1. Default as c(1, Inf). It should be a numeric vector of length two, with no NAs and the first value should be equal or smaller than the second one. Duration values outside of durationRange will be imputed using imputeDuration. It can ne set as: “none”, “median”, “mean”, “mode” or an integer (count).
<- generateDrugUtilisationCohortSet(cdm,
cdm3 name = "dus_step2_0_inf",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf) # default as c(1, Inf)
)
cohortAttrition(cdm3$dus_step2_0_inf) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 7 × 4
#> number_records reason excluded_records excluded_subjects
#> <dbl> <chr> <dbl> <dbl>
#> 1 58 Qualifying initial records 0 0
#> 2 58 Duration imputation; affect… 0 0
#> 3 53 Join eras 5 0
#> 4 53 prior use wahout of 0 days 0 0
#> 5 53 at least 0 prior observation 0 0
#> 6 53 cohort_start_date >= NA 0 0
#> 7 53 cohort_end_date <= NA 0 0
The gapEra parameter defines the number of days between two continuous drug exposures to be considered as a same era. Now let’s change it from 0 to a larger number. From the dus_step3_alleras cohort attrition, we can see that when joining era at STEP 3, it resulted in less records, compared to the dus_step2_0_inf cohort, as exposures with less than 30 days gaps are joined.
<- generateDrugUtilisationCohortSet(cdm,
cdm4 name = "dus_step3_alleras",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf),
gapEra = 30 # default as 0
)
cohortAttrition(cdm4$dus_step3_alleras) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 7 × 4
#> number_records reason excluded_records excluded_subjects
#> <dbl> <chr> <dbl> <dbl>
#> 1 51 cohort_end_date <= NA 0 0
#> 2 58 Qualifying initial records 0 0
#> 3 58 Duration imputation; affect… 0 0
#> 4 51 Join eras 7 0
#> 5 51 prior use wahout of 0 days 0 0
#> 6 51 at least 0 prior observation 0 0
#> 7 51 cohort_start_date >= NA 0 0
The priorUseWashout parameter specifiesthe number of prior days without exposure (often termed a ‘washout’) that are required. By default, it is set to NULL, meaning no washout period is necessary. In the example provided, we observe a reduction in the number of records in STEP 4 for cohort dus_alleras_step4 due to the washout period required, compared to the dus_step3_alleras cohort.
<- generateDrugUtilisationCohortSet(cdm,
cdm5 name = "dus_alleras_step4",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf),
gapEra = 30,
priorUseWashout = 30
)
cohortAttrition(cdm5$dus_alleras_step4) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 7 × 4
#> number_records reason excluded_records excluded_subjects
#> <dbl> <chr> <dbl> <dbl>
#> 1 51 cohort_end_date <= NA 0 0
#> 2 58 Qualifying initial records 0 0
#> 3 58 Duration imputation; affect… 0 0
#> 4 51 Join eras 7 0
#> 5 51 prior use wahout of 30 days 0 0
#> 6 51 at least 0 prior observation 0 0
#> 7 51 cohort_start_date >= NA 0 0
The parameter priorObservation defines the minimum number of days of prior observation necessary for drug eras to be taken into account. If set to NULL, the drug eras are not required to fall within the observation_period. In this example, there is a noticeable decrease in the number of records for dus_alleras_step5 cohort in STEP 5 when compared to the dus_alleras_step4 cohort.
<- generateDrugUtilisationCohortSet(cdm,
cdm6 name = "dus_alleras_step5",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf),
gapEra = 30,
priorUseWashout = 30,
priorObservation = 30
)
cohortAttrition(cdm6$dus_alleras_step5) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 7 × 4
#> number_records reason excluded_records excluded_subjects
#> <dbl> <chr> <dbl> <dbl>
#> 1 58 Qualifying initial records 0 0
#> 2 58 Duration imputation; affect… 0 0
#> 3 51 Join eras 7 0
#> 4 51 prior use wahout of 30 days 0 0
#> 5 48 at least 30 prior observati… 3 2
#> 6 48 cohort_start_date >= NA 0 0
#> 7 48 cohort_end_date <= NA 0 0
The cohortDateRange parameter defines the range for the cohort_start_date and cohort_end_date. In the following example, one can observe a reduction in STEP 6 and STEP 7 due to the constraints imposed on the cohort start and end dates.
<- generateDrugUtilisationCohortSet(cdm,
cdm7 name = "dus_alleras_step67",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf),
gapEra = 30,
priorUseWashout = 30,
priorObservation = 30,
cohortDateRange = as.Date(c("2010-01-01", "2011-01-01")),
limi = "All"
)
cohortAttrition(cdm7$dus_alleras_step67) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 7 × 4
#> number_records reason excluded_records excluded_subjects
#> <dbl> <chr> <dbl> <dbl>
#> 1 58 Qualifying initial records 0 0
#> 2 58 Duration imputation; affect… 0 0
#> 3 51 Join eras 7 0
#> 4 51 prior use wahout of 30 days 0 0
#> 5 48 at least 30 prior observati… 3 2
#> 6 34 cohort_start_date >= 2010-0… 14 14
#> 7 4 cohort_end_date <= 2011-01-… 30 28
Change the limit parameter from All to First and observe how it impacts the attrition of the dus_step8_firstera cohort in comparison to the dus_alleras_step67 cohort. The number of records decreased at STEP 8 because of the First limit. It gets the first record that fulfills all criteria.
<- generateDrugUtilisationCohortSet(cdm,
cdm8 name = "dus_step8_firstera",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf),
gapEra = 30,
priorUseWashout = 30,
priorObservation = 30,
cohortDateRange = as.Date(c("2010-01-01", "2011-01-01")),
limit = "First"
)
cohortAttrition(cdm8$dus_step8_firstera) %>% select(number_records, reason, excluded_records, excluded_subjects)
#> # A tibble: 8 × 4
#> number_records reason excluded_records excluded_subjects
#> <dbl> <chr> <dbl> <dbl>
#> 1 58 Qualifying initial records 0 0
#> 2 58 Duration imputation; affect… 0 0
#> 3 51 Join eras 7 0
#> 4 51 prior use wahout of 30 days 0 0
#> 5 48 at least 30 prior observati… 3 2
#> 6 34 cohort_start_date >= 2010-0… 14 14
#> 7 4 cohort_end_date <= 2011-01-… 30 28
#> 8 4 Limit to first era 0 0
The parameter limit only allows All and First. The First value represents the first era that meets the criteria set by the parameters prior to limit. However, if the goal is to get the first-ever era, this can be achieved using this function too. Setting the following parameter will result in the first ever drug era:
<- generateDrugUtilisationCohortSet(cdm,
cdm8 name = "dus_step8_firstever",
conceptSet = conceptSet_ingredient,
imputeDuration = "none",
durationRange = c(0, Inf),
gapEra = 0,
priorUseWashout = Inf,
priorObservation = 0,
cohortDateRange = as.Date(c(NA, NA)),
limit = "First"
)
::dbDisconnect(con, shutdown = TRUE) DBI
Constructing concept sets and generating various cohorts are the initial steps in conducting a drug utilization study. For further guidance on using getting more information like characteristics from these cohorts, please refer to the other vignettes.