Example to estimate incubation period

Flavio Finger

2023-01-13

Description

This package contains two functions useful to compute the incubation period distribution from outbreak data. The inputs needed for each patient are given as a data.frame or linelist object and must contain:

The function empirical_incubation_dist() computes the discrete probability distribution by giving equal weight to each patient. Thus, in the case of N patients, the n possible exposure dates of a given patient get the overall weight 1/(n*N). The function returns a data frame with column incubation_period containing the different incubation periods with a time step of one day and their relative_frequency.

The function fit_gamma_incubation_dist() takes the same inputs, but directly samples from the empirical distribution and fits a discrete gamma distribution to it by the means of fit_disc_gamma.

Example

Load environment:

library(magrittr)
library(epitrix)
library(distcrete)
library(ggplot2)

Make a linelist object containing toy data with several possible exposure dates for each case:

ll <- sim_linelist(15)

x <- 0:15
y <- distcrete("gamma", 1, shape = 12, rate = 3, w = 0)$d(x)
mkexposures <- function(i) {
  i - sample(x, size = sample.int(5, size = 1), replace = FALSE, prob = y)
}
exposures <- sapply(ll$date_of_onset, mkexposures)
ll$dates_exposure <- exposures

print(ll)
#>    id date_of_onset date_of_report gender  outcome
#> 1   1    2020-01-23     2020-02-01   male recovery
#> 2   2    2020-02-14     2020-02-18   male    death
#> 3   3    2020-01-25     2020-01-29 female recovery
#> 4   4    2020-01-16     2020-01-30   male recovery
#> 5   5    2020-01-22     2020-01-28   male    death
#> 6   6    2020-01-26     2020-01-31   male recovery
#> 7   7    2020-02-09     2020-02-16 female recovery
#> 8   8    2020-02-17     2020-02-24 female recovery
#> 9   9    2020-01-14     2020-01-20   male recovery
#> 10 10    2020-02-22     2020-03-12   male recovery
#> 11 11    2020-02-26     2020-03-04   male recovery
#> 12 12    2020-01-06     2020-01-10   male recovery
#> 13 13    2020-02-23     2020-02-29 female recovery
#> 14 14    2020-01-08     2020-01-16 female recovery
#> 15 15    2020-01-21     2020-01-26   male recovery
#>                       dates_exposure
#> 1                       18281, 18280
#> 2                       18303, 18305
#> 3                              18282
#> 4  18274, 18273, 18275, 18272, 18271
#> 5                              18279
#> 6                              18281
#> 7                       18297, 18298
#> 8         18306, 18304, 18305, 18307
#> 9                       18270, 18272
#> 10 18308, 18311, 18310, 18313, 18312
#> 11 18315, 18316, 18314, 18317, 18313
#> 12        18264, 18263, 18265, 18262
#> 13        18313, 18312, 18310, 18309
#> 14               18264, 18265, 18266
#> 15        18279, 18277, 18280, 18278

Empirical distribution:

incubation_period_dist <- empirical_incubation_dist(ll, date_of_onset, dates_exposure)
print(incubation_period_dist)
#> # A tibble: 7 × 2
#>   incubation_period relative_frequency
#>               <dbl>              <dbl>
#> 1                 0              0    
#> 2                 1              0.06 
#> 3                 2              0.107
#> 4                 3              0.262
#> 5                 4              0.312
#> 6                 5              0.149
#> 7                 6              0.11

ggplot(incubation_period_dist, aes(incubation_period, relative_frequency)) +
  geom_col()

Fit discrete gamma:

fit <- fit_gamma_incubation_dist(ll, date_of_onset, dates_exposure)
print(fit)
#> $mu
#> [1] 4.229868
#> 
#> $cv
#> [1] 0.32265
#> 
#> $sd
#> [1] 1.364767
#> 
#> $ll
#> [1] -1729.577
#> 
#> $converged
#> [1] TRUE
#> 
#> $distribution
#> A discrete distribution
#>   name: gamma
#>   parameters:
#>     shape: 9.60586714704713
#>     scale: 0.440342153837883

x = c(0:10)
y = fit$distribution$d(x)
ggplot(data.frame(x = x, y = y), aes(x, y)) +
  geom_col(data = incubation_period_dist, aes(incubation_period, relative_frequency)) +
  geom_point(stat="identity", col = "red", size = 3) +
  geom_line(stat="identity", col = "red")

Note that if the possible exposure dates are consecutive for all patients then empirical_incubation_dist() and fit_gamma_incubation_dist() can take date ranges as inputs instead of lists of individual exposure dates (see help for details).