custom-imprinting-types

Generate probabilities of imprinting to custom subsets of influenza A viruses

By default, get_imprinting_probabilities() calculates subtype-specific probabilities of imprinting to influenza A H1N1, H2N2, or H3N2. Researchers may want to calculate different kinds of imprinting probabilities. E.g. perhaps we want to study imprinting to specific influenza isolates, clades, glycosylation states, or to a multivalent vaccine.

To calculate imprinting to custom groups of influenza A viruses, use the annual_frequencies option in get_imprinting_probabilities().

The annual_frequencies input must be a list whose names match the countries input. Each element of the list must be a data frame or tibble with the following columns:

As an example, let’s imagine we want to calculate subtype-specific imprinting probabilities, with some probability of imprinting by vaccination in the United States and Germany. Note that the pediatric influenza vaccination rates used in this example are PURELY HYPOTEHTICAL, and not based on data or actual vaccine policies in these countries.

Let’s start by making a data frame of circulation frequencies for the United States.

library(imprinting)
## Start with subtype-specific fractions for H1N1, H2N2, H3N2
US_frequencies = get_country_cocirculation_data(country = 'United States', max_year = 2022) %>%
  select(1:4)
head(US_frequencies)
#> # A tibble: 6 × 4
#>    year `A/H1N1` `A/H2N2` `A/H3N2`
#>   <dbl>    <dbl>    <dbl>    <dbl>
#> 1  1918        1        0        0
#> 2  1919        1        0        0
#> 3  1920        1        0        0
#> 4  1921        1        0        0
#> 5  1922        1        0        0
#> 6  1923        1        0        0

Now, add in a vaccination column. Not all countries vaccinate healthy infants against influenza, and infant influenza vaccination has only been widely practiced for the past few decades, even in countries where coverage is now high. Hypothetically, let’s assume that 5% of US infants were vaccinated against influenza starting in 1995, increasing steadily to 75% coverage in 2020. (Again, this is purely hypothetical, and not based on data.)

## Add a vaccination column
US_frequencies <- US_frequencies %>%
  mutate(vaccination = c(rep(0, 77), seq(.5, .75, length = 26), .75, .75), # Add a vaccination column
         `A/H1N1` = `A/H1N1`*(1-vaccination), # Assume only non-vaccinated children have primary
         `A/H2N2` = `A/H2N2`*(1-vaccination), # infections; multiply the subtype-specific circulation
         `A/H3N2` = `A/H3N2`*(1-vaccination)) # fractions by one minus the year's vaccination probability.
tail(US_frequencies, n = 30)
#> # A tibble: 30 × 5
#>     year `A/H1N1` `A/H2N2` `A/H3N2` vaccination
#>    <dbl>    <dbl>    <dbl>    <dbl>       <dbl>
#>  1  1993  0.105          0    0.895        0   
#>  2  1994  0.00522        0    0.995        0   
#>  3  1995  0.0108         0    0.489        0.5 
#>  4  1996  0.288          0    0.202        0.51
#>  5  1997  0              0    0.48         0.52
#>  6  1998  0.00444        0    0.466        0.53
#>  7  1999  0.00372        0    0.456        0.54
#>  8  2000  0.130          0    0.320        0.55
#>  9  2001  0.335          0    0.105        0.56
#> 10  2002  0.0155         0    0.415        0.57
#> # … with 20 more rows

Assume Germany adopted their infant vaccination policy 10 years later, in 2005. Generate a Germany-specific table of frequencies:

Germany_frequencies <- get_country_cocirculation_data(country = 'Germany',
                                                      max_year = 2022) %>%
  select(1:4) %>%
  mutate(vaccination = c(rep(0, 87), seq(.05, .75, length = 16), .75, .75),
         `A/H1N1` = `A/H1N1`*(1-vaccination), # Assume only non-vaccinated children have primary
         `A/H2N2` = `A/H2N2`*(1-vaccination), # infections; multiply the subtype-specific circulation
         `A/H3N2` = `A/H3N2`*(1-vaccination))
tail(Germany_frequencies, 20)
#> # A tibble: 20 × 5
#>     year `A/H1N1` `A/H2N2` `A/H3N2` vaccination
#>    <dbl>    <dbl>    <dbl>    <dbl>       <dbl>
#>  1  2002 0               0   1           0     
#>  2  2003 0.000727        0   0.999       0     
#>  3  2004 0               0   0.95        0.05  
#>  4  2005 0.190           0   0.713       0.0967
#>  5  2006 0.145           0   0.712       0.143 
#>  6  2007 0.0820          0   0.728       0.19  
#>  7  2008 0.0537          0   0.710       0.237 
#>  8  2009 0.447           0   0.269       0.283 
#>  9  2010 0.644           0   0.0264      0.33  
#> 10  2011 0.612           0   0.0111      0.377 
#> 11  2012 0.0352          0   0.541       0.423 
#> 12  2013 0.280           0   0.250       0.47  
#> 13  2014 0.149           0   0.335       0.517 
#> 14  2015 0.0977          0   0.339       0.563 
#> 15  2016 0.296           0   0.0945      0.61  
#> 16  2017 0.0128          0   0.331       0.657 
#> 17  2018 0.255           0   0.0415      0.703 
#> 18  2019 0.126           0   0.124       0.75  
#> 19  2020 0.121           0   0.129       0.75  
#> 20  2022 0.0110          0   0.239       0.75

Input the custom frequencies into get_imprinting_probabilities()

## Check that all frequencies sum to 1
rowSums(US_frequencies[,2:5])
#>   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
rowSums(Germany_frequencies[,2:5])
#>   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# Wrap the country-specific frequencies into a named list
input_list = list("United States" = US_frequencies,
                  "Germany" = Germany_frequencies)

## Calculate probabilities
get_imprinting_probabilities(observation_years = 2022, 
                             countries = c("United States", "Germany"), 
                             annual_frequencies = input_list, 
                             df_format = "wide")
#> # A tibble: 210 × 8
#>     year country       birth_year `A/H1N1` `A/H2N2` `A/H3N2` vaccination naive
#>    <dbl> <chr>              <dbl>    <dbl>    <dbl>    <dbl>       <dbl> <dbl>
#>  1  2022 United States       1918        1        0        0           0     0
#>  2  2022 United States       1919        1        0        0           0     0
#>  3  2022 United States       1920        1        0        0           0     0
#>  4  2022 United States       1921        1        0        0           0     0
#>  5  2022 United States       1922        1        0        0           0     0
#>  6  2022 United States       1923        1        0        0           0     0
#>  7  2022 United States       1924        1        0        0           0     0
#>  8  2022 United States       1925        1        0        0           0     0
#>  9  2022 United States       1926        1        0        0           0     0
#> 10  2022 United States       1927        1        0        0           0     0
#> # … with 200 more rows