Introduction to the R package covid19br

Introduction

This vignette shows how to use the R package covid19br for downloading and exploring data from the COVID-19 pandemic in Brazil and the globe as well. The package downloads datasets from the following repositories:

The last repository has data on the COVID-19 pandemic at the global level (daily counts of confirmed cases, deaths, and recovered patients by countries and territories), and has been widely used all over the world as a reliable source of data information on the COVID-19 pandemic. The former repository, on the other hand, possesses data on the Brazilian territory by city, state, region, and national levels.

We hope that this package may be helpful to other researchers and scientists to understand and fight this terrible pandemic that has been plaguing the world.

Getting started with R package covid19br

We will get started by showing how to use the package to load into R data sets of the COVID-19 pandemic by downloading the COVID-19 data set from the official Brazilian repository https://covid.saude.gov.br

library(covid19br)
library(tidyverse)

# downloading the data (at national level):
brazil <- downloadCovid19("brazil")

# looking at the downloaded data:
glimpse(brazil)
#> Rows: 600
#> Columns: 9
#> $ date         <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week     <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases     <int> 0, 1, 0, 0, 1, 0, 0, 0, 1, 4, 6, 6, 6, 0, 9, 18, 25, 21, …
#> $ accumCases   <int> 0, 1, 1, 1, 2, 2, 2, 2, 3, 7, 13, 19, 25, 25, 34, 52, 77,…
#> $ newDeaths    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ newFollowup  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ pop          <dbl> 210147125, 210147125, 210147125, 210147125, 210147125, 21…

# plotting the accumulative number of deaths:
ggplot(brazil, aes(x = date, y = accumDeaths)) +
  geom_point() +
  geom_path()

Next, will show how to draw a plot with the daily count of new deaths along with its respective moving averarge. Here, we will use the function pracma::movavg() to compute the moving average.

library(pracma)

# computing the moving average:
brazil <- brazil %>%
  mutate(
    ma_newDeaths = movavg(newDeaths, n = 7, type = "s")
  )

# looking at the transformed data:
glimpse(brazil)
#> Rows: 600
#> Columns: 10
#> $ date         <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week     <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases     <int> 0, 1, 0, 0, 1, 0, 0, 0, 1, 4, 6, 6, 6, 0, 9, 18, 25, 21, …
#> $ accumCases   <int> 0, 1, 1, 1, 2, 2, 2, 2, 3, 7, 13, 19, 25, 25, 34, 52, 77,…
#> $ newDeaths    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ newFollowup  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ pop          <dbl> 210147125, 210147125, 210147125, 210147125, 210147125, 21…
#> $ ma_newDeaths <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.…

After computing the desired moving average, it is convenient to reorganize the data to fit the so-called tidy data format. This task can be easily done with the aid of the function pivot_long():

deaths <- brazil %>%
  select(date, newDeaths, ma_newDeaths) %>%
  pivot_longer(
    cols = c("newDeaths", "ma_newDeaths"),
    values_to = "deaths", names_to = "type"
  ) %>%
  mutate(
    type = recode(type, 
           ma_newDeaths = "moving average",
           newDeaths = "count",
    )
  )

# looking at the (tidy) data:
glimpse(deaths)
#> Rows: 1,200
#> Columns: 3
#> $ date   <date> 2020-02-25, 2020-02-25, 2020-02-26, 2020-02-26, 2020-02-27, 20…
#> $ type   <chr> "count", "moving average", "count", "moving average", "count", …
#> $ deaths <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

# drawing the desired plot:
ggplot(deaths, aes(x = date, y=deaths, color = type)) +
  geom_point() +
  geom_path() + 
  theme(legend.position="bottom")

When dealing with epidemiological data we are often interested in computing quantities such as incidence, mortality and lethality rates. The function covid19br::add_epi_rates() can be used to add those rates to the downloaded data, as shown below:


# downloading the data (region level):
regions <- downloadCovid19("regions") 

# adding the rates to the downloaded data:
regions <- regions %>%
  add_epi_rates()

# looking at the data:
glimpse(regions)
#> Rows: 3,000
#> Columns: 13
#> $ region       <chr> "Midwest", "Midwest", "Midwest", "Midwest", "Midwest", "M…
#> $ date         <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week     <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 3, 4, …
#> $ accumCases   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 5, 9, …
#> $ newDeaths    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ newFollowup  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ pop          <dbl> 16297074, 16297074, 16297074, 16297074, 16297074, 1629707…
#> $ incidence    <dbl> 0.000000000, 0.000000000, 0.000000000, 0.000000000, 0.000…
#> $ lethality    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ mortality    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

The function plotly::ggplotly() can be used to draw an interactive plot as follows:

library(plotly)

p <- ggplot(regions, aes(x = date, y = mortality, color = region)) +
  geom_point() +
  geom_path()

ggplotly(p)

In our last example, we will obtain a table summarizing the for the 27 Brazilian capitals in 2021-10-16.

library(kableExtra)

cities <- downloadCovid19("cities")

capitals <- cities %>%
  filter(capital == TRUE, date == max(date)) %>%
  add_epi_rates() %>%
  select(region, state, city, newCases, newDeaths, accumCases, accumDeaths, incidence, mortality, lethality) %>%
  arrange(desc(lethality), desc(mortality), desc(incidence))

# printing the table:
capitals %>%
 kable(
    full_width = F,
    caption = "Summary of the COVID-19 pandemic in the 27 capitals of Brazilian states."
  )
Summary of the COVID-19 pandemic in the 27 capitals of Brazilian states.
region state city newCases newDeaths accumCases accumDeaths incidence mortality lethality
Southeast RJ Rio de Janeiro 634 24 489805 34647 7289.955 515.6645 7.07
Northeast MA São Luís 19 0 46968 2574 4262.518 233.5999 5.48
North PA Belém 25 1 106497 5139 7134.306 344.2651 4.83
North AM Manaus 8 0 204520 9477 9369.776 434.1745 4.63
Southeast SP São Paulo 354 25 967101 38578 7893.399 314.8705 3.99
Northeast CE Fortaleza 29 17 257905 9757 9661.744 365.5208 3.78
South PR Curitiba 24 8 218337 7669 11294.627 396.7193 3.51
Northeast PE Recife 127 3 157030 5493 9541.680 333.7735 3.50
Northeast BA Salvador 31 5 236632 8032 8238.280 279.6320 3.39
Midwest GO Goiânia 57 3 201827 6718 13312.134 443.1068 3.33
South RS Porto Alegre -59 4 170888 5660 11517.141 381.4605 3.31
Midwest MT Cuiabá 0 0 112614 3519 18384.548 574.4865 3.12
Northeast AL Maceió 18 2 91357 2733 8965.816 268.2178 2.99
Midwest MS Campo Grande 7 3 138651 4095 15474.753 457.0404 2.95
North RO Porto Velho 36 0 86965 2512 16422.620 474.3704 2.89
North AC Rio Branco 3 0 38165 1088 9369.806 267.1125 2.85
Northeast PB João Pessoa 41 3 106629 2925 13180.102 361.5508 2.74
Northeast RN Natal 80 1 100776 2701 11398.427 305.5008 2.68
North AP Macapá 5 0 60841 1493 12087.768 296.6262 2.45
Northeast PI Teresina 37 2 106807 2565 12349.843 296.5849 2.40
Southeast MG Belo Horizonte 177 9 286234 6804 11394.348 270.8523 2.38
Midwest DF Brasília 616 11 510159 10692 16919.193 354.5953 2.10
Southeast ES Vitória 64 1 65337 1278 18044.060 352.9441 1.96
Northeast SE Aracaju 18 0 127997 2421 19481.654 368.4859 1.89
North RR Boa Vista 5 0 96858 1533 24262.236 384.0055 1.58
South SC Florianópolis 52 0 82456 1075 16459.170 214.5824 1.30
North TO Palmas 77 0 52775 640 17643.008 213.9559 1.21