Getting Data from DESTATIS via R

The R package wiesbaden provides functions to directly retrieve data from databases maintained by the Federal Statistical Office of Germany (DESTATIS) in Wiesbaden.

Access to the following databases is implemented:

To access any of the databases using this package, you need to register on the respective website to get a personal login name and password. The registration is free.

To authenticate, supply a vector with your user name, password, and database shortcut (“regio”, “de”, “nrw”, “bm”) as an argument for the genesis parameter whenever you call a retrieve_* function:

c(user="your-username", password="your-password", db="database-shortname")

Alternatively, you can use save_credentials() to store the credentials on your computer. This function relies on the keyring package. For more details about how credentials are stored by this package, see the keyring package documentation.

Use the function test_login() to check if your login/password combination allows you to access the respective database (and if the server is functioning properly).

library(wiesbaden)

# Assuming credentials are stored via save_credentials()
test_login(genesis=c(db='regio'))
#> [1] "Sie wurden erfolgreich an- und abgemeldet."

# ... or supply password/username 
test_login(genesis=c(db='regio', user="your-username", password="your-password"))
#> [1] "Sie wurden erfolgreich an- und abgemeldet."

The available data are organized by themes (“Themen”) and subthemes. To get a list of all available themes go to the respective database website (links above) and click on “Themen”. Each theme typically comes with multiple subthemes.

Suppose we want to download the federal election results on the county level from regionalstatistik.de. This data is available in the theme “Wahlen” which has the code 14. The federal election results are available in subtheme 141.

Using retrieve_datalist(), download a data.frame of all available data cubes in theme 141:

d <- retrieve_datalist(tableseries="141*", genesis=c(db='regio')) 

Note, we are assuming that credentials are stored via save_credentials().

Use grepl (or str_detect() from the stringr package) to filter cubes with a description that contains the word “Kreise” (county):

subset(d, grepl("Kreise", description)) 
#>    tablename
#> 1 14111KJ001
#> 2 14111KJ002
#>                                                                                      description
#> 1 Wahlberechtigte, Wahlbeteiligung, Gültige Zweitstimmen, Kreise und kreisfreie Städte, Stichtag
#> 2                         Gültige Zweitstimmen, Kreise und kreisfreie Städte, Parteien, Stichtag

Having identified the correct data cube, call retrieve_data() to download the data:

data <- retrieve_data(tablename="14111KJ002", genesis=c(db='regio')) 
head(data)
#>   id14111 KREISE     PART04       STAG WAHL09_val WAHL09_qual WAHL09_lock
#> 1       D  01001        AFD 22.09.2013       1855           e          NA
#> 2       D  01001        AFD 24.09.2017       3702           e          NA
#> 3       D  01001 B90-GRUENE 16.10.1994       4651           e          NA
#> 4       D  01001 B90-GRUENE 27.09.1998       3815           e          NA
#> 5       D  01001 B90-GRUENE 22.09.2002       5556           e          NA
#> 6       D  01001 B90-GRUENE 18.09.2005       5028           e          NA
#>   WAHL09_err
#> 1          0
#> 2          0
#> 3          0
#> 4          0
#> 5          0
#> 6          0

The data are organized in long format: For each combination of KREIS (county), PART04 (political party) and STAG (election date) there is a vote count (WAHL09_value). Please see help file for the information on the additional variables (*_qual, *_lock, *_err).

To get the metadata for each variable, call retrieve_metadata():

retrieve_metadata(tablename="14111KJ002", genesis=c(db='regio'))
#>     name                  description   unit
#> 1 WAHL09         Gültige Zweitstimmen Anzahl
#> 2   STAG                     Stichtag       
#> 3 PART04                     Parteien       
#> 4 KREISE Kreise und kreisfreie Städte

To get the value labels for the variable PART04, call retrieve_valuelabel():

retrieve_valuelabel("PART04", genesis=c(db='regio'))
#>       PART04       description
#> 1        AFD               AfD
#> 2 B90-GRUENE             GRÜNE
#> 3        CDU           CDU/CSU
#> 4   DIELINKE         DIE LINKE
#> 5        FDP               FDP
#> 6   SONSTIGE Sonstige Parteien
#> 7        SPD               SPD

This function also works with the other variables (e.g., KREIS).