Introduction to rzeit2

Jan Dix

2018-07-11

This is an introductory vignette to the rzeit package. The package connects to the Content API and gives access to articles and corresponding metadata from the DIE ZEIT archive dating back to 1946 and from ZEIT Online. DIE ZEIT is the most widely read German weekly newspaper. The API can be used for free for non-commercial use cases. For the terms of use see http://developer.zeit.de/licence (German only at the moment).

In short the package allows you to:

The package is made publicly available at GitHub. In this vignette I demonstrate basic features of the package. In a second vignette, I will dig deeper into the matter and show how the package can be used to download articles and calculate the sentiment for the respective articles.

Setup

Currently, the package is only available on GitHub. Using the devtools package, you can easily install it:

devtools::install_github("jandix/rzeit2")
library(rzeit2)

To be able to work with the API, we have to fetch an API key first. There is no sophisticated authentication process involved here–just go to the developer page and sign up by providing your name and a valid email address.

With set_api_key(), I provide a comfortable function that stores the key in the R environment. You only have to do this once; the next time R is launched this key is automatically available and fetched by the package’s functions:

set_api_key(api_key = "set_your_api_key_here",
            path = tempdir())

Next, we can start tapping the API. get_content represents the core function of the package. Again, because the api key is stored in the environment, we do not have to pass the key explicitly (but still could do so using the api argument). As an example, we collect articles that include “Merkel” in the article body, headline or byline:

articles_merkel <- get_content("Merkel",
                               limit = 100,
                               begin_date = "20180101",
                               end_date = "20180520")

Note that for the ease of exposition, I limited the number of collected results to 100 here using the limit argument. The maximum limit per call is 1000. Further, I restricted the search to articles that were published in a time period of about one year.

The results object is of class list and provides information about the articles found as well as the number of hits for a given period.

If the number of tweets exceed the limit you should use the get_content_all. The function uses get_content under the hood. Hence, the query works the same way. The timeout argument define how long the function should wait between the api calls.

articles_merkel <- get_content_all("Merkel",
                                   timeout = 1,
                                   begin_date = "20150101",
                                   end_date = "20180520")

Lastly, you can check your api limit by using get_client. The function returns all information about your account and your daily limits.

get_client()