Open access copies of scholarly publications are sometimes hard to find. Some are published in open access journals. Others are made freely available as preprints before publication, and others are deposited in institutional repositories, digital archives maintained by universities and research institutions. This document guides you to roadoi, a R client that makes it easy to search for these open access copies by interfacing the oaDOI.org service where DOIs are matched with full-text links in open access journals and archives.

About oaDOI.org

oaDOI.org, developed and maintained by the team of Impactstory, is a non-profit service that finds open access copies of scholarly literature simply by looking up a DOI (Digital Object Identifier). It not only returns open access full-text links, but also helpful metadata about the open access status of a publication such as licensing or provenance information.

oaDOI uses different data sources to find open access full-texts including:

Basic usage

There is one major function to talk with oaDOI.org, oadoi_fetch().

library(roadoi)
roadoi::oadoi_fetch(dois = c("10.1186/s12864-016-2566-9", "10.1016/j.cognition.2014.07.007"))
## # A tibble: 2 × 16
##                                       `_best_open_url` `_closed_base_ids`
##                                                  <chr>             <list>
## 1             http://doi.org/10.1186/s12864-016-2566-9         <list [0]>
## 2 http://hdl.handle.net/11858/00-001M-0000-0024-2A9E-8          <chr [1]>
## # ... with 14 more variables: `_closed_urls` <list>,
## #   `_open_base_ids` <list>, `_open_urls` <list>, doi <chr>,
## #   doi_resolver <chr>, evidence <chr>, free_fulltext_url <chr>,
## #   is_boai_license <lgl>, is_free_to_read <lgl>,
## #   is_subscription_journal <lgl>, license <chr>, oa_color <chr>,
## #   url <chr>, year <int>

According to the oaDOI.org API specification, the following variables with the following definitions are returned:

Providing your email address when using this client is highly appreciated by oaDOI.org. It not only helps the maintainer of oaDOI.org, the non-profit Impactstory, to inform you when something breaks, but also to demonstrate API usage to its funders. Simply use the email parameter for this purpose:

roadoi::oadoi_fetch("10.1186/s12864-016-2566-9", email = "name@example.com")
## # A tibble: 1 × 16
##                           `_best_open_url` `_closed_base_ids`
##                                      <chr>             <list>
## 1 http://doi.org/10.1186/s12864-016-2566-9         <list [0]>
## # ... with 14 more variables: `_closed_urls` <list>,
## #   `_open_base_ids` <list>, `_open_urls` <list>, doi <chr>,
## #   doi_resolver <chr>, evidence <chr>, free_fulltext_url <chr>,
## #   is_boai_license <lgl>, is_free_to_read <lgl>,
## #   is_subscription_journal <lgl>, license <chr>, oa_color <chr>,
## #   url <chr>, year <int>

To follow your API call, and to estimate the time until completion, use the .progress parameter inherited from plyr to display a progress bar.

roadoi::oadoi_fetch(dois = c("10.1186/s12864-016-2566-9", "10.1016/j.cognition.2014.07.007"), .progress = "text")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================================| 100%
## # A tibble: 2 × 16
##                                       `_best_open_url` `_closed_base_ids`
##                                                  <chr>             <list>
## 1             http://doi.org/10.1186/s12864-016-2566-9         <list [0]>
## 2 http://hdl.handle.net/11858/00-001M-0000-0024-2A9E-8          <chr [1]>
## # ... with 14 more variables: `_closed_urls` <list>,
## #   `_open_base_ids` <list>, `_open_urls` <list>, doi <chr>,
## #   doi_resolver <chr>, evidence <chr>, free_fulltext_url <chr>,
## #   is_boai_license <lgl>, is_free_to_read <lgl>,
## #   is_subscription_journal <lgl>, license <chr>, oa_color <chr>,
## #   url <chr>, year <int>

Use Case: Studying the compliance with open access policies

An increasing number of universities, research organisations and funders have launched open access policies in recent years. Using roadoi together with other R-packages makes it easy to examine how and to what extent researchers comply with these policies in a reproducible and transparent manner. In particular, the rcrossref package, maintained by rOpenSci, provides many helpful functions for this task.

Gathering DOIs representing scholarly publications

DOIs have become essential for referencing scholarly publications, and thus many digital libraries and institutional databases keep track of these persistent identifiers. For the sake of this vignette, instead of starting with a pre-defined set of publications originating from these sources, we simply generate a random sample of 100 DOIs registered with Crossref by using the rcrossref package.

library(dplyr)
library(rcrossref)
# get a random sample of DOIs and metadata describing these works
random_dois <- rcrossref::cr_r(sample = 100) %>%
  rcrossref::cr_works() %>%
  .$data
random_dois
## # A tibble: 100 × 35
##       alternative.id
##                <chr>
## 1  S0197458005002277
## 2  S1618866716000145
## 3                   
## 4                   
## 5                   
## 6  S0376738806004145
## 7                   
## 8                   
## 9                   
## 10                  
## # ... with 90 more rows, and 34 more variables: container.title <chr>,
## #   created <chr>, deposited <chr>, DOI <chr>, funder <list>,
## #   indexed <chr>, ISBN <chr>, ISSN <chr>, issue <chr>, issued <chr>,
## #   license_date <chr>, license_URL <chr>, license_delay.in.days <chr>,
## #   license_content.version <chr>, link <list>, member <chr>, page <chr>,
## #   prefix <chr>, publisher <chr>, reference.count <chr>, score <chr>,
## #   source <chr>, subject <chr>, title <chr>, type <chr>, URL <chr>,
## #   volume <chr>, assertion <list>, author <list>,
## #   `clinical-trial-number` <list>, update.policy <chr>, subtitle <chr>,
## #   archive <chr>, abstract <chr>

Let’s see when these random publications were published

random_dois %>%
  # convert to years
  mutate(issued, issued = lubridate::parse_date_time(issued, c('y', 'ymd', 'ym'))) %>%
  mutate(issued, issued = lubridate::year(issued)) %>%
  group_by(issued) %>%
  summarize(pubs = n()) %>%
  arrange(desc(pubs))
## # A tibble: 39 × 2
##    issued  pubs
##     <dbl> <int>
## 1      NA    15
## 2    2015     7
## 3    2006     6
## 4    2008     5
## 5    2014     5
## 6    2004     4
## 7    2007     4
## 8    2013     4
## 9    2016     4
## 10   1982     3
## # ... with 29 more rows

and of what type they are

random_dois %>%
  group_by(type) %>%
  summarize(pubs = n()) %>%
  arrange(desc(pubs))
## # A tibble: 6 × 2
##                  type  pubs
##                 <chr> <int>
## 1     journal-article    71
## 2        book-chapter    15
## 3           component     6
## 4 proceedings-article     5
## 5             dataset     2
## 6            standard     1

Calling oaDOI.org

Now let’s call oaDOI.org

oa_df <- roadoi::oadoi_fetch(dois = random_dois$DOI)

and merge the resulting information about open access full-text links with our Crossref metadata-set

my_df <- dplyr::left_join(oa_df, random_dois, by = c("doi" = "DOI"))
my_df
## # A tibble: 100 × 50
##                                    `_best_open_url` `_closed_base_ids`
##                                               <chr>             <list>
## 1                                              <NA>         <list [0]>
## 2                                              <NA>         <list [0]>
## 3                                              <NA>         <list [0]>
## 4                                              <NA>         <list [0]>
## 5                                              <NA>         <list [0]>
## 6                                              <NA>         <list [0]>
## 7                                              <NA>         <list [0]>
## 8                                              <NA>         <list [0]>
## 9  http://doi.org/10.1371/journal.pone.0173290.t002         <list [0]>
## 10                                             <NA>         <list [0]>
## # ... with 90 more rows, and 48 more variables: `_closed_urls` <list>,
## #   `_open_base_ids` <list>, `_open_urls` <list>, doi <chr>,
## #   doi_resolver <chr>, evidence <chr>, free_fulltext_url <chr>,
## #   is_boai_license <lgl>, is_free_to_read <lgl>,
## #   is_subscription_journal <lgl>, license <chr>, oa_color <chr>,
## #   url <chr>, year <int>, alternative.id <chr>, container.title <chr>,
## #   created <chr>, deposited <chr>, funder <list>, indexed <chr>,
## #   ISBN <chr>, ISSN <chr>, issue <chr>, issued <chr>, license_date <chr>,
## #   license_URL <chr>, license_delay.in.days <chr>,
## #   license_content.version <chr>, link <list>, member <chr>, page <chr>,
## #   prefix <chr>, publisher <chr>, reference.count <chr>, score <chr>,
## #   source <chr>, subject <chr>, title <chr>, type <chr>, URL <chr>,
## #   volume <chr>, assertion <list>, author <list>,
## #   `clinical-trial-number` <list>, update.policy <chr>, subtitle <chr>,
## #   archive <chr>, abstract <chr>

Reporting

After gathering the data, reporting with R is very straightforward. You can even generate dynamic reports using R Markdown and related packages, thus making your study reproducible and transparent for others.

To display how many full-text links were found and which sources were used in a nicely formatted markdown-table using the knitr-package:

my_df %>%
  group_by(evidence) %>%
  summarise(Articles = n()) %>%
  mutate(Proportion = Articles / sum(Articles)) %>%
  arrange(desc(Articles)) %>%
  knitr::kable()
evidence Articles Proportion
closed 81 0.81
oa journal (via journal title in doaj) 6 0.06
oa journal (via publisher name) 6 0.06
oa repository (via BASE title and first author match) 3 0.03
oa repository (via BASE doi match) 2 0.02
hybrid journal (via crossref license) 1 0.01
oa repository (via BASE title match) 1 0.01

How many of them are provided as green or gold open access?

my_df %>%
  group_by(oa_color) %>%
  summarise(Articles = n()) %>%
  mutate(Proportion = Articles / sum(Articles)) %>%
  arrange(desc(Articles)) %>%
  knitr::kable()
oa_color Articles Proportion
NA 81 0.81
gold 13 0.13
green 6 0.06

Let’s take a closer look and assess how green and gold is distributed over publication types?

my_df %>%
  filter(!evidence == "closed") %>% 
  count(oa_color, type, sort = TRUE) %>% 
  knitr::kable()
oa_color type n
gold journal-article 7
gold component 6
green journal-article 6