Introduction to crandep

2021-06-10

This vignette provides an introduction to the functions facilitating the analysis of the dependencies of CRAN packages, specifically get_dep(), df_to_graph() and topo_sort_kahn().

library(crandep)
library(dplyr)
library(igraph)

One or multiple types of dependencies

To obtain the information about various kinds of dependencies of a package, we can use the function get_dep() which takes the package name and the type of dependencies as the first and second arguments, respectively. Currently, the second argument accepts a character vector of one or more of the following words: Depends, Imports, LinkingTo, Suggests, Enhances, Reverse_depends, Reverse_imports, Reverse_linking_to, Reverse_suggests, and Reverse_enhances, or any variations in their letter cases, or if the underscore "_" is replaced by a space.

get_dep("dplyr", "Imports")
#>     from         to    type reverse
#> 1  dplyr   ellipsis imports   FALSE
#> 2  dplyr   generics imports   FALSE
#> 3  dplyr       glue imports   FALSE
#> 4  dplyr  lifecycle imports   FALSE
#> 5  dplyr   magrittr imports   FALSE
#> 6  dplyr    methods imports   FALSE
#> 7  dplyr         R6 imports   FALSE
#> 8  dplyr      rlang imports   FALSE
#> 9  dplyr     tibble imports   FALSE
#> 10 dplyr tidyselect imports   FALSE
#> 11 dplyr      utils imports   FALSE
#> 12 dplyr      vctrs imports   FALSE
#> 13 dplyr     pillar imports   FALSE
get_dep("MASS", c("depends", "suggests"))
#>   from        to     type reverse
#> 1 MASS grDevices  depends   FALSE
#> 2 MASS  graphics  depends   FALSE
#> 3 MASS     stats  depends   FALSE
#> 4 MASS     utils  depends   FALSE
#> 5 MASS   lattice suggests   FALSE
#> 6 MASS      nlme suggests   FALSE
#> 7 MASS      nnet suggests   FALSE
#> 8 MASS  survival suggests   FALSE

For more information on different types of dependencies, see the official guidelines and https://r-pkgs.org/description.html.

In the output, the column type is the type of the dependency converted to lower case. Also, LinkingTo is now converted to linking to for consistency.

get_dep("xts", "LinkingTo")
#>   from  to       type reverse
#> 1  xts zoo linking to   FALSE
get_dep("xts", "linking to")
#>   from  to       type reverse
#> 1  xts zoo linking to   FALSE

For the reverse dependencies, the substring "reverse_" will not be shown in type; instead the reverse column will be TRUE. This can be illustrated by the following:

get_dep("abc", c("depends", "reverse_depends"))
#>   from       to    type reverse
#> 1  abc abc.data depends   FALSE
#> 2  abc     nnet depends   FALSE
#> 3  abc quantreg depends   FALSE
#> 4  abc     MASS depends   FALSE
#> 5  abc   locfit depends   FALSE
#> 6  abc abctools depends    TRUE
#> 7  abc  EasyABC depends    TRUE
get_dep("xts", c("linking to", "reverse linking to"))
#>   from      to       type reverse
#> 1  xts     zoo linking to   FALSE
#> 2  xts RcppXts linking to    TRUE
#> 3  xts     TTR linking to    TRUE

Theoretically, for each forward dependency

#>   from to type reverse
#> 1    A  B    c   FALSE

there should be an equivalent reverse dependency

#>   from to type reverse
#> 1    B  A    c    TRUE

Aligning the type in the forward and reverse dependencies enables this to be checked easily.

To obtain all types of dependencies, we can use "all" in the second argument, instead of typing a character vector of all 10 words:

df0.abc <- get_dep("abc", "all")
df0.abc
#>    from         to     type reverse
#> 1   abc   abc.data  depends   FALSE
#> 2   abc       nnet  depends   FALSE
#> 3   abc   quantreg  depends   FALSE
#> 4   abc       MASS  depends   FALSE
#> 5   abc     locfit  depends   FALSE
#> 10  abc   abctools  depends    TRUE
#> 11  abc    EasyABC  depends    TRUE
#> 12  abc ecolottery  imports    TRUE
#> 13  abc       ouxy  imports    TRUE
#> 14  abc      poems  imports    TRUE
#> 16  abc      coala suggests    TRUE
df0.rstan <- get_dep("rstan", "all") # too many rows to display
dplyr::count(df0.rstan, type, reverse) # hence the summary using count()
#>         type reverse  n
#> 1    depends   FALSE  2
#> 2    depends    TRUE 24
#> 3   enhances    TRUE  1
#> 4    imports   FALSE 10
#> 5    imports    TRUE 86
#> 6 linking to   FALSE  5
#> 7 linking to    TRUE 72
#> 8   suggests   FALSE 12
#> 9   suggests    TRUE 20

As of 2021-06-10, there are 0 packages that have all 10 types of dependencies, and 6 packages that have 9 types of dependencies: bigmemory, miceadds, quanteda, rstan, sf, xts.

Building and visualising a dependency network

To build a dependency network, we have to obtain the dependencies for multiple packages. For illustration, we choose the core packages of the tidyverse, and find out what each package Imports. We put all the dependencies into one data frame, in which the package in the from column imports the package in the to column. This is essentially the edge list of the dependency network.

df0.imports <- rbind(
    get_dep("ggplot2", "Imports"),
    get_dep("dplyr", "Imports"),
    get_dep("tidyr", "Imports"),
    get_dep("readr", "Imports"),
    get_dep("purrr", "Imports"),
    get_dep("tibble", "Imports"),
    get_dep("stringr", "Imports"),
    get_dep("forcats", "Imports")
)
head(df0.imports)
#>      from        to    type reverse
#> 1 ggplot2    digest imports   FALSE
#> 2 ggplot2      glue imports   FALSE
#> 3 ggplot2 grDevices imports   FALSE
#> 4 ggplot2      grid imports   FALSE
#> 5 ggplot2    gtable imports   FALSE
#> 6 ggplot2   isoband imports   FALSE
tail(df0.imports)
#>       from       to    type reverse
#> 61 stringr magrittr imports   FALSE
#> 62 stringr  stringi imports   FALSE
#> 63 forcats ellipsis imports   FALSE
#> 64 forcats magrittr imports   FALSE
#> 65 forcats    rlang imports   FALSE
#> 66 forcats   tibble imports   FALSE

With the help of the ‘igraph’ package, we can use this data frame to build a graph object that represents the dependency network.

g0.imports <- igraph::graph_from_data_frame(df0.imports)
set.seed(1457L)
old.par <- par(mar = rep(0.0, 4))
plot(g0.imports, vertex.label.cex = 1.5)
par(old.par)

The nature of a dependency network makes it a directed acyclic graph (DAG). We can use the ‘igraph’ function is_dag() to check.

igraph::is_dag(g0.imports)
#> [1] TRUE

Note that this applies to Imports (and Depends) only due to their nature. This acyclic nature does not apply to a network of, for example, Suggests.

Boundary and giant component

It is possible to set a boundary on the nodes to which the edges are directed, using the function df_to_graph(). The second argument takes in a data frame that contains the list of such nodes in the column name.

df0.nodes <- data.frame(name = c("ggplot2", "dplyr", "tidyr", "readr", "purrr", "tibble", "stringr", "forcats"), stringsAsFactors = FALSE)
g0.core <- df_to_graph(df0.imports, df0.nodes)
set.seed(259L)
old.par <- par(mar = rep(0.0, 4))
plot(g0.core, vertex.label.cex = 1.5)
par(old.par)

Topological ordering of nodes

Since networks according to Imports or Depends are DAGs, we can obtain the topological ordering using, for example, Kahn’s (1962) sorting algorithm.

topo_sort_kahn(g0.core)
#>        id id_num
#> 1 forcats      1
#> 2 ggplot2      2
#> 3   readr      3
#> 4   tidyr      4
#> 5   dplyr      5
#> 6   purrr      6
#> 7  tibble      7

In the topological ordering, represented by the column id_num, a low (high) number represents being at the front (back) of the ordering. If package A Imports package B i.e. there is a directed edge from A to B, then A will be topologically before B. As the package ‘tibble’ doesn’t import any package but is imported by most other packages, it naturally goes to the back of the ordering. This ordering may not be unique for a DAG, and other admissible orderings can be obtained by setting random=TRUE in the function:

set.seed(387L); topo_sort_kahn(g0.core, random = TRUE)
#>        id id_num
#> 1 ggplot2      1
#> 2   readr      2
#> 3 forcats      3
#> 4   tidyr      4
#> 5   purrr      5
#> 6   dplyr      6
#> 7  tibble      7

We can also apply the topological sorting to the bigger dependencies network.

df0.topo <- topo_sort_kahn(g0.imports)
head(df0.topo)
#>        id id_num
#> 1 forcats      1
#> 2 ggplot2      2
#> 3   readr      3
#> 4 stringr      4
#> 5   tidyr      5
#> 6  digest      6
tail(df0.topo)
#>           id id_num
#> 32   methods     32
#> 33    pillar     33
#> 34 pkgconfig     34
#> 35     rlang     35
#> 36     utils     36
#> 37     vctrs     37

Going forward

In this other vignette, we show how to obtain the dependency network of all CRAN packages using other functions in the package. The number of reverse dependencies can then be modelled.