2. Parallelize Computation of Indices

Note: This vignette presents some performance tests ran between non-parallel and parallel versions of fundiversity functions. Note that to avoid the dependency on other packages, this vignette is pre-computed.

Within fundiversity the computation of most indices can be parallelized using the future package. The indices that currently support parallelization are: FRic, FDis, FDiv, and FEve. The goal of this vignette is to explain how to toggle and use parallelization in fundiversity.

The future package provides a simple and general framework to allow asynchronous computation depending on the resources available for the user. The first vignette of future gives a general overview of all its features. The main idea being that the user should write the code once and that it would run seamlessly sequentially, or in parallel on a single computer, or on a cluster, or distributed over several computers. fundiversity can thus run on all these different backends following the user’s choice.

library("fundiversity")

data("traits_birds", package = "fundiversity")
data("site_sp_birds", package = "fundiversity")

Running code in parallel

By default the fundiversity code will run sequentially on a single core. To trigger parallelization the user needs to define a future::plan() object with a parallel backend such as future::multisession to split the execution across multiple R sessions.

# Sequential execution
fric1 <- fd_fric(traits_birds)

# Parallel execution
future::plan(future::multisession)  # Plan definition
fric2 <- fd_fric(traits_birds)  # The code resolve in similar fashion

identical(fric1, fric2)
#> [1] TRUE

Within the future::multisession backend you can specify the number of cores on which the function should be parallelized over through the argument workers, you can change it in the future::plan() call:

future::plan(future::multisession, workers = 2)  # Only 2 cores are used
fric3 <- fd_fric(traits_birds)

identical(fric3, fric2)
#> [1] TRUE

To learn more about the different backends available and the related arguments needed, please refer to the documentation of future::plan() and the overview vignette of future.

Performance comparison

We can now compare the difference in performance to see the performance gain thanks to parallelization:

future::plan(future::sequential)
non_parallel_bench <- microbenchmark::microbenchmark(
  non_parallel = {
    fd_fric(traits_birds)
  },
  times = 20
)

future::plan(future::multisession)
parallel_bench <- microbenchmark::microbenchmark(
  parallel = {
    fd_fric(traits_birds)
  },
  times = 20
)

rbind(non_parallel_bench, parallel_bench)
#> Unit: milliseconds
#>          expr       min         lq       mean     median         uq       max neval cld
#>  non_parallel  8.756378   8.892243   9.841818   9.072241   9.218554   23.9519    20  a 
#>      parallel 56.374332 167.680385 218.073077 172.888927 185.670312 1247.8534    20   b

The non parallelized code runs faster than the parallelized one! Indeed, the parallelization in fundiversity parallelize the computation across different sites. So parallelization should be used when you have many sites on which you want to compute similar indices.

# Function to make a bigger site-sp dataset
make_more_sites <- function(n) {
  site_sp <- do.call(rbind, replicate(n, site_sp_birds, simplify = FALSE))
  rownames(site_sp) <- paste0("s", seq_len(nrow(site_sp)))

  site_sp
}

For example with a dataset 5000 times bigger:

bigger_site <- make_more_sites(5000)

microbenchmark::microbenchmark(
  seq = { 
    future::plan(future::sequential) 
    fd_fric(traits_birds, bigger_site) 
  },
  multisession = { 
    future::plan(future::multisession, workers = 4)
    fd_fric(traits_birds, bigger_site) 
  },
  multicore = { 
    future::plan(future::multicore, workers = 4) 
    fd_fric(traits_birds, bigger_site) 
  }, times = 20
)
#> Warning in supportsMulticoreAndRStudio(...): [ONE-TIME WARNING] Forked processing ('multicore') is not supported when running R from RStudio
#> because it is considered unstable. For more details, how to control forked processing or not, and how to silence this warning in future R
#> sessions, see ?parallelly::supportsMulticore
#> Unit: seconds
#>          expr      min       lq     mean   median       uq      max neval cld
#>           seq 15.58688 15.67587 15.97552 15.97047 16.24568 16.54392    20  a 
#>  multisession 21.17851 21.75313 22.02965 21.88691 22.26971 23.50062    20   b
#>     multicore 15.53872 15.75567 16.06103 16.01595 16.35790 16.98102    20  a
Session info of the machine on which the benchmark was ran and time it took to run
#>  seconds needed to generate this document: 1095.27 sec elapsed
#> ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       Ubuntu 20.04.5 LTS
#>  system   x86_64, linux-gnu
#>  ui       RStudio
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Etc/UTC
#>  date     2022-11-15
#>  rstudio  2022.02.0+443 Prairie Trillium (server)
#>  pandoc   2.17.1.1 @ /usr/lib/rstudio-server/bin/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  !  package        * version    date (UTC) lib source
#>  P  abind            1.4-5      2016-07-21 [3] CRAN (R 4.2.0)
#>     assertthat       0.2.1      2019-03-21 [3] CRAN (R 4.1.3)
#>     cachem           1.0.6      2021-08-19 [3] CRAN (R 4.1.3)
#>  VP cli              3.4.0      2022-09-23 [?] CRAN (R 4.2.1) (on disk 3.4.1)
#>     codetools        0.2-18     2020-11-04 [5] CRAN (R 4.0.3)
#>     colorspace       2.0-3      2022-02-21 [1] CRAN (R 4.2.0)
#>     crayon           1.5.1      2022-03-26 [1] CRAN (R 4.2.0)
#>     DBI              1.1.2      2021-12-20 [3] CRAN (R 4.1.3)
#>  P  digest           0.6.29     2021-12-01 [3] CRAN (R 4.2.0)
#>     dplyr          * 1.0.10     2022-09-01 [1] CRAN (R 4.2.1)
#>     ellipsis         0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>     evaluate         0.18       2022-11-07 [1] CRAN (R 4.2.1)
#>     fansi            1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  P  fastmap          1.1.0      2021-01-25 [3] CRAN (R 4.2.1)
#>     fundiversity   * 0.2.1.9000 2022-04-12 [3] Github (bisaloo/fundiversity@87652ba)
#>  VP future           1.26.1     2022-09-02 [3] CRAN (R 4.2.1) (on disk 1.28.0)
#>  VP future.apply     1.9.0      2022-11-05 [3] CRAN (R 4.2.1) (on disk 1.10.0)
#>     generics         0.1.2      2022-01-31 [1] CRAN (R 4.2.0)
#>  P  geometry         0.4.6      2022-04-18 [3] CRAN (R 4.2.0)
#>     ggplot2        * 3.3.6      2022-05-03 [1] CRAN (R 4.2.0)
#>  VP globals          0.15.0     2022-08-28 [3] CRAN (R 4.2.1) (on disk 0.16.1)
#>     glue             1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>     gtable           0.3.0      2019-03-25 [1] CRAN (R 4.2.0)
#>     htmltools        0.5.3      2022-07-18 [1] CRAN (R 4.2.1)
#>     knitr            1.40       2022-08-24 [1] CRAN (R 4.2.1)
#>     lattice          0.20-45    2021-09-22 [3] CRAN (R 4.1.3)
#>     lifecycle        1.0.3      2022-10-07 [1] CRAN (R 4.2.1)
#>  P  listenv          0.8.0      2019-12-05 [3] CRAN (R 4.2.1)
#>  P  magic            1.6-0      2022-02-09 [3] CRAN (R 4.2.0)
#>     magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>     MASS             7.3-58.1   2022-08-03 [3] CRAN (R 4.2.1)
#>     Matrix           1.4-1      2022-03-23 [3] CRAN (R 4.1.3)
#>     memoise          2.0.1      2021-11-26 [3] CRAN (R 4.1.3)
#>     microbenchmark   1.4.9      2021-11-09 [3] CRAN (R 4.1.3)
#>     multcomp         1.4-19     2022-04-26 [1] CRAN (R 4.2.0)
#>     munsell          0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>     mvtnorm          1.1-3      2021-10-08 [1] CRAN (R 4.2.0)
#>  VP parallelly       1.31.1     2022-07-21 [3] CRAN (R 4.2.1) (on disk 1.32.1)
#>     pillar           1.7.0      2022-02-01 [1] CRAN (R 4.2.0)
#>     pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>     R6               2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  P  Rcpp             1.0.8.3    2022-03-17 [3] CRAN (R 4.2.0)
#>     rlang            1.0.6      2022-09-24 [1] CRAN (R 4.2.1)
#>     rmarkdown        2.13       2022-03-10 [3] CRAN (R 4.1.3)
#>     rstudioapi       0.14       2022-08-22 [1] CRAN (R 4.2.1)
#>     sandwich         3.0-2      2022-06-15 [1] CRAN (R 4.2.0)
#>     scales           1.2.0      2022-04-13 [1] CRAN (R 4.2.0)
#>     sessioninfo      1.2.2      2021-12-06 [3] CRAN (R 4.1.3)
#>     stringi          1.7.6      2021-11-29 [1] CRAN (R 4.2.0)
#>     stringr          1.4.0      2019-02-10 [1] CRAN (R 4.2.0)
#>     survival         3.3-1      2022-03-03 [3] CRAN (R 4.1.3)
#>     TH.data          1.1-1      2022-04-26 [1] CRAN (R 4.2.0)
#>     tibble           3.1.7      2022-05-03 [1] CRAN (R 4.2.0)
#>     tictoc           1.0.1      2021-04-19 [3] CRAN (R 4.1.3)
#>     tidyselect       1.2.0      2022-10-10 [1] CRAN (R 4.2.1)
#>     utf8             1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>     vctrs            0.5.0      2022-10-22 [1] CRAN (R 4.2.1)
#>     withr            2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>     xfun             0.34       2022-10-18 [1] CRAN (R 4.2.1)
#>     yaml             2.3.6      2022-10-18 [1] CRAN (R 4.2.1)
#>     zoo              1.8-10     2022-04-15 [1] CRAN (R 4.2.0)
#> 
#>  [1] /home/ke76dimu/R-library/4.2
#>  [2] /usr/local/lib/R/site-library
#>  [3] /data/library/4.1
#>  [4] /usr/lib/R/site-library
#>  [5] /usr/lib/R/library
#> 
#>  V ── Loaded and on-disk version mismatch.
#>  P ── Loaded and on-disk path mismatch.
#> 
#> ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────