As part of a reproducible workflow, caching of function calls, code chunks, and other elements of a project is a critical component. The objective of a reproducible workflow is is likely that an entire work flow from raw data to publication, decision support, report writing, presentation building etc., could be built and be reproducible anywhere, on any computer, operating system, with any starting conditions, on demand. The reproducible::Cache
function is built to work with any R function.
Cache
users DBI
as a backend, with key functions, dbReadTable
, dbRemoveTable
, dbSendQuery
, dbSendStatement
, dbCreateTable
and dbAppendTable
. These can all be accessed via Cache
, showCache
, clearCache
, and keepCache
. It is optimized for speed of transactions, using fastdigest::fastdigest
on R memory objects and digest::digest
on files. The main function is superficially similar to archivist::cache
, which uses digest::digest
in all cases to determine whether the arguments are identical in subsequent iterations. It also but does many things that make standard caching with digest::digest
don’t work reliably between systems. For these, the function .robustDigest
is introduced to make caching transferable between systems. This is relevant for file paths, environments, parallel clusters, functions (which are contained within an environment), and many others (e.g., see ?.robustDigest
for methods). Cache
also adds important elements like automated tagging and the option to retrieve disk-cached values via stashed objects in memory using memoise::memoise
. This means that running Cache
1, 2, and 3 times on the same function will get progressively faster. This can be extremely useful for web apps built with, say shiny
.
Any function can be cached using: Cache(FUN = functionName, ...)
.
This will be a slight change to a function call, such as: projectRaster(raster, crs = crs(newRaster))
to Cache(projectRaster, raster, crs = crs(newRaster))
.
This is particularly useful for expensive operations.
## Loading required package: sp
library(reproducible)
tmpDir <- file.path(tempdir(), "reproducible_examples", "Cache")
checkPath(tmpDir, create = TRUE)
## [1] "C:/Users/EMCINT~1.L-V/AppData/Local/Temp/RtmpI5BCs4/reproducible_examples/Cache"
ras <- raster(extent(0, 1000, 0, 1000), vals = 1:1e6, res = 1)
crs(ras) <- "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +ellps=WGS84"
newCRS <- "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
# No Cache
system.time(map1 <- projectRaster(ras, crs = newCRS))
## user system elapsed
## 1.86 0.23 2.11
# Try with memoise for this example -- for many simple cases, memoising will not be faster
opts <- options("reproducible.useMemoise" = TRUE)
# With Cache -- a little slower the first time because saving to disk
system.time(map1 <- Cache(projectRaster, ras, crs = newCRS, cacheRepo = tmpDir,
notOlderThan = Sys.time()))
## user system elapsed
## 2.70 0.55 3.41
# vastly faster the second time
system.time(map2 <- Cache(projectRaster, ras, crs = newCRS, cacheRepo = tmpDir))
## ...(Object to retrieve (8e07ebc7bbe758ad.rds) is large: 6.5 Mb)
## loading cached result from previous projectRaster call, adding to memoised copy
## user system elapsed
## 0.28 0.06 0.37
# may be faster the third time because of memoise; but this example is too simple to show
system.time(map3 <- Cache(projectRaster, ras, crs = newCRS, cacheRepo = tmpDir))
## ...(Object to retrieve (8e07ebc7bbe758ad.rds) is large: 6.5 Mb)
## loading memoised result from previous projectRaster call.
## user system elapsed
## 0.26 0.05 0.33
## [1] TRUE
## [1] TRUE
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:raster':
##
## extract
try(clearCache(tmpDir, ask = FALSE), silent = TRUE) # just to make sure it is clear
ranNumsA <- Cache(rnorm, 10, 16, cacheRepo = tmpDir)
# All same
ranNumsB <- Cache(rnorm, 10, 16, cacheRepo = tmpDir) # recovers cached copy
## loading cached result from previous rnorm call.
## loading cached result from previous 'rnorm' pipe sequence call.
## loading cached result from previous rnorm call.
ranNumsA <- Cache(rnorm, 4, cacheRepo = tmpDir, userTags = "objectName:a")
ranNumsB <- Cache(runif, 4, cacheRepo = tmpDir, userTags = "objectName:b")
# access it again, from Cache
Sys.sleep(1)
ranNumsA <- Cache(rnorm, 4, cacheRepo = tmpDir, userTags = "objectName:a")
## loading cached result from previous rnorm call.
## Cache size:
## Total (including Rasters): 504 bytes
## Selected objects (not including Rasters): 504 bytes
# keep only items accessed "recently" (i.e., only objectName:a)
onlyRecentlyAccessed <- showCache(tmpDir, userTags = max(wholeCache[tagKey == "accessed"]$tagValue))
## Cache size:
## Total (including Rasters): 252 bytes
## Selected objects (not including Rasters): 252 bytes
# inverse join with 2 data.tables ... using: a[!b]
# i.e., return all of wholeCache that was not recently accessed
# Note: the two different ways to access -- old way with "artifact" will be deprecated
toRemove <- unique(wholeCache[!onlyRecentlyAccessed, on = "cacheId"], by = "cacheId")$cacheId
clearCache(tmpDir, toRemove, ask = FALSE) # remove ones not recently accessed
## Cache size:
## Total (including Rasters): 252 bytes
## Selected objects (not including Rasters): 252 bytes
## Cache size:
## Total (including Rasters): 252 bytes
## Selected objects (not including Rasters): 252 bytes
## cacheId tagKey tagValue createdDate
## 1: f7bee22047b8d0c1 objectName a 2020-02-19 17:02:23
## 2: f7bee22047b8d0c1 function rnorm 2020-02-19 17:02:23
## 3: f7bee22047b8d0c1 class numeric 2020-02-19 17:02:23
## 4: f7bee22047b8d0c1 object.size 1008 2020-02-19 17:02:23
## 5: f7bee22047b8d0c1 accessed 2020-02-19 17:02:23 2020-02-19 17:02:23
## 6: f7bee22047b8d0c1 inCloud FALSE 2020-02-19 17:02:23
## 7: f7bee22047b8d0c1 otherFunctions vweave_rmarkdown 2020-02-19 17:02:23
## 8: f7bee22047b8d0c1 otherFunctions process_file 2020-02-19 17:02:23
## 9: f7bee22047b8d0c1 otherFunctions process_group 2020-02-19 17:02:23
## 10: f7bee22047b8d0c1 otherFunctions process_group.block 2020-02-19 17:02:23
## 11: f7bee22047b8d0c1 otherFunctions call_block 2020-02-19 17:02:23
## 12: f7bee22047b8d0c1 otherFunctions block_exec 2020-02-19 17:02:23
## 13: f7bee22047b8d0c1 otherFunctions in_dir 2020-02-19 17:02:23
## 14: f7bee22047b8d0c1 otherFunctions timing_fn 2020-02-19 17:02:23
## 15: f7bee22047b8d0c1 otherFunctions handle 2020-02-19 17:02:23
## 16: f7bee22047b8d0c1 otherFunctions withVisible 2020-02-19 17:02:23
## 17: f7bee22047b8d0c1 preDigest n:7eef4eae85fd9229 2020-02-19 17:02:23
## 18: f7bee22047b8d0c1 preDigest .FUN:4f604aa46882b368 2020-02-19 17:02:23
## 19: f7bee22047b8d0c1 file.size 175 2020-02-19 17:02:23
## 20: f7bee22047b8d0c1 accessed 2020-02-19 17:02:24 2020-02-19 17:02:24
## loading cached result from previous rnorm call.
ranNumsB <- Cache(runif, 4, cacheRepo = tmpDir, userTags = "objectName:b")
# keep only those cached items from the last 24 hours
oneDay <- 60 * 60 * 24
keepCache(tmpDir, after = Sys.time() - oneDay, ask = FALSE)
## Cache size:
## Total (including Rasters): 504 bytes
## Selected objects (not including Rasters): 504 bytes
## cacheId tagKey tagValue createdDate
## 1: f7bee22047b8d0c1 objectName a 2020-02-19 17:02:23
## 2: f7bee22047b8d0c1 function rnorm 2020-02-19 17:02:23
## 3: f7bee22047b8d0c1 class numeric 2020-02-19 17:02:23
## 4: f7bee22047b8d0c1 object.size 1008 2020-02-19 17:02:23
## 5: f7bee22047b8d0c1 accessed 2020-02-19 17:02:23 2020-02-19 17:02:23
## 6: f7bee22047b8d0c1 inCloud FALSE 2020-02-19 17:02:23
## 7: f7bee22047b8d0c1 otherFunctions vweave_rmarkdown 2020-02-19 17:02:23
## 8: f7bee22047b8d0c1 otherFunctions process_file 2020-02-19 17:02:23
## 9: f7bee22047b8d0c1 otherFunctions process_group 2020-02-19 17:02:23
## 10: f7bee22047b8d0c1 otherFunctions process_group.block 2020-02-19 17:02:23
## 11: f7bee22047b8d0c1 otherFunctions call_block 2020-02-19 17:02:23
## 12: f7bee22047b8d0c1 otherFunctions block_exec 2020-02-19 17:02:23
## 13: f7bee22047b8d0c1 otherFunctions in_dir 2020-02-19 17:02:23
## 14: f7bee22047b8d0c1 otherFunctions timing_fn 2020-02-19 17:02:23
## 15: f7bee22047b8d0c1 otherFunctions handle 2020-02-19 17:02:23
## 16: f7bee22047b8d0c1 otherFunctions withVisible 2020-02-19 17:02:23
## 17: f7bee22047b8d0c1 preDigest n:7eef4eae85fd9229 2020-02-19 17:02:23
## 18: f7bee22047b8d0c1 preDigest .FUN:4f604aa46882b368 2020-02-19 17:02:23
## 19: f7bee22047b8d0c1 file.size 175 2020-02-19 17:02:23
## 20: f7bee22047b8d0c1 accessed 2020-02-19 17:02:24 2020-02-19 17:02:24
## 21: f7bee22047b8d0c1 accessed 2020-02-19 17:02:25 2020-02-19 17:02:25
## 22: 3aef38d1fc02aee5 objectName b 2020-02-19 17:02:25
## 23: 3aef38d1fc02aee5 function runif 2020-02-19 17:02:25
## 24: 3aef38d1fc02aee5 class numeric 2020-02-19 17:02:25
## 25: 3aef38d1fc02aee5 object.size 1008 2020-02-19 17:02:25
## 26: 3aef38d1fc02aee5 accessed 2020-02-19 17:02:25 2020-02-19 17:02:25
## 27: 3aef38d1fc02aee5 inCloud FALSE 2020-02-19 17:02:25
## 28: 3aef38d1fc02aee5 otherFunctions vweave_rmarkdown 2020-02-19 17:02:25
## 29: 3aef38d1fc02aee5 otherFunctions process_file 2020-02-19 17:02:25
## 30: 3aef38d1fc02aee5 otherFunctions process_group 2020-02-19 17:02:25
## 31: 3aef38d1fc02aee5 otherFunctions process_group.block 2020-02-19 17:02:25
## 32: 3aef38d1fc02aee5 otherFunctions call_block 2020-02-19 17:02:25
## 33: 3aef38d1fc02aee5 otherFunctions block_exec 2020-02-19 17:02:25
## 34: 3aef38d1fc02aee5 otherFunctions in_dir 2020-02-19 17:02:25
## 35: 3aef38d1fc02aee5 otherFunctions timing_fn 2020-02-19 17:02:25
## 36: 3aef38d1fc02aee5 otherFunctions handle 2020-02-19 17:02:25
## 37: 3aef38d1fc02aee5 otherFunctions withVisible 2020-02-19 17:02:25
## 38: 3aef38d1fc02aee5 preDigest n:7eef4eae85fd9229 2020-02-19 17:02:25
## 39: 3aef38d1fc02aee5 preDigest .FUN:881ec847b7161f3c 2020-02-19 17:02:25
## 40: 3aef38d1fc02aee5 file.size 171 2020-02-19 17:02:25
## cacheId tagKey tagValue createdDate
# Keep all Cache items created with an rnorm() call
keepCache(tmpDir, userTags = "rnorm", ask = FALSE)
## Cache size:
## Total (including Rasters): 252 bytes
## Selected objects (not including Rasters): 252 bytes
## cacheId tagKey tagValue createdDate
## 1: f7bee22047b8d0c1 objectName a 2020-02-19 17:02:23
## 2: f7bee22047b8d0c1 function rnorm 2020-02-19 17:02:23
## 3: f7bee22047b8d0c1 class numeric 2020-02-19 17:02:23
## 4: f7bee22047b8d0c1 object.size 1008 2020-02-19 17:02:23
## 5: f7bee22047b8d0c1 accessed 2020-02-19 17:02:23 2020-02-19 17:02:23
## 6: f7bee22047b8d0c1 inCloud FALSE 2020-02-19 17:02:23
## 7: f7bee22047b8d0c1 otherFunctions vweave_rmarkdown 2020-02-19 17:02:23
## 8: f7bee22047b8d0c1 otherFunctions process_file 2020-02-19 17:02:23
## 9: f7bee22047b8d0c1 otherFunctions process_group 2020-02-19 17:02:23
## 10: f7bee22047b8d0c1 otherFunctions process_group.block 2020-02-19 17:02:23
## 11: f7bee22047b8d0c1 otherFunctions call_block 2020-02-19 17:02:23
## 12: f7bee22047b8d0c1 otherFunctions block_exec 2020-02-19 17:02:23
## 13: f7bee22047b8d0c1 otherFunctions in_dir 2020-02-19 17:02:23
## 14: f7bee22047b8d0c1 otherFunctions timing_fn 2020-02-19 17:02:23
## 15: f7bee22047b8d0c1 otherFunctions handle 2020-02-19 17:02:23
## 16: f7bee22047b8d0c1 otherFunctions withVisible 2020-02-19 17:02:23
## 17: f7bee22047b8d0c1 preDigest n:7eef4eae85fd9229 2020-02-19 17:02:23
## 18: f7bee22047b8d0c1 preDigest .FUN:4f604aa46882b368 2020-02-19 17:02:23
## 19: f7bee22047b8d0c1 file.size 175 2020-02-19 17:02:23
## 20: f7bee22047b8d0c1 accessed 2020-02-19 17:02:24 2020-02-19 17:02:24
## 21: f7bee22047b8d0c1 accessed 2020-02-19 17:02:25 2020-02-19 17:02:25
## cacheId tagKey tagValue createdDate
# Remove all Cache items that happened within a rnorm() call
clearCache(tmpDir, userTags = "rnorm", ask = FALSE)
## Cache size:
## Total (including Rasters): 252 bytes
## Selected objects (not including Rasters): 252 bytes
## Cache size:
## Total (including Rasters): 0 bytes
## Selected objects (not including Rasters): 0 bytes
## Empty data.table (0 rows and 4 cols): cacheId,tagKey,tagValue,createdDate
# Also, can set a time before caching happens and remove based on this
# --> a useful, simple way to control Cache
ranNumsA <- Cache(rnorm, 4, cacheRepo = tmpDir, userTags = "objectName:a")
startTime <- Sys.time()
Sys.sleep(1)
ranNumsB <- Cache(rnorm, 5, cacheRepo = tmpDir, userTags = "objectName:b")
keepCache(tmpDir, after = startTime, ask = FALSE) # keep only those newer than startTime
## Cache size:
## Total (including Rasters): 256 bytes
## Selected objects (not including Rasters): 256 bytes
## cacheId tagKey tagValue createdDate
## 1: 142b4176fe87d51d objectName b 2020-02-19 17:02:26
## 2: 142b4176fe87d51d function rnorm 2020-02-19 17:02:26
## 3: 142b4176fe87d51d class numeric 2020-02-19 17:02:26
## 4: 142b4176fe87d51d object.size 1024 2020-02-19 17:02:26
## 5: 142b4176fe87d51d accessed 2020-02-19 17:02:26 2020-02-19 17:02:26
## 6: 142b4176fe87d51d inCloud FALSE 2020-02-19 17:02:26
## 7: 142b4176fe87d51d otherFunctions vweave_rmarkdown 2020-02-19 17:02:26
## 8: 142b4176fe87d51d otherFunctions process_file 2020-02-19 17:02:26
## 9: 142b4176fe87d51d otherFunctions process_group 2020-02-19 17:02:26
## 10: 142b4176fe87d51d otherFunctions process_group.block 2020-02-19 17:02:26
## 11: 142b4176fe87d51d otherFunctions call_block 2020-02-19 17:02:26
## 12: 142b4176fe87d51d otherFunctions block_exec 2020-02-19 17:02:26
## 13: 142b4176fe87d51d otherFunctions in_dir 2020-02-19 17:02:26
## 14: 142b4176fe87d51d otherFunctions timing_fn 2020-02-19 17:02:26
## 15: 142b4176fe87d51d otherFunctions handle 2020-02-19 17:02:26
## 16: 142b4176fe87d51d otherFunctions withVisible 2020-02-19 17:02:26
## 17: 142b4176fe87d51d preDigest n:a4f076b3db622faf 2020-02-19 17:02:26
## 18: 142b4176fe87d51d preDigest .FUN:4f604aa46882b368 2020-02-19 17:02:26
## 19: 142b4176fe87d51d file.size 185 2020-02-19 17:02:26
# default userTags is "and" matching; for "or" matching use |
ranNumsA <- Cache(runif, 4, cacheRepo = tmpDir, userTags = "objectName:a")
ranNumsB <- Cache(rnorm, 4, cacheRepo = tmpDir, userTags = "objectName:b")
# show all objects (runif and rnorm in this case)
showCache(tmpDir)
## Cache size:
## Total (including Rasters): 504 bytes
## Selected objects (not including Rasters): 504 bytes
## cacheId tagKey tagValue createdDate
## 1: 3aef38d1fc02aee5 objectName a 2020-02-19 17:02:27
## 2: 3aef38d1fc02aee5 function runif 2020-02-19 17:02:27
## 3: 3aef38d1fc02aee5 class numeric 2020-02-19 17:02:27
## 4: 3aef38d1fc02aee5 object.size 1008 2020-02-19 17:02:27
## 5: 3aef38d1fc02aee5 accessed 2020-02-19 17:02:27 2020-02-19 17:02:27
## 6: 3aef38d1fc02aee5 inCloud FALSE 2020-02-19 17:02:27
## 7: 3aef38d1fc02aee5 otherFunctions vweave_rmarkdown 2020-02-19 17:02:27
## 8: 3aef38d1fc02aee5 otherFunctions process_file 2020-02-19 17:02:27
## 9: 3aef38d1fc02aee5 otherFunctions process_group 2020-02-19 17:02:27
## 10: 3aef38d1fc02aee5 otherFunctions process_group.block 2020-02-19 17:02:27
## 11: 3aef38d1fc02aee5 otherFunctions call_block 2020-02-19 17:02:27
## 12: 3aef38d1fc02aee5 otherFunctions block_exec 2020-02-19 17:02:27
## 13: 3aef38d1fc02aee5 otherFunctions in_dir 2020-02-19 17:02:27
## 14: 3aef38d1fc02aee5 otherFunctions timing_fn 2020-02-19 17:02:27
## 15: 3aef38d1fc02aee5 otherFunctions handle 2020-02-19 17:02:27
## 16: 3aef38d1fc02aee5 otherFunctions withVisible 2020-02-19 17:02:27
## 17: 3aef38d1fc02aee5 preDigest n:7eef4eae85fd9229 2020-02-19 17:02:27
## 18: 3aef38d1fc02aee5 preDigest .FUN:881ec847b7161f3c 2020-02-19 17:02:27
## 19: 3aef38d1fc02aee5 file.size 169 2020-02-19 17:02:27
## 20: f7bee22047b8d0c1 objectName b 2020-02-19 17:02:27
## 21: f7bee22047b8d0c1 function rnorm 2020-02-19 17:02:27
## 22: f7bee22047b8d0c1 class numeric 2020-02-19 17:02:27
## 23: f7bee22047b8d0c1 object.size 1008 2020-02-19 17:02:27
## 24: f7bee22047b8d0c1 accessed 2020-02-19 17:02:27 2020-02-19 17:02:27
## 25: f7bee22047b8d0c1 inCloud FALSE 2020-02-19 17:02:27
## 26: f7bee22047b8d0c1 otherFunctions vweave_rmarkdown 2020-02-19 17:02:27
## 27: f7bee22047b8d0c1 otherFunctions process_file 2020-02-19 17:02:27
## 28: f7bee22047b8d0c1 otherFunctions process_group 2020-02-19 17:02:27
## 29: f7bee22047b8d0c1 otherFunctions process_group.block 2020-02-19 17:02:27
## 30: f7bee22047b8d0c1 otherFunctions call_block 2020-02-19 17:02:27
## 31: f7bee22047b8d0c1 otherFunctions block_exec 2020-02-19 17:02:27
## 32: f7bee22047b8d0c1 otherFunctions in_dir 2020-02-19 17:02:27
## 33: f7bee22047b8d0c1 otherFunctions timing_fn 2020-02-19 17:02:27
## 34: f7bee22047b8d0c1 otherFunctions handle 2020-02-19 17:02:27
## 35: f7bee22047b8d0c1 otherFunctions withVisible 2020-02-19 17:02:27
## 36: f7bee22047b8d0c1 preDigest n:7eef4eae85fd9229 2020-02-19 17:02:27
## 37: f7bee22047b8d0c1 preDigest .FUN:4f604aa46882b368 2020-02-19 17:02:27
## 38: f7bee22047b8d0c1 file.size 175 2020-02-19 17:02:27
## cacheId tagKey tagValue createdDate
# show objects that are both runif and rnorm
# (i.e., none in this case, because objecs are either or, not both)
showCache(tmpDir, userTags = c("runif", "rnorm")) ## empty
## Cache size:
## Total (including Rasters): 0 bytes
## Selected objects (not including Rasters): 0 bytes
## Empty data.table (0 rows and 4 cols): cacheId,tagKey,tagValue,createdDate
# show objects that are either runif or rnorm ("or" search)
showCache(tmpDir, userTags = "runif|rnorm")
## Cache size:
## Total (including Rasters): 504 bytes
## Selected objects (not including Rasters): 504 bytes
## cacheId tagKey tagValue createdDate
## 1: 3aef38d1fc02aee5 objectName a 2020-02-19 17:02:27
## 2: 3aef38d1fc02aee5 function runif 2020-02-19 17:02:27
## 3: 3aef38d1fc02aee5 class numeric 2020-02-19 17:02:27
## 4: 3aef38d1fc02aee5 object.size 1008 2020-02-19 17:02:27
## 5: 3aef38d1fc02aee5 accessed 2020-02-19 17:02:27 2020-02-19 17:02:27
## 6: 3aef38d1fc02aee5 inCloud FALSE 2020-02-19 17:02:27
## 7: 3aef38d1fc02aee5 otherFunctions vweave_rmarkdown 2020-02-19 17:02:27
## 8: 3aef38d1fc02aee5 otherFunctions process_file 2020-02-19 17:02:27
## 9: 3aef38d1fc02aee5 otherFunctions process_group 2020-02-19 17:02:27
## 10: 3aef38d1fc02aee5 otherFunctions process_group.block 2020-02-19 17:02:27
## 11: 3aef38d1fc02aee5 otherFunctions call_block 2020-02-19 17:02:27
## 12: 3aef38d1fc02aee5 otherFunctions block_exec 2020-02-19 17:02:27
## 13: 3aef38d1fc02aee5 otherFunctions in_dir 2020-02-19 17:02:27
## 14: 3aef38d1fc02aee5 otherFunctions timing_fn 2020-02-19 17:02:27
## 15: 3aef38d1fc02aee5 otherFunctions handle 2020-02-19 17:02:27
## 16: 3aef38d1fc02aee5 otherFunctions withVisible 2020-02-19 17:02:27
## 17: 3aef38d1fc02aee5 preDigest n:7eef4eae85fd9229 2020-02-19 17:02:27
## 18: 3aef38d1fc02aee5 preDigest .FUN:881ec847b7161f3c 2020-02-19 17:02:27
## 19: 3aef38d1fc02aee5 file.size 169 2020-02-19 17:02:27
## 20: f7bee22047b8d0c1 objectName b 2020-02-19 17:02:27
## 21: f7bee22047b8d0c1 function rnorm 2020-02-19 17:02:27
## 22: f7bee22047b8d0c1 class numeric 2020-02-19 17:02:27
## 23: f7bee22047b8d0c1 object.size 1008 2020-02-19 17:02:27
## 24: f7bee22047b8d0c1 accessed 2020-02-19 17:02:27 2020-02-19 17:02:27
## 25: f7bee22047b8d0c1 inCloud FALSE 2020-02-19 17:02:27
## 26: f7bee22047b8d0c1 otherFunctions vweave_rmarkdown 2020-02-19 17:02:27
## 27: f7bee22047b8d0c1 otherFunctions process_file 2020-02-19 17:02:27
## 28: f7bee22047b8d0c1 otherFunctions process_group 2020-02-19 17:02:27
## 29: f7bee22047b8d0c1 otherFunctions process_group.block 2020-02-19 17:02:27
## 30: f7bee22047b8d0c1 otherFunctions call_block 2020-02-19 17:02:27
## 31: f7bee22047b8d0c1 otherFunctions block_exec 2020-02-19 17:02:27
## 32: f7bee22047b8d0c1 otherFunctions in_dir 2020-02-19 17:02:27
## 33: f7bee22047b8d0c1 otherFunctions timing_fn 2020-02-19 17:02:27
## 34: f7bee22047b8d0c1 otherFunctions handle 2020-02-19 17:02:27
## 35: f7bee22047b8d0c1 otherFunctions withVisible 2020-02-19 17:02:27
## 36: f7bee22047b8d0c1 preDigest n:7eef4eae85fd9229 2020-02-19 17:02:27
## 37: f7bee22047b8d0c1 preDigest .FUN:4f604aa46882b368 2020-02-19 17:02:27
## 38: f7bee22047b8d0c1 file.size 175 2020-02-19 17:02:27
## cacheId tagKey tagValue createdDate
# keep only objects that are either runif or rnorm ("or" search)
keepCache(tmpDir, userTags = "runif|rnorm", ask = FALSE)
## Cache size:
## Total (including Rasters): 504 bytes
## Selected objects (not including Rasters): 504 bytes
## cacheId tagKey tagValue createdDate
## 1: 3aef38d1fc02aee5 objectName a 2020-02-19 17:02:27
## 2: 3aef38d1fc02aee5 function runif 2020-02-19 17:02:27
## 3: 3aef38d1fc02aee5 class numeric 2020-02-19 17:02:27
## 4: 3aef38d1fc02aee5 object.size 1008 2020-02-19 17:02:27
## 5: 3aef38d1fc02aee5 accessed 2020-02-19 17:02:27 2020-02-19 17:02:27
## 6: 3aef38d1fc02aee5 inCloud FALSE 2020-02-19 17:02:27
## 7: 3aef38d1fc02aee5 otherFunctions vweave_rmarkdown 2020-02-19 17:02:27
## 8: 3aef38d1fc02aee5 otherFunctions process_file 2020-02-19 17:02:27
## 9: 3aef38d1fc02aee5 otherFunctions process_group 2020-02-19 17:02:27
## 10: 3aef38d1fc02aee5 otherFunctions process_group.block 2020-02-19 17:02:27
## 11: 3aef38d1fc02aee5 otherFunctions call_block 2020-02-19 17:02:27
## 12: 3aef38d1fc02aee5 otherFunctions block_exec 2020-02-19 17:02:27
## 13: 3aef38d1fc02aee5 otherFunctions in_dir 2020-02-19 17:02:27
## 14: 3aef38d1fc02aee5 otherFunctions timing_fn 2020-02-19 17:02:27
## 15: 3aef38d1fc02aee5 otherFunctions handle 2020-02-19 17:02:27
## 16: 3aef38d1fc02aee5 otherFunctions withVisible 2020-02-19 17:02:27
## 17: 3aef38d1fc02aee5 preDigest n:7eef4eae85fd9229 2020-02-19 17:02:27
## 18: 3aef38d1fc02aee5 preDigest .FUN:881ec847b7161f3c 2020-02-19 17:02:27
## 19: 3aef38d1fc02aee5 file.size 169 2020-02-19 17:02:27
## 20: f7bee22047b8d0c1 objectName b 2020-02-19 17:02:27
## 21: f7bee22047b8d0c1 function rnorm 2020-02-19 17:02:27
## 22: f7bee22047b8d0c1 class numeric 2020-02-19 17:02:27
## 23: f7bee22047b8d0c1 object.size 1008 2020-02-19 17:02:27
## 24: f7bee22047b8d0c1 accessed 2020-02-19 17:02:27 2020-02-19 17:02:27
## 25: f7bee22047b8d0c1 inCloud FALSE 2020-02-19 17:02:27
## 26: f7bee22047b8d0c1 otherFunctions vweave_rmarkdown 2020-02-19 17:02:27
## 27: f7bee22047b8d0c1 otherFunctions process_file 2020-02-19 17:02:27
## 28: f7bee22047b8d0c1 otherFunctions process_group 2020-02-19 17:02:27
## 29: f7bee22047b8d0c1 otherFunctions process_group.block 2020-02-19 17:02:27
## 30: f7bee22047b8d0c1 otherFunctions call_block 2020-02-19 17:02:27
## 31: f7bee22047b8d0c1 otherFunctions block_exec 2020-02-19 17:02:27
## 32: f7bee22047b8d0c1 otherFunctions in_dir 2020-02-19 17:02:27
## 33: f7bee22047b8d0c1 otherFunctions timing_fn 2020-02-19 17:02:27
## 34: f7bee22047b8d0c1 otherFunctions handle 2020-02-19 17:02:27
## 35: f7bee22047b8d0c1 otherFunctions withVisible 2020-02-19 17:02:27
## 36: f7bee22047b8d0c1 preDigest n:7eef4eae85fd9229 2020-02-19 17:02:27
## 37: f7bee22047b8d0c1 preDigest .FUN:4f604aa46882b368 2020-02-19 17:02:27
## 38: f7bee22047b8d0c1 file.size 175 2020-02-19 17:02:27
## cacheId tagKey tagValue createdDate
ras <- raster(extent(0, 5, 0, 5), res = 1,
vals = sample(1:5, replace = TRUE, size = 25),
crs = "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +ellps=WGS84")
# A slow operation, like GIS operation
notCached <- suppressWarnings(
# project raster generates warnings when run non-interactively
projectRaster(ras, crs = crs(ras), res = 5, cacheRepo = tmpDir)
)
cached <- suppressWarnings(
# project raster generates warnings when run non-interactively
# using quote works also
Cache(projectRaster, ras, crs = crs(ras), res = 5, cacheRepo = tmpDir)
)
# second time is much faster
reRun <- suppressWarnings(
# project raster generates warnings when run non-interactively
Cache(projectRaster, ras, crs = crs(ras), res = 5, cacheRepo = tmpDir)
)
## loading cached result from previous projectRaster call.
## [1] TRUE
Nested caching, which is when Caching of a function occurs inside an outer function, which is itself cached. This is a critical element to working within a reproducible work flow. It is not enough during development to cache flat code chunks, as there will be many levels of “slow” functions. Ideally, at all points in a development cycle, it should be possible to get to any line of code starting from the very initial steps, running through everything up to that point, in less that 1 second. If the workflow can be kept very fast like this, then there is a guarantee that it will work at any point.
##########################
## Nested Caching
# Make 2 functions
inner <- function(mean) {
d <- 1
Cache(rnorm, n = 3, mean = mean)
}
outer <- function(n) {
Cache(inner, 0.1, cacheRepo = tmpdir2)
}
# make 2 different cache paths
tmpdir1 <- file.path(tempdir(), "first")
tmpdir2 <- file.path(tempdir(), "second")
# Run the Cache ... notOlderThan propagates to all 3 Cache calls,
# but cacheRepo is tmpdir1 in top level Cache and all nested
# Cache calls, unless individually overridden ... here inner
# uses tmpdir2 repository
Cache(outer, n = 2, cacheRepo = tmpdir1, notOlderThan = Sys.time())
## [1] 0.6824337 1.2243838 -0.7529345
## attr(,".Cache")
## attr(,".Cache")$newCache
## [1] TRUE
##
## attr(,"tags")
## [1] "cacheId:a7af5367c13aba8f"
## attr(,"call")
## [1] ""
## Cache size:
## Total (including Rasters): 504 bytes
## Selected objects (not including Rasters): 504 bytes
## cacheId tagKey tagValue createdDate
## 1: 4ac2b7c0f42d1e46 function rnorm 2020-02-19 17:02:28
## 2: 4ac2b7c0f42d1e46 class numeric 2020-02-19 17:02:28
## 3: 4ac2b7c0f42d1e46 object.size 1008 2020-02-19 17:02:28
## 4: 4ac2b7c0f42d1e46 accessed 2020-02-19 17:02:28 2020-02-19 17:02:28
## 5: 4ac2b7c0f42d1e46 inCloud FALSE 2020-02-19 17:02:28
## 6: 4ac2b7c0f42d1e46 otherFunctions vweave_rmarkdown 2020-02-19 17:02:28
## 7: 4ac2b7c0f42d1e46 otherFunctions process_file 2020-02-19 17:02:28
## 8: 4ac2b7c0f42d1e46 otherFunctions process_group 2020-02-19 17:02:28
## 9: 4ac2b7c0f42d1e46 otherFunctions process_group.block 2020-02-19 17:02:28
## 10: 4ac2b7c0f42d1e46 otherFunctions call_block 2020-02-19 17:02:28
## 11: 4ac2b7c0f42d1e46 otherFunctions block_exec 2020-02-19 17:02:28
## 12: 4ac2b7c0f42d1e46 otherFunctions in_dir 2020-02-19 17:02:28
## 13: 4ac2b7c0f42d1e46 otherFunctions timing_fn 2020-02-19 17:02:28
## 14: 4ac2b7c0f42d1e46 otherFunctions handle 2020-02-19 17:02:28
## 15: 4ac2b7c0f42d1e46 otherFunctions withVisible 2020-02-19 17:02:28
## 16: 4ac2b7c0f42d1e46 preDigest n:7f12988bd88a0fb8 2020-02-19 17:02:28
## 17: 4ac2b7c0f42d1e46 preDigest mean:22413394efd9f6a3 2020-02-19 17:02:28
## 18: 4ac2b7c0f42d1e46 preDigest .FUN:4f604aa46882b368 2020-02-19 17:02:28
## 19: 4ac2b7c0f42d1e46 file.size 166 2020-02-19 17:02:28
## 20: a7af5367c13aba8f function outer 2020-02-19 17:02:29
## 21: a7af5367c13aba8f class numeric 2020-02-19 17:02:29
## 22: a7af5367c13aba8f object.size 1008 2020-02-19 17:02:29
## 23: a7af5367c13aba8f accessed 2020-02-19 17:02:29 2020-02-19 17:02:29
## 24: a7af5367c13aba8f inCloud FALSE 2020-02-19 17:02:29
## 25: a7af5367c13aba8f otherFunctions vweave_rmarkdown 2020-02-19 17:02:29
## 26: a7af5367c13aba8f otherFunctions process_file 2020-02-19 17:02:29
## 27: a7af5367c13aba8f otherFunctions process_group 2020-02-19 17:02:29
## 28: a7af5367c13aba8f otherFunctions process_group.block 2020-02-19 17:02:29
## 29: a7af5367c13aba8f otherFunctions call_block 2020-02-19 17:02:29
## 30: a7af5367c13aba8f otherFunctions block_exec 2020-02-19 17:02:29
## 31: a7af5367c13aba8f otherFunctions in_dir 2020-02-19 17:02:29
## 32: a7af5367c13aba8f otherFunctions timing_fn 2020-02-19 17:02:29
## 33: a7af5367c13aba8f otherFunctions handle 2020-02-19 17:02:29
## 34: a7af5367c13aba8f otherFunctions withVisible 2020-02-19 17:02:29
## 35: a7af5367c13aba8f preDigest n:82dc709f2b91918a 2020-02-19 17:02:29
## 36: a7af5367c13aba8f preDigest .FUN:892a6afc47a63a90 2020-02-19 17:02:29
## 37: a7af5367c13aba8f file.size 166 2020-02-19 17:02:29
## cacheId tagKey tagValue createdDate
## Cache size:
## Total (including Rasters): 252 bytes
## Selected objects (not including Rasters): 252 bytes
## cacheId tagKey tagValue createdDate
## 1: 33ceb4fb525fd08f function inner 2020-02-19 17:02:28
## 2: 33ceb4fb525fd08f class numeric 2020-02-19 17:02:28
## 3: 33ceb4fb525fd08f object.size 1008 2020-02-19 17:02:28
## 4: 33ceb4fb525fd08f accessed 2020-02-19 17:02:28 2020-02-19 17:02:28
## 5: 33ceb4fb525fd08f inCloud FALSE 2020-02-19 17:02:28
## 6: 33ceb4fb525fd08f otherFunctions vweave_rmarkdown 2020-02-19 17:02:28
## 7: 33ceb4fb525fd08f otherFunctions process_file 2020-02-19 17:02:28
## 8: 33ceb4fb525fd08f otherFunctions process_group 2020-02-19 17:02:28
## 9: 33ceb4fb525fd08f otherFunctions process_group.block 2020-02-19 17:02:28
## 10: 33ceb4fb525fd08f otherFunctions call_block 2020-02-19 17:02:28
## 11: 33ceb4fb525fd08f otherFunctions block_exec 2020-02-19 17:02:28
## 12: 33ceb4fb525fd08f otherFunctions in_dir 2020-02-19 17:02:28
## 13: 33ceb4fb525fd08f otherFunctions timing_fn 2020-02-19 17:02:28
## 14: 33ceb4fb525fd08f otherFunctions handle 2020-02-19 17:02:28
## 15: 33ceb4fb525fd08f otherFunctions withVisible 2020-02-19 17:02:28
## 16: 33ceb4fb525fd08f preDigest mean:22413394efd9f6a3 2020-02-19 17:02:28
## 17: 33ceb4fb525fd08f preDigest .FUN:87e2c30917a34d25 2020-02-19 17:02:28
## 18: 33ceb4fb525fd08f file.size 166 2020-02-19 17:02:28
# userTags get appended
# all items have the outer tag propagate, plus inner ones only have inner ones
clearCache(tmpdir1, ask = FALSE)
outerTag <- "outerTag"
innerTag <- "innerTag"
inner <- function(mean) {
d <- 1
Cache(rnorm, n = 3, mean = mean, notOlderThan = Sys.time() - 1e5, userTags = innerTag)
}
outer <- function(n) {
Cache(inner, 0.1)
}
aa <- Cache(outer, n = 2, cacheRepo = tmpdir1, userTags = outerTag)
showCache(tmpdir1) # rnorm function has outerTag and innerTag, inner and outer only have outerTag
## Cache size:
## Total (including Rasters): 756 bytes
## Selected objects (not including Rasters): 756 bytes
## cacheId tagKey tagValue createdDate
## 1: 4ac2b7c0f42d1e46 innerTag innerTag 2020-02-19 17:02:29
## 2: 4ac2b7c0f42d1e46 outerTag outerTag 2020-02-19 17:02:29
## 3: 4ac2b7c0f42d1e46 function rnorm 2020-02-19 17:02:29
## 4: 4ac2b7c0f42d1e46 class numeric 2020-02-19 17:02:29
## 5: 4ac2b7c0f42d1e46 object.size 1008 2020-02-19 17:02:29
## 6: 4ac2b7c0f42d1e46 accessed 2020-02-19 17:02:29 2020-02-19 17:02:29
## 7: 4ac2b7c0f42d1e46 inCloud FALSE 2020-02-19 17:02:29
## 8: 4ac2b7c0f42d1e46 otherFunctions vweave_rmarkdown 2020-02-19 17:02:29
## 9: 4ac2b7c0f42d1e46 otherFunctions process_file 2020-02-19 17:02:29
## 10: 4ac2b7c0f42d1e46 otherFunctions process_group 2020-02-19 17:02:29
## 11: 4ac2b7c0f42d1e46 otherFunctions process_group.block 2020-02-19 17:02:29
## 12: 4ac2b7c0f42d1e46 otherFunctions call_block 2020-02-19 17:02:29
## 13: 4ac2b7c0f42d1e46 otherFunctions block_exec 2020-02-19 17:02:29
## 14: 4ac2b7c0f42d1e46 otherFunctions in_dir 2020-02-19 17:02:29
## 15: 4ac2b7c0f42d1e46 otherFunctions timing_fn 2020-02-19 17:02:29
## 16: 4ac2b7c0f42d1e46 otherFunctions handle 2020-02-19 17:02:29
## 17: 4ac2b7c0f42d1e46 otherFunctions withVisible 2020-02-19 17:02:29
## 18: 4ac2b7c0f42d1e46 preDigest n:7f12988bd88a0fb8 2020-02-19 17:02:29
## 19: 4ac2b7c0f42d1e46 preDigest mean:22413394efd9f6a3 2020-02-19 17:02:29
## 20: 4ac2b7c0f42d1e46 preDigest .FUN:4f604aa46882b368 2020-02-19 17:02:29
## 21: 4ac2b7c0f42d1e46 file.size 166 2020-02-19 17:02:29
## 22: b06af03d5a73dc7d outerTag outerTag 2020-02-19 17:02:29
## 23: b06af03d5a73dc7d function inner 2020-02-19 17:02:29
## 24: b06af03d5a73dc7d class numeric 2020-02-19 17:02:29
## 25: b06af03d5a73dc7d object.size 1008 2020-02-19 17:02:29
## 26: b06af03d5a73dc7d accessed 2020-02-19 17:02:29 2020-02-19 17:02:29
## 27: b06af03d5a73dc7d inCloud FALSE 2020-02-19 17:02:29
## 28: b06af03d5a73dc7d otherFunctions vweave_rmarkdown 2020-02-19 17:02:29
## 29: b06af03d5a73dc7d otherFunctions process_file 2020-02-19 17:02:29
## 30: b06af03d5a73dc7d otherFunctions process_group 2020-02-19 17:02:29
## 31: b06af03d5a73dc7d otherFunctions process_group.block 2020-02-19 17:02:29
## 32: b06af03d5a73dc7d otherFunctions call_block 2020-02-19 17:02:29
## 33: b06af03d5a73dc7d otherFunctions block_exec 2020-02-19 17:02:29
## 34: b06af03d5a73dc7d otherFunctions in_dir 2020-02-19 17:02:29
## 35: b06af03d5a73dc7d otherFunctions timing_fn 2020-02-19 17:02:29
## 36: b06af03d5a73dc7d otherFunctions handle 2020-02-19 17:02:29
## 37: b06af03d5a73dc7d otherFunctions withVisible 2020-02-19 17:02:29
## 38: b06af03d5a73dc7d preDigest mean:22413394efd9f6a3 2020-02-19 17:02:29
## 39: b06af03d5a73dc7d preDigest .FUN:7ad10bc1ae528d8c 2020-02-19 17:02:29
## 40: b06af03d5a73dc7d file.size 166 2020-02-19 17:02:29
## 41: 88a34e1d033329e5 outerTag outerTag 2020-02-19 17:02:30
## 42: 88a34e1d033329e5 function outer 2020-02-19 17:02:30
## 43: 88a34e1d033329e5 class numeric 2020-02-19 17:02:30
## 44: 88a34e1d033329e5 object.size 1008 2020-02-19 17:02:30
## 45: 88a34e1d033329e5 accessed 2020-02-19 17:02:30 2020-02-19 17:02:30
## 46: 88a34e1d033329e5 inCloud FALSE 2020-02-19 17:02:30
## 47: 88a34e1d033329e5 otherFunctions vweave_rmarkdown 2020-02-19 17:02:30
## 48: 88a34e1d033329e5 otherFunctions process_file 2020-02-19 17:02:30
## 49: 88a34e1d033329e5 otherFunctions process_group 2020-02-19 17:02:30
## 50: 88a34e1d033329e5 otherFunctions process_group.block 2020-02-19 17:02:30
## 51: 88a34e1d033329e5 otherFunctions call_block 2020-02-19 17:02:30
## 52: 88a34e1d033329e5 otherFunctions block_exec 2020-02-19 17:02:30
## 53: 88a34e1d033329e5 otherFunctions in_dir 2020-02-19 17:02:30
## 54: 88a34e1d033329e5 otherFunctions timing_fn 2020-02-19 17:02:30
## 55: 88a34e1d033329e5 otherFunctions handle 2020-02-19 17:02:30
## 56: 88a34e1d033329e5 otherFunctions withVisible 2020-02-19 17:02:30
## 57: 88a34e1d033329e5 preDigest n:82dc709f2b91918a 2020-02-19 17:02:30
## 58: 88a34e1d033329e5 preDigest .FUN:5f06fb5fbffe9e3b 2020-02-19 17:02:30
## 59: 88a34e1d033329e5 file.size 166 2020-02-19 17:02:30
## cacheId tagKey tagValue createdDate
Sometimes, it is not absolutely desirable to maintain the work flow intact because changes that are irrelevant to the analysis, such as changing messages sent to a user, may be changed, without a desire to rerun functions. The cacheId
argument is for this. Once a piece of code is run, then the cacheId
can be manually extracted (it is reported at the end of a Cache call) and manually placed in the code, passed in as, say, cacheId = "ad184ce64541972b50afd8e7b75f821b"
.
## [1] -0.6264538
## attr(,".Cache")
## attr(,".Cache")$newCache
## [1] TRUE
##
## attr(,"tags")
## [1] "cacheId:7072c305d8c69df0"
## attr(,"call")
## [1] ""
# manually look at output attribute which shows cacheId: 7072c305d8c69df0
Cache(rnorm, 1, cacheRepo = tmpdir1, cacheId = "7072c305d8c69df0") # same value
## cacheId is same as calculated hash
## loading cached result from previous rnorm call.
## [1] -0.6264538
## attr(,".Cache")
## attr(,".Cache")$newCache
## [1] FALSE
##
## attr(,"tags")
## [1] "cacheId:7072c305d8c69df0"
## attr(,"call")
## [1] ""
# override even with different inputs:
Cache(rnorm, 2, cacheRepo = tmpdir1, cacheId = "7072c305d8c69df0")
## cacheId is not same as calculated hash. Manually searching for cacheId:7072c305d8c69df0
## loading cached result from previous rnorm call.
## [1] -0.6264538
## attr(,".Cache")
## attr(,".Cache")$newCache
## [1] FALSE
##
## attr(,"tags")
## [1] "cacheId:7072c305d8c69df0"
## attr(,"call")
## [1] ""
Since the cache is simply a DBI
data table (of an SQLite database by default). In addition, there are several helpers in the reproducible
package, including showCache
, keepCache
and clearCache
that may be useful. Also, one can access cached items manually (rather than simply rerunning the same Cache
function again).
# As of reproducible version 1.0, there is a new backend directly using DBI
mapHash <- unique(showCache(tmpDir, userTags = "projectRaster")$cacheId)
## Cache size:
## Total (including Rasters): 3.3 Kb
## Selected objects (not including Rasters): 3.3 Kb
In general, we feel that a liberal use of Cache
will make a re-usable and reproducible work flow. shiny
apps can be made, taking advantage of Cache
. Indeed, much of the difficulty in managing data sets and saving them for future use, can be accommodated by caching.
Cache(<functionName>, <other arguments>)
This will allow fine scale control of individual function calls.