R’s emphasis on avoiding side effects generally means that if you run the same R code more than once you can be relatively certain that you will get the same result each time. While this is generally true, there are some exceptions. If you evaluate:
x <- x + 5
on the command line, the result will depend on what the value of
x was in the workspace prior to evaluation. Since
workspaces are littered with objects from day to day R use tests are
better run elsewhere to avoid conflicts with those objects.
There are even more subtle factors that can affect test evaluation.
For example, if
x is an S3 object, the packages loaded on
the search path could affect the result of the command. Global options
could also affect the outcome.
Here is a non-exhaustive list of aspects of state that might affect test outcomes:
Ideally a unit testing framework would nullify these environmental
factors such that the only changes in test evaluation are caused by
changes in the code that is being tested.
functionality that sets session state to known “clean” values ahead of
the evaluation of each test. Currently
unitizer attempts to
manage the first six aspects of state listed above.
In order to comply with CRAN policies state management is turned off by default.
unitizer batch processes all the tests when it is first
run before it breaks into interactive mode. It does this to:
The batch-evaluate-and-review-later creates the need for a mechanism
to recreate state for when we review the tests. Imagine trying to figure
out why a test failed when all the variables may have been changed by
unitizer will always recreate the state
of the variables defined by the test scripts, and can optionally
recreate other aspects of state provided that is enabled.
You can turn on the “suggested” state management level to manage the
first four elements of state listed in the previous section. To do so,
unitize(..., state='suggested') or
options(unitizer.state='suggested'). Be sure to read
?unitizerState before you enable this setting as there are
cases when state management may not work.
In order to allow review of each test in its original evaluation environment, each test is evaluated in a separate environment. Each of these environments has for parent the environment of the previous test. This means that a test has access to all the objects created/used by earlier tests, but not objects created/used by subsequent tests. When a later test “modifies” an existing object, the existing object is not really modified; rather, the test creates a new object of the same name in the child environment which masks the object in the earlier test. This is functionally equivalent to overwriting the object as far as the later test is concerned.
For the most part this environment trickery should be transparent to
the user. An exception is the masking of
traceback with versions that account for the special nature
unitizer REPL. Another is that you can not remove an
object created in an earlier test with
rm (well, it is
possible, but the how isn’t documented and you are advised not to
attempt it). Here is a more complex exception:
a <- function() b() NULL # Prevent `a` and `b` being part of the same test b <- function() TRUE a()
In this case, when we evaluate
a() we must step back two
environments to find
a, but that’s okay. The problem is
that once inside
a, we must now evaluate
b is defined in a child environment, not a parent
environment so R’s object lookup fails. If we remove the NULL this would
work, but only because neither the
assignments are tests, so both
be assigned to the environment of the
a() call (see details on tests vignette).
If you are getting weird “object not found” errors when you run your tests, but the same code does not generate those errors when run directly in the command line, this illusion could be failing you. In those situations, make sure that you assign all the variables necessary right ahead of the test so they will all get stored in the same environment.
In the “suggested” state tracking mode
unitize will run
tests in an environment that has the same parent as
.GlobalEnv \ +--> package:x --> ... --> Base / TestEnv --> UnitizerEnv
This means that objects in the global environment / workspace will not affect your tests.
Unfortunately implementing this structure is not trivial because we
need to ensure
UnitizerEnv stays pointed at the environment
.GlobalEnv even as tests modify the search path
etc. To achieve this
base::detach when state tracking is
enabled and only when
running. Any time any of those functions is called,
unitizer updates the parent of
be the second environment on the search path (i.e. the parent of
.GlobalEnv). So, for example, if a test calls
library(z), the new search path would look like so:
.GlobalEnv \ +--> package:y --> package:x --> ... --> Base / TestEnv --> UnitizerEnv
Clearly overriding such fundamental functions such as
detach is not
good form. We recognize this, and try to do the overriding in as
lightweight a manner as possible by tracing them only to record the
search path while
unitizer is evaluating. This should be
completely transparent to the user. The untracing is registered to the
unitize so the functions should get
untraced even if
Aside from the issues raised above, this method is not completely
robust. Any tests that turn tracing off using
untrace any of
unitizer. If you must do any of the above
you should consider specifying a parent environment for your tests
state parameter to
Some functions that expect to find
.GlobalEnv on the
search path may not work as expected. For example,
topenv by default to find an environment to define
classes in. When
setClass is called at the top level, this
normally results in the class being defined in
.GlobalEnv is not available
will attempt to define the class in the first environment on the search
path, which will likely be a locked namespace. You can work around this
by specifying an environment in calls to
Sometimes it is convenient to use the namespace of a package as the
parent environment. This allows you to write tests that use internal
package functions without having to resort to
:::. You can
set the parent evaluation environment with the
If you do use this feature keep in mind that your tests will be directly exposed to the global environment as well since R looks through the search path starting at the global environment after looking in the package namespace and imports (your package code is always exposed to this).
For the most part R is a copy-on-modify language, which allows us to
employ the trickery described above. There are however “reference”
objects that are not copied when they are modified. Notable examples
include environments, reference classes, and
Since our trickery requires us to keep copies of each object in
different environments as they are modified, it does not work with
reference objects since they are not automatically duplicated.
The main consequence of this is that when you are reviewing a test that involves a reference object, the value of that reference object during review will be the value after the last reference modification, which may have been made after the test you are reviewing. The tests will still work as they should, passing if you did not introduce regressions, and failing otherwise. However if you review a failed test you may have a hard time making sense of what happened since the objects you review will may not have the values they had when the test was actually run.
When we review
unitizer tests, it is possible to end up
in a situation where we wish to update our store by keeping a mix of the
new tests as well as some of the old ones. This leads to some
complications because in order to faithfully reproduce the environments
associated with both the reference and the new tests we would
potentially have to store the entire set of environments produced by the
test script for both the new and reference tests. Even worse, if we
unitizer again, we run the risk of having to store
yet another set of environments (the old reference environments, what
were new environments but became reference ones on this additional run,
and the new environments created by this third run). The problem
continues to grow with as each incremental run of the
unitizer script potentially creates the need to store yet
another set of environments.
As a work-around to this problem
unitizer only keeps the
environment associated with the actual reference tests you chose to keep
(e.g. when you type
N at the
when reviewing a failed test).
unitizer then grafts that
test and its environment to the environment chain from the newly
evaluated tests (note that for all tests that pass, we keep the new
version of the tests, not the reference one). This means that in future
unitizer runs where you examine this same reference test,
the other “reference” objects available for inspection may not be from
the same evaluation that produced the test. The
will highlight which objects are from the same evaluation vs which ones
are not (see the discussion on
This is not an ideal outcome, but the compromise was necessary to
avoid the possibility of ever increasing
For more details see
One other way tests can change behavior unexpectedly is if the
packages / objects attached to the search path change. A simple example
is a test script that relies on package “X”, and the user attached that
package at some point during interactive use, but forgot to add the
library call to the test script itself. During
testing, the scripts will work fine, but at some future date if the test
scripts are run again they are likely to fail due to the dependency on
the package that is not explicitly loaded in the test scripts.
In the “suggested” state tracking mode
unitizer runs on
a “trimmed” search path that contains only the packages loaded by in a
freshly loaded R session (i.e. the packages between
?unitizerState). You will need to explicitly load packages
that your tests depend on in your test file (e.g. by using
unitize will restore the search
path to its original state once you complete review.
unitizer also relies on tracing
implement this feature, so the caveats described above apply equally here.
unitizer does not modify the search path
itself other than by using
When search path tracking is enabled,
the versions of the packages on the search path. If tests fails and
package versions on the search path have changes since the reference
test was stored, you will be alerted.
unitizer manipulates the search path it restores
the original one by using
any previously detached objects or packages. This generally works fine,
but detaching and re-attaching packages is not and cannot be the same as
loading a package or attaching an environment for the first time. For
example, S3 method registration is not undone when detaching a package,
or even unloading its namespace. See discussion in
One known problem is the use of
similar which place a pretend package environment on the search path.
Such packages cannot be re-loaded with
library so the
re-attach process will fail (see #252).
Another issue is attached environments that contain references to
themselves, as the
tools:rstudio environment attached by
Rstudio does. It contains functions that have for
tools:rstudio environment. The problem is
that once that environment is detached from the search path, those
functions no longer have access to the search path. Re-attaching the
environment to the search path does not solve the problem because
attach attaches a copy of the environment, not the
environment itself. This new environment will contain the same objects
as the original environment, but all the functions therein will have for
environment the original detached environment, not the copy that is
attached to the search path.
For the specific
tools::rstudio problem we work around
the issue by keeping it on the search path even search path tracking is
enabled (you can over-ride this by changing
search.path.keep, or, if you have environments on your
search path with similar properties, add their names to
search.path.keep). Other options include re-attaching with
parent.env<- instead of
attach, but messing
with the search path in that way seems to be exactly what R core warns
The replacement function parent.env<- is extremely dangerous as it can be used to destructively change environments in ways that violate assumptions made by the internal C code. It may be removed in the near future.
unitizer can track and reset global options. Because
many packages set options when their namespaces are attached,
implementation of this feature must be coordinated with a careful
management of loaded namespaces. For example, we can reasonably easily
set options to be what you would expect in a freshly loaded vanilla R
session, but if some namespaces as otherwise they would be in a
compromised set with their options wiped out.
unitizer can manage search paths and namespaces, but
unfortunately some package namespaces cannot be unloaded so options
management can be problematic when such packages are involved (one
data.table). Because of this options management
is not enabled in the “suggested” state management mode.
Note that no matter what tests are always run with
?unitizer.opts for more details.