To install the last version of this package directly from GitHub uncomment and run:

# library(devtools)
# use "quiet = FALSE" if you want to see the outputs of this command
# devtools::install_github("juancolonna/FastStepGraph", quiet = TRUE, force = TRUE)

# Then, load it:
library(FastStepGraph)

Simulate Gaussian Data with an Autoregressive (AR) Model:

set.seed(1234567)
phi <- 0.4 
p <- 50  # number of variables (dimension)
n <- 30 # number of samples

# Generate Data from a Gaussian distribution 
data <- FastStepGraph::SigmaAR(n, p, phi)
X <- scale(data$X) # standardizing variables

To fit the Omega matrix with FastStepGraph() function you have to know the optimal values of \(\mathbf{\alpha_f}\) and \(\mathbf{\alpha_b}\). If you don’t know these values, try to find them using cross-validation as follows:

t0 <- Sys.time() # INITIAL TIME
res <- FastStepGraph::cv.FastStepGraph(X, data_shuffle = TRUE)
difftime(Sys.time(), t0, units = "secs")
#> Time difference of 3.148527 secs
# print(res$alpha_f_opt)
# print(res$alpha_b_opt)

If your input variables are non-standardized (with zero mean and unit variance), we recommend that you set data_scale=TURE. Subsequently, calculate the Omega matrix by calling the FastStepGraph() function passing the optimal parameters \(\mathbf{\alpha_f}\) and \(\mathbf{\alpha_b}\) found by cross-validation to fit the final model:

t0 <- Sys.time() # INITIAL TIME
G <- FastStepGraph::FastStepGraph(X, alpha_f = res$alpha_f_opt, alpha_b = res$alpha_b_opt)
difftime(Sys.time(), t0, units = "secs")
#> Time difference of 0.00232029 secs
# print(G$Omega)

You can also perform these two steps, the cross-validation to obtain the ideal parameters and return the fitted model, in a single step by setting the return_model=TRUE option as follows:

t0 <- Sys.time() # INITIAL TIME
res <- FastStepGraph::cv.FastStepGraph(X, return_model=TRUE, data_shuffle = TRUE)
difftime(Sys.time(), t0, units = "secs")
#> Time difference of 3.015367 secs
# print(res$alpha_f_opt)
# print(res$alpha_b_opt)
# print(res$Omega)

The arguments n_folds = 5, alpha_f_min = 0.1, alpha_f_max = 0.9, n_alpha = 32 (size of the grid search) and nei.max = 5, have defaults values and can be omitted. Note that, cv.FastStepGraph(X) is not an exhaustive grid search over \(\mathbf{\alpha_f}\) and \(\mathbf{\alpha_b}\). This is a heuristic that tests only a few \(\mathbf{\alpha_b}\) values starting with the rule \(\mathbf{\alpha_b}=\frac{\mathbf{\alpha_f}}{2}\). It is recommended to shuffle the rows of X before running cross-validation. The default value is data_shuffle = TRUE, but if you want to disable row shuffle, set it to data_shuffle = FALSE.

To increase time performance, you can run cv.FastStepGraph(X, parallel = TRUE) in parallel. Before, you’ll need to install and register a parallel backend. To run on a Linux system the doParallel dependency must be installed install.packages("doParallel"). These parallel packages will also require the following dependencies: foreach, iterators and parallel. Make sure you satisfy them. Then, call the method setting the parameter parallel = TRUE, as follows:

t0 <- Sys.time() # INITIAL TIME
# use 'n_cores = NULL' to set the maximum number of cores minus one on your machine 
res <- FastStepGraph::cv.FastStepGraph(X, return_model=TRUE, parallel = TRUE, n_cores = 2)
difftime(Sys.time(), t0, units = "secs")
# print(res$alpha_f_opt)
# print(res$alpha_b_opt)
# print(res$Omega)

Remember, you can set the n_cores parameter to a value equal to the number of cores you have, but be careful as this may overload your system. Setting it to 1 disables parallel processing, and setting it to a number greater than the number of available cores does not improve efficiency.


  1. Institute of Computing. Federal University of Amazonas. Brasil. ↩︎

  2. Mathematics Department. National University of Río Cuarto. Argentina. ↩︎