mvs
Methods for high-dimensional multi-view learning based on the
multi-view stacking (MVS) framework. Data have a multi-view structure
when features comprise different ‘views’ of the same observations. For
example, the different views may comprise omics, imaging or electronic
health records. Package mvs
provides functions to fit
stacked penalized logistic regression (StaPLR) models, which are a
special case of multi-view stacking (MVS). Additionally,
mvs
generalizes the StaPLR model to settings with a
Gaussian or Poisson outcome distribution, and to hierarchical multi-view
structures with more than two levels. For more information about the
StaPLR and MVS methods, see Van Loon, Fokkema, Szabo, & De Rooij
(2020) and Van Loon et al. (2022).
The current stable release can be installed directly from CRAN:
::install.packages("mvs") utils
The current development version can be installed from GitLab using
package devtools
:
::install_gitlab("wsvanloon/mvs@develop") devtools
mvs
The two main functions are StaPLR()
(alias
staplr
), which fits penalized and stacked penalized
regression models models with up to two levels, and MVS()
(alias mvs
), which fits multi-view stacking models with
>= 2 levels. Objects returned by either function have associated
coef
and predict
methods.
StaPLR
library("mvs")
Generate 1000 observations with four two-feature views with varying within- and between-view correlation:
set.seed(012)
<- 1000
n <- seq(0.1, 0.7, 0.1)
cors <- matrix(NA, nrow=n, ncol=length(cors)+1)
X 1] <- rnorm(n)
X[ , for (i in 1:length(cors)) {
+1] <- X[ , 1]*cors[i] + rnorm(n, 0, sqrt(1-cors[i]^2))
X[ , i
}<- c(1, 0, 0, 0, 0, 0, 0, 0)
beta <- X %*% beta
eta <- exp(eta)/(1+exp(eta))
p <- rbinom(n, 1, p) y
Fit StaPLR:
<- rep(1:(ncol(X)/2), each=2)
view_index set.seed(012)
<- StaPLR(X, y, view_index) fit
Extract coefficients at the view level:
<- coef(fit)
coefs $meta coefs
## 5 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -2.345398
## V1 4.693861
## V2 .
## V3 .
## V4 .
We see that the only the first view has been selected. The data was
generated so that only the first feature (from the first view) was a
true predictor, but it was also substantially correlated with features
from other views (see cor(X)
), most strongly with the
features from the fourth view.
Extract coefficients at the base level:
$base coefs
## [[1]]
## 3 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -0.05351035
## V1 0.86273113
## V2 0.09756006
##
## [[2]]
## 3 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -6.402186e-02
## V1 1.114585e-38
## V2 1.156060e-38
##
## [[3]]
## 3 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -0.06875322
## V1 0.26176566
## V2 0.35602028
##
## [[4]]
## 3 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -0.03101978
## V1 0.27605205
## V2 0.39234018
We see that the first feature has the strongest effect on the predicted outcome, with a base-level regression coefficient of 0.86. The features in views two, three and four all have zero effect, since the meta-level coefficients for these views are zero.
Compute predictions:
<- matrix(rnorm(16), nrow=2)
new_X predict(fit, new_X)
## lambda.min
## [1,] 0.8698197
## [2,] 0.1819153
By default, the predictions are made using the values of the penalty parameters which minimize the cross-validation error (lambda.min).
As StaPLR was developed in the context of binary classification
problems, the default outcome distribution is
family = "binomial"
. Other outcome distributions (e.g.,
Gaussian, Poisson) can be modeled by specifying, e.g.,
family = "gaussian"
or
family = "poisson"
.
A generalization of stacked penalized (logistic) regression to
three or more hierarchical levels is implemented in function
MVS
(alias mvs
).
Van Loon, W., De Vos, F., Fokkema, M., Szabo, B., Koini, M., Schmidt, R., & De Rooij, M. (2022). Analyzing hierarchical multi-view MRI data with StaPLR: An application to Alzheimer’s disease classification. Frontiers in Neuroscience, 16, 830630. https://doi.org/10.3389/fnins.2022.830630
Van Loon, W., Fokkema, M., Szabo, B., & De Rooij, M. (2020). Stacked penalized logistic regression for selecting views in multi-view learning. Information Fusion, 61, 113–123. https://doi.org/10.1016/j.inffus.2020.03.007