Main function example: model selection
Here we generate data from a factor model with 3 factors. We have 50 samples of 100 dimensional data. The model is of size 5, where the first 5 covariates model coefficients being non-zero and the rest zero. The factors, loadings, errors are all generated from a normal distribution.
library(FarmSelect)
set.seed(100)
P = 100 #dimension
N = 50 #samples
K = 3 #nfactors
Q = 5 #model size
Lambda = matrix(rnorm(P*K, 0,1), P,K)
F = matrix(rnorm(N*K, 0,1), N,K)
U = matrix(rnorm(P*N, 0,1), P,N)
X = Lambda%*%t(F)+U
X = t(X)
beta_1 = 3+3*runif(Q)
beta = c(beta_1, rep(0,P-Q))
eps = rnorm(N)
Y = X%*%beta+eps
output = farm.select(Y,X)
## Call:
## farm.select(Y = Y, X = X)
##
## Factor Adjusted Robust Model Selection
## loss function used: mcp
##
## p = 100, n = 50
## factors found: 3
## size of model selected:
## 5
names(output)
## [1] "beta.chosen" "coef.chosen" "nfactors" "X.res" "Y.res"
output$beta.chosen
## V1 V2 V3 V4 V5
## 1 2 3 4 5
output$coef.chosen
## V1 V2 V3 V4 V5
## 3.706534 3.498892 4.566415 3.725966 3.297019
The values X.res and Y.res are the covariates and responses after adjusting for latent factors. The formulas for these are \(Y.\text{res} = (I_n-P)Y\) and \(X.\text{res} = (I_n-P)X^T\), where \(P = \hat{F}(\hat{F}^T\hat{F})^{-1}\hat{F}^T\).
Now we use a different loss function for the model selection step.
output = farm.select(Y,X, loss = "lasso" )
## Call:
## farm.select(Y = Y, X = X, loss = "lasso")
##
## Factor Adjusted Robust Model Selection
## loss function used: lasso
##
## p = 100, n = 50
## factors found: 3
## size of model selected:
## 5
We may also use robust estimates of all the parameters. This may take more time, depending upon the dimension size of the problem.
output = farm.select(Y,X, robust = TRUE )
## Call:
## farm.select(Y = Y, X = X, robust = TRUE)
##
## Factor Adjusted Robust Model Selection
## loss function used: mcp
##
## p = 100, n = 50
## factors found: 3
## size of model selected:
## 5