--- title: "Simple and Bipartite Stochastic Block Models" subtitle: "An illustration on antagonistic tree/fungus network" author: "team großBM" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 4 bibliography: references.bib link-citations: yes vignette: > %\VignetteIndexEntry{Simple and Bipartite Stochastic Block Models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Preliminaries This vignette illustrates the use of the `estimateSBM` function and the methods accompanying the R6 classes `SimpleSBMfit` and `BipartiteSBMfit`. ### Requirements The packages required for the analysis are **sbm** plus some others for data manipulation and representation: ```{r setup, message=FALSE, warning=FALSE} library(sbm) library(ggplot2) library(knitr) theme_set(theme_bw()) ``` ### Data set: antagonistic tree/fungus interaction network We consider the fungus-tree interaction network studied by @tree_fungus_network, available with the package **sbm**: ```{r import dataset} data("fungusTreeNetwork") str(fungusTreeNetwork, max.level = 1) ``` This data set provides information about $154$ fungi sampled on $51$ tree species. It is a list with the following entries: - `fungi_list`: list of the fungus species names - `tree_list` : list of the tree species names - `fungus_tree` : binary fungus-tree interactions - `tree_tree` : weighted tree-tree interactions (number of common fungal species two tree species host) - `covar_tree` : covariates associated to pairs of trees (namely genetic, taxonomic and geographic distances) We first consider the tree-tree interactions resulting into a Simple Network. Then we consider the bipartite network between trees and fungi. ### Mathematical Background See @blockmodels for details. ## Analysis of the tree/tree data ### Tree-tree binary interaction networks We first consider the binary network where an edge is drawn between two trees when they do share a least one common fungi: ```{r tree_tree_binary network} tree_tree_binary <- 1 * (fungusTreeNetwork$tree_tree != 0) ``` The simple function `plotMyMatrix` can be use to represent simple or bipartite SBM: ```{r tree_tree_binary network plot data} plotMyMatrix(tree_tree_binary, dimLabels =c('tree')) ``` We look for some latent structure of the network by adjusting a simple SBM with the function `estimateSimpleSBM`. ```{r simpleSBM} mySimpleSBM <- tree_tree_binary %>% estimateSimpleSBM("bernoulli", dimLabels = 'tree', estimOptions = list(verbosity = 1, plot = TRUE)) ``` Once fitted, the user can manipulate the fitted model by accessing the various fields and methods enjoyed by the class `simpleSBMfit`. Most important fields and methods are recalled to the user via the `show` method: ```{r simpleSBMfit} class(mySimpleSBM) mySimpleSBM ``` For instance, ```{r impleSBMfit fields} mySimpleSBM$nbBlocks mySimpleSBM$nbNodes mySimpleSBM$nbCovariates ``` The plot method is available as a S3 or R6 method. The default represents the network data reordered according to the memberships estimated in the SBM. ```{r simpleSBMfit plot1} plot(mySimpleSBM, type = "data", dimLabels = c('tree')) ``` One can also plot the expected network which, in case of the Bernoulli model, corresponds to the probability of connection between any pair of nodes in the network. ```{r simpleSBMfit plot2} plot(mySimpleSBM, type = "expected") ``` ```{r simpleSBMfit plotmeso} plot(mySimpleSBM, type = "meso") ``` ```{r simpleSBMfit coef} coef(mySimpleSBM, 'block') coef(mySimpleSBM, 'connectivity') ``` #### About model selection and choice of the number of blocks During the estimation, a certain range of models are explored corresponding to different number of blocks. By default, the best model in terms of Integrated Classification Likelihood is sent back. In fact, all the model are stored internally. The user can have a quick glance at them via the `$storedModels` field: ```{r simpleSBM storedModel} mySimpleSBM$storedModels %>% kable() ``` We can then see what models are competitive in terms of model selection by checking the ICL: ```{r simpleSBM ICL} mySimpleSBM$storedModels %>% ggplot() + aes(x = nbBlocks, y = ICL) + geom_line() + geom_point(alpha = 0.5) ``` The 4-block model could have been a good choice too, in place of the 5-block model. The user can update the current `simpleSBMfit` thanks to the the `setModel` method: ```{r simpleSBMfit changeModel} mySimpleSBM$setModel(4) mySimpleSBM$nbBlocks mySimpleSBM$plot(type = 'expected') mySimpleSBM$setModel(5) ``` ### Analysis of the weighted interaction network Instead of considering the binary network tree-tree we may consider the weighted network where the link between two trees is the number of fungi they share. We plot the matrix with function `plotMyMatrix`: ```{r tree_tree network plot data} tree_tree <- fungusTreeNetwork$tree_tree plotMyMatrix(tree_tree, dimLabels = c('tree')) ``` Here again, we look for some latent structure of the network by adjusting a simple SBM with the function `estimateSimpleSBM`, considering a Poisson distribution on the edges. ```{r simpleSBM Poisson} mySimpleSBMPoisson <- tree_tree %>% estimateSimpleSBM("poisson", dimLabels = 'tree', estimOptions = list(verbosity = 0, plot = FALSE)) ``` Once fitted, the user can manipulate the fitted model by accessing the various fields and methods enjoyed by the class `simpleSBMfit`. Most important fields and methods are recalled to the user via the `show` method: ```{r simpleSBMfitPoisson} class(mySimpleSBMPoisson) mySimpleSBMPoisson ``` For instance, ```{r impleSBMfitPoison fields} mySimpleSBMPoisson$nbBlocks mySimpleSBMPoisson$nbNodes mySimpleSBMPoisson$nbCovariates ``` We now plot the matrix reordered according to the memberships estimated in the SBM: ```{r simpleSBMfitPoisson plot1} plot(mySimpleSBMPoisson, type = "data", dimLabels =c('tree')) ``` One can also plot the expected network which, in case of the Poisson model, corresponds to the expectation of connection between any pair of nodes in the network. ```{r simpleSBMfitPoisson plot2} plot(mySimpleSBMPoisson, type = "expected", dimLabels = c('tree')) ``` ```{r simpleSBMfitPoisson plotmeso} plot(mySimpleSBMPoisson, type = "meso") ``` The same manipulations can be made on the models as before. ```{r simpleSBMfitPoisson coef} coef(mySimpleSBMPoisson, 'block') coef(mySimpleSBMPoisson, 'connectivity') ``` ### Introduction of covariates We have on each pair of trees 3 covariates, namely the genetic distance, the taxonomic distance and the geographic distance. Each covariate has to be introduced as a matrix: $X^k_{ij}$ corresponds to the value of the $k$-th covariate describing the couple $(i,j)$. ```{r covar SBM,echo=TRUE,eval= TRUE} mySimpleSBMCov<- tree_tree %>% estimateSimpleSBM( model = 'poisson', directed = FALSE, dimLabels = 'tree', covariates = fungusTreeNetwork$covar_tree, estimOptions = list(verbosity = 0, plot = FALSE, nbCores = 2) ) ``` - We select the best number of clusters (with respect to the ICL criteria) ```{r select SBM covar, echo=TRUE, eval = TRUE} mySimpleSBMCov$nbBlocks ``` - We can now extract the parameters of interest, namely ($\lambda$, $\pi$) and the clustering of the nodes. ```{r extract param SBM poisson covar, echo=TRUE, eval = TRUE} mySimpleSBMCov$connectParam mySimpleSBMCov$blockProp mySimpleSBMCov$memberships mySimpleSBMCov$covarParam ``` ```{r simpleSBMfitPoisson covar coef} coef(mySimpleSBMCov, 'covariates') ``` S3 methods are also available for fit and prediction (results hidden here) ```{r simpleSBMfitPoisson covar fitted, results='hide'} #fitted(mySimpleSBMCov) #predict(mySimpleSBMCov) #predict(mySimpleSBMCov, fungusTreeNetwork$covar_tree) ``` ## Analysis of the tree/fungi data We now analyze the bipartite tree/fungi interactions. The incidence matrix can be plotted with the function \cote{plotMyMatrix} ```{r plot incidence} plotMyMatrix(fungusTreeNetwork$fungus_tree, dimLabels = c(row = 'fungis', col= 'tree')) ``` ```{r tree_fungi_bipartite network} myBipartiteSBM <- fungusTreeNetwork$fungus_tree %>% estimateBipartiteSBM(model = 'bernoulli', dimLabels = c('fungis', 'tree'),estimOptions = list(verbosity = 0, plot = FALSE)) ``` ```{r bipartite.sbm fields} myBipartiteSBM$nbNodes myBipartiteSBM$nbBlocks myBipartiteSBM$connectParam coef(myBipartiteSBM, 'block') coef(myBipartiteSBM, 'connectivity') ``` We can now plot the reorganized matrix. ```{r plot bipartite} plot(myBipartiteSBM, dimLabels = c('fungis', 'tree')) ``` ```{r plot bipartite meso} plot(myBipartiteSBM, type = 'meso', dimLabels = list(row = 'fungis', col= 'tree'), plotOptions = list(edge.width = 1, vertex.size = c(1,2))) ``` ## References