This vignette presents the **M3JF**,which implements a
framework named multi-modality matrix joint factorization (M3JF) to
conduct integrative analysis of multiple modality data in
**R**. The objective is to provide an implementation of the
proposed method, which is designed to solve the high dimensionality
multiple modality data in bioinformatics. It was achieved by jointly
factorizing the matrices into a shared sub-matrix and several modality
specific sub-matrices. The introduction of group sparse constraint on
the shared sub-matrix forces the samples in the same group to allow each
modality exploiting only a subset of the dimensions of the global latent
space.

The latest stable version of the package can be installed from any CRAN repository mirror:

```
#Install
install.packages('M3JF')
#Load
library(M3JF)
```

The latest development version is available from https://cran.r-project.org/package=M3JF and may be downloaded from there and installed manually:

`install.packages('/path/to/file/M3JF.tar.gz',repos=NULL,type="source")`

**Support**: Users interested in this package are
encouraged to email to Xiaoyao Yin (xyyin@xmail.ncba.ac.cn) for enquiries, bug reports,
feature requests, suggestions or M3JF-related discussions.

We will give an example of how to use this package hereafter.

We generate simulated data with the R package *InterSIM*,
which generates three inter-related data set with realistic inter- and
intra- relationships based on the DNA methylation, mRNA expression and
protein expression from the TCGA ovarian cancer study. Each data
modality consists of 500 samples, samples are assigned to 4 groups with
100, 150, 135 and 115 samples per group. The data can be generated by
running:

```
library(InterSIM)
<- InterSIM(n.sample=500, cluster.sample.prop = c(0.20,0.30,0.27,0.23),
sim.data delta.methyl=5, delta.expr=5, delta.protein=5,p.DMP=0.2, p.DEG=NULL,
p.DEP=NULL,sigma.methyl=NULL, sigma.expr=NULL,
sigma.protein=NULL,cor.methyl.expr=NULL,
cor.expr.protein=NULL,do.plot=FALSE, sample.cluster=TRUE,
feature.cluster=TRUE)
<- sim.data$dat.methyl
sim.methyl <- sim.data$dat.expr
sim.expr <- sim.data$dat.protein
sim.protein <- list(sim.methyl, sim.expr, sim.protein) data_list
```

**Label assignment**: According to the data generation
process, we assign the groundtruth label to the data we have generated
as:

`= sim.data$clustering.assignment$cluster.id truelabel `

this label will be used to test the clustering ability afterwards.

Now we can cluster the samples with the proposed method and compare
its performance by calculating the normalized mutual information with
the function *cal_NMI* by inputting the truelabel and the
predicted label.

**Evaluating k**: Evaluate the most proper cluster
number k by mean of modality modulairty with the function
*new_modularity*.

```
#Build similarity matrices for your data with SNFtool
library(SNFtool)
library(dplyr)
<- lapply(data_list,function(x){
WL_dist1 <- x%>%as.matrix
dd <- dd %>% dist2(dd) %>% affinityMatrix(K = 10, sigma = 0.5)
w
})#Assign the interval of k according to your data
= 2:10
k_list #Initialize the varible
<- RotationCostBestGivenGraph(W,k_list)
clu_eval #The most proper is the one with minimal rotation cost
= k_list[which.min(clu_eval)] best_k
```

**M3JF**: Jointly factorize the matrices into a shared
embedding matrix and several modality private basis matrices.

```
#Assign the parameters
= 0.01
lambda = 10^-6
theta = best_k
k = M3JF(data_list,lambda,theta,k) res
```

Now you have got the classification result you want.

**Robustness test**: We test the robustness of our
method by calculating the normalized mutual information and adjusted
rand index of the true label and our predicted label. We can compare the
performance of our method with others by these scores, which lie in the
interval [0,1]. The larger the scores, the more robust the method.

```
library(SNFtool)
#Calculate the NMI of *M3JF*
= M3JF(data_list,lambda,theta,k)
M3JF_res = M3JF_res$clusters
M3JF_cluster = cal_NMI(true_label,M3JF_cluster)
M3JF_NMI #Calculate the ARI of *M3JF*
library(mclust)
= adjustedRandIndex(true_label,M3JF_cluster) M3JF_ARI
```