Cluster any count data matrix with a fixed number of variables. Implements the branch & bound Classification-Variational Expectation-Maximisation of this paper (to appear in Computational Statistics).

**MoMPCA** is available on CRAN and the
development version available on Github.

**MoMPCA** needs the following CRAN R packages, so check
that they are are installed on your computer.

```
<- c("methods",
required_CRAN "topicmodels",
"tm",
"Matrix",
"slam",
"magrittr",
"dplyr",
"stats",
"doParallel",
"foreach",
"ggplot2",
"reshape2",
"tidytext")
<- setdiff(required_CRAN, rownames(installed.packages()))
not_installed_CRAN if (length(not_installed_CRAN) > 0) install.packages(not_installed_CRAN)
```

- For the last stable version, use the CRAN version

`install.packages("MoMPCA")`

- For the development version, use the github install

`::install_github("nicolasJouvin/MoMPCA") remotes`

The package comes with the BBCmsg data set and a
`simulate_BBC()`

function wich allows to reproduce the
simulation of the paper.

```
library(MoMPCA)
<- simulate_BBC(N = 400, L = 200, epsilon = 0, lambda = 1)
simu <- simu$dtm.full
dtm <- simu$Ytruth # true clustering Ytruth
```

The `dtm`

is a `tm::DocumentTermMatrix()`

object. The main fitting function is `mmpca_clust()`

, which
allow for a parralel backend via its argument `mc.cores`

.
There is a simple wrapper around this function called
`mmpca_clust_modelselect()`

which allows for model selection
of `(Q, K)`

with an ICL criterion. Please be aware that the
greedy nature of the algorithm may induce quite intensive
computations.

```
<- mmpca_clust(simu$dtm.full, Q = 6, K = 4,
res Yinit = 'random',
method = 'BBCVEM',
max.epochs = 7,
keep = 1,
verbose = 2,
nruns = 2,
mc.cores = 1)
```

The top words of the topic matrix `beta`

can then be
plotted (if working with text)

```
<- plot(res, type = 'topics', n_words = 5)
ggtopics print(ggtopics)
```

And the bound evolution throughout the epochs

```
<- plot(res, type = 'bound')
ggbound print(ggbound)
```

```
<- mmpca_clust_modelselect(simu$dtm.full, Qs = 5:7, Ks = 3:5,
res Yinit = 'kmeans_lda',
init.beta = 'lda',
method = 'BBCVEM',
max.epochs = 7,
nruns = 3,
verbose = 1)
= res$models best_model
```

Please cite our work using the following reference:

- N. Jouvin, P. Latouche, C. Bouveyron, A. Livartowski, G. Bataillon, Greedy clustering of count data through a mixture of multinomial PCA (To appear in Computational Statistics)

```
@article{jouvin:hal-02278224,
TITLE = {{Greedy clustering of count data through a mixture of multinomial PCA}},
AUTHOR = {Jouvin, Nicolas and Latouche, Pierre and Bouveyron, Charles and Bataillon, Guillaume and Livartowski, Alain},
URL = {https://hal.archives-ouvertes.fr/hal-02278224},
NOTE = {31 pages, 10 figures},
JOURNAL = {{Computational Statistics}},
PUBLISHER = {{Springer Verlag}},
YEAR = {2020},
KEYWORDS = {Dimension reduction ; Topic modeling ; Count data ; Mixture models ; Clustering ; Variational inference},
HAL_ID = {hal-02278224},
HAL_VERSION = {v1},
}
```

and consider citing this package

```
citation('MoMPCA')
##
## To cite package 'MoMPCA' in publications use:
##
## Nicolas Jouvin (2020). MoMPCA: Inference and Clustering for Mixture
## of Multinomial Principal Component Analysis. R package version 1.0.0.
## https://CRAN.R-project.org/package=MoMPCA
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {MoMPCA: Inference and Clustering for Mixture of Multinomial Principal
## Component Analysis},
## author = {Nicolas Jouvin},
## year = {2020},
## note = {R package version 1.0.0},
## url = {https://CRAN.R-project.org/package=MoMPCA},
## }
##
## ATTENTION: This citation information has been auto-generated from the
## package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
```