leidenbase.Rmd

Brent Ewing

1/19/2022

Introduction

Leidenbase consists of R and C++ interfaces for the basic community detection functions of Vincent Traags’s Leidenalg package. The Leidenalg algorithm is described in the article

Traag, V.A., Waltman. L., Van Eck, N.-J. (2018). From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports, 9(1), 5233. 10.1038/s41598-019-41695-z

Detailed information describing use of the leiden algorithm is available at

https://leidenalg.readthedocs.io/en/latest/

leidenalg source code is available at

https://github.com/vtraag/leidenalg

R Interface

The R interface consists of the leiden_find_partition function

leiden_find_partition( igraph, partition_type = c("CPMVertexPartition", "ModularityVertexPartition", "RBConfigurationVertexPartition", "RBERVertexPartition", "SignificanceVertexPartition", "SurpriseVertexPartition"), initial_membership = NULL, edge_weights = NULL, node_sizes = NULL, seed = NULL, resolution_parameter = 0.1, num_iter = 2, verbose = FALSE )

Arguments:

  igraph: R igraph graph.

  partition_type: String partition type name. Default is CPMVertexParition.

  initial_membership: Numeric vector of initial membership assignments of nodes.
                      These are 1-based indices. Default is one community per
                      node.

  edge_weights: Numeric vector of edge weights. Default is 1.0 for all edges.

  node_sizes: Numeric vector of node sizes. Default is 1 for all nodes.

  seed: Numeric random number generator seed. The seed value must be
        either NULL for random seed values or greater than 0 for a
        fixed seed value. Default is NULL.

  resolution_parameter: Numeric resolution parameter. The value must be
                        greater than 0.0. Default is 0.1. The
                        resolution_parameter is ignored
                        for the partition_types ModularityVertexPartition,   
                        SignificanceVertexPartition, and
                        SurpriseVertexPartition.

  num_iter: Numeric number of iterations. Default is 2.

  verbose: A logic flag to determine whether or not we should print run
           diagnostics.
           
Value:
   A named list consisting of a numeric vector where each value is the community
   to which the node belongs (1-based indices), a numeric quality value, a
   numeric modularity, a numeric significance, a numeric vector of edge weights
   within each community, a numeric vector of edge weights from each community,
   a numeric vector of edge weights to each community, and total edge weight.

In order to use leiden_find_partition, create or load an igraph graph and then call leiden_find_partition. For example,

library(igraph)
fpath <- system.file('testdata', 'igraph_n1500_edgelist.txt.gz', package='leidenbase')
zfp <- gzfile(fpath)
igraph <- read_graph(file = zfp, format='edgelist')
res <- leiden_find_partition(igraph, partition_type = 'CPMVertexPartition', seed = 123456, resolution_parameter = 0.5)

leiden_find_partition returns

str(res)
## List of 8
##  $ membership                  : int [1:1500] 22 41 2 30 39 12 40 31 30 4 ...
##  $ quality                     : num 11835
##  $ modularity                  : num 0.426
##  $ significance                : num 87495
##  $ edge_weight_within_community: num [1:115] 1489 1284 1205 1147 893 ...
##  $ edge_weight_from_community  : num [1:115] 1786 2021 2069 1576 1742 ...
##  $ edge_weight_to_community    : num [1:115] 1786 2021 2069 1576 1742 ...
##  $ total_edge_weight           : num 61636

The membership vector consists of the community value for each node. For example,

head(res[['membership']])
## [1] 22 41  2 30 39 12

The nodes that belong to community 1 are

which(res[['membership']]==1)
##  [1]   33   35   41  116  140  151  172  178  208  214  238  308  367  440  454
## [16]  496  745  814  834  836  849  854  860  869  870  927  937 1042 1062 1104
## [31] 1111 1139 1238 1248 1263 1277 1297 1323 1348 1377 1382 1407 1483

The first three communities are

# lim1 <- max(res$membership)
lim1 <- 3
for( i in seq(lim1)) {
  cat(paste0('Community ', i, ':'))
  for(j in which(res[['membership']]==i))
    cat(' ', j)
  cat('\n')
}
## Community 1:  33  35  41  116  140  151  172  178  208  214  238  308  367  440  454  496  745  814  834  836  849  854  860  869  870  927  937  1042  1062  1104  1111  1139  1238  1248  1263  1277  1297  1323  1348  1377  1382  1407  1483
## Community 2:  3  94  96  112  127  154  195  232  248  283  297  363  385  451  455  464  517  565  671  797  802  839  971  976  1002  1037  1051  1106  1117  1120  1123  1195  1249  1257  1281  1283  1355  1395  1397  1479
## Community 3:  168  179  220  320  327  351  417  419  423  509  594  618  623  645  733  778  801  887  929  959  998  1011  1016  1088  1147  1155  1168  1201  1206  1258  1269  1290  1295  1312  1347  1389  1427  1432  1480

C/C++ Interface

The C/C++ interface consists of the leidenFindPartition function

int leidenFindPartition( igraph_t *pigraph,
                           std::string const partitionType,
                           std::vector < size_t > const *pinitialMembership,
                           std::vector < double > const *pedgeWeights,
                           std::vector < size_t > const *pnodeSizes,
                           size_t seed,
                           double resolutionParameter,
                           std::int32_t numIter,
                           std::vector < size_t > *pmembership,
                           std::vector < double > *pweightInCommunity,
                           std::vector < double > *pweightFromCommunity,
                           std::vector < double > *pweightToCommunity,
                           double *pweightTotal,
                           double *pquality,
                           double *pmodularity,
                           double *psignificance,
                           int *pstatus )

Parameters:

  igraph: pointer to igraph_t graph
  partitionType         partition type used for optimization
  initialMembership     vector of initial membership assignments of nodes (NULL
                        for default of one community per node)
  edgeWeights           vector of weights of edges (NULL for default of 1.0)
  nodeSizes             vector of node sizes (NULL for default of 1)
  seed                  random number generator seed (0=random seed)
  resolutionParameter   resolution parameter
  numIter               number of iterations
  pmembership           vector of node membership assignments
  pweightInCommunity    vector of edge weights within community
  pweightFromCommunity  vector of edge weights from community
  pweightToCommunity    vector of edge weights to community
  pweightTotal          total edge weights
  pquality              partition quality
  pmodularity           partition modularity
  psignificance         partition significance
  pstatus               function status (0=success; -1=failure)

Notes for the C/C++ interface: