`mwcsr`

is an R package to solve maximum weight connected
subgraph (MWCS) problem and its variants. The package implements and
provides an interface to several solvers: both exact and heuristic.

`mwcsr`

can be installed from GitHub repository using
`devtools`

package:

Load `mwcsr`

, as well as `igraph`

package,
which contains functions for graph manipulations.

Let’s load an example instance of MWCS problem. The instance is a
simple `igraph`

object with `weight`

vertex
attribute.

```
## IGRAPH f86c56f UN-- 194 209 --
## + attr: name (v/c), label (v/c), weight (v/n), label (e/c)
## + edges from f86c56f (vertex names):
## [1] C00022_2--C00024_0 C00022_0--C00024_1 C00025_0--C00026_0
## [4] C00025_1--C00026_1 C00025_2--C00026_2 C00025_4--C00026_4
## [7] C00025_7--C00026_7 C00024_1--C00033_0 C00024_0--C00033_1
## [10] C00022_0--C00041_0 C00022_1--C00041_1 C00022_2--C00041_2
## [13] C00036_0--C00049_0 C00036_1--C00049_1 C00036_2--C00049_2
## [16] C00036_4--C00049_4 C00037_1--C00065_0 C00037_0--C00065_1
## [19] C00022_0--C00074_5 C00022_1--C00074_6 C00022_2--C00074_7
## [22] C00024_0--C00083_0 C00024_1--C00083_1 C00026_1--C00091_0
## + ... omitted several edges
```

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.7379 -0.7379 1.9294 5.9667 7.2931 38.1546
```

Now let us initialize a heuristic relax-and-cut MWCS solver (Alvarez-Miranda and Sinnl, 2017):

Now we can use this solver to solve the example instance:

```
## IGRAPH aafbb87 UN-- 162 164 --
## + attr: name (v/c), label (v/c), weight (v/n)
## + edges from aafbb87 (vertex names):
## [1] C00022_0--C00024_1 C00022_0--C00041_0 C00022_0--C00074_5
## [4] C00022_0--C00149_0 C00022_1--C00041_1 C00022_1--C00074_6
## [7] C00022_1--C00149_2 C00022_2--C00024_0 C00022_2--C00041_2
## [10] C00022_2--C00074_7 C00022_2--C00149_1 C00024_0--C00033_1
## [13] C00024_0--C00222_0 C00024_1--C00033_0 C00024_1--C00222_2
## [16] C00025_0--C00026_0 C00025_1--C00026_1 C00025_2--C00026_2
## [19] C00025_4--C00026_4 C00025_7--C00026_7 C00026_0--C00091_1
## [22] C00026_0--C00311_1 C00026_1--C00091_0 C00026_1--C00311_0
## + ... omitted several edges
```

`## [1] 1178.432`

Supported MWCS variants are:

- classic (simple) MWCS, where only vertices are weighted;
- generalized MWCS (GMWCS), where both vertices and edges are weighted;
- signal generalized MWCS (SGMWCS), where both vertices and edges are marked with weighted “signals”, and a weight of a subgraph is calculated as a sum of weights of its unique signals.

In `mwcsr`

, instances of all of the above problems are
represented by an `igraph`

object with certain specified
attributes. The validity and the type of the instance can be checked
using `get_instance_type`

function, for example:

```
## $type
## [1] "MWCS"
##
## $valid
## [1] TRUE
```

Simple maximum weight connected subgraph (MWCS) problem can be defined as follows. Let \(G = (V, E)\) be an undirected graph and \(\omega : V \rightarrow \mathbb{R}\) is a weight function defined on the vertices. Then MWCS problem consists in finding a connected subgraph \(\widetilde{G} = (\widetilde{V}, \widetilde{E})\) with a maximal total sum of vertex weights:

\[\Omega(\widetilde{G}) = \sum_{v \in \widetilde{V}} \omega(v) \rightarrow max.\]

An important property of MWCS is that solution can always be represented as a tree.

In `mwcsr`

an MWCS instance is defined as an
`igraph`

with `weight`

vertex attribute (and
without `weight`

edge attribute).

```
## IGRAPH f86c56f UN-- 194 209 --
## + attr: name (v/c), label (v/c), weight (v/n), label (e/c)
## + edges from f86c56f (vertex names):
## [1] C00022_2--C00024_0 C00022_0--C00024_1 C00025_0--C00026_0
## [4] C00025_1--C00026_1 C00025_2--C00026_2 C00025_4--C00026_4
## [7] C00025_7--C00026_7 C00024_1--C00033_0 C00024_0--C00033_1
## [10] C00022_0--C00041_0 C00022_1--C00041_1 C00022_2--C00041_2
## [13] C00036_0--C00049_0 C00036_1--C00049_1 C00036_2--C00049_2
## [16] C00036_4--C00049_4 C00037_1--C00065_0 C00037_0--C00065_1
## [19] C00022_0--C00074_5 C00022_1--C00074_6 C00022_2--C00074_7
## [22] C00024_0--C00083_0 C00024_1--C00083_1 C00026_1--C00091_0
## + ... omitted several edges
```

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.7379 -0.7379 1.9294 5.9667 7.2931 38.1546
```

Budget MWCS is a slight modification of the original problem. While the objective function remains the same, an additional constraint is introduced. In this problem nonnegative values called budget costs are assigned to vertices. The sum of budget costs of the vertices in the solution is then constrained to not exceed a predefined budget.

More formally, let \(c: V \rightarrow R^+\) be the function of the budget costs. \(B \in R^+\) is a budget of the problem. In this terms Budget MWCS is defined as a problem where the following conditions are satisfied:

\[ \Omega(\widetilde{G}) = \sum\limits_{v \in \widetilde{V}} \omega(v) \rightarrow max.\\ \text{subject to} \sum\limits_{v \in \widetilde{V}} c(v) \leq B \]

In `mwcsr`

a Budget MWCS instance is an
`igraph`

object with `weight`

vertex attribute as
in original MWCS problem and additional `budget_cost`

vertex
attribute. Value of the budget not the part of the instance itself but
rather a parameter to be passed to a function that solves an
instance.

```
budget_mwcs_example <- mwcs_example
set.seed(42)
V(budget_mwcs_example)$budget_cost <- runif(vcount(budget_mwcs_example))
get_instance_type(budget_mwcs_example)
```

```
## $type
## [1] "Budget MWCS"
##
## $valid
## [1] TRUE
```

Generalized MWCS (GMWCS) is similar to MWCS, but edges are also weighted. More formally, let \(G = (V, E)\) be an undirected graph and \(\omega : (V \cup E) \rightarrow \mathbb{R}\) is a weight function defined on the vertices and the edges. Then GMWCS problem consists in finding a connected subgraph \(\widetilde{G} = (\widetilde{V}, \widetilde{E})\) with a a maximal total sum of vertex and edge weights:

\[ \Omega(\widetilde{G}) = \sum_{v \in \widetilde{V}} \omega(v) + \sum_{e \in \widetilde{E}} \omega(e) \rightarrow max. \]

An important consequence of edge weights is that the optimal solution can contain cycles.

A GMWCS instance is defined as an `igraph`

with
`weight`

attribute defined for both vertices and edges.

```
## IGRAPH f86c56f UNW- 194 209 --
## + attr: name (v/c), label (v/c), weight (v/n), label (e/c), weight
## | (e/n)
## + edges from f86c56f (vertex names):
## [1] C00022_2--C00024_0 C00022_0--C00024_1 C00025_0--C00026_0 C00025_1--C00026_1
## [5] C00025_2--C00026_2 C00025_4--C00026_4 C00025_7--C00026_7 C00024_1--C00033_0
## [9] C00024_0--C00033_1 C00022_0--C00041_0 C00022_1--C00041_1 C00022_2--C00041_2
## [13] C00036_0--C00049_0 C00036_1--C00049_1 C00036_2--C00049_2 C00036_4--C00049_4
## [17] C00037_1--C00065_0 C00037_0--C00065_1 C00022_0--C00074_5 C00022_1--C00074_6
## [21] C00022_2--C00074_7 C00024_0--C00083_0 C00024_1--C00083_1 C00026_1--C00091_0
## [25] C00042_2--C00091_0 C00026_0--C00091_1 C00042_1--C00091_1
## + ... omitted several edges
```

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.7379 -0.7379 1.9294 5.9667 7.2931 38.1546
```

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.2715 -0.5762 -0.1060 0.5452 1.0458 8.5829
```

The signal generalized MWCS (SGMWCS) variant continues to generalize
MWCS problem and introduces a concept of *signals*. Instead of
specifying vertex and edge weights directly, the weights are defined for
a set of signals, and vertices and edges are marked with these signals.
The difference from GMWCS is that the signals can be repeated in the
graph, while the weight of the subgraph is defined as a sum of weights
of its unique signals.

Formally, let \(G = (V, E)\) be an undirected graph, \(S\) – a set of signlas with weights \(\omega: S \rightarrow \mathbb{R}\), and \(\sigma: (V \cup E) \rightarrow 2^S\) – markings of the vertices and edges with signals. An SGMWCS problem consists in finding a connected subgraph, with a maximal total sum of its unique signals:

\[ \Omega(\widetilde{G}) = \sum_{s \in \sigma(\widetilde{V} \cup \widetilde{E})} \omega(s) \rightarrow max, \] where \(\sigma(\widetilde{V} \cup \widetilde{E}) = \bigcup_{x \in (\widetilde{V} \cup \widetilde{E})} \sigma(x)\).

SGMWCS instances can arise when the data, from which the weights are inferred, map to the graph ambigously. For example, when m/z peak from mass-spectrometry data is assigned to multiple isomer metabolites, or when the same enzyme catalyze multiple reactions, and so on. Practically, it is usually assumed that the signals with negative weights are not repeated.

An SGMWCS is represented as an `igraph`

object, with
`signal`

attribute defined for both vertices and edges and a
`signals`

attribute defined for the graph, containing the
signal weights. Specification of multiple signals per node or edge is
not yet supported.

```
## IGRAPH f86c56f UN-- 194 209 --
## + attr: signals (g/n), name (v/c), label (v/c), signal (v/c), label
## | (e/c), signal (e/c)
## + edges from f86c56f (vertex names):
## [1] C00022_2--C00024_0 C00022_0--C00024_1 C00025_0--C00026_0 C00025_1--C00026_1
## [5] C00025_2--C00026_2 C00025_4--C00026_4 C00025_7--C00026_7 C00024_1--C00033_0
## [9] C00024_0--C00033_1 C00022_0--C00041_0 C00022_1--C00041_1 C00022_2--C00041_2
## [13] C00036_0--C00049_0 C00036_1--C00049_1 C00036_2--C00049_2 C00036_4--C00049_4
## [17] C00037_1--C00065_0 C00037_0--C00065_1 C00022_0--C00074_5 C00022_1--C00074_6
## [21] C00022_2--C00074_7 C00024_0--C00083_0 C00024_1--C00083_1 C00026_1--C00091_0
## [25] C00042_2--C00091_0 C00026_0--C00091_1 C00042_1--C00091_1
## + ... omitted several edges
```

`## chr [1:194] "s1" "s1" "s1" "s2" "s3" "s4" "s4" "s4" "s4" "s4" "s5" "s5" ...`

`## chr [1:209] "s103" "s104" "s105" "s106" "s107" "s108" "s109" "s110" "s110" ...`

```
## s1 s2 s3 s4 s5 s6
## 5.008879 -0.737898 -0.737898 20.112627 19.890279 2.069292
```

Sometimes, construction of SGMWCS instances can be simplified using
`normalize_sgmwcs_instance`

function.

Let consider an example graph obtained from gatom package.

```
## IGRAPH f86c56f UNW- 194 209 --
## + attr: name (v/c), metabolite (v/c), element (v/c), label (v/c), url
## | (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC (v/n), baseMean
## | (v/n), logPval (v/n), signal (v/c), signalRank (v/n), score (v/n),
## | weight (v/n), label (e/c), pval (e/n), origin (e/n), RefSeq (e/c),
## | gene (e/c), enzyme (e/c), reaction_name (e/c), reaction_equation
## | (e/c), url (e/c), reaction (e/c), rpair (e/c), log2FC (e/n), baseMean
## | (e/n), logPval (e/n), signal (e/c), signalRank (e/n), score (e/n),
## | weight (e/n)
## + edges from f86c56f (vertex names):
## [1] C00022_2--C00024_0 C00022_0--C00024_1 C00025_0--C00026_0 C00025_1--C00026_1
## + ... omitted several edges
```

In this graph, the `signals`

graph attributed is absent
with weights specified directly as vertex or edge attributes along with
`signal`

attributes, which is a very practical intermediate
representation. However, it is recognized as a GMWCS instance:

```
## $type
## [1] "GMWCS"
##
## $valid
## [1] TRUE
```

Let convert this representation into a valid SGMWCS instance using
`normalize_sgmwcs_instance`

function:

```
## $type
## [1] "SGMWCS"
##
## $valid
## [1] TRUE
```

And let call the same function with explicit arguments:

```
gatom_instance <- normalize_sgmwcs_instance(gatom_example,
nodes.weight.column = "weight",
edges.weight.column = "weight",
nodes.group.by = "signal",
edges.group.by = "signal",
group.only.positive = TRUE)
```

The function does the following:

- It extracts signal weights from the specified columns.
`NULL`

can be specified as a value of`nodes.group.by`

or`edges.group.by`

if there are no corresponding signals in the data, in which case zero signals will be created. - It splits input signals with negative weights into multiple unique
signals, unless
`group.only.positive`

is set to`FALSE`

.

Currently, four solvers are supported:

- heuristic relax-and-cut solver
`rmwcs_solver`

for MWCS and Budget MWCS; - heuristic relax-and-cut solver
`rnc_solver`

for MWCS/GMWCS/SGMWCS; - heuristic simulated annealing solver
`annealing_solver`

for MWCS/GMWCS/SGMWCS; - exact (if CPLEX library is available) or heuristic (without CPLEX)
solver
`virgo_solver`

for MWCS/GMWCS/SGMWCS. - exact SCIP-Jack (if SCIP is available) solver
`scipjack_solver`

for MWCS;

While selecting a particular solver depends on the particular class of instances, the general recommendations are:

- For MWCS use
`rmwcs_solver`

if small suboptimality can be tolerated. It is very fast and usually is able to find optimal or very close to optimal solution. To find exact solution`virgo_solver`

can be used if CPLEX library is available. - For Budget MWCS only
`rmwcs_solver`

is available. - For GMWCS and SGMWCS
`virgo_solver`

is the recommended solver if CPLEX library is available. Without CPLEX it is recommended to use`rnc_solver`

giving both acceptable primal solution and an upper bound on the objective function. Manual tuning of the`annealing_solver`

may give better solutions in some cases.

Relax-and-cut solver is a heuristic solver able to rapidly find high-quality solutions for MWCS problem (Alvarez-Miranda and Sinnl, 2017, https://doi.org/10.1016/j.cor.2017.05.015). The solver does not require any additional libraries.

Relax-and-cut solver can be constructed using
`rmwcs_solver`

function with the default arguments.

`## [1] 1178.432`

`## [1] FALSE`

`rmwcs_solver`

supports Budget MWCS instances and its
special case where all budget costs are set to one and called MWCS with
cardinality constraints. In `mwcsr`

such problems are
represented as Simple MWCS problems and maximum cardinality is passed to
`solve_mwcsp`

funciton as argument:

`## [1] 10`

`## [1] 134.9068`

To solve Budget MWCS passing budget limit is necessary as well:

`## [1] 4.625048`

`## [1] 163.6254`

Rnc solver is another relax-and-cut solver made for GMWCS/SGMWCS
problems inspired by rmwcs solver. The solver does not require any
libraries as well. Although with `rnc_solver`

it is possible
to solve MWCS problems, running `rmwcs_solver`

for this type
of problems of this type is preferable. No budget and cardinality
variants are available.

The solver can be constructed using `rnc_solver`

function.

`## [1] 1295.815`

`## [1] FALSE`

And for SGMWCS instance:

`## [1] 263.8406`

Another heuristic solver is a simulated annealing based solver. The
solver is rather generic, but can produce good enough solutions if
parameters are tuned well. As it is heuristic solver with no estimate on
upper bound of the objective function the solved to optimality flag is
always set to `FALSE`

.

This solver does support warm start allowing to run series of restarts of annealing solver with different temperature schedules.

The use of this solver may be beneficial in some cases, although there are no generic guidelines.

```
m <- NULL
for (i in 0:15) {
asolver <- annealing_solver(schedule = "boltzmann", initial_temperature = 8.0 / (2 ** i),
final_temperature = 1 / (2 ** i))
if (i != 0) {
m <- solve_mwcsp(asolver, gmwcs_example, warm_start = m)
} else {
m <- solve_mwcsp(asolver, gmwcs_example)
}
print(m$weight)
}
```

```
## [1] 955.6224
## [1] 1157.73
## [1] 1216.681
## [1] 1218.817
## [1] 1222.339
## [1] 1222.339
## [1] 1222.339
## [1] 1222.339
## [1] 1222.339
## [1] 1222.339
## [1] 1222.339
## [1] 1222.339
## [1] 1222.339
## [1] 1222.339
## [1] 1222.339
## [1] 1222.339
```

The `mwcsr`

package provides interface to exact Java-based
Virgo solver

(https://github.com/ctlab/virgo-solver) which can be used
to solve MWCS, GMWCS and SGMWCS instances. The solver requires Java
(11+) to be installed on your machine.

There are two modes of execution:

- Heuristic – which finds a solution based on minimal spanning tree heuristic and does not require any additional setup;
- Exact – which uses CPLEX library to solve the instances to provable optimality. CPLEX can be downloaded from the official web-site: https://www.ibm.com/products/ilog-cplex-optimization-studio. Free licence can be obtained for academic purposes. CPLEX version 12.7.1 or higher is required.

Heuristic solver can be constructed, by specifying
`cplex_dir=NULL`

. As it is a heuristic solver, the solved to
optimality flag is always set to `FALSE`

.

```
mst_solver <- virgo_solver(cplex_dir=NULL)
m <- solve_mwcsp(mst_solver, sgmwcs_example)
print(m$weight)
```

`## [1] 263.2752`

`## [1] FALSE`

Exact solver requires setting `cplex_dir`

argument with a
path to CPLEX installation. The `cplex_dir`

requires, that
`cplex.jar`

file and CPLEX dynamic library file (depending on
the operating system: `libcplex<version>.dll`

for
Windows, `libcplex<version>.so`

for Linux,
`libcplex<version>.jnilib`

for OS X) can be found there
with recursive search. Alternatively, `cplex_jar`

argument
pointing to `cplex.jar`

file and `cplex_bin`

argument pointing to the directory with CPLEX dynamic library files can
be specified. Additionally, it is convenient to put the path to CPLEX
into a `CPLEX_HOME`

environment variable, so that it does not
have to be changed from one system to another, when run.

`## Error in virgo_solver(cplex_dir = cplex_dir): Could not find `cplex.jar` in /opt/ibm/ILOG/CPLEX_Studio129/`

`## Error in eval(expr, envir, enclos): object 'exact_solver' not found`

As the CPLEX found the optimal solution, the corresponding flag is
set to `TRUE`

:

`## [1] 263.2752`

`## [1] FALSE`

Some additional information like the running time, instance files,
solver version is available as in the `stats`

field. Refer to
Virgo documentation for the description of the values.

```
## isOpt VPrep EPrep time nodes edges
## 1 0 136 148 803 59 58
## nodefile
## 1 /tmp/Rtmp8wdbQk/graph35b1ac13d84855/nodes.txt
## edgefile
## 1 /tmp/Rtmp8wdbQk/graph35b1ac13d84855/edges.txt
## sigfile version
## 1 /tmp/Rtmp8wdbQk/graph35b1ac13d84855/signals.txt 0.1.5
```

Computational resources availble for Virgo can be specified with the following parameters:

`memory`

– maximum amount of memory, more specifically Java heap size, specified via`-Xmx`

Java option, default:`2G`

.`threads`

– number of threads for simultaneous computation, default: the number of available cores.`timelimit`

– maximum time in seconds to solve the problem, if the solver is interrupted due to time limit, the best solution found so far is reported and`solved_to_optimality`

flag is set to`FALSE`

.

Another useful parameter is `penalty`

. The non-zero
penalty make the solver run an additional pass over the solution, with
each edge penalized with the specified value. As there can be multiple
solutions, having the same weight, especially in case of SGMWCS, this
procedure allows to locally minimize the solution size, while preserving
the weight.

`## Error in virgo_solver(cplex_dir = cplex_dir, penalty = 0.001): Could not find `cplex.jar` in /opt/ibm/ILOG/CPLEX_Studio129/`

`## Error in eval(expr, envir, enclos): object 'psolver' not found`

`## Error in eval(expr, envir, enclos): object 'min_m' not found`

`## Error in eval(expr, envir, enclos): object 'min_m' not found`

Now the solution, has `min_m$stats$nodes`

nodes instead of
`m$stats$nodes`

for the solution that was found before, while
having the same total weight of `min_m$weight`

.

You can also use R interface to SCIP-jack solver. To use it you need
to download SCIPOptSuite from the official web-site and build
`scipstp`

application. Beware, that `scipstp`

is
not provided in the pre-built SCIPOptSuite version, so you have to build
it manually from source.

Quick build instructions:

```
# in scipoptsuite source directory
cmake -Bbuild -H.
cmake --build build --target scipstp
# optionally copy scipstp file somewhere to $PATH
cp build/applications/scipstp /usr/local/bin/
```

For complete build and configuration instructions for this solver visit SCIP website.

After installing and ensuring that scipstp application is correctly
built, you can access `scipjack_solver`

class to solve MWCS
instances:

The optimization parameters are passed using config file. You can modify bundled file or create a new one to fine-tune the solver.

This part of the tutorial shows how `mwcsr`

solvers can be
combined with `BioNet`

package to find an active module in a
protein-protein interaction network. You need `BioNet`

and
`DLBCL`

packages from Bioconductor to be installed in order
to run following code examples.

```
BioNetInstalled <- FALSE
if (requireNamespace("BioNet") && requireNamespace("DLBCL")) {
BioNetInstalled <- TRUE
}
```

Let start with generating an example scored network, following BioNet tutorial:

```
if (BioNetInstalled) {
library("BioNet")
library("DLBCL")
data(dataLym)
data(interactome)
pvals <- cbind(t = dataLym$t.pval, s = dataLym$s.pval)
rownames(pvals) <- dataLym$label
pval <- aggrPvals(pvals, order = 2, plot = FALSE)
logFC <- dataLym$diff
names(logFC) <- dataLym$label
subnet <- subNetwork(dataLym$label, interactome)
subnet <- rmSelfLoops(subnet)
fb <- fitBumModel(pval, plot = FALSE)
scores <- scoreNodes(subnet, fb, fdr = 0.001)
}
```

Here we have network object `subnet`

of type
`graphNEL`

and a vector of node scores
`scores`

:

BioNet comes with a heuristic MWCS FastHeinz solver, that we can use to find the module following the BioNet tutorial:

```
if (BioNetInstalled) {
bionet_h <- runFastHeinz(subnet, scores)
plotModule(bionet_h, scores=scores, diff.expr=logFC)
sum(scores[nodes(bionet_h)])
}
```

We can construct an MWCS instance by converting `graphNEL`

object into `igraph`

and add node weights:

```
if (BioNetInstalled) {
bionet_example <- igraph.from.graphNEL(subnet, weight=FALSE) # ignoring edge weights of 1
V(bionet_example)$weight <- scores[V(bionet_example)]
get_instance_type(bionet_example)
}
```

Now the instance can be solved with the relax-and-cut solver:

```
if (BioNetInstalled) {
rmwcs <- rmwcs_solver()
bionet_m <- solve_mwcsp(rmwcs, bionet_example)
plotModule(bionet_m$graph, scores=scores, diff.expr=logFC)
}
```

Note that the weight increased, compared to FastHeinz solution:

Similarly, Virgo can be used to solve the instance to provable optimality, but in this case it produces the same results: