`ergm`

Terms`options(rmarkdown.html_vignette.check_title = FALSE)`

This document seeks to be the most up-to-date API documentation for
`ergm`

terms. Note that it is not intended to be a tutorial
as much as a description of what inputs and outputs different parts of
the system expect.

The storage API defines two types of storage: *private
storage*, which is attached the `ModelTerm`

structure and
is specific to each `ergm`

term, and *public storage*,
which is attached to the `Model`

and can be accessed by all
terms.

A *statistic* is a familiar `ergm`

term like
“`edges`

” or “`nodefactor`

”: it adds at least one
sufficient statistic to the model. Every statistic can have private
storage, and it can read from public storage, but it cannot write to
public storage.

An *auxiliary* in an `ergm`

term but not an ERGM
term in the mathematical sense: it adds no statistics to the model and
exists only to initialize and maintain public storage to be used by
statistics. It may not be specified on an `ergm`

formula by
the end-user, but only requested by a statistic.

An auxiliary can rely on another auxiliary’s public storage. Note that circular dependencies are not checked.

For the purposes of this overview, the following information is relevant, and is elaborated formally later:

- Each term has a a pointer
`void *mtp->storage`

to private storage - Each term has a pointer to an array of pointers to public storage
`void **mtp->aux_storage`

. (The pointer is to the same location for all terms of a model.) - Each term has some subset of five functions on the
`C`

side: initializers (`i_`

), updaters (`u_`

), change stats (`c_`

), difference stats (`d_`

), finalizers (`f_`

), writers (`w_`

), and “eXtended” functions (`x_`

). - An
`InitErgmTerm.`

function’s output list can have an additional element,`auxiliaries`

, a one-sided formula.

- The user passes a formula to the procedure (
`ergm`

,`summary`

, etc.); this formula may only have statistics. `ergm_model()`

is called.- It iterates through the formula terms, calling
`InitErgmTerm.<NAME>()`

functions in turn (or`InitWtErgmTerm.<NAME>()`

for valued ERGMs), adding their output to the model term list. Some terms include`auxiliaries`

formulas in their list. `call.ErgmTerm()`

when it finds that a term has requested auxiliaries, attaches an attribute`attr(., "aux.slots")`

containing an integer vector for the model’s own and/or requested auxiliaries’ positions on the`aux_storage`

vector.- It calls
`ergm.auxstorage()`

with the complete model.- It iterates through initialized model terms, looking at their
`auxiliaries`

element for a one-sided formula listing their requested auxiliaries. - It iterates through the auxiliary formulas, calling term initialization on them.
- It iterates through the
`auxiliaries`

elements of auxiliary terms and initialises those, etc.. - It constructs a list of unique initialized auxilariy terms (that is, when two or more terms had requested an identical auxiliary).
- It inserts the initialized unique auxiliary terms in
`model$terms`

and the index (in the unique list) of the auxiliary requested by each statistic in the`aux.slots`

of the requesting statistic. - For auxiliary itself, the first element of
`aux.slots`

is set to the position of the auxiliary itself.

- It iterates through initialized model terms, looking at their

- It iterates through the formula terms, calling
- The
`ergm_state`

is constructed from an edgelist (`state$el`

), an empty network (`state$nw0`

), a model (`state$model`

), and (optionally) a proposal (`state$proposal`

) and a statistics vector (`state$stats`

). `update.ergm_state()`

is called.- It iterates through the formula terms, calling
`term$ext.encode()`

(if defined) to construct a vector`state$ext.state`

.`state$ext.flag`

is set to reconciled.

- It iterates through the formula terms, calling
- The
`ergm_state`

is passed to the`C`

code. `Redgelist2Network()`

initializes the network.`ModelInitialize()`

is called:`ModelInitialize()`

initializes all terms (statistic and auxiliaries), also counting up the number of auxiliaries (distinguished by having no`c_`

,`d_`

, or`s_`

functions). A term can export both a`c_`

function and a`d_`

function. In that case, it is responsible for deleting one of them when its`i_`

function is called.- It counts up the number of auxiliaries and allocates an array of
`void *`

s, one for each auxiliary. The`mtp->aux_storage`

pointer for each term is set to point to that (one) array. - It assigns
`attr(model$terms[[i]], "aux.slots")`

to`mtp->aux_slots`

. - It assigns the
`ergm_model`

`SEXP`

to`model->R`

, in case it’s needed. - It assigns the
`ergm_model$terms[[i]]`

`SEXP`

to`mtp->R`

, in case it’s needed. - It calls
`InitStats()`

, which calls the initializer (`i_`

function) of each term or, if not found, an updater (`u_`

function) with invalid input (i.e., toggle \((0,0)\)) is called in case the term developer prefers a one-function implementation.

- The terms are initialized in reverse order, so auxiliaries are initialized before the statistics are, and statistics and auxiliaries can count on their auxiliaries being initialized by the time they are initialized.

- For each iteration, a proposal is made.
`ChangeStats()`

is called.- It calls the
`d_`

functions, for those terms for which they are initialized. - It iterates through the proposed toggles.
- It calls the
`c_`

functions. - It adds its output to the cumulative change.
- It calls the
`u_`

functions with the toggle (if more to come). - It makes the toggle provisionally (if more to come).

- It calls the
- It undoes the provisional toggles.

- It calls the
- If the proposal is accepted,
`u_`

function is called, and network is updated for each toggle.

`ModelDestroy()`

is called:`DestroyStats()`

is called, iterating through the terms.- Finalizer (
`f_`

) function is called, if defined. - If
`mtp->storage`

is not`NULL`

, it is freed.

- Finalizer (
- Other parts of the model are freed; in particular those of
`mtp->aux_storage`

if not`NULL`

.

`ErgmStateRSave()`

is called:`Network2Redgelist()`

is called, returning a`SEXP`

with the state.- For each term, a writer (
`w_`

function) is called if defined, returning a`SEXP`

with the extended state. - These are stored in a vector
`state$ext.state`

. Element`state$ext.flag`

is set to signal that a change was made on the`C`

side.

`NetworkDestroy()`

is called.- States and other outputs are passed back to
`R`

. `update.ergm_state()`

is called.- It iterates through the formula terms, calling
`term$ext.decode()`

(if defined) to update`state$nw0`

or other aspects of the network.`state$ext.flag`

is set to reconciled.

- It iterates through the formula terms, calling

`R`

sideThe following is adapted from the header of
`R/InitErgmTerm.R`

`InitErgmTerm.*`

and `InitWtErgmTerm.*`

functionsThe following are the minimal arguments of these functions:

`function(nw, arglist, ...)`

`nw`

: a `network`

object taken from the LHS of
the model formula or the `basis=`

argument (if given); valued
networks are instrumented with `%ergmlhs% "response"`

information.

`arglist`

: a `list`

of arguments passed to the
term call on the formula. If arguments are passed with names, the list
is named. Typically, helper function `check.ErgmTerm()`

is
used to preprocess this list with the equivalent of R’s
`match.call()`

. Notably, thanks to R’s lazy evaluation, it is
possible to obtain the argument expressions without evaluating them by
calling `substitute(arglist)`

. See
`InitErgmTerm.:`

and `InitErgmTerm.*`

in
`R/InitErgmTerm.interaction.R`

for examples.

`env`

: an environment, typically of the
`formula`

from which the term was extracted.

Any term options are passed as direct arguments (not
`arglist`

). See `options?ergm`

for details.

Three return types are possible:

`NULL`

, to indicate that this term does not add to the model. (E.g.,`nodefactor("a", levels=FALSE)`

.)- An
`ergm_model`

object, in which case its terms are “pasted” into the model. This is useful for some term operators. - A named
`list`

defining the properties of the term. The following names have special meanings; but any other elements can be included and will be accessible on the`C`

side.

In the following, let \(p = \dim(\eta) = \dim(g(y))\), the dimension of the statistic \(g(y)\) being computed and of the canonical parameter \(\eta\). If the model is curved, the model parameter vector \(\theta\) is first mapped onto \(\eta=\eta(\theta)\); otherwise, \(\eta\equiv\theta\). Let \(q=\dim(\theta)\).

`name`

(required): a string containing the term’s
`C`

-side name: `ergm`

will search for
`name`

prepended with `"c_"`

for change
statistics, `"d_"`

for difference, `"s_"`

for
summary, etc.. This is the only required element.

`coef.names`

: a character vector of length \(p\) of names for the elements of the
canonical statistic of the model (and canonical parameters); can be
absent or zero-length for auxiliaries.

`inputs`

: a vector of (double-precision) numeric inputs
that will be made available to the `C`

-side implementation as
a `double`

vector; optionally,`inputs`

may have an
attribute named `"ParamsBeforeCov"`

, which is used primarily
for backwards compatibility, but can also be used to more conveniently
separate, e.g., metadata elements from vertex attribute elements.

`iinputs`

: a vector of integer inputs that will be made
available to the `C`

-side implementation as an
`int`

vector; optionally,`iinputs`

may have an
attribute named `"ParamsBeforeCov"`

, which is used primarily
for backwards compatibility, but can also be used to more conveniently
separate, e.g., metadata elements from vertex attribute elements.

`pkgname`

: a string containing the name of the
`R`

package containing the `C`

implementation; if
not specified, the package in which the `Init*ErgmTerm.*`

function was found is assumed.

`dependence`

: a logical value indicating whether the
addition of this term to the model induces dyadic-dependence; if all
terms have `dependence`

set to `FALSE`

, the model
is inferred to be dyad-independent; if not specified, `TRUE`

is assumed.

`emptynwstats`

: a numeric vector of length \(p\) providing the value of the statistic if
evaluated on an empty network; if not specified or NULL, assumed to be a
vector of zeros. (See `InitErgmTerm.degree()`

in
`R/InitEergmTerm.R`

for an example.)

`minpar`

and `maxpar`

: numeric vectors of
length \(q\) giving the bounds on the
valid values for the model’s parameters; if not specified,
`-Inf`

and `+Inf`

vectors are assumed.

`offset`

: a logical vector of length \(q\) that allows the term to mark some of
its own statistics as having fixed parameters.

For curved terms, all of the following must be present except
`cov`

:

`params`

: a list whose names correspond to element names
of the curved parameter vector \(\theta\); the items in the list are there
for historical reasons and are ignored.

`map`

: a function taking at least two arguments,
`x`

(a numeric \(q\)-vector
containing \(\theta\)),
`n`

\(=p\) (the length of the
output) and an optional `cov`

parameter; it is to return a
numeric vector of length `n`

containing \(\eta(\theta)\).

`gradient`

: a function taking the same arguments as
`map`

and returning the gradient of `map`

(\(\eta'(\theta)\)) as a \(q\times p\) numeric matrix.

`cov`

: an optional arbitrary data structure that if
present is passed to `map`

and `gradient`

.

`C`

side`ModelTerm`

and `WtModelTerm`

data
structuresThe data structure contains information accessible to the
`C`

-side statistics. Here, `outlist`

refers to the
list returned by the `Init*ErgmTerm.*`

function.

A number of its elements are for internal use and should generally
not be considered a part of the API or accessed by the term (with some
rare exceptions). These include: `c_func`

,
`d_func`

, `i_func`

, `u_func`

,
`f_func`

, `s_func`

, `w_func`

,
`x_func`

, `z_func`

, `statspos`

,
`statcache`

, `emptynwstats`

.

Furthermore, elements `aux_storage`

and
`aux_slots`

should not be accessed directly but only with
helper described in the auxiliary storage API below.

The following elements are a part of the API:

`double *attrib`

(`INPUT_ATTRIB`

,
`DINPUT_ATTRIB`

): contents of `outlist$inputs`

,
shifted by `ParamsBeforeCov`

attribute if given. (Rarely
used.)

`int *iattrib`

(`IINPUT_ATTRIB`

): contents of
`outlist$iinputs`

, shifted by `ParamsBeforeCov`

attribute if given. (Rarely used.)

`int nstats`

(`N_CHANGE_STATS`

): value of \(p\), the length of the statistic
vector.

`double *dstats`

(`CHANGE_STAT`

): a \(p\)-vector to be overwritten with statistic
value.

`int ninputparams`

(`N_INPUT_PARAMS`

,
`N_DINPUT_PARAMS`

): length of
`outlist$inputs`

.

`double *inputparams`

: contents of
`outlist$inputs`

.

`int niinputparams`

(`N_IINPUT_PARAMS`

): length
of `outlist$iinputs`

.

`int *iinputparams`

(`IINPUT_PARAMS`

): contents
of `outlist$iinputs`

.

`void *storage`

(`STORAGE`

): a pointer managed
by the term to its private storage space.

`unsigned int n_aux`

(`N_AUX`

): number of
auxiliaries associated with this term; typically the number requested,
plus one if the term is itself an auxiliary.

`SEXP R`

: the contents of `outlist`

as an
`R`

expression.

`SEXP ext_state`

: Location of the extended state
information for the term. See below.

`c_`

, `d_`

, and `s_`

functions`d_`

functions are the original difference statistics.
`c_`

functions are new, while `s_`

functions have
been around for a long time, but never formally documented.

Change statistic (binary):
`void c_<NAME>(Vertex tail, Vertex head, ModelTerm *mtp, Network *nwp, Rboolean edgestate)`

Change statistic (valued):
`void c_<NAME>(Vertex tail, Vertex head, double weight, WtModelTerm *mtp, WtNetwork *nwp, double edgestate)`

Difference statistic (binary):
`void d_<NAME>(Vertex *tails, Vertex *heads, ModelTerm *mtp, Network *nwp)`

Difference statistic (valued):
`void d_<NAME>(Vertex *tails, Vertex *heads, double *weights, WtModelTerm *mtp, WtNetwork *nwp)`

Summary statistic (binary):
`void s_<NAME>(ModelTerm *mtp, Network *nwp)`

Summary statistic (valued):
`void s_<NAME>(WtModelTerm *mtp, WtNetwork *nwp)`

**Parameters**

**Note**: In undirected networks, it can be assumed that
`tail`

< `head`

(and similarly with the
multiple toggles). In bipartite networks, `tail`

s are in the
first partition and `head`

s are in the second.

`Edge ntgoggles`

: Number of edges to be toggled or
updated.

`Vertex tail`

: Tail of (1) dyad to be toggled or
updated.

`Vertex *tails`

: An array of tails of the dyads to be
toggled or updated.

`Vertex head`

: Head of (1) dyad to be toggled or
updated.

`Vertex *heads`

: An array of heads of the dyads to be
toggled or updated.

`double weight`

: New weight for (1) dyad.

`double *weights`

: An array of new weights for the dyads
to be toggled or updated.

`ModelTerm *mtp`

: A pointer to the `ModelTerm`

data structure. See `inst/include/ergm_changestat.h`

and
`inst/include/ergm_wtchangestat.h`

for details.

`Network *nwp`

: A pointer to the `Network`

of
interest before any toggles are applied.

`Rboolean edgestate`

: An indicator of whether edge
`(tail,head)`

is in the network `nwp`

pre-toggle.

`double edgestate`

: The weight of dyad
`(tail,head)`

in the network `nwp`

pre-update.

**Storage**

All functions except for `s_`

expect any storage they need
to be initialized and up to date (consistent with `nwp`

). In
particular, if their statistic requested \(k\) auxiliary terms, the \(k\) (`mtp->n_aux`

) elements
of its `mtp->aux_slots`

vector will be the indexes of
`mtp->aux_storage`

where they can find the respective
objects.

It is worth noting that macros defined for `d_`

functions
that refer to a specific toggle, such as `TAIL`

,
`HEAD`

, etc. might not be usable in a `c_`

function, but it’s made up for by `c_`

function’s reduced
need for bookkeeping: `tail`

, `head`

, etc. can be
used directly.

These functions overwrite `mtp->dstats`

(often aliased
as `CHANGE_STAT`

) with the following:

`c_`

and`d_`

functions: change of the value of the statistic they implement relative to`nwp`

due to the toggles.`s_`

the value of the statistic it implements.

Every `ergm`

term has private storage, found at
`void *mtp->storage`

, which allows it to store arbitrary
information about the state of the network, as well as precalculated
values of variables, preallocated memory it needs for its calculations,
or any other use. It does so by specifying an updating function (and,
optionally, an initialization and a finalization function). This
updating function is called every time the network is about to change.
The API for these functions is defined below.

Public storage is found at `void **mtp->aux_storage`

.
Each auxiliary term gets assigned a slot (i.e.,
`void *nwp->mtp->aux_storage[i]`

) to manage; its slot
number is the first element of its input vector, and terms requesting it
are told which slot to look in in a similar fashion. An auxiliary term
that requests other auxiliaries will have its own slot as the first
input and the slots of auxiliaries it requests as subsequent inputs.

`R`

sideA statistic that only references its private storage *or* is
an auxiliary itself does not need to do anything special on the
`R`

side.

To request an auxiliary, a term’s `InitErgmTerm`

call’s
output list must include an `auxiliaries`

element containing
a one-sided `ergm`

-style formula listing the auxiliary terms
it wishes to use separated by the `+`

operator.

`C`

side: Modifying Storage`i_`

functions: Initializer/ConstructorThis function is optional for using storage: if it’s not provided,
the model code will call the `u_`

function with an invalid
toggle first, signaling for it to initialize.

Binary:
`void i_<NAME>(Model *mtp, Network *nwp)`

Valued:
`void i_<NAME>(WtModel *mtp, WtNetwork *nwp)`

In general, `i_`

function expects to be called after
`ModelInitialize()`

and `NetworkInitialize()`

,
before any `c_`

or `d_`

functions. That is, the
network must be populated with the ties of its initial state and have
`mtp->aux_storage`

vector allocated.

**Private storage**

Network populated with initial ties and initialized model.

**Public storage**

The first element of `mtp->aux_slots`

is the index of
the element of `mtp->aux_storage`

to be managed by this
auxiliary. That is
`mtp->aux_storage[mtp->aux_slots[0]]`

is a
`void *`

to point to the data to be public.

The other data passed from the `InitErgmTerm.`

are shifted
over to make room for it.

**Private storage**

Allocates memory for the information to be stored and overwrites
`mtp->storage`

with a pointer to it, then updates the
stored information to be consistent with `*nwp`

.

**Public storage**

Allocates memory for the information to be stored and overwrites
`mtp->aux_storage[mtp->aux_slots[0]]`

with a pointer to
it, then updates the stored information to be consistent with
`*nwp`

.

An auxiliary can also use its private storage as needed.

`u_`

functions: UpdaterBinary:
`void u_<NAME>(Vertex tail, Vertex head, Model *mtp, Network *nwp, Rboolean edgestate)`

Valued:
`void u_<NAME>(Vertex tail, Vertex head, double weight, Model *mtp, Network *nwp, double edgestate)`

Initialized network. If no `i_`

function was provided, to
be called with a \((0,0)\) toggle as a
signal to initialize; otherwise, initialized storage. Any statistic or
auxiliary can rely on its auxiliaries having been initialized before
it.

If called with an a toggle `(0,0)`

and uninitialized
storage, initialize. This will never be done if an `i_`

function is defined for the term.

Update the state of its storage (`mtp->storage`

and/or
`mtp->aux_storage[mtp->aux_slots[0]]`

) to match what
the state of `*nwp`

would be after the given dyad had been
toggled.

`f_`

functions: Finalizer/DestructorThis function is optional for using storage: if it’s not provided,
the model code will free any pointers to
`mtp->aux_storage`

and `mtp->storage`

that
are not `NULL`

.

Binary:
`void f_<NAME>(Model *mtp, Network *nwp)`

Valued:
`void f_<NAME>(WtModel *mtp, WtNetwork *nwp)`

Network and a model.

Deallocates its storage (`mtp->storage`

and/or
`mtp->aux_storage[mtp->aux_slots[0]]`

) and sets its
pointers to `NULL`

.

`C`

side: Accessing Storage`c_`

, `d_`

, and `s_`

functions can
read from, but not write to, their private storage. `c_`

and
`d_`

functions can rely on initialization having been called
before.

Auxilaries must not implement `c_`

, `d_`

, and
`s_`

functions.

Terms requesting one or more auxiliaries will be passed the indices
of the element of `mtp->aux_storage`

by inserting them at
the start of `mtp->aux_slots`

. That is
`mtp->aux_storage[mtp->aux_slots[0]]`

is a
`void *`

to point to the data public by the first auxiliary
term on the `auxiliaries`

formula,
`mtp->aux_storage[mtp->aux_slots[1]]`

is the second,
etc..

`x_`

functions: eXtensionsThis interface is intended to be used by packages extending
`ergm`

to send arbitrary signals to statistics and
auxiliaries. For example, for temporal ERGMs, it may be used to signal
to the statistic that the clock is about to advance. It is the
responsibility of the extension writer to ensure that everything behaves
sensibly.

Binary:
`void x_<NAME>(unsigned int type, void *data, Model *mtp, Network *nwp)`

Valued:
`void x_<NAME>(unsigned int type, void *data, Model *mtp, Network *nwp)`

**Parameters**

`type`

: a magic constant identifying the type of signal
being sent. Based on it, the function can ignore the signal, or
determine how to interpret `data`

.

`data`

: arbitrary data to be sent to the function; it is
up to the extension writer to determine how it is formatted and
interpreted.

There are no restrictions on side-effects. It is up to the extension writer to ensure that everything works.

The following helper macros have been defined to date, and can be
found in `storage.h`

.

These functions are defined in `ergm_storage.h`

and
exported.

`ALLOC_STORAGE(nmemb, stored_type, store_into)`

: Allocate
a vector of `nmemb`

elements of type
`stored_type`

, save its pointer to private storage and also
to a `stored_type *store_into`

which is also declared. Should
be used by the `i_`

function, but may also be used by the
`u_`

function.

`GET_STORAGE(stored_type, store_into)`

: Declare
`stored_type *store_into`

and assign the pointer to private
storage to it. Can be used by all functions.

`ALLOC_AUX_STORAGE(nmemb, stored_type, store_into)`

:
Allocate a vector of `nmemb`

elements of type
`stored_type`

, and save it to the auxiliary storage slot
belonging to the calling auxiliary and into a
`stored_type *store_into`

which is also declared. Can be used
by the `i_`

function, but may also be used by the
`u_`

function.

`GET_AUX_STORAGE(stored_type, store_into)`

: Declare
`stored_type *store_into`

and assign the pointer to the
auxiliary storage (either for a statistic or for the auxiliary). Can bn
used by all functions.

`GET_AUX_STORAGE_NUM(stored_type, store_into, ind)`

:
Declare `stored_type *store_into`

and assign the pointer to
the `ind`

th auxiliary). Can be used by clients of
auxiliaries.

`ALLOC_AUX_SOCIOMATRIX(stored_type, store_into)`

: Allocate
an array of appropriate dimension with elements of type
`stored_type`

, save it to auxiliary storage, and into
`**store_type`

, so that `store_into[i][j]`

returns
the value associated with dyad \((i,j)\), with vertices indexed from 1. For
bipartite and undirected networks, as little space as possible (resp. a
rectangle or a triangle) is allocated.

Note that this term assumes that the private and the public storage of the calling term are not used in any other way.

`FREE_AUX_SOCIOMATRIX`

: Frees the sociomatrix allocated by
`ALLOC_AUX_SOCIOMATRIX`

.

MHproposals may also request auxiliary terms. An
`InitErgmProposal.<NAME>()`

or
`InitWtErgmProposal.<NAME>()`

with an
`auxiliaries`

formula will similarly receive the positions of
its auxiliaries’ slots in the network. However, this appears to have a
slight cost in speed and a potentially significant cost in memory, since
the auxiliary may need to duplicate the information in the
`MH_`

function.

Functions prefixed with `Mi_`

, `Mu_`

, and,
`Mf_`

serve as respectively the initializers, the updaters,
and the finalizers of the MHproposal storage, though the old-style call
with `MHp->ntoggles==0`

is also supported. Macros in the
`MHstorage.h`

header file can be used to access storage the
same way as for the statistics.

The function called to generate the proposal can have a prefix of
either `MH_`

(for backwards compatibility) or
`Mp_`

for consistency.

One important difference is that `Mp_`

function
*is* permitted to write to its private storage. This may be
useful if, say, a systematic sample is desired.