FVDDPpkg

Introduction

Before starting, it should be mentioned that the sole purpose of this vignette is to provide intuitive and easily replicable instructions on how to use the FVDDPpkg package on . For this reason, the underlying theory will not be developed except in the minimum necessary terms; for more information, please refer to the bibliography cited here.

First of all, import the package.

library(FVDDPpkg)

As shown in the work of Papaspiliopoulos, Ruggiero, and Spanò (2016) Fleming-Viot Dependent Dirichlet Processes (FVDDP), conditioned on observed data \(Y\), take the general form of finite mixtures of Dirichlet Processes: in fact \[X_t \ | \ Y \sim \sum_{\mathbf{m} \in M} w_\mathbf{m}\Pi_{\alpha + \sum_{j=1}^K m_j \delta_{y_j^\ast}}\] where

\(X_t\) is the state of the process, also denoted as latent signal in the related literature. Its dependence on the time \(t\) is crucial, since the aim of this package is to perform the computations required to estimate its state at different times.
\(\mathbf{y}^\ast = (y_1^\ast, \dots, y_K^\ast)\) is a \(K\)-dimensional vector containing all observed unique values.
\(M\) is a set of multiplicities; its elements are \(K\)-dimensional integer-valued vectors in the form \(\mathbf{m}= (m_1, \dots, m_K)\). Intuitively, they can be thought as their \(j\)-th entry counts the occurences of the \(j\)-th unique value, \(y_j^\ast\).
\(w_\mathbf{m}\) is the weight associated to the vector \(\mathbf{m}\in M\). By definition, weights are positive and sum up to \(1\).
\(\Pi_\alpha\) denotes the law of a Dirchet process, as introduced by Ferguson.
\(\alpha\) is the characterizing measure of a Dirichlet Process. In turn, it can be expressed as \(\alpha = \theta P_0\), where the real number \(\theta\) is the intensity, and the probability measure \(P_0\) gives the centering. For a wide review of the Dirichlet Process and its properties, refer to Ghosal and Vaart (2017).
\(\delta_y\) is a Dirac measure who puts mass on the point \(y\).

The derivation of this model, which stems from the study of population genetics, is done by exploiting the concept of duality for Markov processes (Papaspiliopoulos and Ruggiero (2014)), applying it to the results of Ethier and Griffiths (1993), among the others, on the seminal work of Fleming and Viot (1979).

Initialization

In order to understand how to recover the previous expression, start noting that, unconditional on observed data \[X_{t_0} \sim \Pi_{\alpha},\] where the time is arbitrarily set \(t=t_0\). This means that, while no data is included within the model, FVDDP can be fully characterized by \(\theta\) and \(P_0\). The creation of a process is carried out using the function initialize The user has to specify the positive real number theta and two functions, the first to sample from \(P_0\) (sampling.f) and the second to evaluate its p.d.f. or p.m.f. (density.f), depending whether it is atomic or not; this is specified by the last argument, atomic. The function returns an object of class fvddp.

FVDDP = initialize(theta = 1.28, sampling.f = function(x) rpois(x, 5),
                   density.f = function(x) dpois(x, 5), TRUE)

FVDDP
#> theta: 1.28 
#> 
#> P0.sample: function(x) rpois(x, 5)
#> 
#> P0.density/mass: function(x) dpois(x, 5)
#> 
#> is.atomic: TRUE 
#> 
#> Empty Process

In this chunk of code, for example, \(\theta = 1.28\) and \(P_0 \sim \mathrm{Po(5)}\). Note that when printing the process, it is explicitly stated that no data has been included within the model.

Update

Updating the process with data collected at time \(t_0\) stored in the vector \(Y_0\), the form of the latent signal becomes \[ X_{t_0} \ | \ Y_0 \sim \Pi_{\alpha + \sum_{j=1}^K m_j \delta_{y_j^\ast}}. \] In some sense, this already is the mixture expressed in the general formula, under the specification that the vector \(\mathbf{y}^\ast\) collects the unique values observed at time \(t_0\), \(\mathbf{m}\) stores the multiplicities of \(Y_0\) with respect to \(\mathbf{y}^\ast\), and \(M = \{ \mathbf{m}\}\): this implies that \(w_\mathbf{m}= 1\).

The update is performed by means of the update() command, whose arguments are an fvddp object and a numeric vector. The returned object will include the information provided by \(Y_0\). In particular:

\(\mathbf{y}^\ast\) is stored as an ordered vector in the attribute y.star.
\(M\) is stored as an integer-valued matrix M of size \(|M| \times K\), where \(|M|\) denotes the cardinality of \(M\) and \(K\) is the length of \(\mathbf{y}^\ast\). Each vector \(\mathbf{m}\) is stored as a row of M.
\(w\) is stored as a vector w of size \(|M|\). Its \(j\)-th element represents the weight associated to the \(j\)-th row of M.

FVDDP = update(fvddp = FVDDP, y.new = c(7, 4, 9, 7))

FVDDP
#> theta: 1.28 
#> 
#> P0.sample: function(x) rpois(x, 5)
#> 
#> P0.density/mass: function(x) dpois(x, 5)
#> 
#> is.atomic: TRUE 
#> 
#> Unique values (y.star): 
#> [1] 4 7 9
#> 
#> Multiplicities (M): 
#>      4 7 9
#> [1,] 1 2 1
#> 
#> Weights (w): 
#> [1] 1

As one could expect, updating the signal with the vector \((4,7,7,9)\) (the order of the input does not matter) leads to a degenerate mixture with \(\mathbf{y}^\ast = (4,7,9)\) and \(M = \{ (1,2,1)\}\).

Updating a non-empty process will have a different effect: suppose to know the law of \[X_{t_n} \ | \ Y_0, \dots, Y_{n-1} \sim \sum_{\mathbf{m} \in M} w_\mathbf{m}\Pi_{\alpha + \sum_{j=1}^K m_j \delta_{y_j^\ast}}\] where \(Y_j\) denotes a vector of values observed at time \(t_j\), then \[X_{t_n} | Y_0 \dots, Y_n \sim \sum_{\mathbf{m} \in (M + \mathbf{n})} w_\mathbf{m}^\phi \Pi_{\alpha + \sum_{j=1}^K m_j \delta_{y_j^\ast}}\] where \(\mathbf{n}= (n_1, \dots, n_K)\) is the vector of multiplicities of \(Y_n\) according to the unique values collected up to the same time, the new weights are such that \[w_{\mathbf{m}}^\phi \propto w_\mathbf{m}\mathrm{PU}(\mathbf{n}|\mathbf{m})\] where \(\mathrm{PU}(\mathbf{m}|\mathbf{n})\) denotes the probability of drawing a vector of multiplicities \(\mathbf{n}\) starting from \(\mathbf{m}\) via Polya urn sampling scheme under \(\theta\) and \(P_0\) specified by the model, and \[M + \mathbf{n}= \{ \mathbf{m}+ \mathbf{n}: \mathbf{m}\in M\}.\]. Hence, the following changes will be applied to the input process of the function:

if new values are observed, they are included in y.star.
the vector \(\mathbf{n}\) will be added to each row M.
the vector \(w\) ill be appropriately modified and normalized in order to sum up to one.

For the details of the role of Polya urn scheme on the update of mixtures of Dirichlet Processes, see Antoniak (1974) and Blackwell and MacQueen (1973).

Propagation

The propagation of the signal, also known as prediction, aims to estimate the state of the process at a time after the data is collected: in other words, in the future. If updating the signal one can get \(X_{t_n} \ | \ Y_0, \dots, Y_n\), the use of the propagation leads to \[X_{t_n + t}\ |\ Y_0, \dots, Y_n \sim \sum_{\mathbf{n} \in L(M)} w_\mathbf{n}^\psi \Pi_{\alpha + \sum_{j=1}^K n_j \delta_{y_j^\ast}}.\] This means that the probability mass is shifted to a set \[L(M) = \{ \mathbf{n}\in M : \exists \ \mathbf{m}\in M : \mathbf{n}\leq \mathbf{m}\}\] where the notation \(\mathbf{n}\leq \mathbf{m}\) implies that \(n_j \leq m_j \ \forall j \in \{1, \dots, K\}\). The new weights are such that \[w_\mathbf{n}^\phi = \sum_{\mathbf{m}\in M : \mathbf{n}\leq \mathbf{m}} w_\mathbf{m}p_{\mathbf{m}, \mathbf{n}}(t)\] and \(p_{\mathbf{m}, \mathbf{n}}(t)\) represents the probability of reaching \(\mathbf{n}\) starting from \(\mathbf{m}\) in a time \(t\) for a \(K\)-dimensional death process, as shown in Tavaré (1984); however, the exact value of such probability, stated in Papaspiliopoulos, Ruggiero, and Spanò (2016) is \[p_{\mathbf{m}, \mathbf{n}}(t)= \begin{cases} e^{-\lambda_{|\mathbf{m}|}t} \quad &\text{if} \ \mathbf{n}= \mathbf{m}\\ C_{|\mathbf{m}|, |\mathbf{n}|}(t) \mathrm{MVH} (\mathbf{n}; |\mathbf{n}|, \mathbf{m}) \quad &\text{if} \ \mathbf{n}< \mathbf{m}\\ \end{cases}\] where \(\lambda_{|\mathbf{m}|} = \frac{|\mathbf{m}|(\theta + |\mathbf{m}| -1)}{2}\) and \[C_{|\mathbf{m}|, |\mathbf{n}|}(t) = \big(\prod_{h=|\mathbf{n}| + 1}^{|\mathbf{m}|} \lambda_{h} \big) (-1)^{|\mathbf{m}|-|\mathbf{n}|} \sum_{k=|\mathbf{n}|}^{|\mathbf{m}|} \frac{e^{-\lambda_{k} t}}{\prod_{|\mathbf{n}| \leq h \leq |\mathbf{m}|, h \neq k }(\lambda_{k} - \lambda_{h})}\] and \(|\mathbf{m}|\) represents the \(L^1\) norm (i.e. the sum) of the vector \(\mathbf{m}\).

The propagate() function can be exploited to propagate the signal. Its arguments are an fvddp object and a (positive) time. The result is the propagated process, whose matrix M will be larger and whose weights will be as described in the formulae above.

FVDDP = propagate(fvddp = FVDDP, delta.t = 0.6)

FVDDP
#> theta: 1.28 
#> 
#> P0.sample: function(x) rpois(x, 5)
#> 
#> P0.density/mass: function(x) dpois(x, 5)
#> 
#> is.atomic: TRUE 
#> 
#> Unique values (y.star): 
#> [1] 4 7 9
#> 
#> Multiplicities (M): 
#>       4 7 9
#>  [1,] 0 0 0
#>  [2,] 1 0 0
#>  [3,] 0 1 0
#>  [4,] 1 1 0
#>  [5,] 0 2 0
#>  [6,] 1 2 0
#>  [7,] 0 0 1
#>  [8,] 1 0 1
#>  [9,] 0 1 1
#> [10,] 1 1 1
#> [11,] 0 2 1
#> [12,] 1 2 1
#> 
#> Weights (w): 
#>  [1] 0.060274058 0.099035520 0.198071041 0.142898157 0.071449079 0.027252056
#>  [7] 0.099035520 0.071449079 0.142898157 0.054504111 0.027252056 0.005881167

The example shows the propagation of the signal introduced in the previous section, with \(t = 0.6\). Note that y.star does not vary. Also, note that in examples like this, it is sufficient a time \(t \simeq 2\) to shift almost all the mass to the component of the mixture characterized by the zero vector.

FVDDP = update(fvddp = FVDDP, y.new = c(4, 7, 7, 10, 10, 5))

FVDDP
#> theta: 1.28 
#> 
#> P0.sample: function(x) rpois(x, 5)
#> 
#> P0.density/mass: function(x) dpois(x, 5)
#> <bytecode: 0x12b7a6180>
#> 
#> is.atomic: TRUE 
#> 
#> Unique values (y.star): 
#> [1]  4  5  7  9 10
#> 
#> Multiplicities (M): 
#>       4 5 7 9 10
#>  [1,] 1 1 2 0  2
#>  [2,] 2 1 2 0  2
#>  [3,] 1 1 3 0  2
#>  [4,] 2 1 3 0  2
#>  [5,] 1 1 4 0  2
#>  [6,] 2 1 4 0  2
#>  [7,] 1 1 2 1  2
#>  [8,] 2 1 2 1  2
#>  [9,] 1 1 3 1  2
#> [10,] 2 1 3 1  2
#> [11,] 1 1 4 1  2
#> [12,] 2 1 4 1  2
#> 
#> Weights (w): 
#>  [1] 0.032822343 0.051700651 0.302672433 0.327846322 0.083102652 0.061084451
#>  [7] 0.009482191 0.010270845 0.060128867 0.044197613 0.011203233 0.005488398

The latter chunk shows an application of update() on a larger process. A larger vector y.new may induce large variations in the weights. Being it of size \(3\), the example does not cause an immediately recognizable effect.

Smoothing

In the theory of Hidden Markov Models, the smoothing operator is used to infer the state of the signal given observations from the past, the present and the future. In other words, one can estimate \(X_t\) when \(t \leq t_n\), exploiting all collected data.

To do so, it has been shown by Ascolani, Lijoi, and Ruggiero (2023) that it is required to create two processes. The first has to be propagated forward from \(t_0\) to \(t-{i-1}\) as in the previous sections; the second one has to be run backward using the same strategy: initialize and update it at \(t_n\) and propagate it towards \(t_{n-1}\) (with a positive time \(t_n - t_{n-1}\) in the function), and sequentially update and propagate until \(t_{i+1}\) is reached.

Doing this, one will get that \[X_{t_{i-1}} \ |\ Y_0, \dots, Y_{i-1} \sim \sum_{\mathbf{n}_{i-1} \in M_{i-1}} u_{\mathbf{n}_{i-1}} \Pi_{\alpha + \sum_{j=1}^K n_{i-1,j}\delta_{y_j^\ast}}\] and \[X_{t_{i+1}} \ |\ Y_{i+1}, \dots, Y_{n}= \sum_{\mathbf{n}_{i+1}\in M_{i+1}} v_{\mathbf{n}_{i+1}} \Pi_{\alpha + \sum_{j=1}^K n_{i+1,j}\delta_{y_j^\ast}}\] where the subscript \(i-1\) and \(i+1\) are necessary to specify elements from the past or the future mixture, \(v\) stands for the weights and for example \(n_{i-1, j}\) is the \(j\)-th component of the vector \(\mathbf{n}_{i-1}\) (same for \(\mathbf{n}_{i+1}\)).

Provided this description based on available data from past and future, call \(\mathbf{n}_i\) the multiplicities generated by the vector \(Y_i\). Then

\[X_{t_i} \ |\ X_{t_{i-1}}, X_{t_{i+1}}, Y_i \sim \sum_{\substack{\mathbf{n}_{i-1}\\ \in M_{i-1}}}\sum_{\substack{\mathbf{n}_{i+1}\\ \in M_{i+1}}} u_{\mathbf{n}_{i-1}} v_{\mathbf{n}_{i+1}} \sum_{\substack{(\mathbf{k}_{i-1}, \mathbf{k}_{i+1}) \\ \in D^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}}} w_{\mathbf{k}_{i-1}, \mathbf{n}_i, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} \Pi_{\alpha + \sum_{j=1}^K (\mathbf{k}_{i-1, j} + \mathbf{n}_{i,j} + \mathbf{k}_{i+1, j} )\delta_{y_j^\ast}}\] where:

if \(P_0\) is non-atomic, define the sets \[ D_{i-1} := \{ j \in \{ 1, \dots, K\} : n_{i-1, j} > 0 \ \text{and either} \ n_{i,j}>0 \ \text{or} \ n_{i+1,j}>0 \},\] \[D_{i+1} := \{ j \in \{ 1, \dots, K\} : n_{i+1, j} > 0 \ \text{and either} \ n_{i,j}>0 \ \text{or} \ n_{i-1,j}>0 \}\] and \[ S := D_{i-1} \cup D_{i+1}\] to express the indices of shared values among different times. Then \[\begin{align*} D^{\mathbf{n}_{i-1},\mathbf{n}_{i+1}} = \{ (\mathbf{k}_{i-1}, \mathbf{k}_{i+1}) : &\mathbf{k}_{i-1}\leq \mathbf{n}_{i-1}\ \text{and} \ k_{i-1, j} > 0 \ \forall \ j \in D_{i-1},\\ &\mathbf{k}_{i+1}\leq \mathbf{n}_{i+1}\ \text{and} \ k_{i+1, j} > 0 \ \forall \ j \in D_{i+1} \} \end{align*}\] and the weights are such that:
- if \(S = \emptyset\): \[w_{\mathbf{k}_{i-1}, \mathbf{n}_i, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}\ \propto \tilde{p}_{\mathbf{k}_{i-1}, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} \frac{\theta^{(|\mathbf{k}_{i-1}|)} \theta^{(|\mathbf{k}_{i+1}|)}}{(\theta + |\mathbf{n}_i|)^{(|\mathbf{k}_{i-1}|+|\mathbf{k}_{i+1}|)}}\]
- if \(S \neq \emptyset\): \[w_{\mathbf{k}_{i-1}, \mathbf{n}_i, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} \ \propto \tilde{p}_{\mathbf{k}_{i-1}, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} \frac{\theta^{(|\mathbf{k}_{i-1}|)} \theta^{(|\mathbf{k}_{i+1}|)}}{(\theta + |\mathbf{n}_i|)^{(|\mathbf{k}_{i-1}|+|\mathbf{k}_{i+1}|)}} \\ \times \prod_{j \in S}\frac{(k_{i-1, j} + n_{i,j} + k_{i+1,j}-1)!}{(k_{i-1,j}-1)! (n_{i,j}-1)! (k_{i-1,j}-1)!} \] if \((\mathbf{k}_{i-1}, \mathbf{k}_{i+1}) \in D\), and \(0\) otherwise, under the convention that \((-1)!=1\).
if \(P_0\) is atomic, let \[D^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}:=\{ (\mathbf{k}_{i-1}, \mathbf{k}_{i+1}) : \mathbf{k}_{i-1}\leq \mathbf{n}_{i-1}, \mathbf{k}_{i+1}\leq \mathbf{n}_{i+1}\}\] and the weights can be expressed as \[ w_{\mathbf{k}_{i-1}, \mathbf{n}_i, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}\ \propto \tilde{p}_{\mathbf{k}_{i-1}, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} \frac{m(\mathbf{k}_{i-1}+ \mathbf{n}_i +\mathbf{k}_{i+1})}{m(\mathbf{k}_{i-1})m(\mathbf{n}_i)m(\mathbf{k}_{i+1})}\]

where \[\tilde{p}_{\mathbf{k}_{i-1}, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} = p_{\mathbf{n}_{i-1}, \mathbf{k}_{i-1}}(t_i - t_{i-1})p_{\mathbf{n}_{i+1}, \mathbf{k}_{i+1}}(t_{i+1} - t_i)\] is the joint transition probability from \(\mathbf{n}_{i-1}\) to \(\mathbf{k}_{i-1}\) in time \(t_i - t_{i-1}\) and from \(\mathbf{n}_{i+1}\) to \(\mathbf{k}_{i+1}\) in time \(t_{i+1} - t_i\) and \(m(\cdot)\) is the marginal likelihood function of multiplicities in the atomic case.

This peculiar structure, developed in Ascolani, Lijoi, and Ruggiero (2023), can be better understood with some examples. They can be shown in this implementation with smooth(). The arguments are two latent signals (fvddp.past and fvddp.future), the positive times \(t_i - t_{i-1}\) (t.past) and \(t_{i+1} - t_i\) (t.future) and the data collected at time \(t_i\) (y.new).

FVDDP_NONATOMIC = initialize(theta = 0.7, sampling.f = function(x) rbeta(x, 4, 7),
                          density.f = function(x) dbeta(x, 4, 7), atomic = FALSE)
FVDDP_PAST_NONATOMIC = update(fvddp = FVDDP_NONATOMIC, y.new = c(0.210, 0.635, .541))
FVDDP_FUTURE_NONATOMIC = update(fvddp = FVDDP_NONATOMIC, y.new = c(0.210))
FVDDP_FUTURE_NONATOMIC = propagate(fvddp = FVDDP_FUTURE_NONATOMIC, delta.t = 0.4)
FVDDP_FUTURE_NONATOMIC = update(fvddp = FVDDP_FUTURE_NONATOMIC, y.new = c(.635))

In the example above, two process were created with \(\theta = 0.7\) and \(P_0 \sim \mathrm{Beta}(4, 7)\). The signal was updated once in the past, and twice in the future (with a propagation between the two updates).

FVDDP_SMOOTH_NONATOMIC = smooth(fvddp.past = FVDDP_PAST_NONATOMIC, fvddp.future = FVDDP_FUTURE_NONATOMIC,
                                t.past = 0.75, t.future = 0.3, y.new = c(0.210, 0.635, 0.479))

FVDDP_SMOOTH_NONATOMIC
#> theta: 0.7 
#> 
#> P0.sample: function(x) rbeta(x, 4, 7)
#> 
#> P0.density/mass: function(x) dbeta(x, 4, 7)
#> 
#> is.atomic: FALSE 
#> 
#> Unique values (y.star): 
#> [1] 0.210 0.479 0.541 0.635
#> 
#> Multiplicities (M): 
#>      0.21 0.479 0.541 0.635
#> [1,]    2     1     0     3
#> [2,]    2     1     1     3
#> [3,]    2     1     2     3
#> 
#> Weights (w): 
#> [1] 0.23344663 0.03392615 0.73262722

Using the function on the two processes, it is possible to see that the structure described above for the nonatomic case causes a shrinkage of the mixture. Indeed, the set \(M\) only contains three vectors.

In order to make a comparison, one can try to do something similar taking \(P_0 \sim \mathrm{Binom}(10, 0.6)\).

FVDDP_ATOMIC = initialize(theta = 0.7, sampling.f = function(x) rbeta(x, 10, 0.6),
                          density.f = function(x) dbinom(x, 10, 0.6), atomic = TRUE)
FVDDP_PAST_ATOMIC = update(fvddp = FVDDP_ATOMIC, y.new = c(2, 6, 5))
FVDDP_FUTURE_ATOMIC = update(fvddp = FVDDP_ATOMIC, y.new = c(2))
FVDDP_FUTURE_ATOMIC = propagate(fvddp = FVDDP_FUTURE_ATOMIC, delta.t = 0.4)
FVDDP_FUTURE_ATOMIC = update(fvddp = FVDDP_FUTURE_ATOMIC, y.new = c(6))

As before, the mixture referred to past observations is updated once, the one referred to future observations is updated twice.

FVDDP_SMOOTH_ATOMIC = smooth(fvddp.past = FVDDP_PAST_ATOMIC, fvddp.future = FVDDP_FUTURE_ATOMIC,
                             t.past = 0.75, t.future = 0.3, y.new = c(2, 6, 4))

FVDDP_SMOOTH_ATOMIC
#> theta: 0.7 
#> 
#> P0.sample: function(x) rbeta(x, 10, 0.6)
#> 
#> P0.density/mass: function(x) dbinom(x, 10, 0.6)
#> <bytecode: 0x129a106d8>
#> 
#> is.atomic: TRUE 
#> 
#> Unique values (y.star): 
#> [1] 2 4 5 6
#> 
#> Multiplicities (M): 
#>       2 4 5 6
#>  [1,] 1 1 0 1
#>  [2,] 1 1 0 2
#>  [3,] 1 1 0 3
#>  [4,] 1 1 1 1
#>  [5,] 1 1 1 2
#>  [6,] 1 1 1 3
#>  [7,] 1 1 2 1
#>  [8,] 1 1 2 2
#>  [9,] 1 1 2 3
#> [10,] 2 1 0 1
#> [11,] 2 1 0 2
#> [12,] 2 1 0 3
#> [13,] 2 1 1 1
#> [14,] 2 1 1 2
#> [15,] 2 1 1 3
#> [16,] 2 1 2 1
#> [17,] 2 1 2 2
#> [18,] 2 1 2 3
#> 
#> Weights (w): 
#>  [1] 0.0001721823 0.0025036150 0.0094628016 0.0002320301 0.0024384330
#>  [6] 0.0068287189 0.0004682311 0.0037328174 0.0075867038 0.0117827759
#> [11] 0.1266737735 0.3103950448 0.0112750878 0.0913256329 0.1717719912
#> [16] 0.0153587293 0.0979424891 0.1300489424

In this case, the mixture is clearly bigger. The reason is that when \(P_0\) is atomic, the set \(D^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}\) does not put constraints based on the appearance of shared values across different times.

Approximations

Past examples should also provide an insight on the main issue related to propagate() and smooth(): the size of the matrix \(M\) grows polynomially with respect to the amount of collected data; moreover, it can be seen that as the number of weights increases, many of them become almost negligible.

In order to avoid long computations, it is possible to use approx.propagate(): it reproduces the propagation of the signal by means of Monte Carlo method. The idea, proposed by Ascolani, Lijoi, and Ruggiero (2021), is to mimic the evolution of the \(K\)-dimensional death process using a one-dimensional one, and then extracting a multidimensional vector with a multivariate hypergeometic distributions.

FVDDP =initialize(theta = 3, sampling.f= function(x) rnorm(x, -1, 3),
                  density.f = function(x) dnorm(x, -1, 3), atomic = FALSE)
FVDDP = update(fvddp = FVDDP, y.new = c(-1.145, 0.553, 0.553, 0.553))

In the previous chunk, a process with hyperparameters \(\theta = 3\) and \(P_0 \sim \mathcal{N}(-1, 3)\) was created and updated. The syntax of the approximating functions is just the same as in propagate(), with the exceptions that one must specify the number of samples N to be drawn.

FVDDP_APPR_PROP = approx.propagate(fvddp = FVDDP, delta.t = 0.45, N = 20000)

FVDDP_APPR_PROP
#> theta: 3 
#> 
#> P0.sample: function(x) rnorm(x, -1, 3)
#> 
#> P0.density/mass: function(x) dnorm(x, -1, 3)
#> 
#> is.atomic: FALSE 
#> 
#> Unique values (y.star): 
#> [1] -1.145  0.553
#> 
#> Multiplicities (M): 
#>      -1.145 0.553
#> [1,]      0     0
#> [2,]      0     1
#> [3,]      0     2
#> [4,]      0     3
#> [5,]      1     0
#> [6,]      1     1
#> [7,]      1     2
#> [8,]      1     3
#> 
#> Weights (w): 
#> [1] 0.13530 0.33260 0.17440 0.02000 0.10310 0.17210 0.05785 0.00465

The results is again an fvddp object. In order to measure the accuracy of such approximation, one has to compute the exact output of the propagation, again with time \(t=0.45\).

FVDDP_EXACT_PROP = propagate(fvddp = FVDDP, delta.t = 0.45)

Then one can measure the difference in the weights with error.estimate(). The arguments are fvddp.exact and fvddp.approx, and the output is a vector containing the difference among the weights, in absolute value. The option remove.unmatched allows to choose whenever a vector is in the exact propagation but not in the approximate: if TRUE, the missing weight is assumed to be \(0\), if FALSE, this comparison will not be reported in the output (which will result to be shorter).

error.estimate(FVDDP_EXACT_PROP, FVDDP_APPR_PROP)
#> [1] 0.0058286503 0.0068326918 0.0028019247 0.0014386014 0.0008613986
#> [6] 0.0015530747 0.0001989751 0.0001334191

Something similar can be done for the smoothing via approximate.smooth(); in this case the Monte Carlo method is necessary to support importance sampling, where the importances are given by the right hand side of the formulae for \(w_{\mathbf{k}_{i-1}, \mathbf{n}_i, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}\). For this reason, the result of the simulation may be less stable than in the case of the propagation seen above, and a larger amount of samples will be required to achieve a good accuracy.

In the following example, one can see how to copy wht was done in the exact smoothing.

FVDDP_SMOOTH_APPR = approx.smooth(fvddp.past = FVDDP_PAST_ATOMIC, fvddp.future = FVDDP_FUTURE_ATOMIC,
                             t.past = 0.75, t.future = 0.3, y.new = c(2, 6, 4), N = 50000)

FVDDP_SMOOTH_APPR
#> theta: 0.7 
#> 
#> P0.sample: function(x) rbeta(x, 10, 0.6)
#> 
#> P0.density/mass: function(x) dbinom(x, 10, 0.6)
#> <bytecode: 0x129a106d8>
#> 
#> is.atomic: TRUE 
#> 
#> Unique values (y.star): 
#> [1] 2 4 5 6
#> 
#> Multiplicities (M): 
#>       2 4 5 6
#>  [1,] 1 1 0 1
#>  [2,] 1 1 0 2
#>  [3,] 1 1 0 3
#>  [4,] 1 1 1 1
#>  [5,] 1 1 1 2
#>  [6,] 1 1 1 3
#>  [7,] 1 1 2 1
#>  [8,] 1 1 2 2
#>  [9,] 1 1 2 3
#> [10,] 2 1 0 1
#> [11,] 2 1 0 2
#> [12,] 2 1 0 3
#> [13,] 2 1 1 1
#> [14,] 2 1 1 2
#> [15,] 2 1 1 3
#> [16,] 2 1 2 1
#> [17,] 2 1 2 2
#> [18,] 2 1 2 3
#> 
#> Weights (w): 
#>  [1] 0.0002318445 0.0032076427 0.0117206721 0.0002089701 0.0021740870
#>  [6] 0.0057754325 0.0003174570 0.0025240938 0.0051318324 0.0161197332
#> [11] 0.1568137178 0.3993952863 0.0088537112 0.0776609913 0.1427394035
#> [16] 0.0102407155 0.0676997460 0.0891846631

error.estimate(FVDDP_SMOOTH_ATOMIC, FVDDP_SMOOTH_APPR)
#>  [1] 5.966216e-05 7.040276e-04 2.257871e-03 2.305999e-05 2.643460e-04
#>  [6] 1.053286e-03 1.507740e-04 1.208724e-03 2.454871e-03 4.336957e-03
#> [11] 3.013994e-02 8.900024e-02 2.421377e-03 1.366464e-02 2.903259e-02
#> [16] 5.118014e-03 3.024274e-02 4.086428e-02

Pruning

Another tool to cut the computational cost of predictive of smoothing inference is given by the prune() function. It allows to remove from the mixture all vectors \(\mathbf{m}\) whose weight \(w_\mathbf{m}\) is under some treshold \(\varepsilon\). Such eights are then normalized such that their sum is \(1\).

In the example, the function will be applied to one of the processes prevously calculated, fixing \(\varepsilon = 10^{-2}\).

PRUNED = prune(fvddp = FVDDP_SMOOTH_ATOMIC, eps = 1e-02)

PRUNED
#> theta: 0.7 
#> 
#> P0.sample: function(x) rbeta(x, 10, 0.6)
#> 
#> P0.density/mass: function(x) dbinom(x, 10, 0.6)
#> <bytecode: 0x129a106d8>
#> 
#> is.atomic: TRUE 
#> 
#> Unique values (y.star): 
#> [1] 2 4 5 6
#> 
#> Multiplicities (M): 
#>       2 4 5 6
#>  [1,] 2 1 0 1
#>  [2,] 2 1 0 2
#>  [3,] 2 1 0 3
#>  [4,] 2 1 1 1
#>  [5,] 2 1 1 2
#>  [6,] 2 1 1 3
#>  [7,] 2 1 2 1
#>  [8,] 2 1 2 2
#>  [9,] 2 1 2 3
#> 
#> Weights (w): 
#> [1] 0.01219024 0.13105433 0.32112895 0.01166500 0.09448380 0.17771211 0.01588986
#> [8] 0.10132948 0.13454622

In this context, the treshold is insanely high; this is done for the unique purpose of showing how the function works; in the practical use of the package, a reasonable \(\varepsilon\) is between \(10^{-9}\) and the machine epsilon of the computer in use.

Posterior sampling

The last task that it can be performed is sampling values from Fleming-Viot dependent Dirichlet Processes. This can be done by simply choosing a vector \(w_\mathbf{m}\) and drawing a value from \(P_0\) with probability \(\frac{\theta}{\theta + |\mathbf{m}|}\), or choosing \(y_m^\ast\) with probability \(\frac{m_j}{\theta + |\mathbf{m}|}\). To get a sample of size \(N\), it is sufficient to replicate this mechanism \(n\) times.

The implementation is named posterior.sample(). Its arguments are the signal and the number N of values to draw.

y = posterior.sample(fvddp = FVDDP_EXACT_PROP, N = 100)

table(round(y, 3))
#> 
#>  -8.67 -7.232 -7.197 -6.912 -6.764 -5.968 -5.659 -5.249  -5.19 -5.114 -5.105 
#>      1      1      1      1      1      1      1      1      1      1      1 
#> -4.826 -4.375 -4.326  -4.21 -4.073 -3.836 -3.728 -3.562 -3.428 -2.793 -2.746 
#>      1      1      1      1      1      1      1      1      1      1      1 
#> -2.304 -2.162  -2.04 -1.687 -1.624 -1.555 -1.447 -1.286  -1.21 -1.145  -1.07 
#>      1      1      1      1      1      1      1      1      1      3      1 
#> -0.965 -0.706 -0.656 -0.561 -0.465 -0.462 -0.298 -0.256 -0.134 -0.059 -0.032 
#>      1      1      1      1      1      1      1      1      1      1      1 
#>  0.128  0.208  0.369  0.384  0.444  0.553  0.652  0.817   0.95  1.014  1.036 
#>      1      1      1      1      1     26      1      1      1      1      1 
#>  1.092  1.367  1.442  1.535  1.603  2.016  2.233   2.29  2.341  2.447  2.766 
#>      1      1      1      1      1      1      1      1      1      1      1 
#>  2.959  3.393  3.893  4.828  4.854  5.744  5.833 
#>      1      1      1      1      1      1      1

The command table() was used here to display more efficiently how many times each value has been sampled.

Predictive structure

In the Bayesian Nonparametric framework, scientists prefer to use the predictive structure of the Dirichlet process when they want to picture how future observations will be like. This choice is strongly related to the exchangeability assumption underlying the model (more in Ghosal and Vaart (2017)); in this context, however, it is sufficient to say that predictive structure is nothing but the sequential use of posterior sampling and update. In fact, a value is repeatedly drawn from the mixture and it is incorporated within each vector \(\mathbf{m}\in M\) via an update. A full description of this mechanism was developed by Ascolani, Lijoi, and Ruggiero (2021).

This is implemented efficiently via predictive.struct(); the arguments are the same as in posterior.sample().

y = predictive.struct(fvddp = FVDDP_EXACT_PROP, N = 100)

table(round(y, 3))
#> 
#> -5.296 -4.938 -3.849 -2.951 -1.369 -1.145   -1.1  0.003  0.507  0.553  0.691 
#>      4      1      1      1      1      1      7      2     10      4      1 
#>  0.981  1.718  1.819 
#>      1     65      1

Bibliography

Antoniak, Charles E. 1974. “Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems.” The Annals of Statistics 2 (6): 1152–74. https://doi.org/10.1214/aos/1176342871.

Ascolani, Filippo, Antonio Lijoi, and Matteo Ruggiero. 2021. “Predictive inference with Fleming–Viot-driven dependent Dirichlet processes.” Bayesian Analysis 16 (2): 371–95. https://doi.org/10.1214/20-BA1206.

———. 2023. “Smoothing distributions for conditional Fleming–Viot and Dawson–Watanabe diffusions.” Bernoulli 29 (2): 1410–34. https://doi.org/10.3150/22-BEJ1504.

Blackwell, David, and James B. MacQueen. 1973. “Ferguson Distributions Via Polya Urn Schemes.” The Annals of Statistics 1 (2): 353–55. https://doi.org/10.1214/aos/1176342372.

Ethier, S. N., and R. C. Griffiths. 1993. “The Transition Function of a Fleming-Viot Process.” The Annals of Probability 21 (3): 1571–90. https://doi.org/10.1214/aop/1176989131.

Fleming, Wendell H., and Michel Viot. 1979. “Some Measure-Valued Markov Processes in Population Genetics Theory.” Indiana University Mathematics Journal 28 (5): 817–43. http://www.jstor.org/stable/24892583.

Ghosal, Subhashis, and Aad van der Vaart. 2017. Fundamentals of Nonparametric Bayesian Inference. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. https://doi.org/10.1017/9781139029834.

Papaspiliopoulos, Omiros, and Matteo Ruggiero. 2014. “Optimal Filtering and the Dual Process.” Bernoulli 20 (4). https://doi.org/10.3150/13-bej548.

Papaspiliopoulos, Omiros, Matteo Ruggiero, and Dario Spanò. 2016. “Conjugacy properties of time-evolving Dirichlet and gamma random measures.” Electronic Journal of Statistics 10 (2): 3452–89. https://doi.org/10.1214/16-EJS1194.

Tavaré, Simon. 1984. “Lines-of-Descent and Genealogical Processes, and Their Applications in Population Genetics Models.” Advances in Applied Probability 16 (1): 27–27. https://doi.org/10.1017/S000186780002228X.