# The ‘gfiExtremes’ package

The ‘gfiExtremes’ package offers two main functions: gfigpd1 and gfigpd2. Each of them generates simulations from the fiducial distribution of a model involving the generalized Pareto distribution. The difference is that the threshold $$\mu$$ of the generalized Pareto distribution is assumed to be known by gfigpd1, whereas gfigpd2 estimates this threshold.

The algorithms are implemented in C++. They are translated from the original R code written by the authors of the reference paper.

### Note

The package allows to run some MCMC chains in parallel. In the examples below, as well as in the examples of the package documentation, I set nthreads = 2L because of CRAN restrictions: CRAN does not allow more than two R processes in parallel and then it would not accept the package if a higher number of threads were set.

library(gfiExtremes)
#> Le chargement a nécessité le package : coda

## The model with a known threshold

When the threshold $$\mu$$ is known, it is assumed that the data are independent realizations of a random variable $$X$$ which follows a generalized Pareto distribution $$GP(\mu,\gamma,\sigma)$$ conditionally to $$X \geqslant \mu$$. On the event $$X < \mu$$, no distributional assumption is made.

Then the algorithm performed by the gfigpd1 function produces some simulations of the fiducial distributions of $$\gamma$$, $$\sigma$$ and of the $$100\beta\%$$-quantiles of $$X$$ at the requested values of $$\beta$$. These are MCMC chains.

For example, assume that $$X$$ follows the $$GP(\mu,\gamma,\sigma)$$ distribution (so $$X < \mu$$ never happens).

set.seed(666L)
X <- rgpareto(200L, mu = 10, gamma = 0.3, sigma = 2)
gf1 <- gfigpd1(
X, beta = c(0.99, 0.995, 0.999), threshold = 10, iter = 10000L,
nchains = 4L, nthreads = 2L
)

The output of gfigpd1 is a R object ready for analysis with the ‘coda’ package. In particular, it has a summary method.

summary(gf1)
#>
#> Iterations = 2001:61995
#> Thinning interval = 6
#> Number of chains = 4
#> Sample size per chain = 10000
#>
#> 1. Empirical mean and standard deviation for each variable,
#>    plus standard error of the mean:
#>
#>          Mean      SD  Naive SE Time-series SE
#> gamma  0.3132  0.1017 0.0005086      0.0009532
#> sigma  1.9692  0.2351 0.0011757      0.0022424
#> beta1 30.9276  5.4264 0.0271322      0.0467451
#> beta2 38.0272  9.0772 0.0453861      0.0799586
#> beta3 63.5831 27.2337 0.1361684      0.2454226
#>
#> 2. Quantiles for each variable:
#>
#>          2.5%     25%     50%     75%    97.5%
#> gamma  0.1347  0.2415  0.3055  0.3769   0.5319
#> sigma  1.5371  1.8051  1.9585  2.1210   2.4602
#> beta1 23.9020 27.2199 29.7925 33.3429  44.5348
#> beta2 27.0268 31.9434 35.9689 41.6726  61.1854
#> beta3 35.3511 46.5221 56.5144 72.0191 134.1137
# compare with the true quantiles:
qgpareto(c(0.99, 0.995, 0.999), mu = 10, gamma = 0.3, sigma = 2)
#> [1] 29.87381 36.00849 56.28855

Every parameter is very well estimated by the median of its fiducial distribution.

We can get the shortest fiducial confidence intervals with the ‘coda’ function HPDinterval:

HPDinterval(joinMCMCchains(gf1))
#>            lower       upper
#> gamma  0.1196782   0.5116534
#> sigma  1.5219247   2.4336786
#> beta1 22.7531531  41.4748133
#> beta2 25.1379496  55.3255893
#> beta3 31.1060321 113.4594573
#> attr(,"Probability")
#> [1] 0.95

## The model with an unknown threshold

When the threshold $$\mu$$ is unknown, it is also assumed that the data are independent realizations of a random variable $$X$$ which follows a generalized Pareto distribution $$GP(\mu,\gamma,\sigma)$$ conditionally to $$X \geqslant \mu$$, but there are additional assumptions.

These additional assumptions have no meaningful interpretation but this is not important in order to estimate the quantiles of $$X$$: it is possible that the parameters $$\gamma$$ and $$\sigma$$ cannot be estimated (it is always possible if $$X$$ strictly follows the unrealistic assumptions of the model) but $$\mu$$ can always be estimated and the fiducial distributions of the quantiles are available.

Let’s assume for example that $$X$$ follows a log-normal distribution:

set.seed(666L)
X <- rlnorm(400L, meanlog = 1)
gf2 <- gfigpd2(
X, beta = c(0.99, 0.995, 0.999), iter = 10000L, burnin = 10000L,
nchains = 4L, nthreads = 2L
)
summary(gf2)
#>
#> Iterations = 10001:69995
#> Thinning interval = 6
#> Number of chains = 4
#> Sample size per chain = 10000
#>
#> 1. Empirical mean and standard deviation for each variable,
#>    plus standard error of the mean:
#>
#>        Mean     SD Naive SE Time-series SE
#> beta1 25.64  4.025  0.02013         0.1160
#> beta2 32.77  6.471  0.03235         0.1970
#> beta3 55.60 17.167  0.08584         0.5701
#>
#> 2. Quantiles for each variable:
#>
#>        2.5%   25%   50%   75%  97.5%
#> beta1 19.73 22.75 24.97 27.77  35.44
#> beta2 23.69 28.19 31.59 35.96  48.99
#> beta3 34.07 43.71 51.84 62.69 100.44
# compare with the true quantiles:
qlnorm(c(0.99, 0.995, 0.999), meanlog = 1)
#> [1] 27.83649 35.72423 59.75377

As you can see, the inference on the quantiles is good.