VARMA (commodity prices)

library(ldt)

Introduction

The search.varma() function is one of the three main functions in the ldt package. This vignette explains a basic usage of this function using the commodity prices dataset (International Commodity Prices (2023)). Commodity prices refer to the prices at which raw materials or primary foodstuffs are bought and sold. This dataset contains monthly data on primary commodity prices, including 68 different commodities with some starting in January 1990 and others in later periods.

Data

For this example, we use just the first 5 columns of data:

data <- data.pcp$data[,1:5]

Here are the last few observations from this subset of the data:

tail(data)
#>         PALLFNF  PEXGALL   PNFUEL   PFANDB    PFOOD
#> 2023M4 170.7647 171.9614 155.8291 144.1869 146.1422
#> 2023M5 157.1340 156.8779 148.7744 137.3836 138.6572
#> 2023M6 154.0691 153.8750 145.9433 135.2476 136.0733
#> 2023M7 157.9088 158.0988 146.1088 135.6483 136.6740
#> 2023M8 161.3679 162.2299 142.8028 130.7136 131.3730
#> 2023M9 168.4047 170.0978 143.4144 129.3996 129.7991

And here are some summary statistics for each variable:

sapply(data, summary)
#>          PALLFNF   PEXGALL    PNFUEL    PFANDB     PFOOD
#> Min.     61.8872  65.91441  55.03738  54.72416  55.14206
#> 1st Qu. 106.9555 108.45966  97.47020  64.14606  63.48380
#> Median  125.5690 129.27773 108.51537  90.77461  91.48711
#> Mean    133.2516 137.56116 111.14108  89.21294  89.77894
#> 3rd Qu. 166.4862 172.35819 131.78403 106.95780 108.36253
#> Max.    241.9187 253.29973 178.30364 162.22220 165.74817
#> NA's    156.0000 156.00000 156.00000  24.00000  24.00000

The columns of the data represent the following variables:

Modelling

We use the first variable (i.e., PALLFNF) as the target variable and the MAPE metric to find the best predicting model. Out-of-sample evaluation affects the choice of maximum model complexity, as it involves reestimating the model using maximum likelihood several times. Although the simUsePreviousEstim argument helps with initializing maximum likelihood estimation, VARMA model estimation is time-consuming due to its large number of parameters. We impose some restrictions in the modelset. We set a maximum value for the number of equations allowed in the models. Additionally, we set a maximum value for the parameters of the VARMA model.


search_res <- search.varma(data = get.data(data, endogenous = 5),
                           combinations = get.combinations(sizes = c(1,2,3),
                                                           numTargets = 1),
                           maxParams = c(2,0,0),
                           metric <- get.search.metrics(typesIn = c(), 
                                                        typesOut = c("mape"),
                                                        simFixSize = 6),
                           maxHorizon = 5)
#> Warning in search.varma(data = get.data(data, endogenous = 5), combinations =
#> get.combinations(sizes = c(1, : 'maxHorizon' argument is different from the
#> maximum horizon in the 'metrics' argument.
print(search_res)
#> LDT search result:
#>  Method in the search process: VARMA 
#>  Expected number of models: 22, searched: 22 , failed: 0 (0%)
#>  Elapsed time: 0.01677709 minutes 
#>  Length of results: 1 
#> --------
#>  Target (PALLFNF):
#>    Evaluation (mape):
#>       Best model:
#>        endogenous: (3x1) PALLFNF, PEXGALL, PNFUEL
#>        exogenous: (Intercept)
#>        metric: 2.705854
#> --------

The output of the search.varma() function does not contain any estimation results, but only the information required to replicate them. The summary() function returns a similar structure but with the estimation results included.

search_sum <- summary(search_res)

We can plot the predicted values along with the out-of-sample evaluations:

best_model <- search_sum$results[[1]]$value
pred <- predict(best_model, 
                actualCount = 10, 
                startFrequency = tdata::f.monthly(data.pcp$start,1))
plot(pred, simMetric = "mape")

Conclusion

This package can be a recommended tool for empirical studies that require reducing assumptions and summarizing uncertainty analysis results. This vignette is just a demonstration. There are indeed other options you can explore with the search.varma() function. For instance, you can experiment with different evaluation metrics or restrict the model set based on your specific needs. Additionally, there’s an alternative approach where you can combine modeling with Principal Component Analysis (PCA) (see estim.varma() function). I encourage you to experiment with these options and see how they can enhance your data analysis journey.

References

International Commodity Prices. 2023. “Primary Commodity Prices (Excel Database).”