Introduction to ‘ForecastTB’ package

Neeraj Dhanraj Bokde (neerajdhanraj@eng.au.dk) and Gorm Bruun Andresen (gba@eng.au.dk)

Demonstration of ‘ForecastTB’ package:

This document demonstates the R package ‘ForecastTB’. It is intended for comparing the performance of forecasting methods. The package assists in developing background, strategies, policies and environment needed for comparison of forecasting methods. A comparison report for the defined framework is produced as an output. Load the package as following:

library(ForecastTB)
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
#> Registered S3 methods overwritten by 'forecast':
#>   method             from    
#>   fitted.fracdiff    fracdiff
#>   residuals.fracdiff fracdiff

The basic function of the package is prediction_errors(). Following are the parameters considered by this function:

The prediction_errors() function returns, two slots as output. First slot is output, which provides Error_Parameters, indicating error values for the forecasting methods and error parameters defined in the framework, and Predicted_Values as values forecasted with the same foreasting methods. Further, the second slot is parameters, which returns the parameters used or provided to prediction_errors() function.

a <- prediction_errors(data = nottem)  #`nottem` is a sample dataset in CRAN

a
#> An object of class "prediction_errors"
#> Slot "output":
#> $Error_Parameters
#>            RMSE       MAE      MAPE exec_time
#> ARIMA 2.3400915 1.9329816 4.2156087 0.1356769
#> 
#> $Predicted_Values
#>                    1        2        3        4        5        6        7
#> Test values 39.40000 40.90000 42.40000 47.80000 52.40000 58.00000 60.70000
#> ARIMA       37.41933 37.69716 41.18252 46.29926 52.24804 57.10696 59.71674
#>                    8        9      10       11       12
#> Test values 61.80000 58.20000 46.7000 46.60000 37.80000
#> ARIMA       59.41173 56.38197 51.4756 46.04203 41.52592
#> 
#> 
#> Slot "parameters":
#> $data
#>       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
#> 1920 40.6 40.8 44.4 46.7 54.1 58.5 57.7 56.4 54.3 50.5 42.9 39.8
#> 1921 44.2 39.8 45.1 47.0 54.1 58.7 66.3 59.9 57.0 54.2 39.7 42.8
#> 1922 37.5 38.7 39.5 42.1 55.7 57.8 56.8 54.3 54.3 47.1 41.8 41.7
#> 1923 41.8 40.1 42.9 45.8 49.2 52.7 64.2 59.6 54.4 49.2 36.3 37.6
#> 1924 39.3 37.5 38.3 45.5 53.2 57.7 60.8 58.2 56.4 49.8 44.4 43.6
#> 1925 40.0 40.5 40.8 45.1 53.8 59.4 63.5 61.0 53.0 50.0 38.1 36.3
#> 1926 39.2 43.4 43.4 48.9 50.6 56.8 62.5 62.0 57.5 46.7 41.6 39.8
#> 1927 39.4 38.5 45.3 47.1 51.7 55.0 60.4 60.5 54.7 50.3 42.3 35.2
#> 1928 40.8 41.1 42.8 47.3 50.9 56.4 62.2 60.5 55.4 50.2 43.0 37.3
#> 1929 34.8 31.3 41.0 43.9 53.1 56.9 62.5 60.3 59.8 49.2 42.9 41.9
#> 1930 41.6 37.1 41.2 46.9 51.2 60.4 60.1 61.6 57.0 50.9 43.0 38.8
#> 1931 37.1 38.4 38.4 46.5 53.5 58.4 60.6 58.2 53.8 46.6 45.5 40.6
#> 1932 42.4 38.4 40.3 44.6 50.9 57.0 62.1 63.5 56.3 47.3 43.6 41.8
#> 1933 36.2 39.3 44.5 48.7 54.2 60.8 65.5 64.9 60.1 50.2 42.1 35.8
#> 1934 39.4 38.2 40.4 46.9 53.4 59.6 66.5 60.4 59.2 51.2 42.8 45.8
#> 1935 40.0 42.6 43.5 47.1 50.0 60.5 64.6 64.0 56.8 48.6 44.2 36.4
#> 1936 37.3 35.0 44.0 43.9 52.7 58.6 60.0 61.1 58.1 49.6 41.6 41.3
#> 1937 40.8 41.0 38.4 47.4 54.1 58.6 61.4 61.8 56.3 50.9 41.4 37.1
#> 1938 42.1 41.2 47.3 46.6 52.4 59.0 59.6 60.4 57.0 50.7 47.8 39.2
#> 1939 39.4 40.9 42.4 47.8 52.4 58.0 60.7 61.8 58.2 46.7 46.6 37.8
#> 
#> $nval
#> [1] 12
#> 
#> $ePara
#> [1] "RMSE" "MAE"  "MAPE"
#> 
#> $ePara_name
#> [1] "RMSE" "MAE"  "MAPE"
#> 
#> $Method
#> [1] "ARIMA"
#> 
#> $MethodName
#> [1] "ARIMA"
#> 
#> $Strategy
#> [1] "Recursive"
#> 
#> $dval
#> [1] 240

The quick visualization of the object retuned with prediction_errors() function can be done with plot() function as below:

b <- plot(a)


Comparison of multiple methods:

As discussed above, prediction_errors() function evaluates the performance of ARIMA method. In addition, it allows to compare performance of distinct methods along with ARIMA. In following example, two methods (LPSF and PSF) are compared along with the ARIMA. These methods are formatted in the form of a function, which requires data and nval as input parameters and must return the nval number of frecasted values as a vector. In following code, test1() and test2() functions are used for LPSF and PSF methods, respectively.

library(decomposedPSF)
#> Warning: package 'decomposedPSF' was built under R version 3.6.2
test1 <- function(data, nval){
   return(lpsf(data = data, n.ahead = nval))
}

library(PSF)
#> Warning: package 'PSF' was built under R version 3.6.2
test2 <- function(data, nval){
  a <- psf(data = data, cycle = 12)
  b <- predict(object = a, n.ahead = nval)
  return(b)
}

Following code chunk show how user can attach various methods in the prediction_errors() function. In this chunk, the append_ parameter is assigned 1, to appned the new methods (LPSF and PSF) in addition to the default ARIMA method. On contrary, if the append_parameter is assigned 0, only newly added LPSF and PSF nethods would be compared.

a1 <- prediction_errors(data = nottem, nval = 48, 
                        Method = c("test1(data, nval)", "test2(data, nval)"), 
                        MethodName = c("LPSF","PSF"), append_ = 1)
a1@output$Error_Parameters
#>            RMSE       MAE      MAPE exec_time
#> ARIMA 2.5233156 2.1280641 4.5135378 0.1659489
#> LPSF   2.391580  1.936111  4.238650  0.441232
#> PSF    2.467598  1.854861  3.943937  0.098737
b1 <- plot(a1)


Appending new methods:

Consider, another function test3(), which is to be added to an already existing object prediction_errors, eg. a1.

library(forecast)
test3 <- function(data, nval){
  b <- as.numeric(forecast(ets(data), h = nval)$mean)
  return(b)
}

For this purpose, the append_() function can be used as follows:

The append_() function have object, Method, MethodName, ePara and ePara_name parameters, with similar meaning as that of used in prediction_errors() function. Other hidden parameters of the append_() function automatically get synced with the prediction_errors() function.

c1 <- append_(object = a1, Method = c("test3(data,nval)"), MethodName = c('ETS'))
c1@output$Error_Parameters
#>              RMSE         MAE        MAPE   exec_time
#> ARIMA   2.5233156   2.1280641   4.5135378   0.1659489
#> LPSF     2.391580    1.936111    4.238650    0.441232
#> PSF      2.467598    1.854861    3.943937    0.098737
#> ETS   38.29743056 36.85216463 73.47667823  0.03786588
d1 <- plot(c1)


Removing methods:

When more than one methods are established in the environment and the user wish to remove one or more of these methods from it, the choose_() function can be used. This function takes a prediction_errors object as input shows all methods established in the environment, and asks the number of methods which the user wants to remove from it.

In the following example, the user supplied 4 as input, which reflects Method 4: ETS, and in response to this, the choose_() function provides a new object with updated method lists.

# > e1 <- choose_(object = c1)
# Following are the methods attached with the object:
#         [,1]    [,2]   [,3]  [,4] 
# Indices "1"     "2"    "3"   "4"  
# Methods "ARIMA" "LPSF" "PSF" "ETS"
#
# Enter the indices of methods to remove:4
#
# > e1@output$Error_Parameters
#            RMSE       MAE exec_time
# ARIMA 2.5233156 2.1280641 0.1963789
# LPSF  2.3915796 1.9361111 0.2990961
# PSF   2.2748736 1.8301389 0.1226711

Adding new Error metrics:

In default scenario, the prediction_errors() function compares forecasting methods in terms of RMSE, MAE and MAPE. In addition, it allows to append multiple new error metrics. The Percent change in variance (PCV) is an another error metric with following definition:

\(PCV = \frac{\mid var(Predicted) - var(Observed) \mid}{var(Observed)}\)

where \(var(Predicted)\) and \(var(Observed)\) are variance of predicted and obvserved values. Following chunk code is the function for PCV error metric:

pcv <- function(obs, pred){
  d <- (var(obs) - var(pred)) * 100/ var(obs)
  d <- abs(as.numeric(d))
  return(d)
}

Following chunk code is used to append PCV as a new error metric in existing prediction_errors object.

a1 <- prediction_errors(data = nottem, nval = 48, 
                        Method = c("test1(data, nval)", "test2(data, nval)"), 
                        MethodName = c("LPSF","PSF"), 
                        ePara = "pcv(obs, pred)", ePara_name = 'PCV',
                        append_ = 1)
a1@output$Error_Parameters
#>             RMSE        MAE       MAPE        PCV  exec_time
#> ARIMA   2.523316   2.128064   4.513538  13.757073   0.152633
#> LPSF   2.5366583  1.9791667  4.2695528 13.6065832  0.3272281
#> PSF    2.1729411  1.7230385  3.7067049  0.9558439  0.1047201
b1 <- plot(a1)

A unique plot:

A unique way of showing forecasted values, especially if these are seasonal values, the following function can be used. This plot shows how forecatsed observations are behaving on an increasing number of seasonal time horizons.

plot_circle(a1)
#> Note: 48 points are out of plotting region in sector 'a', track '1'.


Monte-Carlo strategy:

Monte-Carlo is a popular strategy to compare the performance of forecasting methods, which selects multiple patches of dataset randomly and test performance of forecasting methods and returns the average error values.

The Monte-Carlo strategy ensures an accurate comparison of forecasting methods and avoids the baised results obtained by chance.

This package provides the monte_carlo() function as follows:

The parameters used in this function are:

This function returns:

a1 <- prediction_errors(data = nottem, nval = 48, 
                        Method = c("test1(data, nval)"), 
                        MethodName = c("LPSF"), append_ = 1)
monte_carlo(object = a1, size = 180, iteration = 10)
#>         ARIMA     LPSF
#> 9    3.446114 5.009127
#> 11   3.807793 5.367784
#> 30   3.477685 5.195355
#> 32   2.590447 4.758633
#> 24   4.570650 4.871212
#> 2    4.476293 5.698093
#> 48   2.815019 5.143118
#> 35   2.692311 4.714380
#> 18   3.309056 5.182951
#> 3    4.974178 5.665410
#> Mean 3.615955 5.160606


When monte_carlo() function with fval and figs ON flags:

monte_carlo(object = a1, size = 144, iteration = 2, fval = 1, figs = 1)

#> $Error_Parameters
#>         ARIMA     LPSF
#> 82   2.801016 5.705584
#> 25   6.308795 5.714629
#> Mean 4.554905 5.710106
#> 
#> $Predicted_Values
#> $Predicted_Values[[1]]
#>                    1        2        3        4        5        6      7
#> Test values 41.60000 41.30000 40.80000 41.00000 38.40000 47.40000 54.100
#> ARIMA       44.60171 39.45832 37.59085 38.51563 42.28439 47.66747 53.274
#> LPSF        51.20000 42.80000 45.80000 40.00000 42.60000 43.50000 47.100
#>                    8        9       10       11      12       13       14
#> Test values 58.60000 61.40000 61.80000 56.30000 50.9000 41.40000 37.10000
#> ARIMA       57.59768 59.55312 58.68989 55.31748 50.3917 45.24989 41.24716
#> LPSF        50.00000 60.50000 64.60000 64.00000 56.8000 48.60000 44.20000
#>                   15       16       17       18       19       20       21
#> Test values 42.10000 41.20000 47.30000 46.60000 52.40000 59.00000 59.60000
#> ARIMA       39.40119 40.13448 43.18072 47.67437 52.39491 56.09726 57.83918
#> LPSF        36.40000 37.30000 35.00000 44.00000 43.90000 52.70000 58.60000
#>                   22       23       24       25       26      27       28
#> Test values 60.40000 57.00000 50.70000 47.80000 39.20000 39.4000 40.90000
#> ARIMA       57.21981 54.46933 50.37048 46.03734 42.61356 40.9714 41.49068
#> LPSF        60.00000 61.10000 58.10000 51.20000 42.80000 45.8000 40.00000
#>                   29       30       31       32       33       34       35
#> Test values 42.40000 47.80000 52.40000 58.00000 60.70000 61.80000 58.20000
#> ARIMA       43.97321 47.71136 51.68835 54.85384 56.40054 55.96901 53.72917
#> LPSF        42.60000 43.50000 47.10000 50.00000 60.50000 64.60000 64.00000
#>                   36       37       38       39       40       41       42
#> Test values 46.70000 46.60000 37.80000       NA       NA       NA       NA
#> ARIMA       50.32052 46.67092 43.74485 42.28932 42.64408 44.66417 47.77185
#> LPSF        56.80000 48.60000 44.20000 36.40000 37.30000 35.00000 44.00000
#>                   43       44       45       46       47       48
#> Test values       NA       NA       NA       NA       NA       NA
#> ARIMA       51.12056 53.82475 55.19338 54.90562 53.08444 50.25162
#> LPSF        43.90000 52.70000 58.60000 60.00000 61.10000 58.10000
#> 
#> $Predicted_Values[[2]]
#>                    1       2        3        4        5        6        7
#> Test values 38.40000 40.3000 44.60000 50.90000 57.00000 62.10000 63.50000
#> ARIMA       42.44271 44.5907 48.15155 51.44866 53.88731 54.85896 54.08873
#> LPSF        41.20000 39.1000 42.00000 47.10000 51.05000 58.40000 61.15000
#>                    8       9       10       11       12       13       14
#> Test values 56.30000 47.3000 43.60000 41.80000 36.20000 39.30000 44.50000
#> ARIMA       51.85287 48.7946 45.74997 43.53307 42.71215 43.46521 45.54511
#> LPSF        61.05000 56.2000 50.55000 43.00000 38.05000 35.95000 34.85000
#>                   15       16       17       18       19      20       21
#> Test values 48.70000 54.20000 60.80000 65.50000 64.90000 60.1000 50.20000
#> ARIMA       48.35754 51.12987 53.12282 53.82692 53.09187 51.1561 48.57179
#> LPSF        39.70000 45.20000 53.30000 57.65000 61.55000 59.2500 56.80000
#>                   22       23       24       25       26      27       28
#> Test values 42.10000 35.80000 39.40000 38.20000 40.40000 46.9000 53.40000
#> ARIMA       46.04816 44.25745 43.65648 44.37093 46.17147 48.5454 50.84189
#> LPSF        47.90000 44.20000 41.25000 41.20000 39.10000 42.0000 47.10000
#>                   29       30       31       32       33       34       35
#> Test values 59.60000 66.50000 60.40000 59.20000 51.20000 42.80000 45.80000
#> ARIMA       52.44986 52.95999 52.26818 50.59439 48.41441 46.32531 44.88238
#> LPSF        51.05000 58.40000 61.15000 61.05000 56.20000 50.55000 43.00000
#>                   36       37       38       39       40       41       42
#> Test values 40.00000 42.60000 43.50000 47.10000 50.00000 60.50000 64.60000
#> ARIMA       44.45211 45.11975 46.67485 48.67608 50.57589 51.86983 52.23004
#> LPSF        38.05000 35.95000 34.85000 39.70000 45.20000 53.30000 57.65000
#>                   43       44       45       46       47       48
#> Test values 64.00000 56.80000 48.60000 44.20000 36.40000 37.30000
#> ARIMA       51.58766 50.14361 48.30704 46.57997 45.42046 45.12159
#> LPSF        61.55000 59.25000 56.80000 47.90000 44.20000 41.25000

Functions in Future Versions: