ARIMA method has limitations in the area of small sample sizes among others. although, analysis of small sample series are available in few cases, there is currently no widely applicable and easily accessible method that can be used to make small sample inference. Methods like Edgeworth’s expansions involve a lot of algebra (which might discourage its users) and are also applicable in very special cases. The regular bootstrap method that could be a potential alternative failed on the grand of conflicting assumptions. The normal bootstrap method depends on assumption that observations are independent and identically distributed (i.i.d.), while a typical time series data are dependent in nature.
To find a way to ovoid this assumption of i.i.d. on normal bootstrap method in Efron (1979) and still maintain the dependence structure of time series data, one can hold reasonable amount of dependence structure within the series in a way by slicing a time series data into a number of chunks each with a length l. This way the dependence structure within each block is kept. Instead of sampling each unit randomly with replacement (as it would have been done for traditional bootstrap method), the chunks are rather sampled. This will distort certain amount of dependence structure of the series only among blocks are distorted (as serial correlation is distorted among the blocks), while the i.i.d. is invariably preserved. This way, one is able to coerce the i.i.d. assumption of the regular bootstrap method and the assuption of presence of serial correlation of a typical time series data in one method. The broad name given to method that achieves these two opposing objective is called Block Bootstrap Methods.
The main challenge with block bootstrap procedures is the
responsiveness of Root Mean Squared Error (RMSE) to the preference of
block length (l), or the number of blocks (m). This is
one problem define in two ways,
OBL: Optimum Block Lengt
package has chosen to approached this problem with the
preference of block length. Diverse methods can be used (which
are explained briefly below), each method has numerous block lengths
which must be considered, it is this problem that the
OBL: Optimum Block Lengt package is here to solve.
The OBL package provides optimum block length to five(5) different block bootrap methods vized:
The Non-overlapping Block Bootstrap (NBB) uses a method described in Carlstein (1986) which splits original series into Non-overlapping blocks and thereafter resamples the blocks in multiple times(which is named R) to form a new series.
The Moving Block Bootstrap (MBB) otherwise called Overlaping Block Bootstrap uses a method described in Kunsch (1989) which splits original series into overlapping blocks and thereafter resamples the blocks in multiple times(which is named R) to form a new series.
The Circular Block Bootstrap (CBB) uses a method described in Politis and Romano (1992) is an improvement on MBB Kunsch (1989) such that in which provisions are made for observations at the tail end of the original series that could have been cut off from resampling simply because the left over element(s) is not equal to predetermined block length. This happens when original series is not divisible by \(n - l + 1\), where \(n\) is the number of original series and \(l\) is the predetermined block sizes \(1 < l < n\). Such provision is made up by completing the so called left-over by adding the first element(s) of the original series to form a circle. Afterwards, the blocks in multiple times(which is named R) are resampled to form a new series.
The Tapered Moving Block Bootstrap (TMBB unpublished) is formed to reduce the less representative presence of extreme member of the series from \(2l\) to just 2.. Reduction of less-represented elements of the series will help to increase the performance of model evaluation metrics (RMSE and MAE). Afterwards, the blocks in multiple times(which is named R) are resampled to form a new series.
The Tapered Circular Block Bootstrap (TCBB unpublished) is an extension from TMBB such that the last block contains the first element of the parent series as its last sub-series element. It is formed to reduce the less representative presence of extreme member of the series from \(2l\) (in the case of MBB) and from 2 (in the case of TMBB) to just 1. Reduction of less-represented element of the series will help to increase the performance of model evaluation metrics (which leads to reduced RMSE). Afterwards, the blocks in multiple times(which is named R) are resampled to form a new series.
It also checks for every possible block length l (where
\(1 < l < n\) for \(n\) is the length of the original time
series data) in each method to know which one is optimal by calculating
RMSE value for every possible block length of each method and sorting
out which of them is minimum in value. The minimum RMSE values for every
method is sorted out in a data frame(with three(3) columns namely:
Methods, lb and RMSE) to let the
OBL: Optimum Block Lengt
package users choose the method and the block length with the minimal
RMSE value from the output data frame
You can install the development version from GitHub with:
It is observed that the optimum block length of any time series data
is contingent(dependent) on the uniqueness of every time series data.
Block bootstrap users thus, need to be flexible in choosing the optimum
block length by adopting to a concise but clear while such method must
be easy to use as well. As a result of the such a need,
OBL: Optimum Block Lengt package is created to solve such
OBL: Optimum Block Length package helps users to
search for the best block length and the best method that has the
minimum RMSE value.
blockboot function produces a data frame with three (3) column (Method, lb & RMSE).
lolliblock function is another function that can plot the lollipop chart of the data frame displays by the blockboot function. It shows the optimum block lengths, for each method with different colours ranging from red to green. While red shows the method with worst performance (method with the highest RMSE) the green colour shows the method with the smallest RMSE. The corresponding block length of each methods as a legend with their matching colours.
The minimum arguments in the function
blockboot() can be
ts which should be a univirate time series data and
R which is the numbers of replicate of resapling.
blockboot(ts, R, seed, n_cores,methods = c("optnbb", "optmbb", "optcbb", "opttmbb", "opttcbb"))
While the minimum arguments in the function
can be the
ts which should be a univirate time series data
R which is the numbers of replicate of resapling.
lolliblock(ts, R, seed, n_cores,methods = c("optnbb", "optmbb", "optcbb", "opttmbb", "opttcbb"))
univariate time series data
Number of replication for resampling
number of core(s) to be used on your operating system
methods is optional, if specified, it must be any combination as follows: “optnbb”, “optmbb”, “optcbb”, “opttmbb”, “opttcbb”
The suction output a data frame with 5 rows 3 columns which are “Methods”, “lb” and “RMSE”. Method with the minimum RMSE value is
# simulate univariate time series data set.seed(289805) <- arima.sim(n = 10, model = list(ar = 0.8, order = c(1, 0, 0)), sd = 1) ts # get the optimal block length table ::blockboot(ts = ts, R = 100, seed = 6, n_cores = 2) OBL# Methods lb RMSE #1 nbb 9 0.2402482 #2 mbb 9 0.1023012 #3 cbb 8 0.2031448 #4 tmbb 4 0.2654746 #5 tcbb 9 0.4048711
The suction output a lollipop chart with 5 pops for the 5 methods separated with 5 distinct colours while the method with red lollipop indicates the least desired method with the highest RMSE and the method with green lollipop indicates the preferred method having the lowest RMSE. The legend beside the chart indicate the optimum block length for each method.
# simulate univariate time series data set.seed(289805) <- arima.sim(n = 10, model = list(ar = 0.8, order = c(1, 0, 0)), sd = 1) ts # get the optimal block length table ::lolliblock(ts = ts, R = 100, seed = 6, n_cores = 2)OBL