imputeTestbench : Test bench for Missing Data Imputing Models/Methods Comparison

Neeraj Bokde

2016-06-19

This Document is to introduce the R package ‘imputeTestbench’. It is an testing workbench for comparison of missing data imptation models/methods. It compares imputing methods with reference to RMSE, MAE or MAPE parameters. It allows to add new proposed methods to test bench and to compare with other methods. The function append_method() allows to add multiple numbers of methods to the existing methods available in test bench.

Following example describs the working of this package:

Consider a sample data datax as follows:

datax <- c(1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5)

Import library for Package imputeTestbench as follows:

library(imputeTestbench)

The function impute_errors() is used to compare imputing methods with reference to RMSE, MAE or MAPE parameters. Syntax of `impute_errors()’ as shown below:

impute_errors(dataIn, missPercentFrom, missPercentTo, interval, repetition, errorParameter, MethodPath, MethodName)

where,

At simplest form, function impute_errors() can we used as:

q <- impute_errors(datax)
q
## $Parameter
## [1] "RMSE Plot"
## 
## $Missing_Percent
## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
## 
## $Historic_Mean
## [1] 0.4789879 0.6250889 0.7886534 0.9018024 1.0024853 1.0969543 1.1952286
## [8] 1.2724180
## 
## $Interpolation
## [1] 0.6220167 0.7748639 0.9952267 1.3633658 1.5071357 1.9272482 1.7588836
## [8] 1.6804524
plot_errors(q)

By default, this function compares two basic imputation methods, i.e. Historical means and Interpolation methods. The plot_errors() function is used to plot the comparison plots between different methods. This test bench allows to add one more imputing method to compare with already existing methods. The only care is to be takes as, the new imputing method is to be designed in function format such that it could return imputed data as output. Suppose, following function is the desired method to add in test bench.

===============================

inter <- function(outs)

{

library(imputeTS)

outs <- ts(outs)

d <- na.random(outs)

return(d)

}

===============================

Save this function in new R script file and save it and note its Source location similar to "source('~/imputeTestbench/R/inter.R')" and use ’impute_errors()` function as:

#aa <- append_method(existing_method = q,dataIn= datax,missPercentFrom = 10, missPercentTo = 80, interval = 10, MethodPath = "source('~/imputeTestbench/R/inter.R')", MethodName = "Random")

#aa
#plot_errors(aa)

This above code is written in commented format, since this function is dependent on other function and its location, which is not included in this package.

If user wishes to add more than one imputation methods to test bench, the function append_method() is used as:

#bb <- append_method(existing_method = aa, dataIn= datax,missPercentFrom = 10, missPercentTo = 80, interval = 10, MethodPath = "source('~/imputeTestbench/R/PSFimpute.R')", MethodName = "PSFimpute")

#bb
#plot_errors(bb)

where

Similarly, user can remove an imputation method from test bench with following function

#cc <- remove_method(existing_method = bb, method_number = 1)
#cc
#plot_errors(cc)