Data-transformation

Introduction

For the unbiased statistical analysis of data transformation is necessary to transform data for fit model assumptions. AFR package has default time-series dataset macroKZ of macroeconomic parameters for 2010-2022 period. Dataset is raw, not ordered, with missing values and etc.

AFR recommends:

Step 1. Check data for the format, missing values, outliers and summary statistics (min, max and etc).

Step 2. Check data for stationarity.

Step 3. In case of non-stationarity transform data to stationarity by transformation method.

Step 4. As data is transformed, choose regressors for a model.

Step 1

As default dataset macroKZ is uploaded, check dataset by checkdata and summary functions. Depending on the outputs, apply necessary functions to eliminate inappropriate properties of the data. For instance, in case of missing values delete these missing values.

data(macroKZ)
checkdata(macroKZ)
#> There are 0 missing items in the dataset.
#> There are 0 items in non-numeric format in the dataset.
#> There are 0 outliers in the dataset.
#> --------------------------------------------------------------------------------
#>                                  Missing items                                   
#> --------------------------------------------------------------------------------
#> [[1]]
#>               real_gdp              GDD_Agr_R              GDD_Min_R 
#>                      0                      0                      0 
#>              GDD_Man_R              GDD_Elc_R              GDD_Con_R 
#>                      0                      0                      0 
#>              GDD_Trd_R              GDD_Trn_R              GDD_Inf_R 
#>                      0                      0                      0 
#>              GDD_Est_R                  GDD_R              Rincpop_q 
#>                      0                      0                      0 
#>              Rexppop_q                Rwage_q                    imp 
#>                      0                      0                      0 
#>                    exp                 usdkzt                 eurkzt 
#>                      0                      0                      0 
#>                 rurkzt                   poil                GDP_DEF 
#>                      0                      0                      0 
#>                    cpi     realest_resed_prim      realest_resed_sec 
#>                      0                      0                      0 
#>           realest_comm   index_stock_weighted             ntrade_Agr 
#>                      0                      0                      0 
#>             ntrade_Min             ntrade_Man             ntrade_Elc 
#>                      0                      0                      0 
#>             ntrade_Con             ntrade_Trd             ntrade_Trn 
#>                      0                      0                      0 
#>             ntrade_Inf          fed_fund_rate     govsec_rate_kzt_3m 
#>                      0                      0                      0 
#>     govsec_rate_kzt_1y     govsec_rate_kzt_7y    govsec_rate_kzt_10y 
#>                      0                      0                      0 
#>             tonia_rate    rate_kzt_mort_0y_1y    rate_kzt_mort_1y_iy 
#>                      0                      0                      0 
#>    rate_kzt_corp_0y_1y    rate_usd_corp_0y_1y    rate_kzt_corp_1y_iy 
#>                      0                      0                      0 
#>    rate_usd_corp_1y_iy    rate_kzt_indv_0y_1y    rate_kzt_indv_1y_iy 
#>                      0                      0                      0 
#> realest_resed_prim_rus  realest_resed_sec_rus         cred_portfolio 
#>                      0                      0                      0 
#>                coef_k1                coef_k3             provisions 
#>                      0                      0                      0 
#>         percent_margin                com_inc                com_exp 
#>                      0                      0                      0 
#>               oper_inc                oth_inc                     DR 
#>                      0                      0                      0 
#> 
#> --------------------------------------------------------------------------------
#> 
#> --------------------------------------------------------------------------------
#>                                  Numeric format                                  
#> --------------------------------------------------------------------------------
#> [[1]]
#>               real_gdp              GDD_Agr_R              GDD_Min_R 
#>                      0                      0                      0 
#>              GDD_Man_R              GDD_Elc_R              GDD_Con_R 
#>                      0                      0                      0 
#>              GDD_Trd_R              GDD_Trn_R              GDD_Inf_R 
#>                      0                      0                      0 
#>              GDD_Est_R                  GDD_R              Rincpop_q 
#>                      0                      0                      0 
#>              Rexppop_q                Rwage_q                    imp 
#>                      0                      0                      0 
#>                    exp                 usdkzt                 eurkzt 
#>                      0                      0                      0 
#>                 rurkzt                   poil                GDP_DEF 
#>                      0                      0                      0 
#>                    cpi     realest_resed_prim      realest_resed_sec 
#>                      0                      0                      0 
#>           realest_comm   index_stock_weighted             ntrade_Agr 
#>                      0                      0                      0 
#>             ntrade_Min             ntrade_Man             ntrade_Elc 
#>                      0                      0                      0 
#>             ntrade_Con             ntrade_Trd             ntrade_Trn 
#>                      0                      0                      0 
#>             ntrade_Inf          fed_fund_rate     govsec_rate_kzt_3m 
#>                      0                      0                      0 
#>     govsec_rate_kzt_1y     govsec_rate_kzt_7y    govsec_rate_kzt_10y 
#>                      0                      0                      0 
#>             tonia_rate    rate_kzt_mort_0y_1y    rate_kzt_mort_1y_iy 
#>                      0                      0                      0 
#>    rate_kzt_corp_0y_1y    rate_usd_corp_0y_1y    rate_kzt_corp_1y_iy 
#>                      0                      0                      0 
#>    rate_usd_corp_1y_iy    rate_kzt_indv_0y_1y    rate_kzt_indv_1y_iy 
#>                      0                      0                      0 
#> realest_resed_prim_rus  realest_resed_sec_rus         cred_portfolio 
#>                      0                      0                      0 
#>                coef_k1                coef_k3             provisions 
#>                      0                      0                      0 
#>         percent_margin                com_inc                com_exp 
#>                      0                      0                      0 
#>               oper_inc                oth_inc                     DR 
#>                      0                      0                      0 
#> 
#> --------------------------------------------------------------------------------
#> 
#> --------------------------------------------------------------------------------
#>                                     Outliers                                     
#> --------------------------------------------------------------------------------
#> [[1]]
#>               real_gdp              GDD_Agr_R              GDD_Min_R 
#>                      0                      0                      0 
#>              GDD_Man_R              GDD_Elc_R              GDD_Con_R 
#>                      0                      0                      0 
#>              GDD_Trd_R              GDD_Trn_R              GDD_Inf_R 
#>                      0                      0                      0 
#>              GDD_Est_R                  GDD_R              Rincpop_q 
#>                      0                      0                      0 
#>              Rexppop_q                Rwage_q                    imp 
#>                      0                      0                      0 
#>                    exp                 usdkzt                 eurkzt 
#>                      0                      0                      0 
#>                 rurkzt                   poil                GDP_DEF 
#>                      0                      0                      0 
#>                    cpi     realest_resed_prim      realest_resed_sec 
#>                      0                      0                      0 
#>           realest_comm   index_stock_weighted             ntrade_Agr 
#>                      0                      0                      0 
#>             ntrade_Min             ntrade_Man             ntrade_Elc 
#>                      0                      0                      0 
#>             ntrade_Con             ntrade_Trd             ntrade_Trn 
#>                      0                      0                      0 
#>             ntrade_Inf          fed_fund_rate     govsec_rate_kzt_3m 
#>                      0                      0                      0 
#>     govsec_rate_kzt_1y     govsec_rate_kzt_7y    govsec_rate_kzt_10y 
#>                      0                      0                      0 
#>             tonia_rate    rate_kzt_mort_0y_1y    rate_kzt_mort_1y_iy 
#>                      0                      0                      0 
#>    rate_kzt_corp_0y_1y    rate_usd_corp_0y_1y    rate_kzt_corp_1y_iy 
#>                      0                      0                      0 
#>    rate_usd_corp_1y_iy    rate_kzt_indv_0y_1y    rate_kzt_indv_1y_iy 
#>                      0                      0                      0 
#> realest_resed_prim_rus  realest_resed_sec_rus         cred_portfolio 
#>                      0                      0                      0 
#>                coef_k1                coef_k3             provisions 
#>                      0                      0                      0 
#>         percent_margin                com_inc                com_exp 
#>                      0                      0                      0 
#>               oper_inc                oth_inc                     DR 
#>                      0                      0                      0 
#> 
#> --------------------------------------------------------------------------------

Depending on the outputs, apply necessary functions to eliminate inappropriate properties of the data. For instance, in case of missing values delete these missing values.

macroKZ<-na.remove(macroKZ)

Step 2

As dataset is preliminary cleaned, time-series data needs to be stationary. Stationarity is needed for the properties to be independent of time periods, i.e. mean, variance etc are constant over time. In R stationarity can be checked by Augmented-Dickey Fuller (adf.test) and/or Kwiatkowski-Phillips-Schmidt-Shin (kpss.test) tests.

In more details, macroKZ can use sapply function to view which parameter is stationary or not.

Step 3

If dataset, as a whole, or individual parameters are non-stationary, it is recommended to apply transformation techniques to make data stationary. Most common transformation tools are differencing (first and second order), logarithming, difference of logarithms, detrending and etc. After transformation method(s) is applied, make sure that data is stationary.

new<-log(macroKZ)

Step 4

To build the best regression model regressors/independent variables need to be independent of each other. If this condition is violated, multicollinearity presents and regression estimators are biased. AFR package offers corsel function that estimates correlation between regressors in the dataset given a threshold (set by the user). The result can be presented numerically or logically (TRUE/FALSE).

corsel(macroKZ,num=FALSE,thrs=0.65)

Once regressors are chosen, linear regression model can be built via lm function.

model<-lm(real_gdp~imp+exp+usdkzt+eurkzt, macroKZ)