SUR (longrun output growth)

Introduction

The search.sur() function is one of the three main functions in the ldt package. This vignette explains a basic usage of this function using the world bank dataset (World Bank (2022)). Output growth is a widely discussed topic in the field of economics. Several factors can influence the rate and quality of output growth, including physical and human capital, technological progress, institutions, trade openness, and macroeconomic stability Chirwa and Odhiambo (2016). We will use this package to identify the long-run determinants of GDP per capita growth while making minimal assumptions.

Data

To minimize user discretion, we use all available data to select the set of potential regressors. Additionally, to avoid the endogeneity problem, we use information from before 2005 to explain the dependent variable after this year. This results in 571 potential regressors and 208 observations.

Of course, for this illustration, we use just the first 5 columns of data:

data <- cbind(data.wdi$y, data.wdi$x[,1:5])
colnames(data)[2] <- paste0(colnames(data)[2],".lag")

Here are the last few observations from this subset of the data:

tail(data)
#>     NY.GDP.PCAP.KD NY.GDP.PCAP.KD.lag AG.AGR.TRAC.NO AG.CON.FERT.PT.ZS
#> WSM      0.6948973                 NA             NA                NA
#> XKX      3.5026405                 NA             NA                NA
#> YEM     -5.7036924                 NA             NA                NA
#> ZAF     -0.2084907         0.83394060      -1.533726         0.2149429
#> ZMB      1.9830446        -0.63088082             NA                NA
#> ZWE      1.2915497        -0.05297394             NA        -3.1003477
#>     AG.CON.FERT.ZS AG.LND.AGRI.K2
#> WSM     5.16498292    -0.65382289
#> XKX             NA             NA
#> YEM    14.88937834     0.01804223
#> ZAF     2.20864028    -0.08807695
#> ZMB     4.42032159     0.37414717
#> ZWE    -0.01642054     0.86883765

And here are some summary statistics for each variable:

sapply(as.data.frame(data), summary)
#>         NY.GDP.PCAP.KD NY.GDP.PCAP.KD.lag AG.AGR.TRAC.NO AG.CON.FERT.PT.ZS
#> Min.        -5.7036924         -2.7562067      -1.533726       -16.9560997
#> 1st Qu.     -0.1431228          0.7235014       1.308611        -2.9008332
#> Median       1.0235845          1.7697597       2.876800        -1.2511855
#> Mean         1.1094147          1.9232678       3.856278        -1.6759932
#> 3rd Qu.      2.4052532          2.8698123       5.600846         0.2268538
#> Max.         7.1613101         12.7823340      20.814750         7.3208970
#> NA's         9.0000000         73.0000000     134.000000       146.0000000
#>         AG.CON.FERT.ZS AG.LND.AGRI.K2
#> Min.         -6.526751    -6.62157767
#> 1st Qu.       1.310606    -0.29306446
#> Median        4.326556     0.01489903
#> Mean          4.329299     0.05915160
#> 3rd Qu.       6.856201     0.57567407
#> Max.         15.949830     2.23869809
#> NA's         80.000000     7.00000000

The columns of the data represent the following variables:

NY.GDP.PCAP.KD: GDP per capita (constant 2015 US$)
AG.AGR.TRAC.NO: Agricultural machinery, tractors
AG.CON.FERT.PT.ZS: Fertilizer consumption (% of fertilizer production)
AG.CON.FERT.ZS: Fertilizer consumption (kilograms per hectare of arable land)
AG.LND.AGRI.K2: Agricultural land (sq. km)

Modelling

We use the AIC metric to find four best explanatory models. Note that we restrict the modelset by setting a maximum value for the number of equations allowed in the models. Note that “intercept” and “lag” of the dependent variable are included in all equations by numFixPartitions argument.


search_res <- search.sur(data = get.data(data, endogenous = 1),
                         combinations = get.combinations(sizes = c(1,2,3),
                                                         numTargets = 1,
                                                         numFixPartitions = 2), 
                         metric <- get.search.metrics(typesIn = c("aic")),
                         items = get.search.items(bestK = 4))
print(search_res)
#> LDT search result:
#>  Method in the search process: SUR 
#>  Expected number of models: 5, searched: 5 , failed: 0 (0%)
#>  Elapsed time: 0.01687205 minutes 
#>  Length of results: 4 
#> --------
#>  Target (NY.GDP.PCAP.KD):
#>    Evaluation (aic):
#>       Best model:
#>        endogenous: NY.GDP.PCAP.KD
#>        exogenous: (3x1) (Intercept), NY.GDP.PCAP.KD.lag, AG.CON.FERT.PT.ZS
#>        metric: 213.8385
#> --------
#>  ** results for 4 best model(s) are saved

The output of the search.SUR() function does not contain any estimation results, but only the information required to replicate them. The summary() function returns a similar structure but with the estimation results included.

search_sum <- summary(search_res)

The following code generates a table for presenting the result.

models <- lapply(0:3, function(i)
  search_sum$results[which(sapply(search_sum$results, function(d)
    d$info==i && d$typeName=="best model"))][[1]]$value)
names(models) <- paste("Best",c(1:4))
table <- coefs.table(models, latex = FALSE, 
                     regInfo = c("obs", "aic", "sic"))

(Automatically Selected) Determinants of long-run GDP per capita growth
	Best 1	Best 2	Best 3	Best 4
(Intercept)	0.34	0.80^*	0.41	0.85^***
NY.GDP.PCAP.KD.lag	0.41^*	-0.10	0.20^*	0.04
AG.CON.FERT.PT.ZS	0.08
AG.AGR.TRAC.NO		0.07
AG.CON.FERT.ZS			0.08^**
AG.LND.AGRI.K2				0.21
obs	51	58	106	133
aic	213.84	234.47	430.61	546.35
sic	219.63	240.65	438.60	555.02

Conclusion

This package can be a recommended tool for empirical studies that require reducing assumptions and summarizing uncertainty analysis results. This vignette is just a demonstration. There are indeed other options you can explore with the search.sur() function. For instance, you can experiment with different evaluation metrics or restrict the model set based on your specific needs. Additionally, there’s an alternative approach where you can combine modeling with Principal Component Analysis (PCA) (see estim.sur() function). I encourage you to experiment with these options and see how they can enhance your data analysis journey.

SUR (longrun output growth)

Introduction

Data

Modelling

Conclusion

References