Introduction

saeSim is a Package for the R language. It is developed to make the data simulation process more compact and yet flexible enough for customization. It is designed to suffice in the context of small area estimation.

Data Set-up

Consider a linear mixed model. It contains two components. A fixed effects part, and an error component. The error component can be split into a random effects part and a model error. All components to be simulated can simply be added to data.

library(saeSim)
setup <- sim_base() %>% sim_gen_x() %>% sim_gen_e() %>% 
  sim_resp_eq(y = 100 + 2 * x + e) %>% sim_simName("Doku")
setup
##   idD idU        x        e      y
## 1   1   1 -1.25381  2.93784 100.43
## 2   1   2  0.54743 -2.12883  98.97
## 3   1   3 -1.40585  1.01607  98.20
## 4   1   4 -1.77577 -2.26078  94.19
## 5   1   5 -0.43167  0.07433  99.21
## 6   1   6  0.08791 -0.62434  99.55

sim_base() is responsible to supply a data.frame to which variables can be added. The default is to create a data.frame with indicator variables idD and idU (2-level-model), which uniquly identify observations. sim_resp will add a variable y as response.

dataList <- sim(setup, R = 10)
simData <- sim_base() %>% sim_gen_x() %>% sim_gen_e() %>% as.data.frame

sim_base

For the simulation set-up you start with the 'base', which is just a data.frame.

sim_gen

This section gives an overview of data which can be generated.

Pre-configured set-ups

There are several pre-configured set-ups you can use for an easy start.

sim_agg

If you are interested in aggregated information you can either draw directly from the model when specifying nUnits = 1 or use the aggregate component. Aggregating the data is another component which can be used on the population or sample. The aggregation will simply be done after the sampling, if you haven't specified any sampling component, the population is aggregated (makes sense if you draw samples directly from the model).

sim_base_lm() %>% sim_agg()
## Source: local data frame [6 x 4]
## 
##   idD        x        e      y
## 1   1  0.69208 -0.87883  99.81
## 2   2  0.07763 -0.88760  99.19
## 3   3 -0.07306  0.53755 100.46
## 4   4 -0.38365 -0.07519  99.54
## 5   5 -0.12094 -0.47715  99.40
## 6   6 -0.82195  0.25876  99.44

Methods

You will want to check your results regularly to see how things will work out. When working with sim_setup objects there are some methods supplied to do that, without simulating redundant data all the time:

setup <- sim_base_lmm()
plot(setup)

plot of chunk unnamed-chunk-6

library(ggplot2)
autoplot(setup)

plot of chunk unnamed-chunk-6

autoplot(setup, "e")

plot of chunk unnamed-chunk-6

autoplot(setup %>% sim_gen_vc())

plot of chunk unnamed-chunk-7