saeSim
is a Package for the R language. It is developed to make the data simulation process more compact and yet flexible enough for customization. It is designed to suffice in the context of small area estimation.
Consider a linear mixed model. It contains two components. A fixed effects part, and an error component. The error component can be split into a random effects part and a model error. All components to be simulated can simply be added to data.
library(saeSim)
setup <- sim_base() %>% sim_gen_x() %>% sim_gen_e() %>%
sim_resp_eq(y = 100 + 2 * x + e) %>% sim_simName("Doku")
setup
## idD idU x e y
## 1 1 1 -1.25381 2.93784 100.43
## 2 1 2 0.54743 -2.12883 98.97
## 3 1 3 -1.40585 1.01607 98.20
## 4 1 4 -1.77577 -2.26078 94.19
## 5 1 5 -0.43167 0.07433 99.21
## 6 1 6 0.08791 -0.62434 99.55
sim_base()
is responsible to supply a data.frame
to which variables can be added. The default is to create a data.frame
with indicator variables idD
and idU
(2-level-model), which uniquly identify observations. sim_resp
will add a variable y
as response.
sim
. It will return a list
containing data.frame
s as elements:dataList <- sim(setup, R = 10)
simData <- sim_base() %>% sim_gen_x() %>% sim_gen_e() %>% as.data.frame
For the simulation set-up you start with the 'base', which is just a data.frame
.
base_id(nDomains, nUnits)
- nDomains
specify the number of domains/cluster/areas in the data - nUnits
the number of observations in each domain.This section gives an overview of data which can be generated.
There are several pre-configured set-ups you can use for an easy start.
sim_base_lm()
will return a simulation set-up for a linear model, i.e. one regressor and one error component.sim_base_lmm()
a linear mixed modelsim_base_lmc()
a linear model with contamination in the model error (5% outliers in each area)sim_base_lmmc()
a linear mixed model with contamination in the model error and random effect (5% outliers in each area and 5% of the areas are outliers)If you are interested in aggregated information you can either draw directly from the model when specifying nUnits = 1
or use the aggregate component. Aggregating the data is another component which can be used on the population or sample. The aggregation will simply be done after the sampling, if you haven't specified any sampling component, the population is aggregated (makes sense if you draw samples directly from the model).
sim_base_lm() %>% sim_agg()
## Source: local data frame [6 x 4]
##
## idD x e y
## 1 1 0.69208 -0.87883 99.81
## 2 2 0.07763 -0.88760 99.19
## 3 3 -0.07306 0.53755 100.46
## 4 4 -0.38365 -0.07519 99.54
## 5 5 -0.12094 -0.47715 99.40
## 6 6 -0.82195 0.25876 99.44
You will want to check your results regularly to see how things will work out. When working with sim_setup
objects there are some methods supplied to do that, without simulating redundant data all the time:
show
- this is the print
method for S4-Classes. You don't have to call show
explicitlyplot
- will call smoothScatter
for visualizing the dataautoplot
- Will imitate smoothScatter
with ggplot2setup <- sim_base_lmm()
plot(setup)
library(ggplot2)
autoplot(setup)
autoplot(setup, "e")
autoplot(setup %>% sim_gen_vc())