Log-Multiplicative Association (LMA) Models are special cases of log-linear models with two-way interactions and are extensions of the RC(M) association model for two variables to multivariate categorical data. The variables may be dichotomous or multi-category (ploytomous). In LMA models, a multiplicative structure is imposed on the (matrices) of interaction parameters; thereby, reducing the number of parameters and easing interpretation and descriptions of relationships between variables. For example, 20 5-category variables results in a cross-classification with 9.536743e+13 cells. Maximum likelihood estimation (MLE) for small cross-classifications can be achieved by the ‘gnm’ package (Turner and Firth (2020)), and other. For two-tables the ‘logmult’ package (Bouchet-Valat et al. (2020)), which is a wrapper function for ‘gnm’. However, MLE becomes unfeasible for moderate to large number of variables. This package uses pseudo-likelihood estimation to remove limitations on the number of variables and number of categories per variable. For LMA models, pseudo-likelihood estimation has been shown to be a viable alternative to MLE that yields parameter estimates nearly identical to MLE ones (Paek (2016), Paek and Anderson (2017)). Furthermore, pseudo-likelihood estimators are consistent and multivariate normaly distributed (Arnold and Straus (1991), Geys, Molenberghs, and Ryan (1999)).
LMA models have been derived from a number of different starting points, including statistical graphical models (Anderson and Vermunt (2000)), (multidimensional) item response theory models (Anderson and Yu (2007), Anderson, Li, and Vermunt (2007), Anderson, Verkuilen, and Peyton (2010), Chen et al. (2018), Hessen (2012), Holland (1990), Marsman et al. (2018)), underlying multivarite normality (Goodman (1981), Becker (1989), Rom and Sarkar (1990), Wang (1987), Wang (1997)), distance based models (Rooij (2007), Rooij (2009), Rooij and Heiser (2005)), and others. The LMA models fit by the pleLMA package are log-linear model of independence (a baseline model), models in the Rasch family of item response theory models), flexible generalized partial credit models (GPCM), and the Nominal response model. For more details see Anderson, Kateri, and Moustaki (2021).
This document starts with a brief description of the models, and how they are related to item response theory models, and in the following section, the algorithm is described. Subsequently, the use of the package is explained and illustrated using data included with the package. The data included in the packages consists of 42 four-category items designed to measure three different constructs, in our example, we only use 9 items (3 for each construct) and a sub-sample of the data 1000 cases. In the final section, future developments are sketched.
Let \(\mathbf{Y}\) be an \((I\times 1)\) vector of random categorical variables and \(\mathbf{y}= (y_1, \ldots, y_I)'\) is it’s realization where \(y={y_i}\). The Both \(i\) and \(k\) will denote variables and \(j_i\) and \(\ell_k\) will denote categories of variables \(i\) and \(k\), respectively; however, to keep the notation simpler the subscripts on categories will be suppressed. The subject \(n=1,\ldots, N\) is used to index individuals (cases), but is suppressed until we describe the algorithm. The most general LMA for the probability that \(\mathbf{Y}=\mathbf{y}\) is \[\log (P(\mathbf{Y}=\mathbf{y})) = \lambda + \sum_{i=1} \lambda_{ij} + \sum_i \sum_{k>i} \sum_m \sum_{m'\ge m} \sigma_{mm'}\nu_{ijm}\nu_{k\ell m'}, \] where \(\lambda\) ensures that probabilities sum to 1, \(\lambda_{ij}\) is the marginal effect parameter for category \(j\) of variable \(i\), \(\sigma_{mm'}\) is the association parameter for dimensions \(m\) and \(m'\), and \(\nu_{ijm}\) and \(\nu_{k\ell m'}\) are category scale values for items \(i\) and \(k\) on dimensions \(m\) and \(m'\), respectively. The association parameters measure the strength of the association between items and the category scale values represent the structure.
When derived as latent variable model using statistical graphical model, observed discrete variables (i.e., \(\mathbf{y}\)) are related to unobserved (potentially correlated) continuous ones (i.e., \(\mathbf{\theta}= \{\theta_m\}\)). The assumptions required to yield the general LMA model given above are that
There are no latent variables in the LMA, but the parameters for the distribution of the latent variables equal or are functions of the parameters of the LMA. The elements of the conditional covariance matrix \(\mathbf{\Sigma}\) are the \(\sigma_{mm'}\) parameters in the LMA model. The conditional means equal \[E(\theta_m|{\mathbf{y}}) = \sum_m \sigma_{mm} \left(\sum_i \nu_{ijm}\right) + \sum_{m'}\sigma_{mm'} \left(\sum_i \nu_{ijm'}\right).\]
The LMA above is very general and represents the model where each categorical variable is directly related to each of the latent variables and all latent variables are correlated. This model can be fit with sufficient identification constraints, but the current version of the pleLMA package only fits models where each categorical variable loads on one and only one latent variable. This is not a limitation of pseudo-likelihood estimation, but of the current package. The identification constraints used in the package are that \(\sum_j\lambda_{ij}= 0\), \(\sum_j \nu_{ijm}= 0\). Scaling constraints are also required, but these differ depending one which specific case of the LMA fit fit to data. The scaling constraints are given for each case of the model.
Different IRT models can be fit by the placing restrictions the \(\nu_{ijm}\) parameters. For models in the Rasch (‘rasch’) family, the restrictions are that \[\nu_{ijm} = x_j,\] where the \(x_j\)s are typically equally spaced integers (e.g., 0, 1, 2, 3) and are the same for all items. The generalized partial credit model (‘gpcm’) places fewer restrictions on the \(\nu_{ijm}\) by allowed different weights of items and dimensions; namely, \[\nu_{ijm} = \alpha_{im}x_j.\] The “nominal” model places no restrictions on the category scale values, \(\nu_{ijm}\).
As a default, the package sets \(\nu_{ijm}\) to equally spaced numbers where \(\sum_{j}\nu_{ijm}=0\) and \(\sum_j\nu_{ijm}^2=1\). These are starting values when fitting the nominal model and are the fixed \(x_j\)’s for the Rasch and GPCM. For both Rasch and GPCM, the \(x_j\)’s are set equal to equally spaced numbers; however, in the LMA framework, the \(x_j\)’s need not be equally spaced nor the same over items. In other words, the pleLMA package allows for flexible category scaling and the user can set the \(x_j\) to whatever they want.
Important for the pseudo-likelihood algorithm are the conditional distributions of the probability of a response on one item given values on all the other. The algorithm maximizes the product of the (log) likelihoods for all the conditionals, which can be done using maximum likelihood of the conditional distributions. In short, the conditional models that pleLMA works with is \[ P(Y_{in}=j|\mathbf{y_{-i,n}}) = \frac{\exp (\lambda_{ij} + \nu_{ijm} \sum_{k\ne i}\sum_{m'} \sigma_{mm'}\nu_{k\ell m'})} { \sum_h \exp(\lambda_{ih} + \nu_{ihm} \sum_{k\ne i}\sum_{m'} \sigma_{mm'}\nu_{k\ell m'})} \hspace{1in} \\ = \frac{\exp (\lambda_{ij} + \nu_{ijm}\tilde{\theta}_{-i,mn})} { \sum_h \exp(\lambda_{ih} + \nu_{ihm} \tilde{\theta}_{-i,mn})}\qquad\qquad (1)\\ = \frac{\exp (\lambda_{ij} + \sum_{m'}\sigma_{mm'}\ddot{\theta}_{ijm'n})} { \sum_h \exp(\lambda_{ih} + \sum_{m'}\sigma_{mm'}\ddot{\theta}_{ijm'n})}, \qquad (2)\] where \(n\) indicates a specific individual (subject, case etc), \(\mathbf{y_{n,-i}}\) are responses by person \(n\) to items excluding item \(i\), the subscript \(\ell(n)\) indicates that person \(n\) selected category \(\ell\) on item \(k\), and the predictor variables \(\tilde{\theta}_{-i,mn}\) and \(\ddot{\theta}_{ijm'n}\) are functions of the parameters representing interactions. The predictor \(\tilde{\theta}_{-i,mn}\) in (1) are weighted sums of person \(n\)s category scale values for \(k\ne i\), \[\tilde{\theta}_{-i,mn} = \sum_{k\ne i} \sum_{m'} \sigma_{mm'}\nu_{k\ell(n)m'}.\] Fitting the conditional multinomial logistic regression model (1) using \(\tilde{\theta}_{-i,mn}\) yields estimates of \(\lambda_{ij}\) and \(\nu_{ijm}\). We refer to these are item regressions.
In model (2), the predictor is defined as \[\ddot{\theta}_{ijm'n} = \nu_{ijm}\sum_{k\ne i} \nu_{kl(n)m'}.\] The predictor \(\ddot{\theta}_{ijm'n}\) not only depends on the individual, but also on the category \(j\) of item \(i\) that is modeled in (2). Using \(\ddot{\theta}_{ijm'n}\) yields estimates of \(\lambda_{ij}\) and all of the \(\sigma_{mm'}\)s. Note that the \(\sigma_{mm'}\) parameters are restricted to be equal over equations for different items.
The alogrithm is modular. The first yields estimates of \(\lambda_{ij}\)s and \(\nu_{ijm}\) (nominal model) or \(a_{im}\) (GPCM), the second yields estimates of \(\lambda_{ij}\) and the \(\sigma_{mm'}\) parameters (uni-or multidimensional models), and the third combines the first two into a single algorithm (multidimensional GPCM and GPCM models).
All algorithms work with a Master data set that concatenates data sets for each item and the number of rows equals the number of cases \(\times\) number of items \(\times\) number of categories per item. Model (1) is fit to sub-sets of row of the Master data set for a specific (“item data”), and model (2) is fit to the entire Master data set (“sacked data”). The Master data set is properly formatted for input to ‘mnlogit’ and therefore so is the item data.
Algorithm I requires values for \(\sigma_{mm'}\). For uni-dimensional models, we can set \(\sigma_{11}=1\), and for multidimensional models we need to input a matrix of \(\sigma\)s. The matrix could be based on prior knowledge or obtained by running Algorithm II. By default, the ‘pleLMA’ package sets starting values for matrix of \(\sigma\)s equal to an identity matrix.
To obtain the \(a_{im}\) parameters, requires a slight change in the computation of the predictor variables; namely, \[\tilde{\theta}_{-i,mn} = x_j \sum_{k\ne i} \sum_{m'} \sigma_{mm'}\nu_{k\ell(n)m'}.\] The coefficient for this predictor variable will be \(a_{im}\).
By fitting (2) to the stacked the data, the equality restrictions on the \(\sigma_{mm'}\) parameters over the items are imposed.
A new step is added to Algorithm III: imposing a scaling constraint. This is required for the joint distribution (i.e., LMA model). Without the required identification constraints, the Algorithms will not converge. This can be seen in the scaling constraint, because for example, scale values could become very large and association parameters very small but their product remains the same. For convergence, the conditional covariance matrix is transformed into a conditional correlation matrix; that is, \(\sigma_{mm}^{*}=\sigma_{mm} \times c = 1\) and \(\sigma_{mm'}^{*}=\sigma_{mm'} \times \sqrt{c}\). The scale values also need to be adjusted, \(\nu_{ijm}^{*} = \nu_{ijm}/\sqrt{c}\). The method of imposing the scaling constraint differs from (???) who used an option in ‘PROC MDC’ in the SAS software that allows parameters to be fixed to particular values.
The order of steps 2 and 3 in Algorithm III is not of great importance, but is more a matter of convenience after the algorithm has converged. In Algorithm III, we started with getting up-dated estimates of the \(\nu_{ijm}\) parameters, which can set the algorithm in a good starting place (i.e., guard again non-singular estimates of \(\mathbf{\Sigma}\) in the first iteration). At convergence, the value of the maximum log likelihood from Algorithm III step 2 equals the sum over items of the maximum likelihoods from step 4. These are maximums of the log of the pseudo-likelihood function and should be equal.
Different models use the different algorithms as follows:
Dimensions | Model | Algorithm |
---|---|---|
0 | Independence | II |
1 | Rasch | II |
1 | GPCM | I |
1 | Nominal | I |
> 1 | Rasch | II |
> 1 | GPCM | III |
> 1 | Nominal | III |
Algorithm II was proposed by Anderson, Li, and Vermunt (2007) for models in the Rasch family, and Algorithms I and III for the nominal model were proposed and studied by Paek (2016) (Paek and Anderson (2017)). Algorithms I and III for the GPCM and adapting Algorithm II for the independence models are (as far as I know) novel here. Using relatively small data sets (simulated and different studies), the parameters estimates from MLE and PLE for the LMA models are nearly identical and \(r\ge .98\).
The pleLMA package uses base R for data manipulation, ‘stats’ for specifying formulas, and ‘graphics’ for plotting results. The current package uses the mnlogit package (Hasan, Wang, and Mahani (2016)) to the fit conditional multinomial models (i.e., discrete choice models) to the data. We expect that given the use of base R, stats and graphics that the package will be forward compatible with future releases of R.
The function “ple.lma” is the main function that takes as input data and model specifications, computes various constants and objects needed to fit the models, fits the model, and outputs results. Auxiliary functions are provided to aid in examining the result. The package is modular in nature and all functions can be run outside of the ple.lma function provided that the input to the function is provided.
The data, DASS (retrieved July, 2020 from OpenPsychometrics.org), consist of responses collected during the period of 2017 – 2019 to 42 items from the 38,776 respondents. Only a random sample of 1,000 is included with the package. The items were presented online to respondents in a random order. The items included in DASS are responses to scales designed to measure depression (d1–d14), anxiety (a1–a13), and stress (s1–s15). To view more information about the data (e.g., the response options and items),
The data should be in a data frame where rows are individuals or cases and columns are different variables. The rows can be thought of as response patterns or cells of a cross-classification of the variables. The categories for each variable should run from 1 to the number of categories. In this version of the package, the number of categories per variables should all the the same.
Further information can be found from the docmentation,
The dass data for this example consists of a subset N=250 cases and 3 items from each of three scale designed to measure depression (d1-d3), anxiety (a1-a3), and stress (s1-s3). The input data frame is call inData and created by
data(dass)
items.to.use <- c("d1","d2","d3","a1","a2","a3","s1","s2","s3")
inData <- dass[1:250,(items.to.use)]
head(inData)
## d1 d2 d3 a1 a2 a3 s1 s2 s3
## 1 3 3 3 1 3 1 1 1 2
## 2 2 2 2 1 1 1 2 1 2
## 3 4 2 4 1 1 1 4 4 4
## 4 1 4 1 2 1 1 3 2 2
## 5 2 4 3 1 1 1 3 3 3
## 6 3 3 2 4 3 1 4 4 4
Uni-dimensional model are those where there is only one latent trait (i.e., \(M=1\)). In graphical modeling terms, each categorical variable is directly connected to the single (latent) continuous variable.
Additional input objects are required to fully specify a model and the values of thes will differ depending on the specific structure and model desired. “InTraitAdj” is an \((M\times M)\) trait by trait adjacency matrix where a 1 indicates that traits are correlated and 0 uncorrelated. For uni-dimensional model this is simply
## [,1]
## [1,] 1
The the second object required input is an \((I\times M)\) item by trait adjacency matrix, which for simple uni-dimensional models is a vector of ones,
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 1
## [5,] 1
## [6,] 1
## [7,] 1
## [8,] 1
## [9,] 1
The final required object is to specify the model type. The possible types and default scaling of category scale values \(\nu_{ijm}\) are given:
The package defaults set \(x_j\) to equally spaced number centered at 0, which also act as starting values for the nominal model. For both Rasch and GPCMs, the \(x_j\)’s are set equal to equally spaced numbers; however, in the LMA framework, the \(x_j\)’s need not be equally spaced nor the same over items. In other words, the pleLMA package allows for flexible category scaling. The pleLMA package allows the user to set these number of desired values.
For the Rasch model, elements of \(\mathbf{\Sigma}\) are all estimated; however, for the GPCM and Nominal models, one scaling constraint is required for each latent variable. When fitting the model to data, we used \(\sigma_{mm}=\) for all \(m\), hence we estimate a conditional correlation matrix. An alternative constraint where for one item per \(m\) is constrained such that \(\sum_{j} \nu_{ijm}^2=1\). An auxiliary function is provided to change the scaling constraint to this after the model has been fit to data. With the alternative identification constraints, the strength and structure are teased appart. This is an option that will be described later.
The minimal commands for each type of model are illustrated below. Since there are no interaction in the log-linear model of independence, only 2 objects are input,
## [1] "No errors detected in the input"
## Basic set up is complete
Note that two messages are printed to the console: “No errors detected in the input” and “Basic set up is complete”. The first step in ‘ple.lma’ is to check the input for 11 possible errors. If an error is detected, the function will stop and issue an error message stating the problem. The second step is to set up the data and objects needed by all models. The third step is to call specific functions that fit the specified model type and the final step is to set up output. These steps are performed for all models.
All other models required ‘inItemTraitAdj’ and `inTraitAdj’. For models in the Rasch family, we simply change the model.type as follows:
#--- Model in the rasch family
r1 <- ple.lma(inData, model.type="rasch", inItemTraitAdj, inTraitAdj)
## [1] "No errors detected in the input"
## Basic set up is complete
The same messages as for the log-linear models are printed to the console.
The independence log-linear model and models in the Rasch family only involve iterations within the package ‘mnlogit’. The tolerance and convergence information reported for these models is from ‘mnlogit’.
The gpcm and nominal models involve iteratively fitting discrete choice models (i.e., conditional multinomial logistic regression models). In addition to messages about errors and set up, information about the progress of the algorithm is printed to the console. For the GPCM, the minimal input is
#--- Generalized partial credit model
g1 <- ple.lma(inData, model.type="gpcm", inItemTraitAdj, inTraitAdj)
## [1] "No errors detected in the input"
## Basic set up is complete
## [1] "3.69889486681001 > 1e-06"
## [1] "1.08069902531423 > 1e-06"
## [1] "0.487348105030833 > 1e-06"
## [1] "0.0911130954370947 > 1e-06"
## [1] "0.0120229012337063 > 1e-06"
## [1] "0.00599175559784726 > 1e-06"
## [1] "0.0016258872317394 > 1e-06"
## [1] "0.000135284849534401 > 1e-06"
## [1] "6.5337454998371e-05 > 1e-06"
## [1] "2.59626516481148e-05 > 1e-06"
## [1] "3.67867289696733e-06 > 1e-06"
## [1] "The Alogithm has converged: 7.24560891285364e-07 < 1e-06"
The fist number is the convergence criterion, which is the maximum of the absolute difference between log likelihoods for the item regressions on the current and previous iteration. The second number is the tolerance with determines whether the algorithm has converged. In this case, the criterion decreases until it is less than the tolerance (default is 1e-06).
The minimal input for the nominal model is
## [1] "No errors detected in the input"
## Basic set up is complete
## [1] "236.649900724201 > 1e-06"
## [1] "2.71035273432182 > 1e-06"
## [1] "0.840025530686773 > 1e-06"
## [1] "0.287426180234718 > 1e-06"
## [1] "0.0515304028613741 > 1e-06"
## [1] "0.00793552086946647 > 1e-06"
## [1] "0.00194535478169655 > 1e-06"
## [1] "0.000510250419097247 > 1e-06"
## [1] "5.78027153892435e-05 > 1e-06"
## [1] "3.59947190986531e-05 > 1e-06"
## [1] "1.67384064297948e-05 > 1e-06"
## [1] "2.89708532363875e-06 > 1e-06"
## [1] "Alogithm has converged: 3.44122241813238e-07 < 1e-06"
It produces the same messages as did the GPCM model.
The option “starting.sv” is an (I x J) matrix of starting scale values (i.e., the \(\nu_{ijm}\)s) for nominal models or are the fixed category scores \(x_j\) for Rasch and GPCMs. By default the program sets these to equally spaced values centered around 0. If you want to use alternate values, this option can be used. For example, instead of equally spaced, centered and scaled \(x_j\) for a GPCM model, the pla.lma function has the option to input other values for \(x_j\)s. For example,
xj <- matrix(c(0, 1, 2, 5), nrow=9, ncol=4, byrow=TRUE)
g1b <- ple.lma(inData, inItemTraitAdj, inTraitAdj, model.type="gpcm", starting.sv=xj)
## [1] "No errors detected in the input"
## Basic set up is complete
## [1] "19.5690090045299 > 1e-06"
## [1] "16.3282987513332 > 1e-06"
## [1] "2.50602954945845 > 1e-06"
## [1] "0.465388638886907 > 1e-06"
## [1] "0.186234011089994 > 1e-06"
## [1] "0.0270392602000697 > 1e-06"
## [1] "0.0105616367803805 > 1e-06"
## [1] "0.00420017331703093 > 1e-06"
## [1] "0.000631168798918225 > 1e-06"
## [1] "0.000126195818211272 > 1e-06"
## [1] "5.83288578468455e-05 > 1e-06"
## [1] "1.27835623970896e-05 > 1e-06"
## [1] "1.80859996135041e-06 > 1e-06"
## [1] "The Alogithm has converged: 7.69594919347583e-07 < 1e-06"
To determine which model is better, we take a quick look at the value on the maximum of the log pseudo-likelihood function (MLPL) using by
## [1] -2427.44
## [1] -2537.687
Note that with non-equally spaced nor centered scores, the maximum of the log of the pseudo-likelihood (MLPL) less than that from the original model; therefore, the original model is the better fitting model. Alternative scores \(x_j\) can also be input for models in the Rasch family.
Using ‘starting.sv’ with the Nominal model, sets the starting values for the \(\nu_{ijm}\) parameters. For all models, the starting values for \(\mathbf{\Sigma}\) can also be input. For example, we should be able to fit the nominal model is fewer iterations if we start it using the resulting parameter estimates. The tolerance can also be changed. For example,
sv <- n1$estimates[, 6:9]
sigma <- n1$Phi.mat
n1.alt <- ple.lma(inData, model.type="nominal", inItemTraitAdj, inTraitAdj,
tol= 1e-04, starting.sv = sv, starting.phi= sigma)
## [1] "No errors detected in the input"
## Basic set up is complete
## [1] "238.369981206883 > 1e-04"
## [1] "Alogithm has converged: 2.50882976615685e-08 < 1e-04"
Note that the algorithm converged after 1 full iteration and the criterion is much smaller than the tolerance that we set.
One trick to check whether the nominal or GPCM model is better is to use the parameters from the Nominal model as input to the GPCM one. If they are equally good, then the \(a\) parameters from the GPCM should be close to one.
g1.alt <- ple.lma(inData, model.type="gpcm", inItemTraitAdj, inTraitAdj,
tol= 1e-04, starting.sv = sv, starting.phi= sigma)
## [1] "No errors detected in the input"
## Basic set up is complete
## [1] "1.28412834068641 > 1e-04"
## [1] "0.205658442700155 > 1e-04"
## [1] "0.0822753149610662 > 1e-04"
## [1] "0.0142889178932819 > 1e-04"
## [1] "0.00129086467723027 > 1e-04"
## [1] "0.000694908491823298 > 1e-04"
## [1] "0.000192646042705746 > 1e-04"
## [1] "The Alogithm has converged: 1.61028816023645e-05 < 1e-04"
## d1 d2 d3 a1 a2 a3 s1 s2
## 0.6454977 0.6719836 0.6542037 0.3093343 0.6144895 0.4321719 0.7960569 0.7968151
## s3
## 0.9982284
The uni-dimensional Nominal model appears to be better than the GPCM model. These \(a\) parameter estimates suggest that a 3 dimensional model may be appropriate. The parameters three parameter within each scale are similar in value.
The ‘pleLMA’ package produces large number of objects, some of which are NULL for some models due to the requirements of particular used to fit the model. Below is a table of objects and a brief description of them, as well as whether an object is used in a particular alogrithm (i.e., the object is not NULL)
Object | Description | Algorithm |
---|---|---|
model.type | model type fit to data | ☑ |
TraitByTrait | trait adjacency matrix | all |
ItemByTrait | item \(\times\) trait adjacency | all |
item.by.trait | vector indicating trait items load on | I, III |
ItemNames | names of items used | all |
PhiNames | names of \(\ddot{\theta}_{ijmn}\) in stacked data | II, III |
formula.item | formula for item data | I, III |
formula.phi | formula for stacked regressions | II, III |
npersons | number of individuals | all |
nitems | number of items | all |
ncat | number of categories per item | all |
nless | ncat - 1 = number of estimated \(\lambda_{ij}\) and | all |
. | \(\nu_{ijm}\) (or \(a_{im}\))per item | |
Maxnphi | max number of \(\sigma\)s estimated | II, III |
ntraits | number of unobserved traits | II, III |
starting.sv | starting category values/fixed scores | all |
tol | convergence criterion used | I, III |
criterion | max difference between log(like) on last 2 iterations | all |
item.log | log file of \(\hat{\nu}_{ijm}\)s (or \(a_{im}\)) and | I, III |
. | \(\hat{\lambda}_{ij}\) | |
phi.log | log file of \(\hat{\phi}_{mm'}\) & \(\hat{\lambda}_{ij}\) | II, III |
estimates | item by estimated item parameters and log(Likelihood) | all |
Phi.mat | estimated \(\mathbf{\Sigma}\)s | all |
item.mnlogit | list of ‘mnlogit’ output from item regressions after | I, III |
. | convergence | |
phi.mnlogit | ‘mnlogit’ output for \(\sigma\)s from stacked regression | II, III |
. | after convergence | |
mlpl.item | log pseudo-like function from item regressions | I, III |
mlpl.phi | log pseudo-like function from stacked regression | II, III |
AIC | Akaike information criteria (smaller is better) | all |
BIC | Bayesian information criteria (smaller is better) | all |
Typically not all of the objects output need to be examined. The function ‘lma.summary()’ organizes a summary of output that is generally of more interest into 5 parts: [1] A report of information about the data, convergence and fit statistics, [2] the specified Trait \(\times\) Trait adjacency matrix, [3] the specified Item \(\times\) Trait matrix, and [4] the estimated \(\lambda_{ij}\) and \(\nu_{ijm}\) (or \(x_j\)’s for gpcm and rasch models). For example, the basic summary report is
## [,1]
## [1,]
## [2,] =========================================================
## [3,] Pseudo-likelihood Estimation of nominal model
## [4,] =========================================================
## [5,] Report Date: 2021-03-14 13:16:58
## [6,]
## [7,] Data Information:
## [8,] Number of items 250
## [9,] Number of items 9
## [10,] Number of categories per item 4
## [11,] Number of dimensions: 1
## [12,]
## [13,] Model Specification:
## [14,] Number of unique parameters 54
## [15,] Number of unique marginal effects: 27
## [16,] Number of unique category parameters (nu's or a's): 27
## [17,] Number of unique association parameters (phis): 0
## [18,]
## [19,] Convergence Information:
## [20,] Number of iterations: 13
## [21,] Tolerence set tol 1e-06
## [22,] Criterion 3.44122241813238e-07
## [23,]
## [24,] Model Fit Statistics:
## [25,] Maximum log pseudo-likelihood function: -2400.06537390311
## [26,] AIC: 2346.06537390311
## [27,] BIC: 4501.97185824166
## [28,]
The AIC and BIC are computed as follows and may differ from the output from mnlogit: \[\mbox{AIC} = -2*mlpl + p\] \[\mbox{BIC} = -2*mlpl + p*\log(N)\ ,\] where \(mlpl\) is the maximum of the pseudo-likelihood function, \(p\) is the number of parameters, and \(N\) is the sample size. Models with smaller values are better. Note that AIC tends to select more complex models and BIC tends to select simpler models. Deciding on a model should not rest solely on global statistics.
To complete the model specification,
## [,1]
## [1,] 1
## NULL
The last two are objects that contain the parameter estimates,
## lam1 lam2 lam3 lam4 nu1
## d1 -271.8619 -0.3072021 0.5729386 0.20220949 -0.46794596 -0.6244714
## d2 -278.1155 -0.6993158 0.3928989 0.32443810 -0.01802119 -0.6136358
## d3 -269.7006 -0.4754176 0.2657661 0.09164878 0.11800270 -0.6412963
## a1 -299.0364 0.3177562 0.3023306 -0.25703234 -0.36305449 -0.2799779
## a2 -238.3700 1.1304292 0.9194283 -0.04690730 -2.00295021 -0.6211806
## a3 -265.5721 0.6755937 0.4940931 -0.32983550 -0.83985136 -0.3996865
## s1 -262.8586 -1.8326803 0.8193632 0.53566652 0.47765061 -0.8423171
## s2 -267.1777 -1.1236020 0.4818352 0.44781900 0.19394787 -0.7782509
## s3 -247.3727 -0.9227078 0.6880824 0.14336625 0.09125915 -0.9037258
## nu2 nu3 nu4
## d1 -0.07818970 0.17098361 0.5316775
## d2 -0.12937559 0.20606888 0.5369425
## d3 -0.06718668 0.18456161 0.5239214
## a1 -0.06605803 -0.01894353 0.3649794
## a2 -0.21095776 -0.01795778 0.8500962
## a3 -0.05546513 0.10737125 0.3477804
## s1 -0.09575662 0.31423719 0.6238365
## s2 -0.13235774 0.30384290 0.6067658
## s3 -0.28199819 0.41075799 0.7749660
## [,1]
## [1,] 1
The rows of `estimates’ correspond to items and the columns the parameter estimates. The “lam”s are the marginal effects for categories 1 through 4 and the “nu”s are the category scale values. Note that only three \(\lambda_{ij}\)s and three \(\nu_{ijm}\)s were estimated. The fourth value is found by the identification constraint on the locations of the parameters; that is, \(\lambda_{i1} = - \sum_{j=2}^4\lambda_{ij}\) and \(\nu_{i1} = - \sum_{j=2}^4 \nu_{ij}\). The first column of ‘estimates’ contains the values of the log-likelihood from using MLE to fit each item’s (conditional on the rest) data.The sums of these log-likelihoods equals MLPL.item.
For the GPCM (and models in the Rasch family), ‘estimates’ includes the \(x_j\)’s used to fit the model to data.
## loglike lambda1 lambda2 lambda3 lambda4 a x1
## d1 -273.5212 -0.1366067 0.4895941 0.19915638 -0.552143755 0.8754070 -0.6708204
## d2 -279.1964 -0.5731578 0.2981472 0.33436123 -0.059350576 0.8728512 -0.6708204
## d3 -273.1440 -0.2915821 0.1145274 0.09507058 0.081984154 0.8497781 -0.6708204
## a1 -302.1068 0.2861255 0.2509015 -0.27432781 -0.262699199 0.4242113 -0.6708204
## a2 -243.3139 0.9285292 0.7411042 -0.37057057 -1.299062744 0.8250464 -0.6708204
## a3 -268.1178 0.7339107 0.4886953 -0.33246797 -0.890137988 0.5667876 -0.6708204
## s1 -266.8711 -1.1926597 0.5551032 0.40194028 0.235616226 0.9920283 -0.6708204
## s2 -269.6688 -0.8145342 0.3257054 0.43824226 0.050586572 1.0340729 -0.6708204
## s3 -251.5000 -0.8155969 0.6201395 0.19692038 -0.001463042 1.2499241 -0.6708204
## x2 x3 x4
## d1 -0.2236068 0.2236068 0.6708204
## d2 -0.2236068 0.2236068 0.6708204
## d3 -0.2236068 0.2236068 0.6708204
## a1 -0.2236068 0.2236068 0.6708204
## a2 -0.2236068 0.2236068 0.6708204
## a3 -0.2236068 0.2236068 0.6708204
## s1 -0.2236068 0.2236068 0.6708204
## s2 -0.2236068 0.2236068 0.6708204
## s3 -0.2236068 0.2236068 0.6708204
Again the first column are log-likelihoods for each item. The columns parameter estimates for \(\lambda_{ij}\)s, the “slope” parameters \(a\), and the \(x_j\)s are the fixed category scores.
Other potential useful information includes history of parameter estimates (i.e., log files) and actual output from ‘mnlogit’. Below are the classes for these objects:
Dimensions | Model | item.log | phi.log | item.mnlogit | phi.mnlogit |
---|---|---|---|---|---|
0 | independence | NULL | NULL | NULL | mnlogit |
1 | rasch | NULL | NULL | NULL | mnlogit |
1 | gpcm | list | NULL | list | NULL |
1 | nominal | list | NULL | list | NULL |
>1 | rasch | NUL | NUL | NULL | mnlogit |
>1 | gpcm | list | matrix | list | mnlogit |
>1 | nominal | list | matrix | list | mnlogit |
The package mnlogit is used to fit the model to either the “stacked” or item level data. The above commands return the output produced by ‘mnlogit’ from fitting models to stacked data. The prefix is “phi” is the name of the association parameter within the package, and is \(\sigma\) in the LMA at the beginning of this document. The phi parameters are estimated by stacking the conditional regressions over items and individuals. The phi parameters are the variances and covariances of the continuous variables conditional on observed response parameters (i.e., cells of the cross-classification of variables/items). All output that mnlogit normally returns can be extracted from phi.mnlogit and/or item.mnlogit.
To further examine the convergence of the algorithm, we can look at the log files and the convergence statistics for all parameters. For iterations (log file), we can print the value of the parameters for each iteration. The object “n1$item.log” is a list where the 3rd dimension is the item. The history of iterations for item 1 is
Alternatively these can be plotted them using the function
If you run the above command you see that algorithm gets very close to the final values in about 5 iterations, but continues on to meet the more stringent given by “tol”.
Another view of how well the algorithm converged, we can look at the differences between values from the last two iterations, which is given for the log-likelihoods and all item parameters by the function
## $diff.last.Item
## [1] 1 2 3 4 5 6 7 8 9
##
## $diff.last.LogLike
## [1] -1.135754e-07 -1.714000e-09 -3.595849e-07 1.972672e-07 3.441222e-07
## [6] 8.999734e-08 7.582196e-08 4.030994e-09 1.782178e-07
##
## $diff.last.Lambda2
## [1] -2.899275e-09 -2.412602e-09 -2.610266e-09 -8.206348e-10 2.656871e-09
## [6] 7.331320e-10 -3.150512e-10 -5.599418e-10 -4.940359e-11
##
## $diff.last.Lambda3
## [1] 2.384649e-09 3.831433e-09 1.844995e-09 9.264018e-10 2.354453e-09
## [6] 6.403919e-10 2.294988e-09 8.630995e-10 1.188768e-09
##
## $diff.last.Lambda4
## [1] 1.350045e-08 1.568674e-08 1.537976e-08 7.358500e-09 -3.958744e-09
## [6] -1.727329e-09 3.935193e-09 2.510343e-09 1.418283e-09
##
## $diff.last.Nu2
## [1] 1.853086e-09 2.676497e-09 2.720449e-09 1.612524e-10 -1.283265e-09
## [6] -5.427573e-10 -1.048473e-09 -9.378177e-10 -2.186578e-09
##
## $diff.last.Nu3
## [1] -2.282709e-09 9.101780e-11 -1.401513e-10 1.724816e-09 -4.642253e-11
## [6] 2.835269e-10 2.990071e-09 2.324798e-09 2.745951e-09
##
## $diff.last.Nu4
## [1] -9.994187e-09 -9.844689e-09 -9.974597e-09 -2.001206e-09 6.178669e-09
## [6] 2.790979e-09 5.502973e-09 4.125997e-09 3.617784e-09
##
## $criterion.loglike
## [1] 3.441222e-07
##
## $criterion.items
## [1] 6.366022e-08
There is a “\(\$\)diff.last” for the log likelihoods, lambda parameters, and scale values. The length of each of these corresponds to the number of items. Even though tol\(=1e-06\), the largest differences for the item parameters is \(1.7e-09\), the rest are all smaller. For the GPCM model, the command is
For nominal model, the function “scalingPlot” graphs the scale values by integers and overlays a linear regression line. These can be used to determine the order of categories and whether a linear restriction could be imposed on them, such as in the simpler GPCM. The plots also convey how strongly related the items are to the latent trait. To produce these plots
If you run this command, you will notice that for the variables d2 and a3, the scale value are close to linear; whereas, the others deviate from linearity to varying degrees. The items a2, s1 and s2 have the steeper slopes, which indicate that these two items are more strongly related to the latent variable than the others.
To fit the models, we imposed identification constraints, which for the GPCM and Nominal models we set the conditional variances equal to 1. However, we can change this such that a scaling constraint is put on one item (for each latent variable) and estimate the phis (i.e., sigma’s). This teases apart the strength and structure of the relationships between items and the latent trait. One item should be selected and is indicated using a vector anchor. To do the rescaling using item d1,
anchor <- matrix(0, nrow=1, ncol=9)
anchor[1,1] <- 1
rescale <- ScaleItem(n1$item.log, Phi.mat=n1$Phi.mat, anchor=anchor, n1$item.by.trait, nitems=n1$nitems, nless=n1$nless, ncat=n1$ncat, ntraits=n1$ntraits, ItemNames=n1$ItemNames)
If the log-multiplicative models are being used to for measurement, estimates of values on the latent variable are be computed using the estimated item category scale values and conditional variances/covariances (see formula for \(E(\theta_m|\mathbf{y})\)). The function theta.estimtes will compute these values:
theta.r1 <- theta.estimates(r1, inData, scores=r1$estimates)
theta.gn1 <- theta.estimates(g1, inData, scores=g1$estimates)
theta.n1 <- theta.estimates(n1, inData, scores=n1$estimates)
The rows will correspond to individuals and colums to values for each latent variable (only 1 column for uni-dimensional models).
Since the items of DASS are designed to assess three difference constructs, we fit a 3-dimensional model and allow the latent variables to be conditional correlated (i.e, within response pattern). We only need to change inTraitAdj and inItemTraitAdj to fit these models. For the Trait by Trait adjacency matrix,
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 1 1 1
## [3,] 1 1 1
The one’s in the off-diagonal indicate that latent variables are to be conditionally correlated. If we don’t want for example \(\theta_1\) and \(\theta_2\) to be conditionally correlations, a 0 should be put in the (1,2) and (2,1) cells of the matrix.
For the Item by Trait adjacency matrix,
d <- matrix(c(1, 0, 0),nrow=3,ncol=3,byrow=TRUE)
a <- matrix(c(0, 1, 0),nrow=3,ncol=3,byrow=TRUE)
s <- matrix(c(0, 0, 1),nrow=3,ncol=3,byrow=TRUE)
das <- list(d, a, s)
inItemTraitAdj <- rbind(das[[1]], das[[2]], das[[3]])
inItemTraitAdj
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 1 0 0
## [3,] 1 0 0
## [4,] 0 1 0
## [5,] 0 1 0
## [6,] 0 1 0
## [7,] 0 0 1
## [8,] 0 0 1
## [9,] 0 0 1
The command to fit the models is the same, e.g., for the nominal model
## [1] "No errors detected in the input"
## Basic set up is complete
## [1] "20.0124133942919 > 1e-06"
## [1] "3.1445597430361 > 1e-06"
## [1] "0.922360553985698 > 1e-06"
## [1] "0.371834343187174 > 1e-06"
## [1] "0.161436320522853 > 1e-06"
## [1] "0.0702979058287383 > 1e-06"
## [1] "0.0305282256241526 > 1e-06"
## [1] "0.0132444019772606 > 1e-06"
## [1] "0.00574638701340291 > 1e-06"
## [1] "0.00249376173513838 > 1e-06"
## [1] "0.00108222517155809 > 1e-06"
## [1] "0.000469562561306702 > 1e-06"
## [1] "0.000203764231628156 > 1e-06"
## [1] "8.8337244790182e-05 > 1e-06"
## [1] "3.83344780630068e-05 > 1e-06"
## [1] "1.65688461493119e-05 > 1e-06"
## [1] "7.23861660389957e-06 > 1e-06"
## [1] "3.09877691506699e-06 > 1e-06"
## [1] "1.37349479700788e-06 > 1e-06"
## [1] "Alogithm has converged: 5.86059115903481e-07 < 1e-06"
The same evaluation and post fitting functions can be used. The one change is that iteration plots of phis (and lambdas) from iteration can graphed as well as of item statistics using
iterationPlot(history=n3$phi.log, n3$nitems, n3$ncat, n3$nless, n3$ItemNames)
Note that for the multi-dimensional model, that for the GPCM and Nominal models the object phi.mnlogit is no longer NULL. Stacked regression are required to get estimates of the \(\sigma_{mm'}\) parameters. Unlike the independence and Rasch models, the GPCM and Nominal models iteratively estimates the \(\sigma_{mm'}\) parameters.
The matrix of association parameters (conditional correlation matrix) is in the object
## [,1] [,2] [,3]
## [1,] 1.0000000 0.1754632 0.2977876
## [2,] 0.1754632 1.0000000 0.1934630
## [3,] 0.2977876 0.1934630 1.0000000
The same basic set-up is needed for 42 items. For models fit to the larger data set we need
# the full data set
inData <- dass
# A (3 x 3) trait by trait adjacency matrix
inTraitAdj <- matrix(c(1,1,1, 1,1,1, 1,1,1), nrow=3 ,ncol=3)
# A (42 x 3) item by trait adjacency matrix
d <- matrix(c(1, 0, 0),nrow=14,ncol=3,byrow=TRUE)
a <- matrix(c(0, 1, 0),nrow=13,ncol=3,byrow=TRUE)
s <- matrix(c(0, 0, 1),nrow=15,ncol=3,byrow=TRUE)
das <- list(d, a, s)
inItemTraitAdj <- rbind(das[[1]], das[[2]], das[[3]])
The ple.lma and all other functions work in the same manner as shown for our smaller example.
The small the examples presented here with \(I=9\) four category items took less 30 seconds on my desktop, but for more items, a larger sample size and more dimensions the computational time increases. For the Nominal model, 42 items, 3 dimensions, and N=1000, the elapsed time equaled ~900 seconds (i.e., ~15 minutes) and took 15 iterations. The GPCM and Nominal models tend to take the same amount of time and same number of iterations. The independence and Rasch models are much faster.
All the functions used by pleLMA are available as source. The pleLMA algorithm is modular and can be “canabilzied” for specific uses or alternative models. For example, in a replication study, the problem can be set up using the “set.up” function, which can be time consuming, and then use the “fit.rasch”, “fit.gpcm” or “fit.nominal” that set ups the log files, formulas, and fits models. On a replication only the response vector in the “master” data frame needs to be changed (i.e., do not have to re-create the master data frame) so a loop would go around the function that fits the model. This can be sped up further by pulling the code out of the functions and only including what is absolutely necessary. This same strategy can be used to perform jackknife or bootstrap to get standard errors for parameters. Alternatively, functions can be pulled and modified to allow some items to be fit by a GPCM and other by the Nominal model.
In future versions, options for fitting different models to items will be added, along with more complex latent structures, multiple methods for estimating standard errors, deal with different numbers of categories per item, and ability to include collateral information. Even though all of these variations are planned, the current version of the pleLMA package opens up more wide spread use of association models for categorical data.
Anderson, Carolyn J., Maria Kateri, and Irini Moustaki. 2021. “Log-Linear and Log-Multiplicative Association Models for Categorical Data.” Under Review.
Anderson, Carolyn J., Zhusan Li, and Jeoren J. K. Vermunt. 2007. “Estimation of Models in a Rasch Family for Polytomous Items and Multiple Latent Variables.” Journal of Statistical Software. https://doi.org/10.18637/jss.v020.i06.
Anderson, Carolyn J., Jay V. Verkuilen, and Buddy Peyton. 2010. “Modeling Polytomous Item Responses Using Simultaneously Estimated Multinomial Logistic Regression Models.” Journal of Educational and Behavioral Statistics 35: 422–52. https://doi.org/10.3102/1076998609353117.
Anderson, Carolyn J., and Jeroen J. K. Vermunt. 2000. “Log-Multiplicative Association Models as Latent Variable Models for Nominal and/or Ordinal Data.” Sociological Methodology 30: 81–121. https://doi.org/10.1111/0081-1750.00076.
Anderson, Carolyn J., and Hsiu-Ting Yu. 2007. “Log-Multiplicative Association Models as Item Response Models.” Psychometrika 72: 5–23. https://doi.org/10.1007/s11336-005-1419-2.
Arnold, Barry C., and David Straus. 1991. “Pseudolikelihood Estimation: Some Examples.” The Indian Journal of Statistics 53: 233–43. http://www.jstor.org/stable/25052695.
Becker, Mark. 1989. “On the Bivariate Normal Distribution and Association Models for Ordinal Categorical Data.” Statistics & Probability Letters 8: 435–40. https://doi.org/10.1016/0167-7152(89)90023-0.
Bouchet-Valat, Milan, Heather Turner, Michael Friendly, Jim Lemon, and Cabor Csardi. 2020. Package “Logmult”. https://github.com/nalimilan/logmult.
Chen, Yunxio, Xiaoou Li, Jingchen Liu, and Zhiliang Ying. 2018. “Robust Measurement via a Fused Latent Variable and Graphical Item Response Theory Model.” Psychometrika 85: 538–62. https://doi.org/10.1007/s11336-018-9610-4.
Geys, Helena, Geert Molenberghs, and Louise M. Ryan. 1999. “Pseudolikelihood Modeling in Multivariate Outcomes in Developmental Toxicology.” Journal of the American Statistical Association 94: 734–45. https://doi.org/10.2307/2669986.
Goodman, Leo L. 1981. “Association Models and the Bivariate Normal for Contingency Tables With Ordered Categories.” Biometrika 68: 347–55. https://doi.org/10.1093/biomet/68.2.347.
Hasan, Asad, Zhiyu Wang, and Alireza S. Mahani. 2016. “Fast Estimation of Multinomial Logit Models: R Package mnlogit.” Journal of Statistical Software 75: 1–24. https://doi.org/10.18637/jss.v075.i03.
Hessen, David J. 2012. “Fitting and Testing Conditional Multinomial Partial Credit Models.” Psychometrika 77: 693–709. https://doi.org/10.1007/s11336-012-9277-1.
Holland, Paul W. 1990. “The Dutch Identity: A New Tool for the Study of Item Response Models.” Psychometrika 55: 5–18. https://doi.org/10.1007/BF02294739.
Marsman, M., D. Borsboom, J. Kruis, S. Epskamp, R. van Bork, L. J. Waldorp, H. L. J. van der Maas, and G. Maris. 2018. “An Introduction to Network Psychometrics: Relating Ising Network Models to Item Response Theory Models.” Multivariate Behavioral Research 53: 15–35. https://doi.org/10.1080/00273171.2017.1379379.
Paek, Youngshil. 2016. “Pseudo-Likelihood Estimation of Multidimensional Item Response Theory Model.” PhD thesis, University of Illinois, Urbana-Champaign.
Paek, Youngshil, and Carolyn J. Anderson. 2017. “Pseudo-Likelihood Estimation of Multidimensional Response Models: Polytomous and Dichotomous Items.” In Quantitative Psychology — the 81st Annual Meeting of the Psychometric Society, edited by Andries van der Ark, Marie Wiberg, Steven A. Culpepper, Jeffrey A. Douglas, and Wen-Chung Wang, 21–30. NYC: Springer. https://doi.org/10.1007/978-3-319-56294-0_3.
Rom, Dror, and Sanat K. Sarkar. 1990. “Approximating Probability Integrals of Multivariate Normal Using Association Models.” Journal of Statistical Computation and Simulation 35 (1-2): 109–19. https://doi.org/10.1080/00949659008811237.
Rooij, Mark de. 2007. “The Analysis of Change, Netwon’s Law of Gravity and Association Models.” Journal of the Royal Statistical Society: Statistics in Society, Series A 171: 137–57. https://doi.org/10.1111/j.1467-985X.2007.00498.x.
———. 2009. “Ideal Point Discriminant Analysis Revisited with a Special Emphasis on Visualization.” Psychometrika 74: 317–30. https://doi.org/10.1007/s11336-008-9105-9.
Rooij, Mark de, and Willem Heiser. 2005. “Graphical Representations and Odds Ratios in a Distance-Association Model for the Analysis of Cross-Classified Data.” Psychometrika 70: 99–122. https://doi.org/10.1007/s11336-000-0848-1.
Turner, Heather, and David Firth. 2020. Generalized Nonlinear Models in R: An Overview of the Gnm Package. https://cran.r-project.org/package=gnm.
Wang, Yuchung J. 1987. “The Probability Intergrals of Bivariate Normal Distributions: A Contingency Table Approach.” Biometrika 74: 185–90. https://doi.org/10.1093/biomet/74.1.185.
———. 1997. “Multivariate Normal Integrals and Contingency Tables with Ordered Categories.” Psychometrika 62: 267–84. https://doi.org/10.1007/BF02295280.