RRPP is an acronym for randomization of residuals in a permutation
procedure. *RRPP* refers to the package utilizing RRPP.

Analysis of variance (ANOVA) means different things to different people, but generally speaking, one recognizes an ANOVA table as a table of statistics including sums of squares, \(SS\), mean squares , \(MS\), \(F\)-statistics, and \(P\)-values. In this table, such statistics might be found for different model terms and can be calculated for different types of \(SS\).

ANOVA in *RRPP* is a concept that generalizes the statistics
used in univariate ANOVA to multivariate data. The fundamental statistic
is \(SS\), calculated as the trace of
the sums of squares and cross-products matrix, \(\mathbf{S}\). Thus, \(SS\) is sum of each variable’s \(SS\), meaning for univariate data, \(SS\) is found for a single variable, only.
Dividing this trace form of \(SS\) by
appropriate degrees of freedom, produces \(MS\) values, which can be used to calculate
\(F\)-statistics; i.e.,

\[ F = \frac{MS_{effect}}{MS_{Random}} = \frac{SS_{effect}/df_{effect}}{SS_{random}/df_{random}}= \frac{trace(\mathbf{S}_{effect})/df_{effect}}{trace(\mathbf{S}_{random})/df_{random}},\]

where \(random\) can refer to a random effect or residuals, and \(df\) is the appropriate degrees of freedom for effect and random \(SS\).

In this manner, there is no fundamental difference in how univariate
and multivariate statistics are calculated (the univariate statistics
are a simplified version of the same multivariate statistics).
Therefore, ANOVA in *RRPP* does not distinguish between
univariate and multivariate data, but refers to this consistency in
calculation of statistics.

Multivariate analysis of variance (MANOVA) is perhaps typically thought of as an analog to ANOVA when data are multivariate. In what is classically referred to as, “MANOVA,” multivariate statistics are calculated, including Roy’s maximum root, Pillai trace, Hotelling-Lawley trace, and Wilks \(\Lambda\). These statistics do not have probability distributions (density functions), so they are converted to statistics that are assumed to approximately follow \(F\)-distributions. This works in cases that the number of observations are sufficiently larger than the number of response variables. Therefore, the M in MANOVA can be thought of as an indicator that multivariate data are analyzed and/or that multivariate statistics are calculated as a means to estimate \(F\)-statistics, but the ANOVA in MANOVA is not different than parametric ANOVA for univariate data, once \(F\)-statistics are calculated.

In *RRPP*, the M in MANOVA references the calculation of
multivariate statistics and generating their empirical sampling
distributions. Calculating multivariate statistics is not different than
parametric MANOVA, but in *RRPP*, the statistics are calculated
in every random permutation, eliminating the need to find an approximate
probability distribution (the empirical sampling distribution is
sufficient as an approximation). Whereas parametric MANOVA could be
thought of as the forcing of multivariate statistics into a univariate
ANOVA-style framework, MANOVA in *RRPP* is better defined as
using randomization of residuals in a permutation procedure on
multivariate statistics in much the same way as the traces of \(\mathbf{S}\) matrices in ANOVA. Description
of how RRPP works can be found in another
vignette

All multivariate statistics are derived from eigenanalysis of

\[\mathbf{S}_{random}^{-1}\mathbf{S}_{effect},\]

where \(\mathbf{S}_{random}\) is typically the \(\mathbf{S}\) matrix for residuals. Singular matrices are not a concern if data are projected into an Euclidean space of appropriate dimensions before matrix inversion. However, Wilks \(\Lambda\) might be less appropriate than other multivariate statistics, as it relies on products of eigenvalues rather than summation.

When fitting a linear model in *RRPP* over many permutations,
some ANOVA statistics are calculated automatically in every permutation.
Others can be obtained from those statsistics.

```
library(RRPP)
data(Pupfish)
fit <- lm.rrpp(coords ~ Sex*Pop, SS.type = "I",
data = Pupfish, print.progress = FALSE)
attributes(fit)
```

```
## $names
## [1] "call" "LM" "ANOVA" "PermInfo" "turbo" "verbose" "subTest"
##
## $class
## [1] "lm.rrpp"
```

```
## $names
## [1] "SS.type" "SS" "RSS" "TSS" "RSS.model"
## [6] "Rsq" "full.resid" "n" "p" "df"
```

```
## $names
## [1] "SS" "MS" "Rsq" "Fs" "cohenf"
```

The distributions of ANOVA statistics can then be used to construct an ANOVA table, via the anova.lm.rrpp S3 generic; i.e.,

```
##
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals
## Number of permutations: 1000
## Estimation method: Ordinary Least Squares
## Sums of Squares and Cross-products: Type I
## Effect sizes (Z) based on F distributions
##
## Df SS MS Rsq F Z Pr(>F)
## Sex 1 0.015780 0.0157802 0.28012 28.209 4.7773 0.001 ***
## Pop 1 0.009129 0.0091294 0.16206 16.320 4.7097 0.001 ***
## Sex:Pop 1 0.003453 0.0034532 0.06130 6.173 3.7015 0.001 ***
## Residuals 50 0.027970 0.0005594 0.49651
## Total 53 0.056333
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: lm.rrpp(f1 = coords ~ Sex * Pop, SS.type = "I", data = Pupfish,
## print.progress = FALSE)
```

To switch to MANOVA statistics, the linear model fit must be updated to include MANOVA statistics, in addition to the ANOVA statistics already generated. The MANOVA statistics take more time to calculate, because of matrix inversion and eigenanalysis in every permutation, so it is not performed unless requested.

```
## $names
## [1] "call" "LM" "ANOVA" "PermInfo" "turbo" "verbose" "subTest"
## [8] "MANOVA"
##
## $class
## [1] "manova.lm.rrpp" "lm.rrpp"
```

```
## $names
## [1] "SSCP" "invR.H" "eigs" "error"
## [5] "PCA" "manova.pc.dims" "e.rank" "SS.tot"
```

The “eigs” object returns all the eigenvalues of \(\mathbf{S}_{random}^{-1}\mathbf{S}_{effect}\), because the default argument. One can return the matrices, themselves, with “invR.H”, where \(\mathbf{H} = \mathbf{S}_{effect}\), and \(\mathbf{R} = \mathbf{S}_{residual}\), in line with common notation. (\(\mathbf{H}\) and \(\mathbf{R}\) are SSCP matrices for model effects and residuals, respectively.) The class lm.rrpp fit has class manova.lm.rrpp added, and the S3 summary function produces a MANOVA-like table.

```
##
## Linear Model fit with lm.rrpp
##
## Number of observations: 54
## Number of dependent variables: 112
## Data space dimensions: 53
## Residual covariance matrix rank: 50
## Sums of Squares and Cross-products: Type I
## Number of permutations: 1000
##
## Df Rand Roy Z Pr(>Roy)
## Sex 1 Residuals 17.73326 6.8151620 0.001
## Pop 1 Residuals 50.34889 1.5093978 0.070
## Sex:Pop 1 Residuals 12.71263 -0.5293145 0.706
## Full.Model 3 Residuals 54.35676 8.0972310 0.001
## Residuals 50
```

```
##
## Linear Model fit with lm.rrpp
##
## Number of observations: 54
## Number of dependent variables: 112
## Data space dimensions: 53
## Residual covariance matrix rank: 50
## Sums of Squares and Cross-products: Type I
## Number of permutations: 1000
##
## Df Rand Pillai Z Pr(>Pillai)
## Sex 1 Residuals 0.9466190 4.9765825 0.001
## Pop 1 Residuals 0.9805254 1.3801816 0.070
## Sex:Pop 1 Residuals 0.9270745 -0.3379822 0.706
## Full.Model 3 Residuals 2.8339784 9.6895441 0.001
## Residuals 50
```

It should be apparent that although the ANOVA and MANOVA results have some similarities, the effect sizes (\(Z\)-scores) and \(P\)-values can vary. Comparatively, the results of the MANOVA statistics can be influenced by variable covariances more so than ANOVA results (which are influenced solely by variances; i.e., the dispersion of values in the data space). The results are also influenced by dimensionality. In this case, there are more shape variables (112) than residual degrees of freedom (50), so data are projected into a lower-dimension Euclidean space to allow matrix inversion; i.e., \(\mathbf{S}_{residual}^{-1}\mathbf{S}_{effect}\). A tolerance of 0 means all possible PCs were retained for analysis. (Tolerance is the permissible relative change in eigenvalues. For example, tol = 0.01 means that if the k + 1st eigenvalue is not at least 1% smaller than the kth eigenvalue, retain only k eigenvectors. The k + 1, k + 2, k + 3, … eigenvalues decay at such a small rate that they are considered inconsequential.)

When the number of observations far exceeds the data space dimensions, this is okay, and the results will be qualitatively more similar to ANOVA results. Consider this example (although reducing dimensions is not recommended as a typical solution):

```
##
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals
## Number of permutations: 1000
## Estimation method: Ordinary Least Squares
## Sums of Squares and Cross-products: Type I
## Effect sizes (Z) based on F distributions
##
## Df SS MS Rsq F Z Pr(>F)
## Sex 1 0.015780 0.0157802 0.28012 28.209 4.7773 0.001 ***
## Pop 1 0.009129 0.0091294 0.16206 16.320 4.7097 0.001 ***
## Sex:Pop 1 0.003453 0.0034532 0.06130 6.173 3.7015 0.001 ***
## Residuals 50 0.027970 0.0005594 0.49651
## Total 53 0.056333
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: lm.rrpp(f1 = coords ~ Sex * Pop, SS.type = "I", data = Pupfish,
## print.progress = FALSE)
```

```
##
## Linear Model fit with lm.rrpp
##
## Number of observations: 54
## Number of dependent variables: 112
## Data space dimensions: 53
## Residual covariance matrix rank: 50
## Sums of Squares and Cross-products: Type I
## Number of permutations: 1000
##
## Df Rand Roy Z Pr(>Roy)
## Sex 1 Residuals 17.73326 6.8151620 0.001
## Pop 1 Residuals 50.34889 1.5093978 0.070
## Sex:Pop 1 Residuals 12.71263 -0.5293145 0.706
## Full.Model 3 Residuals 54.35676 8.0972310 0.001
## Residuals 50
```

Alternatively, one could specify an exact number of PCs to use in analysis.

```
##
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals
## Number of permutations: 1000
## Estimation method: Ordinary Least Squares
## Sums of Squares and Cross-products: Type I
## Effect sizes (Z) based on F distributions
##
## Df SS MS Rsq F Z Pr(>F)
## Sex 1 0.015780 0.0157802 0.28012 28.209 4.7773 0.001 ***
## Pop 1 0.009129 0.0091294 0.16206 16.320 4.7097 0.001 ***
## Sex:Pop 1 0.003453 0.0034532 0.06130 6.173 3.7015 0.001 ***
## Residuals 50 0.027970 0.0005594 0.49651
## Total 53 0.056333
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: lm.rrpp(f1 = coords ~ Sex * Pop, SS.type = "I", data = Pupfish,
## print.progress = FALSE)
```

```
##
## Linear Model fit with lm.rrpp
##
## Number of observations: 54
## Number of dependent variables: 112
## Data space dimensions: 53
## Residual covariance matrix rank: 50
## Data reduced to 10 PCs, as required or prescribed.
## 2 % of overall variation explained by these PCs.
## See $MANOVA$PCA from manova.lm.rrpp object for more information.
## Sums of Squares and Cross-products: Type I
## Number of permutations: 1000
##
## Df Rand Roy Z Pr(>Roy)
## Sex 1 Residuals 7.357775 8.928089 0.001
## Pop 1 Residuals 18.318603 11.680405 0.001
## Sex:Pop 1 Residuals 5.132268 8.500135 0.001
## Full.Model 3 Residuals 20.461447 10.525117 0.001
## Residuals 50
```

If the maximum number of possible PCs is chosen (full data space
dimensionality) **AND** the number of variables exceeds the
number of observations, *RRPP* will attempt to assess the number
of real dimensions in each random permutation and adjust SSCP matrices,
accordingly. (It is possible to randomly generate SSCP matrices of lower
rank in random permutations, via RRPP.) However, it cannot be guaranteed
that eigenvalues will be positive in such cases. If a warning message is
delivered, “NaNs produced,” this might be the reason. Using a slightly
lower number of PCs will probably resolve the issue.

ANOVA in *RRPP* generalizes univariate ANOVA to multivariate
data, its statistics are directly associated with the amount of
dispersion in multivariate data spaces, irrespective of variable
covariances, and data dimensionality has no effect or limitation on
their calculation.

MANOVA in *RRPP* produces the same multivariate statistics
found in parametric MANOVA (when the number of observations exceeds
residual degrees of freedom), but unlike parametric MANOVA, \(P\)-values are estimated from empirical
sampling distributions of statistics (rather than estimating values that
approximately follow \(F\)-distributions, with constraints). The
choice of multivariate statistic is inconsequential for \(P\)-values, but makes a difference for
effect sizes.

Because high-dimensional data force a projection of data into Euclidean subspaces, to make matrix inversion and eigenanalysis possible, MANOVA results could differ from ANOVA results (both qualitatively and in terms of effect sizes). For multivariate data with many more observations than variables (ideal conditions), results using either ANOVA or MANOVA will be comparable. For high-dimensional data, ANOVA results might be more reliable.