ksample.e {energy}R Documentation

E-statistic (Energy Statistic) for Multivariate k-sample Test of Equal Distributions

Description

Returns the E-statistic (energy statistic) for the multivariate k-sample test of equal distributions.

Usage

 ksample.e(x, sizes, distance = FALSE, ix = 1:sum(sizes), 
           incomplete = FALSE, N = 100)

Arguments

x data matrix of pooled sample
sizes vector of sample sizes
distance logical: if TRUE, x is a distance matrix
ix a permutation of the row indices of x
incomplete logical: if TRUE, compute incomplete E-statistics
N incomplete sample size

Details

The k-sample multivariate E-statistic for testing equal distributions is returned. The statistic is computed from the original pooled samples, stacked in matrix x where each row is a multivariate observation, or from the distance matrix x of the original data. The first sizes[1] rows of x are the first sample, the next sizes[2] rows of x are the second sample, etc. Incomplete statistics are supported for the two-sample case. If incomplete==TRUE, at most N observations from each sample (by sampling without replacement) are used in the calculation of the statistic. If distance==TRUE complete statistics are always computed.

The two-sample E-statistic proposed by Szekely and Rizzo (2003) is the e-distance e(S_i,S_j), defined for two samples S_i, S_j of size n_i, n_j by

e(S_i, S_j) = (n_i n_j)(n_i+n_j)[2M_(ij)-M_(ii)-M_(jj)],

where

M_{ij} = 1/(n_i n_j) sum[1:n_i, 1:n_j] ||X_(ip) - X_(jq)||,

|| || denotes Euclidean norm, and X_(ip) denotes the p-th observation in the i-th sample. The k-sample E-statistic is defined by summing the pairwise e-distances over all k(k-1)/2 pairs of samples:

E = sum[i<j] e(S_i,S_j).

Large values of E are significant.

Value

The value of the multisample E-statistic corresponding to the permutation ix is returned.

Note

This function computes the E-statistic only. For the test decision, a nonparametric bootstrap test (approximate permutation test) is provided by the function eqdist.etest.

Author(s)

Maria L. Rizzo rizzo@math.ohiou.edu and Gabor J. Szekely gabors@bgnet.bgsu.edu

References

Szekely, G. J. and Rizzo, M. L. (2003) Testing for Equal Distributions in High Dimension, submitted.

Szekely, G. J. (2000) E-statistics: Energy of Statistical Samples, preprint.

See Also

eqdist.etest edist energy.hclust

Examples

## compute 3-sample E-statistic for 4-dimensional iris data
 data(iris)
 ksample.e(iris[,1:4], c(50,50,50))

## compute a 3-sample univariate E-statistic
 ksample.e(rnorm(150), c(25,75,50))

[Package Contents]