codec

Dependence and Conditional Dependence

For random variable \(Y\) and random vectors \(Z\) and \(X\), \(T(Y, Z\mid X)\in[0, 1]\), conditional dependence coefficient, gives a measure of dependence of \(Y\) on \(Z\) given \(X\). \(T(Y, Z\mid X)\) is zero if and only if \(Y\) is independent of \(Z\) given \(X\) and is 1 if and only if \(Y\) is a function of \(Z\) given \(X\). This measure is well-defined if \(Y\) is not almost surely a function of \(X\). For more details on the definition of \(T\) and its properties, and its estimator see the paper A Simple Measure Of Conditional Dependence.

Given a sample of \(n\) i.i.d observations of triple \((X, Y, Z)\), we can estimate \(T(Y, Z\mid X)\) efficiently in a non-parametric fashion. Function codec estimates this value. The default value for \(X\) is NULL and if is not provided by the user, codec gives the estimate of the dependence measure of \(Y\) on \(Z\), \(T(Y, Z)\).

In the following examples, we illustrate the behavior of this estimator is different settings.

library(FOCI)

In this example we have generated a \(10000 \times 3\) matrix \(x\), with i.i.d elements from \(unif[0, 1]\). The observed value of \(y\) is the sum of the elements of each row of \(x\) mod \(1\). Although \(y\) is a function of \(x\), it can be seen that it is independent of each of the single columns of \(x\) or each pair of its columns. On the other hand conditional on the last column, \(y\) is a function of the first two columns but it is still independent of any of the first two columns separately.

n = 10000
p = 3
x = matrix(runif(n * p), ncol = p)
y = (x[, 1] + x[, 2] + x[, 3]) %% 1
# y is independent of each of column of x 
codec(y, x[, 1])
#> [1] -0.00225687
codec(y, x[, 2])
#> [1] 0.01326747
codec(y, x[, 3])
#> [1] 0.00130965

# y is independent of the first two columns of x, x[, c(1, 2)]
codec(y, x[, c(1, 2)])
#> [1] -0.00191775

# y is a function of x
codec(y, x)
#> [1] 0.8649652

# conditional on the last column of x, y is a function of the first two columns
codec(y, x[, c(1, 2)], x[, 3])
#> [1] 0.8647881
# conditional on x[, 3], y is independent of x[, 1]
codec(y, x[, 1], x[, 3])
#> [1] 0.02025655

In the following example we have generated a \(10000 \times 2\) matrix \(x\), with i.i.d normal standard elements. Each row of this matrix represent a point in the 2-dimensional plane. We call the square distance of this point from the origin \(y\) and its angle with the horizontal axis, \(z\). It can be seen that \(y\) and \(z\) are independent of each other, but conditional on any of the coordinates of the given point \(y\) can be fully determind using \(z\).

n = 1000
p = 2
x = matrix(rnorm(n * p), ncol = p)
y = x[, 1]^2 + x[, 2]^2
z = atan(x[, 1] / x[, 2])
# y is independent of z
codec(y, z)
#> [1] -0.01983902
# conditional on x[, 1], y is a function of z
codec(y, z, x[, 1])
#> [1] 0.8077018