disttools

Zachary Colburn

2022-02-04

Overview

disttools provides the functionality needed to rapidly and intuitively retrieve information from large ‘dist’ objects. This functionality is encoded in the get_dists function. The function’s main use cases are outlined below.

Usage

After installing the package, it can be loaded by executing:

# Load the package.
library(disttools)

Below, some example data is randomly generated.

# Create some data to play with.
set.seed(123456789)
mat <- matrix(rnorm(10), ncol = 2)

A ‘dist’ object can be generated for these points by executing:

# Generate a 'dist' object.
mat_dists <- dist(mat)

Below, a set of pairs of points are specified for distance retrieval.

# Specify index pairs of interest.
indices <- matrix(c(1,2,3,2,4,4), ncol = 2, byrow = TRUE)

Example 1 - Matrix-based retrieval

The function get_dists can be used to access the distance between pairs of points. This can be accomplished via two methods. First, a matrix of index pairs can be passed to the function along with the ‘dist’ object itself.

# Retrieve distances using the matrix-based method.
get_dists(mat_dists, indices)

Example 2 - Vector-based retrieval

Second, two vectors corresponding to the columns of the index matrix can be passed to the function along with the ‘dist’ object.

# Create vectors i and j from the above data.
i <- indices[,1]
j <- indices[,2]

# Retrieve distances using the paired vectors method.
get_dists(mat_dists, i, j)

Example 3 - Distance retrieval for all combinations of a subset of points

Sometimes, the distances for all combinations of a set of points are desired. This information can be easily extracted by executing the following:

# Create a matrix of unique index pairs.
index_pairs <- combn(1:3, 2) # Generate the combinations
index_pairs <- t(index_pairs) # Transpose to put the data in tall format.

# Retrieve the distances as above.
get_dists(mat_dists, index_pairs)

Example 4 - Return both indices and distances

It is often desirable to create a matrix or data.frame composed of two columns that indicate the indices being compared and a third column giving the distances between those indices. For convenience, the argument return_indices can be set to TRUE. Doing so results in a three column matrix being returned. It can be converted into a data.frame using the function as.data.frame.

# Retrieve distances using the matrix-based method.
get_dists(mat_dists, indices, return_indices = TRUE)

Disclaimer

The views expressed are those of the author(s) and do not reflect the official policy of the Department of the Army, the Department of Defense or the U.S. Government.