frequentdirections Build Status

Implementation of Frequent-Directions algorithm for efficient matrix sketching [E. Liberty, SIGKDD2013]

Installation

# Not yet onCRAN
install.packages("frequentdirections")

# Or the development version from GitHub:
install.packages("devtools")
devtools::install_github("shinichi-takayanagi/frequentdirections")

Example

Download example data

Here, we use Handwritten digits USPS dataset as sample data. In the following example, we assume that you save the above sample data into /tmp directory.

Load data

The dataset has 7291 train and 2007 test images in h5 format. The images are 16*16 grayscale pixels.

library("h5")
file <- h5file("/tmp/usps.h5")
x <- file["train/data"][]
y <- file["train/target"][]
str(x)
#>  num [1:7291, 1:256] 0 0 0 0 0 0 0 0 0 0 ...

Plot example image

Example the number 8

image(matrix(x[338,], nrow=16, byrow = FALSE))

Plot SVD

Plot the original data on the first and second singular vector plane.

x <- scale(x)
frequentdirections::plot_svd(x, y)

Matrix Sketching

l = 8 case

eps <- 10^(-8)
# 7291 x 256 -> 8 * 256 matrix
b <- frequentdirections::sketching(x, 8, eps)
frequentdirections::plot_svd(x, y, b)

l = 32 case

# 7291 x 256 -> 32 * 256 matrix
b <- frequentdirections::sketching(x, 32, eps)
frequentdirections::plot_svd(x, y, b)

l = 128 case

# 7291 x 256 -> 128 * 256 matrix
b <- frequentdirections::sketching(x, 128, eps)
frequentdirections::plot_svd(x, y, b)

This result is almost the same with the original data SVD expression.

That’s why we can think that the original data is expressed with only 128 rows.