A performance example

Before we measure performance of the main functionality of the package, note that something simple as '(a:b)[-i]' can and has been accelerated in this package:

a <- 1L
b <- 1e7L
i <- sample(a:b,1e3)
x <- c(
  R = median(microbenchmark((a:b)[-i], times=times)$time)
, bit = median(microbenchmark(bit_rangediff(c(a,b), i), times=times)$time)
, merge = median(microbenchmark(merge_rangediff(c(a,b), bit_sort(i)), times=times)$time)
)
knitr::kable(as.data.frame(as.list(x/x["R"]*100)), caption="% of time relative to R", digits=1)

Table: % of time relative to R

R bit merge
100 22.7 22.6

The vignette is compiled with the following performance settings: 5 replications with domain size small 1000 and big 106, sample size small 1000 and big 106.

Boolean data types

“A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.”
“Il semble que la perfection soit atteinte non quand il n'y a plus rien à ajouter, mais quand il n'y a plus rien à retrancher”
(Antoine de St. Exupery, Terre des Hommes (Gallimard, 1939), p. 60.)

We compare memory consumption (n=1e+06) and runtime (median of 5 replications) of the different booltypes for the following filter scenarios:

Table: selection characteristic

coin often rare chunk
random 50% random 99% random 1% contiguous chunk of 5%

There are substantial savings in skewed filter situations:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'rare' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'often' scenario

Even in non-skewed situations the new booltypes are competitive:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'coin' scenario

Detailed tables follow.

% memory consumption of filter

Table: % bytes of logical

coin often rare chunk
logical 100.0 100.0 100.0 100.0
bit 3.2 3.2 3.2 3.2
bitwhich 50.0 1.0 1.0 5.0
which 50.0 99.0 1.0 5.0
ri NA NA NA 0.0

% time extracting

Table: % time of logical

coin often rare chunk
logical 4.3 3.8 3.9 NA
bit 18.0 17.4 18.0 NA
bitwhich 81.1 61.2 2.3 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning

Table: % time of logical

coin often rare chunk
logical 29.6 29.3 30.3 NA
bit 58.9 25.8 16.9 NA
bitwhich 164.8 51.9 47.4 NA
which NA NA NA NA
ri NA NA NA NA

% time subscripting with 'which'

Table: % time of logical

coin often rare chunk
logical 11.1 18.8 0.5 NA
bit 31.0 60.8 0.8 NA
bitwhich 44.3 91.0 1.4 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning with 'which'

Table: % time of logical

coin often rare chunk
logical 11.6 21.8 0.4 NA
bit 18.8 34.6 0.6 NA
bitwhich 88.7 27.9 1.5 NA
which NA NA NA NA
ri NA NA NA NA

% time Boolean NOT

Table: % time for Boolean NOT

coin often rare chunk
logical 11.3 11.4 11.4 11.4
bit 1.0 1.0 1.0 1.0
bitwhich 20.2 0.8 0.7 2.5
which NA NA NA NA
ri NA NA NA NA

% time Boolean AND

Table: % time for Boolean &

coin often rare chunk
logical 47.2 19.4 12.9 12.4
bit 2.6 2.6 2.7 2.6
bitwhich 32.8 3.3 3.1 5.5
which NA NA NA NA
ri NA NA NA NA

% time Boolean OR

Table: % time for Boolean |

coin often rare chunk
logical 41.3 12.4 15.0 14.3
bit 2.7 2.7 2.6 2.6
bitwhich 30.3 3.0 3.2 5.6
which NA NA NA NA
ri NA NA NA NA

% time Boolean EQUALITY

Table: % time for Boolean ==

coin often rare chunk
logical 13.5 14.3 14.3 14.3
bit 2.8 2.7 2.6 2.7
bitwhich 18.2 2.7 2.6 3.9
which NA NA NA NA
ri NA NA NA NA

% time Boolean XOR

Table: % time for Boolean !=

coin often rare chunk
logical 15.8 15.6 15.5 15.5
bit 3.1 3.0 3.0 2.9
bitwhich 17.9 2.7 2.7 3.9
which NA NA NA NA
ri NA NA NA NA

% time Boolean SUMMARY

Table: % time for Boolean summary

coin often
logical 100 35.4
bit 48 12.6

Fast methods for integer set operations

“The space-efficient structure of bitmaps dramatically reduced the run time of sorting”
(Jon Bently, Programming Pearls, Cracking the oyster, p. 7)

Execution time for R (R) and bit (b)

Execution time for R, bit and merge relative to most expensive R in 'unsorted bigbig' scenario

Execution time for R, bit and merge in 'sorted bigbig' scenario

% time for sorting

Table: sorted data relative to R's sort

small big
sort 216.8 607.7
sortunique 95.2 48.8

Table: unsorted data relative to R's sort

small big
sort 47.3 67.5
sortunique 21.9 12.6

% time for unique

Table: sorted data relative to R

small big
bit 129.5 32.2
merge 29.5 13.0
sort 0.0 0.0

Table: unsorted data relative to R

small big
bit 142.7 19.5
merge 217.2 57.0
sort 185.1 49.6

% time for duplicated

Table: sorted data relative to R

small big
bit 242.1 33.7
merge 36.4 16.9
sort 0.0 0.0

Table: unsorted data relative to R

small big
bit 256.8 18.5
merge 265.7 62.4
sort 228.4 53.8

% time for anyDuplicated

Table: sorted data relative to R

small big
bit 165.8 38.1
merge 30.5 14.8
sort 0.0 0.0

Table: unsorted data relative to R

small big
bit 159.4 20.7
merge 251.7 69.1
sort 221.9 61.7

% time for sumDuplicated

Table: sorted data relative to R

small big
bit 146.0 33.6
merge 25.9 12.7
sort 0.0 0.0

Table: unsorted data relative to R

small big
bit 120.1 17.6
merge 186.7 59.7
sort 164.8 53.4

% time for match

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 54.2 0 24.6 16.8
sort 0.0 0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 435.3 59 117.6 55.5
sort 381.2 59 104.1 48.9

% time for in

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 199.4 7.9 14.7 27.0
merge 40.1 0.0 21.2 14.3
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 229.4 3.9 9.0 11.5
merge 386.1 60.0 104.5 53.9
sort 341.8 60.0 91.8 47.8

% time for notin

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 288.3 8 13.8 25.9
merge 46.6 0 18.6 16.0
sort 0.0 0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 300.2 3.7 8.8 11.1
merge 319.7 55.7 96.2 52.8
sort 276.0 55.7 84.5 45.9

% time for union

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 81.6 27.6 27.8 13.6
merge 50.0 11.1 10.6 7.6
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 94.8 15.5 18.4 8.6
merge 207.2 53.3 54.1 32.8
sort 155.6 46.6 47.6 28.4

% time for intersect

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 85.3 17.2 24.0 18.7
merge 32.5 0.1 0.1 11.2
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 89.7 9.5 14.2 9.3
merge 189.1 58.2 95.6 36.0
sort 156.7 58.1 95.6 31.0

% time for setdiff

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 97.8 12.3 18.9 24.2
merge 41.7 0.1 6.9 17.5
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 99 6.4 12.1 10.5
merge 270 61.0 33.7 53.8
sort 226 60.9 29.7 46.7

% time for symdiff

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 78.6 12.7 12.7 20.8
merge 16.9 3.5 3.5 7.2
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 76.0 8.0 8.1 9.1
merge 114.8 16.9 17.1 27.3
sort 98.6 14.9 15.0 24.2

% time for setequal

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 107.0 108.1 13.4 13.1
merge 28.4 26.5 6.4 6.1
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 113.3 112.5 7.2 6.9
merge 192.9 75703.2 19.7 34.8
sort 166.2 75674.7 16.2 31.6

% time for setearly

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 105.1 7.1 17.1 13.4
merge 28.9 0.0 0.1 6.6
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 114.1 3.3 9.2 5.5
merge 193.8 36.5 104.0 26.9
sort 166.4 36.5 103.9 24.3