Dynamic Time Warping is a well known and often applied technic to measure the distance between two time series. (Google) finds about 750,000 matches for the term Dynamic Time Warping. Searching the scientific search engines for research articles for this subject results in:
To learn more about DTW interested readers may also read (Wikipedia DTW) or have look at this (Dummy Example). This (tutorial) also explains the main idea of DTW, described in the original paper by (Sakoe and Chiba: Dynamic programming algorithm optimization for spoken word recognition)
Outline:
The distance measure DTW calcuates the minimal costs of the shortest non-linear alignment of two time series \(Q\) (the query time series) and \(C\) (the candidate time series), conditional to
Remark: The DTW distance measure is not a distance metric in the conventional sense, since it does not fulfill the triangle inequality.
In the following we show by aid of toy example how to use the functions of IncDTW and what they are doing. We start with the basic functions. Suppose we have two time series and we need to calculate the DTW distance of these:
set.seed(1090)
Q <- sin(1:10)#+rnorm(20)
C <- sin(-2:10)#+rnorm(15)
tmp <- IncDTW::dtw(Q = Q, C = C, return_diffM = TRUE, return_wp = TRUE,
return_QC = TRUE, return_cm = TRUE, return_diffp = TRUE)
names(tmp)
## [1] "distance" "gcm" "dm" "diffp" "ii" "jj" "wp" "cm" "diffM" "Q" "C"
Another toy example:
set.seed(1090)
Q <- c(1, 1, 2, 3, 2, 0)
C <- c(0, 1, 1, 2, 3, 2, 1)
tmp <- IncDTW::dtw(Q = Q, C = C, return_diffM = TRUE, return_wp = TRUE,
return_QC = TRUE, return_cm = TRUE, return_diffp = TRUE)
names(tmp)
## [1] "distance" "gcm" "dm" "diffp" "ii" "jj" "wp" "cm" "diffM" "Q" "C"
The matrix of differences diffM simply stores the differences of the two time series \(Q\) and \(C\).
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 1 0 0 -1 -2 -1 0
## [2,] 1 0 0 -1 -2 -1 0
## [3,] 2 1 1 0 -1 0 1
## [4,] 3 2 2 1 0 1 2
## [5,] 2 1 1 0 -1 0 1
## [6,] 0 -1 -1 -2 -3 -2 -1
The global cost matrix is calculated by walking through the matrix of absolute differences abs(diffM)
from top left to bottom right applying the following rule:
cm <- abs(diffM) # cost matrix
gcm <- cm # initialize the global cost matrix
for(j in 1:m){
for(i in 1:n){
gcm[ , 1] <- cm[ ,1]
gcm[1, ] <- cm[1, ]
gcm[i, j] <- cm[i, j] + min(c(gcm[i-1, j-1],
gcm[i , j-1],
gcm[i-1, j ]))}}
The direction matrix dm is calculated simultaneously to the global cost matrix and stores for each position where to go next to make the cheapest step.
dm <- matrix(NA, ncol=length(C), nrow = length(Q))
dm[1, ] <- 3
dm[ , 1] <- 2
dm[1, 1] <- NA
for(j in 1:m){
for(i in 1:n){
min_index <- which.min(c(gcm[i-1, j-1],
gcm[i , j-1],
gcm[i-1, j ]))
if( min_index == 1){
dm[i, j] <- 1
} else if( min_index == 2){
dm[i, j] <- 2
} else if( min_index == 3){
dm[i, j] <- 3
} } }
In case the cost matrix, or the matrix of differences is already in storage, and you want to save calculation time, you can use these matrices also as input:
The main intention of idtw() is to compare the DTW distance for different subsets of two time series. That is to compare the DTW(\(Q\), \(C_0\)) with DTW(\(Q\), \(C_1\)) where \(C_1\) could be the extension of \(C_0\) for couple of observations.
Another possible application is a live data stream and you want to update the DTW calculation each time a set of values of \(Q\) are observed. Of course this depends on the application and certainly does not make sense in any case, since a time warp of two time series where the lengths differ too much is questionable. A reasonable range for the ratio n/m depends on the application and purposes of the data analysis.
Assume we already have calculated the DTW of \(C\) and \(Q\), when an update of \(C\) provides new observations:
set.seed(1030)
C0 <- cumsum(rnorm(1000))
Q <- cumsum(rnorm(800))
tmp0 <- IncDTW::dtw(Q = Q, C = C0)
gcm0 <- tmp0$gcm
dm0 <- tmp0$dm
To update the DTW distance for new observations of \(C\) we recycle as much from previous results as possible to keep calculation time low:
C_new <- cumsum(rnorm(10))
C_update <- c(C0, C_new)
# result from incremental calculation
res_inc <- IncDTW::idtw(Q = Q, C = C_update, newO = C_new,
gcm = tmp0$gcm, dm = tmp0$dm)
Finally we compare the results from the incremental calculation with the one from scratch:
# result from scratch
res_scratch <- IncDTW::dtw(Q = Q, C = C_update)
sapply(names(res_inc), function(x){identical(res_inc[[x]], res_scratch[[x]])})
## distance gcm dm
## TRUE TRUE TRUE
We compare the calculation time for these different functions
where the last one is used as benchmark. Moreover we test calculation times for different ways of using these functions and different input parameters, respectively. To compare the calculation times we use the package rbenchmark.
I can not use the package microbenchmark here in the official vignette on CRAN because it has currently the status of ‘currently-unstated OS requirements’, and does not run reliably on some OSes. For interested readers/ users please visit my github page https://github.com/maxar/IncDTW/blob/master/vignettes/Incremental_Dynamic_Time_Warping.Rmd where you will find more detailed run-time comparisons with the help of the microbenchmark package. Also there the function my_check()` is applied to compare the values of the calculated DTW distances to be identical.
First we define the functions with special settings to be tested. Then we test the computation times and compare implicitly the computed values (by the function my_check()
), first without warping window, and then with the sakoe chiba window.
WS <- 40
my_check <- function(values) {
all(sapply(values[-1], function(x) identical(values[[1]], x)))
}
#--- define 'benchmark' functions from package: dtw
dtw_0 <- function(C, Q){
dtw::dtw(C, Q, step.pattern = symmetric1, distance.only = TRUE)$distance }
dtw_sc <- function(C, Q){
dtw::dtw(C, Q, step.pattern = symmetric1,
window.type = "sakoechiba", distance.only = TRUE, window.size = WS)$distance }
dtw_cm<- function(cm){
dtw::dtw(x=cm, step.pattern = symmetric1, distance.only = TRUE)$distance }
#--- define functions to be tested from package: IncDTW
idtw_dtw2vec <- function(C, Q){
IncDTW::dtw2vec(Q = Q, C = C) }
idtw_dtw2vec_sc <- function(C, Q){
IncDTW::dtw2vec(Q = Q, C = C, ws = WS) }
idtw_0 <- function(C, Q){
IncDTW::dtw(Q = Q, C = C)$distance }
idtw_sc <- function(C, Q){
IncDTW::dtw(Q = Q, C = C, ws = WS)$distance }
idtw_diff <- function(diffM){
IncDTW::dtw(Q = diffM, C = "diffM")$distane }
idtw_cm <- function(cm){
IncDTW::dtw(Q = cm, C = "cm")$distance }
idtw_inc <- function(C, Q, gcm00, dm00){
IncDTW::idtw(Q = Q, C = C, newO = C[(length(C)-9) : length(C)],
gcm = gcm00, dm = dm00)$distance }
After defining the functions, we execute them for many synthetic random walks inside of lapply()
.
tmp <- lapply(1:20, function(pseudoseed){
set.seed(pseudoseed)
C <- cumsum(rnorm(500))
Q <- cumsum(rnorm(480))
tmp00 <- IncDTW::dtw(Q = Q, C = C[1:(length(C)-10)], return_diffM = TRUE)
gcm00 <- tmp00$gcm
dm00 <- tmp00$dm
tmp <- IncDTW::dtw(Q = Q, C = C, return_diffM = TRUE)
diffM <- tmp$diffM
cm <- abs(diffM)
mic <- rbenchmark::benchmark( dtw_0(C, Q),
dtw_cm(cm),
#----
idtw_dtw2vec(C, Q),
idtw_0(C, Q),
idtw_diff(diffM),
idtw_cm(cm),
idtw_inc(C, Q, gcm00, dm00),
replications = benchmark_replications[1]
)
return(as.data.frame(mic))
})
mics <- do.call(rbind, tmp)
# class(mics) <- c("microbenchmark", "data.frame")
Now we aggregate the results and print them. This table shows how much faster our proposed functions are than the baseline, and how much calculation time can be saved by recycling already existing calculation results. On average our proposed method in the most basic form idtw_0
is about 3 times faster than the appropriate pendant dtw_0
. Also the incremental way of calculating saves time, that is idtw_inc
is about 5 times faster than calculating from scratch with itdw_0
and about 2-3 times faster compared to the case when the difference matrix is already given idtw_diff
or similar, when the cost matrix is given idtw_cm
.
## test replications elapsed relative user.self sys.self user.child sys.child
## 1 dtw_0(C, Q) 100 2.75 45.833 1.75 1.00 NA NA
## 2 dtw_cm(cm) 100 1.52 25.333 0.79 0.72 NA NA
## 4 idtw_0(C, Q) 100 0.89 14.833 0.47 0.42 NA NA
## 6 idtw_cm(cm) 100 0.33 5.500 0.25 0.08 NA NA
## 5 idtw_diff(diffM) 100 0.43 7.167 0.24 0.20 NA NA
## 3 idtw_dtw2vec(C, Q) 100 0.06 1.000 0.07 0.00 NA NA
## Group.1 x xrel
## 1 dtw_0(C, Q) 2.1595 33.74219
## 2 dtw_cm(cm) 1.1605 18.13281
## 3 idtw_0(C, Q) 0.8660 13.53125
## 4 idtw_cm(cm) 0.3240 5.06250
## 5 idtw_diff(diffM) 0.4420 6.90625
## 6 idtw_dtw2vec(C, Q) 0.0640 1.00000
## 7 idtw_inc(C, Q, gcm00, dm00) 0.1760 2.75000
Next we compare the functions with the sakoe chiba warping window. These need to be tested separately, since the return value can differ from the one without warping window, so the function my_check()
would stop the execution. Nevertheless we simulate exactly (using the same seeds) the same time series as above, such that the computation times are comparable.
tmp <- lapply(1:20, function(pseudoseed){
set.seed(pseudoseed)
C <- cumsum(rnorm(500))
Q <- cumsum(rnorm(480))
tmp00 <- IncDTW::dtw(Q = Q, C = C[1:(length(C)-10)], return_diffM = TRUE)
gcm00 <- tmp00$gcm
dm00 <- tmp00$dm
tmp <- IncDTW::dtw(Q = Q, C = C, return_diffM = TRUE)
diffM <- tmp$diffM
cm <- abs(diffM)
mic <- rbenchmark::benchmark( dtw_sc(C, Q),
#----
idtw_dtw2vec_sc(C,Q),
idtw_sc(C, Q),
replications = benchmark_replications[1])
return(as.data.frame(mic))
})
mics_sc <- do.call(rbind, tmp)
Again the idtw
function outperforms the baseline even by a factor of about 4 to 5. The vector based function is too fast to be benchmarked by the package rbenchmark. Interested readers may repeat the experiment with the microbenchmark package. This is here not possible due to problems between CRAN and microbenchmark. Both functions are about 3 milliseconds faster than the appropriate pendants without warping window dtw_0
and idtw_0
.
## test replications elapsed relative user.self sys.self user.child sys.child
## 1 dtw_sc(C, Q) 100 2.21 110.5 1.17 1.03 NA NA
## 2 idtw_dtw2vec_sc(C, Q) 100 0.02 1.0 0.01 0.00 NA NA
## 3 idtw_sc(C, Q) 100 0.74 37.0 0.41 0.35 NA NA
## 4 dtw_sc(C, Q) 100 2.14 214.0 1.28 0.86 NA NA
## 5 idtw_dtw2vec_sc(C, Q) 100 0.01 1.0 0.02 0.00 NA NA
## 6 idtw_sc(C, Q) 100 0.72 72.0 0.37 0.34 NA NA
## Group.1 x xrel
## 1 dtw_sc(C, Q) 2.1860 132.48485
## 2 idtw_dtw2vec_sc(C, Q) 0.0165 1.00000
## 3 idtw_sc(C, Q) 0.7320 44.36364
Now that we have understood the incremental approach and tested the calculation time and correctness of results, we want to introduce another concept, which is just the reverse: Instead of adding new observations to the time series x, in this section we introduce an easy and fast way of reducing most current observations, conditional to already existing calculation results including all observations of \(C\).
Of course this is only interesting if you are interested not only in the DTW distance, but also want to know about the warping path. Meanwhile the DTW distance could be directly taken from the global cost matrix from previous calculations (by selecting the value of the last row and last but k column, k is the number of reduced observations of \(C\)), we cannot update or modify the previous warping path but need to calculate it again from scratch. This is required, since adding or neglecting observations causes an updated alignment of the two updated time series, and in general the new alignment could be completely different from the original.
set.seed(1150)
Q <- cos(1:100)
C <- cumsum(rnorm(80))
Ndec <- 4
# the ordinary calculation
result_base <- IncDTW::dtw(Q=Q, C=C)
gcm0 <- result_base$gcm
To update the DTW distance by decreasing x for some observations (say 4), again we recycle as much from previous results as possible to keep calculation time low:
# the ordinary calculation without the last 4 observations
result_decr1 <- IncDTW::dtw(Q=Q, C=C[1:(length(C) - Ndec)],
return_wp = TRUE)
gcm1 <- result_decr1$gcm
# the decremental step: reduce C for 4 observation
result_decr2 <- IncDTW::dec_dm(result_base$dm, Ndec = Ndec)
# compare the results: ii, jj, wp and the gcm of
# result_decr1 (conventional) and result_decr2 (recycling previous results)
comparison_0 <- c(
identical(result_decr1$ii, result_decr2$ii),
identical(result_decr1$jj, result_decr2$jj),
identical(result_decr1$wp, result_decr2$wp))
comparison_0
## [1] TRUE TRUE TRUE