Introduction

Function optimizes Extraction windows for DIA/SWATH so we have the same number of precursor per window. This optimization is based on spectral library data or non redundant .blib files (Bibliospec).

Prerequisites

Constant with method

data("masses")
cdsw <- Cdsw(masses , nbins = 25, digits = 1)
cdsw$plot()

knitr::kable(cdsw$asTable())
from to mid width counts
349.63 384.62 367.125 34.99 6688
383.62 418.62 401.120 35.00 8357
417.62 452.61 435.115 34.99 9661
451.61 486.61 469.110 35.00 10452
485.61 520.60 503.105 34.99 10725
519.60 554.59 537.095 34.99 10837
553.59 588.59 571.090 35.00 10433
587.59 622.58 605.085 34.99 9750
621.58 656.58 639.080 35.00 9276
655.58 690.57 673.075 34.99 8406
689.57 724.56 707.065 34.99 7848
723.56 758.56 741.060 35.00 7116
757.56 792.55 775.055 34.99 6355
791.55 826.55 809.050 35.00 5666
825.55 860.54 843.045 34.99 4923
859.54 894.53 877.035 34.99 4359
893.53 928.53 911.030 35.00 3807
927.53 962.52 945.025 34.99 3344
961.52 996.52 979.020 35.00 2724
995.52 1030.51 1013.015 34.99 2357
1029.51 1064.50 1047.005 34.99 2042
1063.50 1098.50 1081.000 35.00 1807
1097.50 1132.49 1114.995 34.99 1313
1131.49 1166.49 1148.990 35.00 1088
1165.49 1200.48 1182.985 34.99 881
constError <- cdsw$error()

Classical Method based on quantile

Same number of MS1 precursors in each window

cdsw$quantile_breaks()
cdsw$plot()

knitr::kable(cdsw$asTable())
from to mid width counts
0% 349.63 381.03 365.330 31.40 5956
4% 380.03 406.71 393.370 26.68 6131
8% 405.71 429.24 417.475 23.53 6070
12% 428.24 450.05 439.145 21.81 6086
16% 449.05 470.06 459.555 21.01 6095
20% 469.06 488.80 478.930 19.74 6107
24% 487.80 508.12 497.960 20.32 6173
28% 507.12 526.81 516.965 19.69 6150
32% 525.81 545.79 535.800 19.98 6166
36% 544.79 565.29 555.040 20.50 6123
40% 564.29 584.80 574.545 20.51 6139
44% 583.80 605.12 594.460 21.32 6121
48% 604.12 626.34 615.230 22.22 6113
52% 625.34 648.36 636.850 23.02 6108
56% 647.36 672.34 659.850 24.98 6074
60% 671.34 696.53 683.935 25.19 6082
64% 695.53 722.89 709.210 27.36 6054
68% 721.89 751.40 736.645 29.51 6053
72% 750.40 782.43 766.415 32.03 6023
76% 781.43 817.40 799.415 35.97 5982
80% 816.40 857.96 837.180 41.56 6026
84% 856.96 905.62 881.290 48.66 5971
88% 904.62 964.93 934.775 60.31 5943
92% 963.93 1049.48 1006.705 85.55 5903
96% 1048.48 1200.48 1124.480 152.00 5863
quantileError <- cdsw$error()

Adjust windows

Using this method the window start and end is shifted to a mass range with as few MS1 peaks as possible.

knitr::kable(cdsw$optimizeWindows(maxbin = 10, plot = TRUE) )

from to mid width counts
350.13 380.95 365.54 30.82 5952
380.45 406.35 393.40 25.90 5932
406.05 429.05 417.55 23.00 5948
428.65 449.65 439.15 21.00 5891
449.45 469.65 459.55 20.20 5872
469.45 488.35 478.90 18.90 5860
488.15 508.05 498.10 19.90 6137
507.45 526.45 516.95 19.00 5892
526.15 545.45 535.80 19.30 6022
545.15 565.05 555.10 19.90 5992
564.55 584.45 574.50 19.90 5976
584.15 605.05 594.60 20.90 6035
604.55 626.05 615.30 21.50 5893
625.55 648.15 636.85 22.60 5987
647.55 672.15 659.85 24.60 6023
671.75 696.15 683.95 24.40 5890
695.55 722.55 709.05 27.00 5981
722.25 751.15 736.70 28.90 5932
750.65 782.15 766.40 31.50 5927
781.65 817.15 799.40 35.50 5944
816.65 857.55 837.10 40.90 5901
857.35 905.25 881.30 47.90 5904
905.05 964.65 934.85 59.60 5881
964.35 1049.15 1006.75 84.80 5864
1048.95 1200.05 1124.50 151.10 5843

Dynamic Swath Windows with Constraints.

cdsw$sampling_breaks(maxwindow = 100,plot = TRUE)

cdsw$plot()

knitr::kable(cdsw$asTable())
from to mid width counts
0% 349.63 381.71 365.670 32.08 6113
4% 380.71 408.41 394.560 27.70 6371
8% 407.41 432.26 419.835 24.85 6578
12% 431.26 454.93 443.095 23.67 6622
16% 453.93 476.38 465.155 22.45 6625
20% 475.38 497.11 486.245 21.73 6678
24% 496.11 518.13 507.120 22.02 6763
28% 517.13 538.79 527.960 21.66 6730
32% 537.79 560.07 548.930 22.28 6771
36% 559.07 581.30 570.185 22.23 6691
40% 580.30 603.26 591.780 22.96 6621
44% 602.26 626.18 614.220 23.92 6603
48% 625.18 649.87 637.525 24.69 6572
52% 648.87 675.24 662.055 26.37 6430
56% 674.24 701.24 687.740 27.00 6402
60% 700.24 728.90 714.570 28.66 6322
64% 727.90 758.92 743.410 31.02 6244
68% 757.92 791.39 774.655 33.47 6093
72% 790.39 826.91 808.650 36.52 5891
76% 825.91 866.98 846.445 41.07 5730
80% 865.98 912.46 889.220 46.48 5469
84% 911.46 963.54 937.500 52.08 5119
88% 962.54 1026.85 994.695 64.31 4672
92% 1025.85 1101.01 1063.430 75.16 4134
96% 1100.01 1200.48 1150.245 100.47 3118
knitr::kable(cdsw$optimizeWindows(maxbin = 10, plot = TRUE) )

from to mid width counts
350.13 381.35 365.74 31.22 6053
381.05 408.05 394.55 27.00 6191
407.65 432.05 419.85 24.40 6448
431.65 454.45 443.05 22.80 6448
454.35 476.05 465.20 21.70 6389
475.85 496.85 486.35 21.00 6526
496.45 517.85 507.15 21.40 6587
517.45 538.45 527.95 21.00 6492
538.15 560.05 549.10 21.90 6688
559.55 581.05 570.30 21.50 6482
580.55 603.05 591.80 22.50 6526
602.55 626.05 614.30 23.50 6452
625.55 649.45 637.50 23.90 6335
649.25 675.15 662.20 25.90 6392
674.55 701.15 687.85 26.60 6293
700.55 728.55 714.55 28.00 6160
728.25 758.55 743.40 30.30 6119
758.25 791.25 774.75 33.00 6038
790.65 826.55 808.60 35.90 5815
826.25 866.55 846.40 40.30 5622
866.35 912.05 889.20 45.70 5408
911.85 963.15 937.50 51.30 5055
962.75 1026.35 994.55 63.60 4631
1026.25 1100.65 1063.45 74.40 4105
1100.45 1200.05 1150.25 99.60 3091
mixedError <- cdsw$error()

Benchmarking of the methods.

We compare the optimal number of MS1 peaks per SWATH window (same in each window) with the numbers obtained by using all of the 3 methods implemented.

barplot(c(const = constError$score1, quantile = quantileError$score1, mixed = mixedError$score1),ylab = "Manhattan distance")

barplot(c(const = constError$score2, quantile = quantileError$score2, mixed = mixedError$score2),ylab = "Euclidean distance")

We can see that Method 3 has a relatively small error although it is able to fulfill constraints such as maximum window size.

Session info

## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=C                          
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dplyr_1.0.7  Matrix_1.3-4 prozor_0.3.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.7            highr_0.9             bslib_0.3.1          
##  [4] compiler_4.1.1        pillar_1.6.4          jquerylib_0.1.4      
##  [7] tools_4.1.1           bit_4.0.4             digest_0.6.28        
## [10] docopt_0.7.1          jsonlite_1.7.2        evaluate_0.14        
## [13] lifecycle_1.0.1       tibble_3.1.4          lattice_0.20-44      
## [16] AhoCorasickTrie_0.1.2 pkgconfig_2.0.3       rlang_0.4.11         
## [19] DBI_1.1.1             cli_3.1.0             parallel_4.1.1       
## [22] yaml_2.2.1            xfun_0.26             fastmap_1.1.0        
## [25] stringr_1.4.0         knitr_1.36            generics_0.1.1       
## [28] sass_0.4.0            vctrs_0.3.8           hms_1.1.1            
## [31] tidyselect_1.1.1      bit64_4.0.5           ade4_1.7-18          
## [34] grid_4.1.1            glue_1.4.2            R6_2.5.1             
## [37] fansi_0.5.0           vroom_1.5.6           rmarkdown_2.11       
## [40] tzdb_0.1.2            purrr_0.3.4           readr_2.0.1          
## [43] seqinr_4.2-8          magrittr_2.0.1        htmltools_0.5.2      
## [46] ellipsis_0.3.2        MASS_7.3-54           assertthat_0.2.1     
## [49] utf8_1.2.2            stringi_1.7.4         crayon_1.4.2