This vignette contains the use case presented in the KibioR paper.



1 Biological data importation

The $import() method will try importation from known file extensions. If it does not guess the file format, try using the right method it depends on:

  • $import_sequences(): import fasta format
  • $import_features(): import gtf, gff, bed formats
  • $import_alignments(): import bam format
  • $import_tabular(): import csv, txt, tab formats
  • $import_json(): import json format

For instance, we can import all data from extdata in one command.

Some file with gff3.gz, bed, fa.gz and json extensions were just imported in one shot. For some reasons, the bai extension file was not. We can try to import it with the right method $import_alignement(), which will cut down some checks and try to call the right method.

Using published data: for instance, we can import Hetionet JSON dataset from Greene Lab, Baranzini Lab, et al..





2 Use cases

These use cases come from the Kibio and KibioR publication.

2.1 Use case 1: Characterize the effects of drugs using tissue-specific interactions

The objective of this analysis is to investigate targets for the drug metformin within the framework of tissue-specific gene expression patterns and Protein-Protein Interactions (PPIs) of that target protein. The resources used to reach this objective are DrugBank (Wishart, Feunang, et al. 2018a), HUGO Gene Nomenclature Committee (HGNC (Gray et al. 2015)) and the Database of Human Tissue Protein-Protein Interactions (TissueNet v.2 (Basha et al. 2017)). In this example, we perform consecutive search/pull commands with KibioR to emulate what one do when exploring datasets following a classical iterative manner.

A litterature search had identified metformin (Rojas and Gomes 2013) as a drug that helps control blood sugar and is used to treat pre-diabetes, type 2 diabetes and gestational diabetes. We searched for metformin in the DrugBank to obtained its 3 target genes (proteins): protein kinase AMP-activated non-catalytic subunit beta 1 (PRKAB1), glycerol-3-phosphate dehydrogenase 1 (GPD1) and electron transfer flavoprotein dehydrogenase (ETFDH). The gene symbols in HGNC gave us the associated Ensembl Gene IDs. We then injected these Ensembl IDs in another search with TissueNet2 database to find tissue-specific interactions for these target genes. This resulted in all the subcutaneous adipose interactions (170) and all whole blood interactions (167). With these parameters, we obtained the non-redondant 3 interactions between subcutaneous adipose and whole blood tissues: solute carrier family 25 member 10 (SLC25A10, ENSG00000183048), cell death inducing DFFA like effec-tor A (CIDEA, ENSG00000176194) and coagulation factor X (F10, ENSG00000126218).

Findings in mice suggest SLC25A10 as a possible target for anti-obesity treatments (Kulyté et al. 2017) and mice without a functional CIDEA are resistant to diet-induced obesity and diabetes(Puri et al. 2008). These tissue-specific interactions seem to be interesting targets for diabetes, and could explain potential mechanisms of action in subcutaneous adipose. These results emphasize the need to consider prior knowledge of tissue-specific drug-protein interactions for drug design, as it could potentially improve the prediction of drug effects and/or adverse effects.

2.2 Use case 2: Identification of miRNAs linked to inflammation in the prostate

It has been demonstrated that miRNAs regulate antitumor immunity (Yang, Yan, and Liu 2018). Aberrant expression of miRNAs was observed in prostate cancer (Sharma and Baruah 2019). The objective of this case study was to identify miRNAs that could regulate genes linked to immunity mainly expressed in the prostate, and that could be involved in cancer. This analysis required 4 resources: 1/ InnateDB a database of genes involved in innate immune response (Breuer et al. 2013), 2/ the Human Protein Atlas, a database of all the human proteins in cells, tissues and organs (Uhlén et al. 2015), 3/ miRTarBase, a curated database of miRNA-target interactions (Chou et al. 2018) and 4/ miRBase, a microRNA sequences and annotations database (Kozomara, Birgaoanu, and Griffiths-Jones 2019). Here, we simply used 3 joins with filters between 4 databases to get our answer.

We started the exploration from a list of human genes involved in innate immune response from InnateDB and selected those expressed in “glandular cell” type and “prostate” tissues using the Human Protein Atlas dataset. This resulted in a list of 768 unique genes. We then found the miRNA targets for these genes using the miRTarBase. This search was restricted to human miRNA-mRNA associations with strong validation evidence (Kuhn et al. 2008), namely those with at least “qRT-PCR”, “Luciferase reporter assay” and “Western blot” as methods of validation. This resulted in a list of 1180 miRNA-mRNA interactions. Finally, the miRNA identifiers of these interactions were linked to the miRBase where the “experimental” filter was applied to obtain a reduced list of 48 unique miRNAs.

The expression profiles of two of those miRNAs have evidence of being involved in prostate cancer (Vanacore et al. 2017). In fact, miR-375 (RF00700) is known to be important for early diagnosis (Wen, Deng, and Wang 2014) and miR-650 (RF00952) can suppress the cellular stress response 1 (CSR1) expression and promote tumor growth (Zuo et al. 2015). Finding new targets could help to develop miRNA-based strategies for more effective immunotherapeutic interventions in cancer. This finding confirms the utility of our approach and motivates further investigation of other miRNAs that were identified for their potential roles in prostate cancer pathophysiology and/or treatment.

2.3 Use case 3: Identification of pathways for metabolites produced by the microbiome

The aim of this exploration was to find drug metabolism pathways linked to metabolites of human gut microbiome origin (Wilson and Nicholson 2017). The resources used to reach this objective are HMDB (Wishart, Feunang, et al. 2018b) and the Small Molecule Pathway Database (SMPDB (Jewison et al. 2014)). We used a more complex set of commands to show the ease of KibioR imbrication into common R command steps. We also did not limit our search to only one column for the first query, making it truely searching in all textual fields and bringing results from potential free-text comments containing the word “microbial”.

We started by searching the HMDB metabolites dataset for metabolites of “microbial” origin. We obtained 546 metabolite records listed as such on the 114,100 initial records. From these selected metabolites, we change the referential by joining on to the SMPDB protein database relation, where we retrieve the microbial metabolites associated pathways. Finally, to find all data linked to these pathways, the SMPDB pathways database was mined, pruning only for the “drug metabolism” pathways. This search resulted in 41 unique pathways.

This investigation identified drug metabolism pathways that could be influenced by different microbiota composition and/or variations in microbial-derived metabolite precursor availability. It is noteworthy that data exploration can be tailored to diverse applications. For example, if the interest is understanding skin aging, one could navigate to the Digital Ageing Atlas (Ageing Map) (Craig et al. 2015) database to highlight pathways and metabolites that could potentially act on aging. This could be an interesting lead for the development of new therapeutics.





3 How to…

3.1 Search elements from a vector of IDs

Basically, the general form is kc$search("index", query = "column:(id1 || id2 || id3)").

To automate in a script, one can use:

You can create a utils function for that:

3.2 Delete all indices

Be very cautious with this one. Deleting an index cannot be undone.

By default, kibior will not allow you to delete everything at once with $delete("*"). But, if you need to do it, you can forcefully apply it with kc$list() %>% kc$delete().

3.4 Import data from a single big JSON file

You can find a lot of data already availabel on the web. We can take an example with Hetionet doi:10.7554/elife.26726 available on het.io. The file is big and contains graph data (nodes and edges). One way of integrating it in Kibio is to separate the nodes from the edges available inside the file with a simple code.

4 Session info

This vignette has been built using the following session:

Session info

```r
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] kibior_0.1.1   magrittr_2.0.1 readr_1.4.0    stringr_1.4.0  dplyr_1.0.3   
## [6] ggplot2_3.3.3  knitr_1.30    
## 
## loaded via a namespace (and not attached):
##  [1] zip_2.1.1         Rcpp_1.0.6        cellranger_1.1.0  pillar_1.4.7     
##  [5] compiler_4.0.3    forcats_0.5.0     elastic_1.1.0     tools_4.0.3      
##  [9] digest_0.6.27     jsonlite_1.7.2    evaluate_0.14     lifecycle_0.2.0  
## [13] tibble_3.0.5      gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.10     
## [17] openxlsx_4.2.3    crul_1.0.0        curl_4.3          yaml_2.2.1       
## [21] haven_2.3.1       xfun_0.20         rio_0.5.16        withr_2.4.1      
## [25] generics_0.1.0    vctrs_0.3.6       hms_1.0.0         grid_4.0.3       
## [29] tidyselect_1.1.0  glue_1.4.2        httpcode_0.3.0    data.table_1.13.6
## [33] R6_2.5.0          readxl_1.3.1      foreign_0.8-80    rmarkdown_2.6    
## [37] tidyr_1.1.2       purrr_0.3.4       scales_1.1.1      ellipsis_0.3.1   
## [41] htmltools_0.5.1.1 colorspace_2.0-0  stringi_1.5.3     munsell_0.5.0    
## [45] crayon_1.3.4
```

</p>





References

Basha, Omer, Ruth Barshir, Moran Sharon, Eugene Lerman, Binyamin F. Kirson, Idan Hekselman, and Esti Yeger-Lotem. 2017. “The TissueNet V.2 Database: A Quantitative View of Protein-Protein Interactions Across Human Tissues.” Nucleic Acids Research 45 (D1): D427–D431. https://doi.org/10.1093/nar/gkw1088.

Breuer, Karin, Amir K. Foroushani, Matthew R. Laird, Carol Chen, Anastasia Sribnaia, Raymond Lo, Geoffrey L. Winsor, Robert E. W. Hancock, Fiona S. L. Brinkman, and David J. Lynn. 2013. “InnateDB: Systems Biology of Innate Immunity and Beyond—Recent Updates and Continuing Curation.” Nucleic Acids Research 41 (D1): D1228–D1233. https://doi.org/10.1093/nar/gks1147.

Chou, Chih-Hung, Sirjana Shrestha, Chi-Dung Yang, Nai-Wen Chang, Yu-Ling Lin, Kuang-Wen Liao, Wei-Chi Huang, et al. 2018. “miRTarBase Update 2018: A Resource for Experimentally Validated microRNA-Target Interactions.” Nucleic Acids Research 46 (D1): D296–D302. https://doi.org/10.1093/nar/gkx1067.

Craig, Thomas, Chris Smelick, Robi Tacutu, Daniel Wuttke, Shona H. Wood, Henry Stanley, Georges Janssens, et al. 2015. “The Digital Ageing Atlas: Integrating the Diversity of Age-Related Changes into a Unified Resource.” Nucleic Acids Research 43 (D1): D873–D878. https://doi.org/10.1093/nar/gku843.

Gray, Kristian A., Bethan Yates, Ruth L. Seal, Mathew W. Wright, and Elspeth A. Bruford. 2015. “Genenames.org: The HGNC Resources in 2015.” Nucleic Acids Research 43 (D1): D1079–D1085. https://doi.org/10.1093/nar/gku1071.

Jewison, Timothy, Yilu Su, Fatemeh Miri Disfany, Yongjie Liang, Craig Knox, Adam Maciejewski, Jenna Poelzer, et al. 2014. “SMPDB 2.0: Big Improvements to the Small Molecule Pathway Database.” Nucleic Acids Research 42 (D1): D478–D484. https://doi.org/10.1093/nar/gkt1067.

Kozomara, Ana, Maria Birgaoanu, and Sam Griffiths-Jones. 2019. “miRBase: From microRNA Sequences to Function.” Nucleic Acids Research 47 (D1): D155–D162. https://doi.org/10.1093/nar/gky1141.

Kuhn, Donald E., Mickey M. Martin, David S. Feldman, Alvin V. Terry, Gerard J. Nuovo, and Terry S. Elton. 2008. “Experimental Validation of miRNA Targets.” Methods, MicroRNAs: Part B, 44 (1): 47–54. https://doi.org/10.1016/j.ymeth.2007.09.005.

Kulyté, Agné, Anna Ehrlund, Peter Arner, and Ingrid Dahlman. 2017. “Global Transcriptome Profiling Identifies KLF15 and SLC25A10 as Modifiers of Adipocytes Insulin Sensitivity in Obese Women.” PLOS ONE 12 (6): e0178485. https://doi.org/10.1371/journal.pone.0178485.

Puri, Puneet, Faridoddin Mirshahi, Onpan Cheung, Ramesh Natarajan, James W. Maher, John M. Kellum, and Arun J. Sanyal. 2008. “Activation and Dysregulation of the Unfolded Protein Response in Nonalcoholic Fatty Liver Disease.” Gastroenterology 134 (2): 568–76. https://doi.org/10.1053/j.gastro.2007.10.039.

Rojas, Lilian Beatriz Aguayo, and Marilia Brito Gomes. 2013. “Metformin: An Old but Still the Best Treatment for Type 2 Diabetes.” Diabetology & Metabolic Syndrome 5 (1): 6. https://doi.org/10.1186/1758-5996-5-6.

Sharma, N., and M. M. Baruah. 2019. “The microRNA Signatures: Aberrantly Expressed miRNAs in Prostate Cancer.” Clinical and Translational Oncology 21 (2): 126–44. https://doi.org/10.1007/s12094-018-1910-8.

Uhlén, Mathias, Linn Fagerberg, Björn M. Hallström, Cecilia Lindskog, Per Oksvold, Adil Mardinoglu, Åsa Sivertsson, et al. 2015. “Tissue-Based Map of the Human Proteome.” Science 347 (6220). https://doi.org/10.1126/science.1260419.

Vanacore, Daniela, Mariarosaria Boccellino, Sabrina Rossetti, Carla Cavaliere, Carmine D’Aniello, Rossella Di Franco, Francesco Jacopo Romano, et al. 2017. “Micrornas in Prostate Cancer: An Overview.” Oncotarget 8 (30): 50240–51. https://doi.org/10.18632/oncotarget.16933.

Wen, Xin, Fang-Ming Deng, and Jinhua Wang. 2014. “MicroRNAs as Predictive Biomarkers and Therapeutic Targets in Prostate Cancer.” American Journal of Clinical and Experimental Urology 2 (3): 219–30. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4219315/.

Wilson, Ian D., and Jeremy K. Nicholson. 2017. “Gut Microbiome Interactions with Drug Metabolism, Efficacy, and Toxicity.” Translational Research, Microbiome and Human Disease Pathogenesis, 179 (January): 204–22. https://doi.org/10.1016/j.trsl.2016.08.002.

Wishart, David S., Yannick D. Feunang, An C. Guo, Elvis J. Lo, Ana Marcu, Jason R. Grant, Tanvir Sajed, et al. 2018a. “DrugBank 5.0: A Major Update to the DrugBank Database for 2018.” Nucleic Acids Research 46 (D1): D1074–D1082. https://doi.org/10.1093/nar/gkx1037.

Wishart, David S., Yannick Djoumbou Feunang, Ana Marcu, An Chi Guo, Kevin Liang, Rosa Vázquez-Fresno, Tanvir Sajed, et al. 2018b. “HMDB 4.0: The Human Metabolome Database for 2018.” Nucleic Acids Research 46 (D1): D608–D617. https://doi.org/10.1093/nar/gkx1089.

Yang, Ju, Jing Yan, and Baorui Liu. 2018. “Targeting VEGF/VEGFR to Modulate Antitumor Immunity.” Frontiers in Immunology 9. https://doi.org/10.3389/fimmu.2018.00978.

Zuo, Jianwei, Yuanqing Guo, Xinsheng Peng, Yubo Tang, Xintao Zhang, Peiheng He, Shuaihua Li, et al. 2015. “Inhibitory Action of Pristimerin on Hypoxia‑mediated Metastasis Involves Stem Cell Characteristics and EMT in PC-3 Prostate Cancer Cells.” Oncology Reports 33 (3): 1388–94. https://doi.org/10.3892/or.2015.3708.