ontology_index
An ontology_index
can be created by loading a pre-existing one - for example by calling data(hpo)
, reading ontologies encoded in OBO or OWL format into R using get_ontology
, or by calling the function ontology_index
explicitly. An ontology_index
is a named list
of properties for each term, where each property is represented by a list
or vector
named by term (thus lookups by term name are simple). All ontology_index
objects contain the id
, name
, parents
, children
and ancestors
for each term, however additional properties can be added. For details on how to use an ontology_index
, see the ‘Introduction to ontologyX’ vignette.
The functions get_OBO
, get_OWL
and get_ontology
can read ontologies into R as ontology_index
objects, the latter determining which function to call based on the file name. By default, the properties id
, name
, parents
, children
, ancestors
and obsolete
are populated.
Note: the xml2
package is required in order to read ontologies encoded in OWL format.
The syntax is simple:
ontology <- get_ontology(file)
Additional information is often present in the original files - for example definitions (the def
field in OBO format) and part_of
relationships to other terms. It is possible to include these extra properties by passing additional arguments to get_ontology
or get_OBO
/get_OWL
as appropriate. The given name of the additional arguments then go on to form the names of the properties included in the returned ontology_index
, and the format of the arguments depends on whether you are reading an OBO or OWL formatted file.
For OBO files, these arguments should be character vectors where the first element is a regular expression specifying what text to extract as the property and the second element giving the mode of the returned array (defaulting to character
, but potentially also taking the values logical
,numeric
,list
, etc.). Here for example, we additionally extract the "definition"
field for each term:
ontology <- get_OBO(file, definition=c("^def: (\\.*)", "character"))
Now, ontology
will have the ontology$definition
field, a character vector (named by term as usual) populated, in addition to the standard properties. Thus, definitions can be extracted simply using ontology$definition[termID]
.
For OWL files the additional arguments should be functions which given an xml_nodeset
(see the xml2
package), corresponding to all the terms in the ontology, return lists or vectors of term properties. A selection of such functions are included with the package, including OWL_IDs
, OWL_labels
, OWL_obsolete
and OWL_is_a
. The following expressions can be used to load ontology_index
es with the part_of
relationship populated from OWL formatted ontologies:
ontology <- get_OWL(file, part_of=OWL_part_of)
Here, OWL_part_of
is a function which returns a list of character
vectors containing the part_of
relationships for each term. Customs functions based on XPath expressions can be defined if other term properties are desired, see ?OWL_strings_from_nodes
for details.
By default, if there are terms in parent-offspring relationships specified by the ontology which are missing, an error is thrown. However, the ontology reading functions get_ontology
, get_OWL
and get_OBO
have a remove_missing
argument (FALSE
by default), which causes parent-offspring relationships with a missing term to be ignored, hence the complete parts of the ontology can still be used.
When reading ontologies from OWL files, the term IDs are often given as URLs:
"http://purl.obolibrary.org/obo/<ontology code>_<number>"
,
and thus the term IDs stored by the ontology_index
object obtained with get_OWL
will also be in this form. However, data is often supplied with the OBO ID format: <ontology code>:<number>
. Possible solutions to making such data compatible with the ontologyX packages are to transform the term IDs in the data to the OWL format (i.e. to match the index) or transform the ontology_index
object itself (to match the data, so that it’s IDs are in OBO style). The former is simpler and can be done by paste
ing data strings onto the appropriate URL and replacing the underscore with a colon. The latter can be done with the help of regular expressions, for example as demonstrated below, by looping over the properties in the ontology_index
, and transforming them as appropriate depending on whether they are lists or vectors:
term_url <- "http://purl.obolibrary.org/obo/(\\w+)_(\\d+)"
for (property in names(ontology)) {
if (is.character(unlist(use.names=FALSE, ontology[[property]]))) ontology[[property]] <- if (is.list(ontology[[property]])) lapply(ontology[[property]], function(term_properties) gsub(x=term_properties, pattern=term_url, replacement="\\1:\\2")) else gsub(x=ontology[[property]], pattern=term_url, replacement="\\1:\\2")
names(ontology[[property]]) <- gsub(x=names(ontology[[property]]), pattern=term_url, replacement="\\1:\\2")
}
Modifying an existing ontology_index
to add term properties is as simple as modifying a list. In the example below, we’ll add the number of children for each term.
ontology$number_of_children <- sapply(ontology$children, length)
In the same manner, a valid ontology_index
can be built up from scratch as a list, of course requiring that the standard properties are included for use with functions in ontologyIndex
.