2-PhylteR in a container

Damien de Vienne

2023-08-24

1 PhylteR in a container

In addition to the R package phylter available on CRAN (https://CRAN.R-project.org/package=phylter) and on GitHub (https://github.com/damiendevienne/phylter), containerized versions of phylter (docker and singularity) are also proposed.

This may ease the use of phylter on some computing infrastructures (clusters) or for users reluctant to the R language.

The containers host python3 scrips allowing to easily run phylter with the same options than with the R package, but also perform additional tasks such as removing (pruning) outliers from input trees and/or filtering out outlier sequences from (aligned) sequence files (fasta format).

Using phylter from the container simply consists in running the phylter.py function, specifying various options such as the directory containing the gene trees (with -t), the job name (with -j), etc.

The containers also contain a toy dataset of 255 Carnivora genes trees and alignments from Allio et al. (2021) that will allow you to test both the correct installation of the container(s), and the use of the phylter.py function and its options.

1.1 Run PhylteR with Docker

PhylteR is available as a Docker container: https://hub.docker.com/r/treecoutheo/phylter_docker.

Here are the steps needed to use the docker version of phylter:

Warning: you may need administrator rights to use docker!

  1. Pull the latest version of phylter container from the Docker Hub repository:
sudo docker pull treecoutheo/phylter_docker:latest
  1. Run phylter on the example Carnivora dataset
sudo docker run -v $PWD:$PWD -w $PWD treecoutheo/phylter_docker phylter.py -j Carnivora_docker -t /usr/container-data/trees

The command above creates the directory Carnivora_docker that will contain:

  1. Run phylter on the example Carnivora dataset AND prune trees AND remove outliers from fasta files

You may want to run phylter and to subsequently remove the identified outliers from both the gene trees and the sequences files. For this to be performed, the sequence files must contain the same name as the corresponding tree, minus the extension if any. For example, a sequence file named ENSG00000274286_ADRA2B_final_align_NT.aln will be matched automatically to a tree file named ENSG00000274286_ADRA2B.treefile: phylter.py will identify the gene ID as being ENSG00000274286_ADRA2B.

sudo docker run -v $PWD:$PWD -w $PWD treecoutheo/phylter_docker phylter.py -j Carnivora_docker -t /usr/container-data/trees -p TRUE -s /usr/container-data/alignments -g TRUE

The command above generates, in addition to the two files described in the previous example:

  1. Prune trees and filter out sequences AFTER the phylter run.

Instead of performing the phylter analysis and the filtering of outliers at the same time, you can do it in multiple steps. here is how, on the example dataset:

sudo docker run -v $PWD:$PWD -w $PWD treecoutheo/phylter_docker phylter.py -j Carnivora_docker -t /usr/container-data/trees

The output file phylter.out will be used for performing the pruning and/or the sequence filtering (see below).

sudo docker run -v $PWD:$PWD -w $PWD treecoutheo/phylter_docker prune_tree_outliers.R container-data_phylter /usr/container-data/trees Carnivora_docker/phylter.out
sudo docker run -v $PWD:$PWD -w $PWD treecoutheo/phylter_docker remove_sequence_outliers.py -j container-data_phylter -s /usr/container-data/alignments -o Carnivora_docker/phylter.out
  1. View all phylter options

phylter.py allows specifying all the options available in the R package. To see this list of options, simply use the -h option:

sudo docker run -v $PWD:$PWD -w $PWD treecoutheo/phylter_docker phylter.py -h

1.2 Run PhylteR with Singularity

PhylteR is also available as a singularity container : (https://cloud.sylabs.io/library/theo.treecou/tool/phylter_singularity). Here are instructions to install (or build) and run it:

  1. Pull the latest version of phylter container from the Sylabs repository:
sudo singularity pull PhylteR.sif library://theo.treecou/tool/phylter_singularity:latest

Alternatively, you can build a singularity image from the Docker Hub repository:

sudo singularity pull PhylteR.sif docker://treecoutheo/phylter_docker:latest

2.a Run phylter on the carnivora example dataset:

singularity exec -B $PWD PhylteR.sif phylter.py -j Carnivora_singularity -t /usr/container-data/trees

Note: For more options Please, refer to the description of the docker container to see how to use all the options available with the phylter.py function.

2.b Alternatively, you can open a console in the singularity container as follows and use R in that console:

singularity shell -B $PWD PhylteR.sif

R # this launch the version of R from inside the singularity

Then:


library(phylter)

list_trees <- Sys.glob("/usr/container-data/trees/ENSG*.treefile")

trees <- lapply(list_trees, ape::read.tree)

results <- phylter(trees, parallel = FALSE)

2 References


For comments, suggestions and bug reports, please open an issue on this GitHub repository.