How to visualize nLTT values distributions

Richel Bilderbeek

2023-08-21

Calculating the average nLTT plot of multiple phylogenies is not a trivial tasks.

The function get_nltt_values collects the nLTT values of a collection of phylogenies as tidy data.

This allows for a good interplay with ggplot2.

Example: Easy trees

Create two easy trees:

newick1 <- "((A:1,B:1):2,C:3);"
newick2 <- "((A:2,B:2):1,C:3);"
phylogeny1 <- ape::read.tree(text = newick1)
phylogeny2 <- ape::read.tree(text = newick2)
phylogenies <- c(phylogeny1, phylogeny2)

There are very similar. phylogeny1 has short tips:

ape::plot.phylo(phylogeny1)
ape::add.scale.bar() #nolint

This can be observed in the nLTT plot:

nLTT::nltt_plot(phylogeny1, ylim = c(0, 1))

As a collection of timepoints:

t <- nLTT::get_phylogeny_nltt_matrix(phylogeny1)
knitr::kable(t)
time N
0.0000000 0.3333333
0.6666667 0.6666667
1.0000000 1.0000000

Plotting those timepoints:

df <- as.data.frame(nLTT::get_phylogeny_nltt_matrix(phylogeny1))
ggplot2::qplot(
  time, N, data = df, geom = "step", ylim = c(0, 1), direction = "vh",
  main = "NLTT plot of phylogeny 1"
)
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

phylogeny2 has longer tips:

ape::plot.phylo(phylogeny2)
ape::add.scale.bar() #nolint

Also this can be observed in the nLTT plot:

nLTT::nltt_plot(phylogeny2, ylim = c(0, 1))

As a collection of timepoints:

t <- nLTT::get_phylogeny_nltt_matrix(phylogeny2)
knitr::kable(t)
time N
0.0000000 0.3333333
0.3333333 0.6666667
1.0000000 1.0000000

Plotting those timepoints:

df <- as.data.frame(nLTT::get_phylogeny_nltt_matrix(phylogeny2))
ggplot2::qplot(
  time, N, data = df, geom = "step", ylim = c(0, 1), direction = "vh",
  main = "NLTT plot of phylogeny 2"
)

The average nLTT plot should be somewhere in the middle.

It is constructed from stretched nLTT matrices.

Here is the nLTT matrix of the first phylogeny:

t <- nLTT::stretch_nltt_matrix(
  nLTT::get_phylogeny_nltt_matrix(phylogeny1), dt = 0.20, step_type = "upper"
)
knitr::kable(t)
0.0 0.6666667
0.2 0.6666667
0.4 0.6666667
0.6 0.6666667
0.8 1.0000000
1.0 1.0000000

Here is the nLTT matrix of the second phylogeny:

t <- nLTT::stretch_nltt_matrix(
  nLTT::get_phylogeny_nltt_matrix(phylogeny2), dt = 0.20, step_type = "upper"
)
knitr::kable(t)
0.0 0.6666667
0.2 0.6666667
0.4 1.0000000
0.6 1.0000000
0.8 1.0000000
1.0 1.0000000

Here is the average nLTT matrix of both phylogenies:

t <- nLTT::get_average_nltt_matrix(phylogenies, dt = 0.20)
knitr::kable(t)
0.0 0.6666667
0.2 0.6666667
0.4 0.8333333
0.6 0.8333333
0.8 1.0000000
1.0 1.0000000

Observe how the numbers get averaged.

The same, now shown as a plot:

nLTT::nltts_plot(phylogenies, dt = 0.20, plot_nltts = TRUE)

Here a demo how the new function works:

t <- nLTT::get_nltt_values(c(phylogeny1, phylogeny2), dt = 0.2)
knitr::kable(t)
id t nltt
1 0.0 0.6666667
1 0.2 0.6666667
1 0.4 0.6666667
1 0.6 0.6666667
1 0.8 1.0000000
1 1.0 1.0000000
2 0.0 0.6666667
2 0.2 0.6666667
2 0.4 1.0000000
2 0.6 1.0000000
2 0.8 1.0000000
2 1.0 1.0000000

Plotting options, first create a data frame:

df <- nLTT::get_nltt_values(c(phylogeny1, phylogeny2), dt = 0.01)

Here we see an averaged nLTT plot, where the original nLTT values are still visible:

ggplot2::qplot(
  t, nltt, data = df, geom = "point", ylim = c(0, 1),
  main = "Average nLTT plot of phylogenies", color = id, size = I(0.1)
) + ggplot2::stat_summary(
  fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)

Here we see an averaged nLTT plot, with the original nLTT values omitted:

ggplot2::qplot(t, nltt, data = df, geom = "blank", ylim = c(0, 1),
  main = "Average nLTT plot of phylogenies"
) + ggplot2::stat_summary(
  fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)

Example: Harder trees

Create two harder trees:

newick1 <- "((A:1,B:1):1,(C:1,D:1):1);"
newick2 <- paste0("((((XD:1,ZD:1):1,CE:2):1,(FE:2,EE:2):1):4,((AE:1,BE:1):1,",
  "(WD:1,YD:1):1):5);"
)
phylogeny1 <- ape::read.tree(text = newick1)
phylogeny2 <- ape::read.tree(text = newick2)
phylogenies <- c(phylogeny1, phylogeny2)

There are different. phylogeny1 is relatively simple, with two branching events happening at the same time:

ape::plot.phylo(phylogeny1)
ape::add.scale.bar() #nolint

This can be observed in the nLTT plot:

nLTT::nltt_plot(phylogeny1, ylim = c(0, 1))

As a collection of timepoints:

t <- nLTT::get_phylogeny_nltt_matrix(phylogeny2)
knitr::kable(t)
time N
0.0000000 0.1111111
0.5714286 0.2222222
0.7142857 0.3333333
0.7142857 0.4444444
0.7142857 0.5555556
0.8571429 0.6666667
0.8571429 0.7777778
0.8571429 0.8888889
1.0000000 1.0000000

phylogeny2 is more elaborate:

ape::plot.phylo(phylogeny2)
ape::add.scale.bar() #nolint

Also this can be observed in the nLTT plot:

nLTT::nltt_plot(phylogeny2, ylim = c(0, 1))

As a collection of timepoints:

t <- nLTT::get_phylogeny_nltt_matrix(phylogeny2)
knitr::kable(t)
time N
0.0000000 0.1111111
0.5714286 0.2222222
0.7142857 0.3333333
0.7142857 0.4444444
0.7142857 0.5555556
0.8571429 0.6666667
0.8571429 0.7777778
0.8571429 0.8888889
1.0000000 1.0000000

The average nLTT plot should be somewhere in the middle.

It is constructed from stretched nLTT matrices.

Here is the nLTT matrix of the first phylogeny:

t <- nLTT::stretch_nltt_matrix(
  nLTT::get_phylogeny_nltt_matrix(phylogeny1), dt = 0.20, step_type = "upper"
)
knitr::kable(t)
0.0 0.5
0.2 0.5
0.4 0.5
0.6 1.0
0.8 1.0
1.0 1.0

Here is the nLTT matrix of the second phylogeny:

t <- nLTT::stretch_nltt_matrix(
  nLTT::get_phylogeny_nltt_matrix(phylogeny2), dt = 0.20, step_type = "upper"
)
knitr::kable(t)
0.0 0.2222222
0.2 0.2222222
0.4 0.2222222
0.6 0.3333333
0.8 0.6666667
1.0 1.0000000

Here is the average nLTT matrix of both phylogenies:

t <- nLTT::get_average_nltt_matrix(phylogenies, dt = 0.20)
knitr::kable(t)
0.0 0.3611111
0.2 0.3611111
0.4 0.3611111
0.6 0.6666667
0.8 0.8333333
1.0 1.0000000

Observe how the numbers get averaged.

Here a demo how the new function works:

t <- nLTT::get_nltt_values(c(phylogeny1, phylogeny2), dt = 0.2)
knitr::kable(t)
id t nltt
1 0.0 0.5000000
1 0.2 0.5000000
1 0.4 0.5000000
1 0.6 1.0000000
1 0.8 1.0000000
1 1.0 1.0000000
2 0.0 0.2222222
2 0.2 0.2222222
2 0.4 0.2222222
2 0.6 0.3333333
2 0.8 0.6666667
2 1.0 1.0000000

Plotting options, first create a data frame:

df <- nLTT::get_nltt_values(c(phylogeny1, phylogeny2), dt = 0.01)

Here we see an averaged nLTT plot, where the original nLTT values are still visible:

ggplot2::qplot(
  t, nltt, data = df, geom = "point", ylim = c(0, 1),
  main = "Average nLTT plot of phylogenies", color = id, size = I(0.1)
) + ggplot2::stat_summary(
  fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)

Here we see an averaged nLTT plot, with the original nLTT values omitted:

ggplot2::qplot(t, nltt, data = df, geom = "blank", ylim = c(0, 1),
  main = "Average nLTT plot of phylogenies"
) + ggplot2::stat_summary(
  fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)

Example: Five random trees

Create three random trees:

set.seed(42)
phylogeny1 <- ape::rcoal(10)
phylogeny2 <- ape::rcoal(20)
phylogeny3 <- ape::rcoal(30)
phylogeny4 <- ape::rcoal(40)
phylogeny5 <- ape::rcoal(50)
phylogeny6 <- ape::rcoal(60)
phylogeny7 <- ape::rcoal(70)
phylogenies <- c(
  phylogeny1, phylogeny2, phylogeny3,
  phylogeny4, phylogeny5, phylogeny6, phylogeny7
)

Here a demo how the new function works:

t <- nLTT::get_nltt_values(phylogenies, dt = 0.2)
knitr::kable(t)
id t nltt
1 0.0 0.2000000
1 0.2 0.2000000
1 0.4 0.2000000
1 0.6 0.2000000
1 0.8 0.3000000
1 1.0 1.0000000
2 0.0 0.1000000
2 0.2 0.1000000
2 0.4 0.1000000
2 0.6 0.1000000
2 0.8 0.2000000
2 1.0 1.0000000
3 0.0 0.0666667
3 0.2 0.0666667
3 0.4 0.1000000
3 0.6 0.1333333
3 0.8 0.2333333
3 1.0 1.0000000
4 0.0 0.0500000
4 0.2 0.0500000
4 0.4 0.0500000
4 0.6 0.1000000
4 0.8 0.2750000
4 1.0 1.0000000
5 0.0 0.0400000
5 0.2 0.0600000
5 0.4 0.0600000
5 0.6 0.0600000
5 0.8 0.1000000
5 1.0 1.0000000
6 0.0 0.0333333
6 0.2 0.0333333
6 0.4 0.0666667
6 0.6 0.0666667
6 0.8 0.0833333
6 1.0 1.0000000
7 0.0 0.0285714
7 0.2 0.0285714
7 0.4 0.0285714
7 0.6 0.0428571
7 0.8 0.1000000
7 1.0 1.0000000

Here we see an averaged nLTT plot, where the original nLTT values are still visible:

ggplot2::qplot(t, nltt, data = df, geom = "point", ylim = c(0, 1),
  main = "Average nLTT plot of phylogenies", color = id, size = I(0.1)
) + ggplot2::stat_summary(
  fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)

Here we see an averaged nLTT plot, with the original nLTT values omitted:

ggplot2::qplot(t, nltt, data = df, geom = "blank", ylim = c(0, 1),
  main = "Average nLTT plot of phylogenies"
) + ggplot2::stat_summary(
  fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)