When preparing a secsse analysis, it can be daunting to prepare the different required matrices and settings in order to be able to perform a meaningful analysis. Starting with secsse package version 2.6, there are now general helper functions available that can prepare all matrices for some general cases. Often, these general cases can already be applicable, alternatively, they can be modified later on to better reflect the intricacies of the specific studied system.
To perform a secsse analysis, we want to use maximum likelihood to find the most likely values for our parameters, given a phylogenetic tree and tip states. To do so, secsse requires the user to specify how speciation changes the state of the daughter species in relation to the parent species, and requires the user to specify the number of unique speciation rates to be fitted. Here, we will explore a basic example.
Now, we can use our settings to perform an analysis. Because we are lacking empirical data in this example, we will simulate a tree for this. To do so, we first need to specify our focal rates, and then fill them in.
<- 0.5
speciation <- 0.0
extinction <- 0.2
sp_sn <- 0.2
sp_ns <- 0.5
q_ab <- 0.5
q_ba
<- c(speciation,
params
extinction,
sp_sn, sp_ns,
q_ab, q_ba)
<- secsse::fill_in(lambda_matrices,
lambda_matrices_p
params)<- secsse::fill_in(trans_matrix,
trans_matrix_p
params)<- secsse::fill_in(mus,
mus_p params)
With the values replaced, we can now simulate an “empirical” dataset:
<- secsse::secsse_sim(lambdas = lambda_matrices_p,
simulated_tree mus = mus_p,
qs = trans_matrix_p,
num_concealed_states = num_hidden_states,
crown_age = 5,
conditioning = "obs_states",
verbose = TRUE,
seed = 26)
<- simulated_tree$obs_traits
sim_traits <- simulated_tree$phy focal_tree
Given this data, we can now perform our maximum likelihood analysis. Here, we choose to initialize our parameters with random values in [0, 1], we use multithreading to speed up the analysis, and use the subplex optimization method, as this has shown to be more reliable.
<- list()
param_posit 1]] <- lambda_matrices
param_posit[[2]] <- mus
param_posit[[3]] <- trans_matrix
param_posit[[
<- params
initpars <- initpars[-2]
initpars
<- secsse::cla_secsse_ml(phy = focal_tree,
answ traits = sim_traits,
num_concealed_states = num_hidden_states,
idparslist = param_posit,
idparsopt = c(1, 3, 4, 5, 6),
initparsopt = initpars,
idparsfix = c(0, 2),
parsfix = c(0.0, 0.0),
sampling_fraction = c(1, 1),
optimmethod = "subplex",
verbose = FALSE,
num_threads = 6,
atol = 0.1, # high values for demonstration
rtol = 0.1) # purposes, don't use at home!
## Warning in secsse::cla_secsse_ml(phy = focal_tree, traits = sim_traits, : Note:
## you set some transitions as impossible to happen.
We can now extract our parameters to get them in the right place:
<- secsse::extract_par_vals(param_posit, answ$MLpars)
found_pars_vals found_pars_vals
## [1] 0.6105537 0.0000000 0.1472296 0.1313448 0.2067287 0.7870417
We have done this now only for the CR model, but we can also use the CTD and ETD model. Let’s do that semi-automagically! We first define a generic function to optimize for a model:
<- function(focal_tree, traits, model) {
fit_model <- secsse::create_default_lambda_list(state_names = used_states,
focal_list model = model)
<- secsse::create_lambda_matrices(state_names = used_states,
lambda_matrices num_concealed_states = num_hidden_states,
transition_list =
focal_list,model = model)
<- secsse::create_mus(state_names = used_states,
mus num_concealed_states = num_hidden_states,
model = model,
lambdas = lambda_matrices)
<- secsse::create_default_q_list(state_names = used_states,
q_list num_concealed_states = num_hidden_states,
mus = mus)
<- secsse::create_transition_matrix(state_names = used_states,
trans_matrix num_concealed_states = num_hidden_states,
transition_list = q_list,
diff.conceal = TRUE)
<- list()
param_posit 1]] <- lambda_matrices
param_posit[[2]] <- mus
param_posit[[3]] <- trans_matrix
param_posit[[
<- max(trans_matrix, na.rm = TRUE)
max_indicator
# we cheat a bit by setting extinction to zero -
# in a real analysis this should be avoided.
<- unique(mus)
extinct_rates <- 1:max_indicator
idparsopt <- idparsopt[-extinct_rates]
idparsopt <- c(0, extinct_rates)
idparsfix <- rep(0.0, length(idparsfix))
parsfix
<- c(rep(params[1], min(extinct_rates) - 1),
initpars -c(1, 2)])
params[
<- secsse::cla_secsse_ml(phy = focal_tree,
answ traits = traits,
num_concealed_states = num_hidden_states,
idparslist = param_posit,
idparsopt = idparsopt,
initparsopt = initpars,
idparsfix = idparsfix,
parsfix = parsfix,
sampling_fraction = c(1, 1),
optimmethod = "subplex",
verbose = FALSE,
num_threads = 6,
atol = 0.1, # high values for demonstration
rtol = 0.1) # purposes, don't use at home!
<- secsse::extract_par_vals(param_posit, answ$MLpars)
found_pars_vals <- 2 * max_indicator - 2 * as.numeric(answ$ML)
aic return(list(pars = found_pars_vals,
ml = as.numeric(answ$ML),
aic = aic))
}
And then we can loop over the different models:
<- c()
found for (focal_model in c("CR", "CTD", "ETD")) {
<- fit_model(focal_tree = focal_tree,
local_answ traits = sim_traits,
model = focal_model)
<- rbind(found, c(focal_model, local_answ$ml, local_answ$aic))
found }
## Warning in secsse::cla_secsse_ml(phy = focal_tree, traits = traits,
## num_concealed_states = num_hidden_states, : Note: you set some transitions as
## impossible to happen.
## Warning in secsse::cla_secsse_ml(phy = focal_tree, traits = traits,
## num_concealed_states = num_hidden_states, : Note: you set some transitions as
## impossible to happen.
## Warning in secsse::cla_secsse_ml(phy = focal_tree, traits = traits,
## num_concealed_states = num_hidden_states, : Note: you set some transitions as
## impossible to happen.
colnames(found) <- c("model", "LL", "AIC")
<- as.data.frame(found)
found $LL <- as.numeric(found$LL)
found$AIC <- as.numeric(found$AIC)
found found
## model LL AIC
## 1 CR -128.1962 268.3923
## 2 CTD -127.8295 271.6590
## 3 ETD -127.9006 271.8012
Because we have simulated the tree using the CR model, we expect the model with the lowest AIC to be the CR model again, and indeed we do find this!