Composite variables and biotic-abiotic interactions

In this vignette we explain how to use composite variables in trophic SDM. These variables are useful especially in species-rich network, where some predators might have so many preys that assuming that each of them has a differential effect on predator distribution is not only a problem, but might seem ecologically unjustified. Instead, we could expect the richness or diversity of prey, or whether at least a prey is available, to be important. Hence, we implemented the use of composite variables, i.e., variables that summarize the information of a large number of variables from the graph in a few summary variables. Examples of composite variables are prey richness or diversity, or a binary variable set to one if the number of preys is above a certain threshold. These variables assume that all species have the same impact on the predator. An alternative is to group species in the metaweb to represent trophic groups that clump together species that feed on, or are eaten by, the same type of species. We can then construct composite variables (e.g., their richness) for each of those trophic groups to better represent the variety of resources for species like generalist or top predators.
In webSDM it is possible to define an extremely large set of composite variables using the arguments sp.formula and sp.partition. In general, sp.formula allows the definition of the composite variables, while sp.partition allow to define the groups of preys (or predator) for which each composite variable is calculate. Hereafter we don’t provide a complete analyses of a case-study using composite variables, but rather focus on the description of how to implement them in webSDM. Finally, we will see how to include an interaction between preys and the environment.

library(webSDM)
data(X ,Y, G)

Composite variables

sp.formula can be a right hand side formula including richness (i.e. the sum of its preys) and any transformations of richness. Notice that only “richness” is allowed for now. However, we will see how this already allow to build an extremely large set of composite variable. We start with a simple case where we model species as a function of the richness of their prey, with a polynomial of degree two.

m = trophicSDM(Y, X, G,
               env.formula = "~ X_1 + X_1^2", 
               family = binomial(link = "logit"), penal = NULL, 
               sp.formula = "richness + I(richness^2)",
               mode = "prey", method = "glm")
#> Formula was modified since it led to identical columns of the design matrix (e.g. Y_1 or Y_1^2 for binary data)

Notice that for predators that feed on a single prey (with presence-absence data), their richness and the square of their richness is exactly the same variable. In this case, trophicSDM() removes the redundant variable but prints a warning message. We can see the formulas created by the argument $form.all of the fitted model.

m$form.all
#> $Y1
#> [1] "y ~ X_1 + X_1^2"
#> 
#> $Y2
#> [1] "y ~ X_1 + X_1^2"
#> 
#> $Y3
#> [1] "y ~ X_1 + X_1^2"
#> 
#> $Y5
#> [1] "y ~ X_1+I(Y1 + Y2 + Y3)+I(I(Y1 + Y2 + Y3)^2)"
#> 
#> $Y4
#> [1] "y ~ X_1+I(Y3)"
#> 
#> $Y6
#> [1] "y ~ X_1+I(Y3 + Y5)+I(I(Y3 + Y5)^2)"

We now set as composite variable a dummy variable specifying whether there is at least one available prey or not. To do so we define sp.formula = "I(richness>0)".

m = trophicSDM(Y,X,G,
               env.formula = "~ X_1 + X_1^2", 
               family = binomial(link = "logit"), penal = NULL, 
               sp.formula = "I(richness>0)",
               mode = "prey", method = "glm")

Which leads to the following formulas:

m$form.all
#> $Y1
#> [1] "y ~ X_1 + X_1^2"
#> 
#> $Y2
#> [1] "y ~ X_1 + X_1^2"
#> 
#> $Y3
#> [1] "y ~ X_1 + X_1^2"
#> 
#> $Y5
#> [1] "y ~ X_1+I(I(Y1 + Y2 + Y3) > 0)"
#> 
#> $Y4
#> [1] "y ~ X_1+I(I(Y3) > 0)"
#> 
#> $Y6
#> [1] "y ~ X_1+I(I(Y3 + Y5) > 0)"

Composite variables and groups of preys

A halfway between considering that each species has a differential effect on the predator (i.e., no composite variable), or that every prey has the same effect (i.e., modeling predator as a function of prey richness), is to create group of species that we assume to have the same effect on the predator. Then, we can model predator as a function of composite variables calculated on each of these groups. To do so, we rely on the argument sp.partition. This parameters has to be a list, where each element contain the species of the given group.
For example, we can put species Y1 and Y2 in the same group, species Y3 and Y4 are alone in their group and Y5 and Y6 to form another group.

sp.partition = list(c("Y1","Y2"),c("Y3"),c("Y4"), c("Y5","Y6"))

Then, we can specify whatever sp.formula, for example using richness (with a quadratic term).

m = trophicSDM(Y,X,G, "~ X_1 + X_1^2", 
               family = binomial(link = "logit"), penal = NULL, 
               sp.partition = sp.partition,
               sp.formula = "richness + I(richness^2)",
               mode = "prey", method = "glm")
#> Formula was modified since it led to identical columns of the design matrix (e.g. Y_1 or Y_1^2 for binary data)
#> Formula was modified since it led to identical columns of the design matrix (e.g. Y_1 or Y_1^2 for binary data)
#> Formula was modified since it led to identical columns of the design matrix (e.g. Y_1 or Y_1^2 for binary data)

Which leads to the following formulas:

m$form.all
#> $Y1
#> [1] "y ~ X_1 + X_1^2"
#> 
#> $Y2
#> [1] "y ~ X_1 + X_1^2"
#> 
#> $Y3
#> [1] "y ~ X_1 + X_1^2"
#> 
#> $Y5
#> [1] "y ~ X_1+I((Y1 + Y2))+I(I((Y1 + Y2)^2))+I((Y3))"
#> 
#> $Y4
#> [1] "y ~ X_1+I((Y3))"
#> 
#> $Y6
#> [1] "y ~ X_1+I((Y3))+I((Y5))"

Custom formula

In case we want to specify a composite variable that cannot be created from a function of richness, the user can specify directly the list of the biotic formulas in the sp.formula argument. Importantly, notice that in this case trophicSDM() will not create a new formula from the argument G and will not check that the user-defined formula actually derives from G (i.e. that predator-prey interactions described in the formulas derive from G). For example, we might want to quantify the effect of the prey Y5 on the predator Y6 when the prey Y3 is not available. To do so, we define for species Y6 an interaction term between species Y5 and one minus Y3 (i.e. a dummy variable that takes the value of 1 when species Y5 is present and Y3 is absent). To do so, we define sp.formula as a list describing the biotic formula for each species.

sp.formula = list(Y1 = "",
                  Y2 = "",
                  Y3 = "",
                  Y4 = "Y3",
                  Y5 = "Y1",
                  Y6 = "I(Y5*(1-Y3))")

m = trophicSDM(Y,X,G, "~ X_1 + X_1^2", 
               family = binomial(link = "logit"), penal = NULL, 
               sp.formula = sp.formula,
               mode = "prey", method = "glm")
#> We don't check that G and the graph induced by the sp.formula specified by the user match, nor that the latter is a graph. Please be careful about their consistency.

m$form.all
#> $Y1
#> [1] "y ~ X_1 + X_1^2"
#> 
#> $Y2
#> [1] "y ~ X_1 + X_1^2"
#> 
#> $Y3
#> [1] "y ~ X_1 + X_1^2"
#> 
#> $Y5
#> [1] "y ~ X_1+Y1"
#> 
#> $Y4
#> [1] "y ~ X_1+Y3"
#> 
#> $Y6
#> [1] "y ~ X_1+I(Y5 * (1 - Y3))"

On computeVariableImportance with composite variable

When computing the importance of variables using the function computeVariableImportance, you should be careful with the definition of groups. Indeed, the function computes the variable importance of each group as the sum of the standardised regression coefficients containing some variables from the groups. Therefore, if species from different groups of variable are merged in the same composite variable, computeVariableImportance will sum the regression coefficient of the composite variable in different groups, leading to wrong results. For example, if sp.formula = "richness" and sp.partition = NULL, you should put all species in the same group in the argument groups.

m = trophicSDM(Y,X,G,
               env.formula = "~ X_1 + X_1^2", 
               family = binomial(link = "logit"), penal = NULL, 
               sp.formula = "richness + I(richness^2)",
               mode = "prey", method = "glm")
#> Formula was modified since it led to identical columns of the design matrix (e.g. Y_1 or Y_1^2 for binary data)

computeVariableImportance(m, groups = list("abiotic" = c("X_1", "X_2"),
                                           "biotic" = c("Y1", "Y2", "Y3",
                                                        "Y4", "Y5", "Y6")))
#> Warning in computeVariableImportance(m, groups = list(abiotic = c("X_1", : If you use composite variables, you
#> should group together species that belong to the same composite variable. For example, if sp.formula = 'richness' and
#> sp.partition = NULL, you should put all species in the same group in the argument 'groups'. If you define a partition
#> of species in sp.partition, then species in the same group in sp.partition should put all species in the same group
#> in the argument 'groups'
#>         Y1 Y2 Y3         Y5         Y4         Y6
#> abiotic  0  0  0 0.05380788 0.17251449 0.04623329
#> biotic   0  0  0 0.08470730 0.07633953 0.38116851

If you use composite variables, you should group together species that belong to the same composite variable.If you define a partition of species in sp.partition, then species in the same group in sp.partition should put all species in the same group in the argument ‘groups’.

sp.partition = list(c("Y1","Y2"),c("Y3"),c("Y4"), c("Y5","Y6"))

m = trophicSDM(Y,X,G, "~ X_1 + X_1^2", 
               family = binomial(link = "logit"), penal = NULL, 
               sp.partition = sp.partition,
               sp.formula = "richness + I(richness^2)",
               mode = "prey", method = "glm")
#> Formula was modified since it led to identical columns of the design matrix (e.g. Y_1 or Y_1^2 for binary data)
#> Formula was modified since it led to identical columns of the design matrix (e.g. Y_1 or Y_1^2 for binary data)
#> Formula was modified since it led to identical columns of the design matrix (e.g. Y_1 or Y_1^2 for binary data)

computeVariableImportance(m, groups = list("abiotic" = c("X_1", "X_2"),
                                           "Group1" = c("Y1", "Y2"),
                                           "Group2" = c("Y3"),
                                           "Group3" = c("Y4", "Y5", "Y6")))
#> Warning in computeVariableImportance(m, groups = list(abiotic = c("X_1", : If you use composite variables, you
#> should group together species that belong to the same composite variable. For example, if sp.formula = 'richness' and
#> sp.partition = NULL, you should put all species in the same group in the argument 'groups'. If you define a partition
#> of species in sp.partition, then species in the same group in sp.partition should put all species in the same group
#> in the argument 'groups'
#>         Y1 Y2 Y3        Y5         Y4         Y6
#> abiotic  0  0  0 0.1016899 0.17251449 0.03603087
#> Group1   0  0  0 0.8621983 0.00000000 0.00000000
#> Group2   0  0  0 0.2581345 0.07633953 0.22671385
#> Group3   0  0  0 0.0000000 0.00000000 0.11579149

Interaction with biotic and abiotic terms

Assuming that the effect of prey on predators does not vary with environmental conditions might be a too strong assumption. In order to integrate species plasticity, it is possible to define a sp.formula that includes both biotic and abiotic terms. For example, we hereafter define an interaction between the prey and the environmental variable X_2.

sp.formula = list(Y1 = "",
                  Y2 = "",
                  Y3 = "",
                  Y4 = "I(X_2*Y3)",
                  Y5 = "I(X_2*Y1)",
                  Y6 = "I(X_2*Y3) + I(X_2*Y5)")

m = trophicSDM(Y,X,G,
               env.formula = "~ X_1 + X_1^2", 
               family = binomial(link = "logit"), penal = NULL, 
               sp.formula = sp.formula,
               mode = "prey", method = "glm")
#> We don't check that G and the graph induced by the sp.formula specified by the user match, nor that the latter is a graph. Please be careful about their consistency.