The sensiPhy package provides simple functions to perform sensitivity analysis in phylogenetic comparative methods. It uses several simulation methods to estimate the impact of different types of uncertainty on PGLS models:
sensiPhy functions use a common syntax that combines the type of uncertainty and the type of model: uncertainty.type_phylm (for linear regressions) or uncertainty.type_phyglm (for logistic regressions).
Function | Uncertainty | Model |
---|---|---|
samp_phylm | sample size | linear regression |
samp_phyglm | sample size | logistic regression |
influ_phylm | influential species | linear regression |
influ_phyglm | influential species | logistic regression |
clade_phylm | influential clade | linear regression |
clade_phyglm | influential clades | logistic regression |
tree_phylm | phylogenetic uncertain | linear regression |
tree_phylm | phylogenetic uncertain | logistic regression |
intra_phylm | Data uncertainty | linear regression |
intra_phylm | Data uncertainty | logistic regression |
Mandatory arguments of these functions include: function(formula , data, phy, ...)
Function | Description |
---|---|
match_dataphy | Match data and phylogeny based on model formula |
miss.phylo.d | Calculates the phylogenetic signal for missing data |
The following examples describes the basic usage of sensiPhy functions.
Loading the package and data
set.seed(1234)
library(sensiPhy)
### Loading data:
data(alien)
data(primates) # see ?alien & ?primates for details about the data.
The samp_phylm
function performs analyses of sensitivity to species sampling by randomly removing species and detecting the effects on parameter estimates in a phylogenetic linear regression.
# run analysis:
samp <- samp_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy[[1]],
data = alien$data, times = 10, track = F)
## Used dataset has 84 species that match data and phylogeny
# You can change the number of repetitions and break intervals:
samp2 <- samp_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy[[1]], track = F,
data = alien$data, times = 100, breaks = c(0.1, 0.2, 0.3, 0.4))
## Used dataset has 84 species that match data and phylogeny
# You can change the phylogenetic model:
samp <- samp_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy[[1]],
data = alien$data, model = "kappa", track = F)
## Used dataset has 84 species that match data and phylogeny
# Check results:
knitr::kable(summary(samp))
## 150 simulations saved, see output$samp.model.estimates to acess all simulations
% Species Removed | % Significant Intercepts | Mean Intercept Change (%) | Mean sDFintercept | % Significant Slopes | Mean Slope Change (%) | Mean sDFslope |
---|---|---|---|---|---|---|
10 | 100 | 1.966667 | -0.1970550 | 100 | 3.686667 | 0.2384257 |
20 | 100 | 4.103333 | -0.1311984 | 100 | 7.776667 | 0.3018753 |
30 | 100 | 5.003333 | -0.4332537 | 100 | 9.896667 | 0.6141183 |
40 | 100 | 6.873333 | -0.3434639 | 100 | 13.570000 | 0.7573968 |
50 | 100 | 9.203333 | -0.5721201 | 100 | 18.300000 | 1.3674361 |
# Visual diagnostics
sensi_plot(samp2)
# You can specify which graph and parameter ("slope" or "intercept") to print:
sensi_plot(samp2, graphs = 1)
sensi_plot(samp2, param = "intercept")
The function influ_phylm
performs leave-one-out deletion analyis for phylogenetic linear regression, and detects influential species.
# run analysis:
influ <- influ_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy[[1]],
data = alien$data, track = F)
## Used dataset has 84 species that match data and phylogeny
# To check summary results:
summary(influ)
## $`Influential species for the Slope`
## [1] "Ovis_ammon" "Ovis_aries" "Equus_hemionus"
## [4] "Camelus_bactrianus" "Axis_porcinus" "Axis_axis"
##
## $`Slope Estimates`
## Species removed Slope DFslope Change(%) Pval
## 1 Ovis_ammon 0.1585335 0.011033226 7.5 1.465612e-10
## 2 Ovis_aries 0.1567383 0.009238004 6.3 2.949203e-10
## 3 Equus_hemionus 0.1384028 -0.009097490 6.2 3.326380e-09
## 4 Camelus_bactrianus 0.1400746 -0.007425680 5.0 2.741126e-09
## 5 Axis_porcinus 0.1546280 0.007127723 4.8 3.792296e-10
## 6 Axis_axis 0.1531949 0.005694542 3.9 4.646006e-10
##
## $`Influential species for the Intercept`
## [1] "Ornithorhynchus_anatinus" "Ovis_ammon"
## [3] "Ovis_aries" "Axis_porcinus"
## [5] "Sorex_cinereus"
##
## $`Intercept Estimates`
## Species removed Intercept DFintercept Change(%) Pval
## 1 Ornithorhynchus_anatinus 2.459667 0.09744173 4.1 4.298004e-10
## 2 Ovis_ammon 2.275301 -0.08692386 3.7 2.181632e-09
## 3 Ovis_aries 2.289397 -0.07282803 3.1 2.279349e-09
## 4 Axis_porcinus 2.306031 -0.05619439 2.4 1.890367e-09
## 5 Sorex_cinereus 2.412289 0.05006331 2.1 7.463555e-10
# Most influential species
influ$influential.species
## $influ.sp.slope
## [1] "Ovis_ammon" "Ovis_aries" "Equus_hemionus"
## [4] "Camelus_bactrianus" "Axis_porcinus" "Axis_axis"
##
## $influ.sp.intercept
## [1] "Ornithorhynchus_anatinus" "Ovis_ammon"
## [3] "Ovis_aries" "Axis_porcinus"
## [5] "Sorex_cinereus"
# Visual diagnostics
sensi_plot(influ)
# Check most influential species on the original regression plot:
sensi_plot(influ, graphs = 2)
The function clade_phylm
estimate the impact on model estimates of phylogenetic linear regression after removing clades from the analysis.
# Original data set:
knitr::kable(head(primates$data))
family | adultMass | sexMaturity | homeRange | |
---|---|---|---|---|
Cercopithecus_diana | Cercopithecidae | 4370 | 1977.08 | 1.27 |
Cercopithecus_neglectus | Cercopithecidae | 5320 | 1809.79 | 0.06 |
Cercopithecus_pogonias | Cercopithecidae | 3580 | 1472.16 | 1.17 |
Cercopithecus_cephus | Cercopithecidae | 3440 | 1320.08 | 0.24 |
Cercopithecus_ascanius | Cercopithecidae | 3540 | 1490.41 | 0.17 |
Cercopithecus_mitis | Cercopithecidae | 5040 | 1779.37 | 0.11 |
# run analysis:
clade <- clade_phylm(log(sexMaturity) ~ log(adultMass), phy = primates$phy[[1]],
data = primates$data, clade.col = "family", times = 99, track = F)
## Used dataset has 95 species that match data and phylogeny
# To check summary results and most influential clades:
summary(clade)
## $Slope
## clade.removed N.species slope DFslope Change (%) Pval
## 3 Cercopithecidae 32 0.3075424 0.05706804 22.8 5.742404e-11
## 2 Cebidae 19 0.2199214 -0.03055296 12.2 7.285954e-07
## 1 Callitrichidae 9 0.2260405 -0.02443393 9.8 5.285826e-08
## 4 Lemuridae 8 0.2581974 0.00772303 3.1 1.323367e-09
## m.null.slope Pval.randomization
## 3 0.2799503 0.18181818
## 2 0.2644762 0.00000000
## 1 0.2580067 0.01010101
## 4 0.2553345 0.33333333
##
## $Intercept
## clade.removed N.species intercept DFintercept Change (%) Pval
## 3 Cercopithecidae 32 4.496911 -0.38912425 8.0 5.742404e-11
## 2 Cebidae 19 5.051216 0.16518056 3.4 7.285954e-07
## 1 Callitrichidae 9 5.072203 0.18616707 3.8 5.285826e-08
## 4 Lemuridae 8 4.829880 -0.05615518 1.1 1.323367e-09
## m.null.intercept Pval.randomization
## 3 4.652871 0.2323232
## 2 4.769680 0.0000000
## 1 4.825502 0.0000000
## 4 4.847405 0.3535354
# Visual diagnostics for clade removal:
sensi_plot(clade, "Cercopithecidae")
sensi_plot(clade, "Cebidae")
The function tree_phylm
performs Phylogenetic linear regression evaluating uncertainty in trees topology.
# This analysis needs a multiphylo file:
class(alien$phy)
## [1] "multiPhylo"
alien$phy
## 101 phylogenetic trees
# run PGLS accounting for phylogenetic uncertain:
tree <- tree_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy,
data = alien$data, times = 100, track = F)
## Used dataset has 84 species that match data and phylogeny
# To check summary results:
knitr::kable(summary(tree))
mean | CI_low | CI_high | |
---|---|---|---|
intercept | 2.326 | 2.319 | 2.334 |
se.intercept | 0.342 | 0.342 | 0.343 |
pval.intercept | 0.000 | 0.000 | 0.000 |
slope | 0.152 | 0.151 | 0.153 |
se.slope | 0.022 | 0.022 | 0.022 |
pval.slope | 0.000 | 0.000 | 0.000 |
# Visual diagnostics
sensi_plot(tree)
The function intra_phylm
performs Phylogenetic linear regression evaluating intraspecific variability.
# run PGLS accounting for intraspecific variation:
intra <- intra_phylm(gestaLen ~ adultMass, phy = alien$phy[[1]], track = F,
data = alien$data, Vy = "SD_gesta", Vx = "SD_mass",
times = 100, x.transf = log, y.transf = log)
## Used dataset has 84 species that match data and phylogeny
# To check summary results:
knitr::kable(summary(intra))
mean | CI_low | CI_high | |
---|---|---|---|
intercept | 2.390 | 2.359 | 2.421 |
se.intercept | 0.368 | 0.364 | 0.373 |
pval.intercept | 0.000 | 0.000 | 0.000 |
slope | 0.144 | 0.140 | 0.148 |
se.slope | 0.023 | 0.022 | 0.023 |
pval.slope | 0.000 | 0.000 | 0.000 |
# Visual diagnostics
sensi_plot(intra)
The function miss.phylo.d
Calculates D statistic (Fritz & Purvis 2010), a measure of phylogenetic signal, for missing data. Missingness is recoded into a binary variable (1=missing, 0=non missing).
# Load caper:
library(caper)
# Load data
data(alien)
knitr::kable(head(alien.data))
family | adultMass | gestaLen | homeRange | SD_mass | SD_gesta | SD_range | |
---|---|---|---|---|---|---|---|
Tachyglossus_aculeatus | Tachyglossidae | 4020.767 | 28.375 | 0.9991117 | 1218.29240 | 4.1995 | 0.7922419 |
Ornithorhynchus_anatinus | Ornithorhynchidae | 1458.208 | 15.000 | 0.1120000 | 180.81779 | 2.1600 | 0.0430000 |
Ondatra_zibethicus | Cricetidae | 1135.014 | 27.100 | 0.0044500 | 388.17479 | 3.3333 | 0.0000000 |
Mesocricetus_auratus | Cricetidae | 97.125 | 15.500 | NA | 12.52913 | 0.4960 | NA |
Castor_canadensis | Castoridae | 18085.634 | 110.000 | NA | 2875.61581 | 13.0900 | NA |
Chinchilla_lanigera | Chinchillidae | 452.575 | NA | NA | 40.73175 | NA | NA |
data <- alien.data
phy = alien.phy[[1]]
# Test phylogenetic signal for missing data:
homeNAsig <- miss.phylo.d(data, phy, binvar = homeRange)
## [1] "Percentage of missing data in traits:"
## family adultMass gestaLen homeRange SD_mass SD_gesta SD_range
## 0.00 2.13 9.57 44.68 2.13 9.57 44.68
print(homeNAsig)
##
## Calculation of D statistic for the phylogenetic structure of a binary variable
##
## Data : data
## Binary variable : homeRange
## Counts of states: 0 = 52
## 1 = 42
## Phylogeny : phy
## Number of permutations : 1000
##
## Estimated D : 0.9469183
## Probability of E(D) resulting from no (random) phylogenetic structure : 0.336
## Probability of E(D) resulting from Brownian phylogenetic structure : 0
plot(homeNAsig)
massNAsig <- miss.phylo.d(data, phy, binvar = adultMass)
## [1] "Percentage of missing data in traits:"
## family adultMass gestaLen homeRange SD_mass SD_gesta SD_range
## 0.00 2.13 9.57 44.68 2.13 9.57 44.68
print(massNAsig)
##
## Calculation of D statistic for the phylogenetic structure of a binary variable
##
## Data : data
## Binary variable : adultMass
## Counts of states: 0 = 92
## 1 = 2
## Phylogeny : phy
## Number of permutations : 1000
##
## Estimated D : 0.8647909
## Probability of E(D) resulting from no (random) phylogenetic structure : 0.362
## Probability of E(D) resulting from Brownian phylogenetic structure : 0.269
plot(massNAsig)
The funcion match_dataphy
combines phylogeny and data to ensure that tips in phylogeny match data and that observations with missing values are removed.
This function uses all variables provided in the ‘formula’ to match data and phylogeny. To avoid cropping the full dataset, ‘match_dataphy’ searches for NA values only on variables provided by formula. Missing values on other variables, not included in ‘formula’, will not be removed from data.
# Load data:
data(alien)
# Match data and phy based on model formula:
comp.data <- match_dataphy(gestaLen ~ homeRange, data = alien$data, alien$phy[[1]])
## Used dataset has 49 species that match data and phylogeny
# With a `multiphylo` tree:
comp.data2 <- match_dataphy(homeRange ~ homeRange, data = alien$data, alien$phy)
## Used dataset has 52 species that match data and phylogeny
# Check combined data:
knitr::kable(comp.data$data)
family | adultMass | gestaLen | homeRange | SD_mass | SD_gesta | SD_range | |
---|---|---|---|---|---|---|---|
Tachyglossus_aculeatus | Tachyglossidae | 4020.767 | 28.375 | 0.9991117 | 1218.29240 | 4.199500 | 0.7922419 |
Ornithorhynchus_anatinus | Ornithorhynchidae | 1458.208 | 15.000 | 0.1120000 | 180.81779 | 2.160000 | 0.0430000 |
Ondatra_zibethicus | Cricetidae | 1135.014 | 27.100 | 0.0044500 | 388.17479 | 3.333300 | 0.0000000 |
Myocastor_coypus | Myocastoridae | 6135.768 | 131.737 | 0.0376000 | 546.08335 | 3.425162 | 0.0156771 |
Marmota_monax | Sciuridae | 3747.182 | 31.600 | 0.0333582 | 528.35266 | 2.275200 | 0.0429752 |
Tamiasciurus_hudsonicus | Sciuridae | 209.452 | 35.724 | 0.0117357 | 23.66808 | 3.072264 | 0.0072091 |
Sciurus_carolinensis | Sciuridae | 538.715 | 44.100 | 0.0270056 | 49.02307 | 0.220500 | 0.0470268 |
Sciurus_vulgaris | Sciuridae | 649.843 | 40.545 | 0.0818883 | 223.54599 | 3.649050 | 0.1449527 |
Glis_glis | Gliridae | 146.000 | 27.750 | 0.0030500 | 21.17000 | 2.747250 | 0.0025500 |
Lepus_arcticus | Leporidae | 4166.869 | 52.000 | 18.9500050 | 145.84042 | 0.988000 | 16.0499950 |
Lepus_timidus | Leporidae | 3048.556 | 50.133 | 1.0090000 | 408.50650 | 0.100266 | 0.1200000 |
Lepus_europaeus | Leporidae | 3934.287 | 41.200 | 1.1241550 | 605.88020 | 1.606800 | 1.2592213 |
Daubentonia_madagascariensis | Daubentoniidae | 2634.074 | 165.750 | 0.2893750 | 213.35999 | 5.304000 | 0.3792628 |
Ovis_canadensis | Bovidae | 81829.394 | 177.088 | 22.9816067 | 35513.95700 | 5.843904 | 4.4763964 |
Capra_hircus | Bovidae | 45689.284 | 158.000 | 176.8260900 | 23027.39914 | 8.374000 | 132.8704278 |
Ammotragus_lervia | Bovidae | 97562.268 | 152.227 | 11.0133367 | 39902.96761 | 18.571694 | 9.3912932 |
Oreamnos_americanus | Bovidae | 73490.939 | 174.089 | 22.9285800 | 26456.73804 | 14.971654 | 1.0514030 |
Bos_taurus | Bovidae | 590397.500 | 280.500 | 17.9750000 | 295198.75000 | 2.524500 | 5.7347079 |
Alces_alces | Cervidae | 461860.313 | 235.840 | 182.3697044 | 115465.07825 | 9.669440 | 226.7850513 |
Odocoileus_hemionus | Cervidae | 87031.783 | 204.836 | 14.5387520 | 25326.24885 | 3.277376 | 16.2703626 |
Odocoileus_virginianus | Cervidae | 78331.245 | 205.069 | 2.6916700 | 28199.24820 | 14.764968 | 0.2033300 |
Axis_axis | Cervidae | 68752.082 | 230.222 | 1.8092367 | 13819.16848 | 11.971544 | 1.3112559 |
Axis_porcinus | Cervidae | 37015.550 | 221.938 | 0.7000000 | 5108.14590 | 19.308606 | 0.1000000 |
Cervus_nippon | Cervidae | 66598.688 | 222.553 | 2.9257450 | 38427.44298 | 8.457014 | 2.0742550 |
Dama_dama | Cervidae | 59049.963 | 229.623 | 2.2787525 | 15884.44005 | 9.184920 | 2.9084486 |
Moschus_moschiferus | Moschidae | 13357.143 | 171.000 | 0.8516667 | 1215.50001 | 27.189000 | 0.9905414 |
Antilocapra_americana | Antilocapridae | 46892.188 | 247.944 | 10.9361500 | 6986.93601 | 11.901312 | 5.2666490 |
Sus_scrofa | Suidae | 82160.027 | 117.735 | 5.7871275 | 31138.65023 | 6.004485 | 6.1376236 |
Mustela_putorius | Mustelidae | 952.942 | 41.063 | 0.8729158 | 188.68252 | 0.739134 | 0.8328026 |
Mustela_erminea | Mustelidae | 206.598 | 54.519 | 0.2205200 | 98.13405 | 19.408764 | 0.3400689 |
Mustela_nivalis | Mustelidae | 90.777 | 37.891 | 0.3641075 | 37.40012 | 4.433247 | 0.5252465 |
Mephitis_mephitis | Mephitidae | 2573.499 | 64.717 | 2.0949033 | 707.71222 | 2.653397 | 1.4576314 |
Procyon_lotor | Procyonidae | 7232.221 | 63.911 | 7.1197650 | 2256.45295 | 2.045152 | 3.8152350 |
Canis_latrans | Canidae | 12197.470 | 61.821 | 39.7382275 | 2110.16231 | 1.978272 | 37.1539830 |
Nyctereutes_procyonoides | Canidae | 4636.990 | 61.450 | 1.7072350 | 978.40489 | 2.396550 | 0.8172350 |
Vulpes_vulpes | Canidae | 5077.005 | 52.343 | 7.1448013 | 964.63095 | 3.978068 | 16.6391120 |
Vulpes_lagopus | Canidae | 4260.696 | 52.413 | 19.6376689 | 1218.55906 | 1.048260 | 12.0364612 |
Herpestes_javanicus | Herpestidae | 720.714 | 47.233 | 0.2462257 | 178.01636 | 2.645048 | 0.2516805 |
Desmana_moschata | Talpidae | 428.875 | 46.250 | 0.0045000 | 49.32062 | 1.248750 | 0.0000000 |
Erinaceus_europaeus | Erinaceidae | 865.896 | 35.556 | 0.1438589 | 282.28210 | 2.417808 | 0.1420654 |
Sorex_cinereus | Soricidae | 4.198 | 18.000 | 0.0046864 | 0.44079 | 0.000000 | 0.0000000 |
Trichosurus_vulpecula | Phalangeridae | 2665.563 | 17.660 | 0.0450433 | 455.81127 | 1.130240 | 0.0108214 |
Bettongia_gaimardi | Potoroidae | 1680.472 | 20.880 | 0.4533350 | 18.48519 | 0.730800 | 0.0033350 |
Macropus_rufogriseus | Macropodidae | 16242.858 | 30.182 | 0.2436667 | 2192.78583 | 3.410566 | 0.1470518 |
Macropus_robustus | Macropodidae | 24879.172 | 34.333 | 1.6569700 | 7861.81835 | 0.480662 | 0.9497399 |
Macropus_giganteus | Macropodidae | 33630.005 | 36.385 | 1.8885950 | 15268.02227 | 0.254695 | 2.4557433 |
Phascolarctos_cinereus | Phascolarctidae | 7752.538 | 32.775 | 0.0150000 | 2209.47333 | 3.080850 | 0.0075000 |
Perameles_gunnii | Peramelidae | 810.000 | 12.375 | 0.0302500 | 59.94000 | 0.210375 | 0.0124574 |
Didelphis_marsupialis | Didelphidae | 1182.023 | 13.600 | 0.4161250 | 177.30345 | 0.734400 | 0.4894138 |
# Check phy:
plot(comp.data$phy)
# See species dropped from phy or data:
comp.data$dropped
## [1] "Mesocricetus_auratus" "Castor_canadensis"
## [3] "Hystrix_brachyura" "Chinchilla_lanigera"
## [5] "Marmota_bobak" "Tamias_townsendii"
## [7] "Atlantoxerus_getulus" "Sciurus_niger"
## [9] "Sciurus_aureogaster" "Oryctolagus_cuniculus"
## [11] "Macaca_arctoides" "Macaca_mulatta"
## [13] "Macaca_fascicularis" "Ovis_ammon"
## [15] "Ovis_aries" "Hemitragus_jemlahicus"
## [17] "Capra_ibex" "Rupicapra_rupicapra"
## [19] "Ovibos_moschatus" "Gazella_subgutturosa"
## [21] "Saiga_tatarica" "Bubalus_bubalis"
## [23] "Tragelaphus_strepsiceros" "Capreolus_capreolus"
## [25] "Rangifer_tarandus" "Rusa_timorensis"
## [27] "Cervus_elaphus" "Rusa_unicolor"
## [29] "Camelus_bactrianus" "Equus_hemionus"
## [31] "Mustela_sibirica" "Mustela_lutreola"
## [33] "Neovison_vison" "Nasua_nasua"
## [35] "Lycalopex_griseus" "Felis_catus"
## [37] "Pseudocheirus_peregrinus" "Bettongia_lesueur"
## [39] "Macropus_eugenii" "Macropus_parma"
## [41] "Petrogale_lateralis" "Petrogale_penicillata"
## [43] "Thylogale_billardierii" "Potorous_tridactylus"
## [45] "Lasiorhinus_latifrons"