Sensitivity Analysis for Comparative Methods

Functions for sensitivity analysis

The sensiPhy package provides simple functions to perform sensitivity analysis in phylogenetic comparative methods. It uses several simulation methods to estimate the impact of different types of uncertainty on PGLS models:

  1. Species Sampling uncertainty (sample size; influential species and clades)
  2. Phylogenetic uncertainty
  3. Data uncertainty (intraspecific variation and measurement error)

sensiPhy functions use a common syntax that combines the type of uncertainty and the type of model: uncertainty.type_phylm (for linear regressions) or uncertainty.type_phyglm (for logistic regressions).

Function Uncertainty Model
samp_phylm sample size linear regression
samp_phyglm sample size logistic regression
influ_phylm influential species linear regression
influ_phyglm influential species logistic regression
clade_phylm influential clade linear regression
clade_phyglm influential clades logistic regression
tree_phylm phylogenetic uncertain linear regression
tree_phylm phylogenetic uncertain logistic regression
intra_phylm Data uncertainty linear regression
intra_phylm Data uncertainty logistic regression

Mandatory arguments of these functions include: function(formula , data, phy, ...)

Additional functions

Function Description
match_dataphy Match data and phylogeny based on model formula
miss.phylo.d Calculates the phylogenetic signal for missing data

The following examples describes the basic usage of sensiPhy functions.

Examples:

Loading the package and data

set.seed(1234)
library(sensiPhy)

### Loading data:
data(alien)
data(primates) # see ?alien & ?primates for details about the data.

1.Sampling uncertainty

1.1 Sensitivity analysis for sampling size:

The samp_phylm function performs analyses of sensitivity to species sampling by randomly removing species and detecting the effects on parameter estimates in a phylogenetic linear regression.

  • Additional arguments:
    breaks: A vector containing the percentages of species to remove
    times: The number of times species are randomly deleted for each break
# run analysis:
samp <- samp_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy[[1]], 
                   data = alien$data, times = 10, track = F)
## Used dataset has  84  species that match data and phylogeny
# You can change the number of repetitions and break intervals:
samp2 <- samp_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy[[1]], track = F,
                    data = alien$data, times = 100, breaks = c(0.1, 0.2, 0.3, 0.4))
## Used dataset has  84  species that match data and phylogeny
# You can change the phylogenetic model:
samp <- samp_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy[[1]], 
                   data = alien$data, model = "kappa", track = F)
## Used dataset has  84  species that match data and phylogeny
# Check results:
knitr::kable(summary(samp))
## 150 simulations saved, see output$samp.model.estimates to acess all simulations
% Species Removed % Significant Intercepts Mean Intercept Change (%) Mean sDFintercept % Significant Slopes Mean Slope Change (%) Mean sDFslope
10 100 1.966667 -0.1970550 100 3.686667 0.2384257
20 100 4.103333 -0.1311984 100 7.776667 0.3018753
30 100 5.003333 -0.4332537 100 9.896667 0.6141183
40 100 6.873333 -0.3434639 100 13.570000 0.7573968
50 100 9.203333 -0.5721201 100 18.300000 1.3674361
# Visual diagnostics
sensi_plot(samp2)

# You can specify which graph and parameter ("slope" or "intercept") to print: 
sensi_plot(samp2, graphs = 1)

sensi_plot(samp2, param = "intercept")



1.2 Sensitivity analysis for influential species:

The function influ_phylm performs leave-one-out deletion analyis for phylogenetic linear regression, and detects influential species.

# run analysis:
influ <- influ_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy[[1]], 
                     data = alien$data, track = F)
## Used dataset has  84  species that match data and phylogeny
# To check summary results:
summary(influ)
## $`Influential species for the Slope`
## [1] "Ovis_ammon"         "Ovis_aries"         "Equus_hemionus"    
## [4] "Camelus_bactrianus" "Axis_porcinus"      "Axis_axis"         
## 
## $`Slope Estimates`
##      Species removed     Slope      DFslope Change(%)         Pval
## 1         Ovis_ammon 0.1585335  0.011033226       7.5 1.465612e-10
## 2         Ovis_aries 0.1567383  0.009238004       6.3 2.949203e-10
## 3     Equus_hemionus 0.1384028 -0.009097490       6.2 3.326380e-09
## 4 Camelus_bactrianus 0.1400746 -0.007425680       5.0 2.741126e-09
## 5      Axis_porcinus 0.1546280  0.007127723       4.8 3.792296e-10
## 6          Axis_axis 0.1531949  0.005694542       3.9 4.646006e-10
## 
## $`Influential species for the Intercept`
## [1] "Ornithorhynchus_anatinus" "Ovis_ammon"              
## [3] "Ovis_aries"               "Axis_porcinus"           
## [5] "Sorex_cinereus"          
## 
## $`Intercept Estimates`
##            Species removed Intercept DFintercept Change(%)         Pval
## 1 Ornithorhynchus_anatinus  2.459667  0.09744173       4.1 4.298004e-10
## 2               Ovis_ammon  2.275301 -0.08692386       3.7 2.181632e-09
## 3               Ovis_aries  2.289397 -0.07282803       3.1 2.279349e-09
## 4            Axis_porcinus  2.306031 -0.05619439       2.4 1.890367e-09
## 5           Sorex_cinereus  2.412289  0.05006331       2.1 7.463555e-10
# Most influential species
influ$influential.species
## $influ.sp.slope
## [1] "Ovis_ammon"         "Ovis_aries"         "Equus_hemionus"    
## [4] "Camelus_bactrianus" "Axis_porcinus"      "Axis_axis"         
## 
## $influ.sp.intercept
## [1] "Ornithorhynchus_anatinus" "Ovis_ammon"              
## [3] "Ovis_aries"               "Axis_porcinus"           
## [5] "Sorex_cinereus"
# Visual diagnostics
sensi_plot(influ)

# Check most influential species on the original regression plot:
sensi_plot(influ, graphs = 2)



1.3 Sensitivity analysis for influential clades (Primates data):

The function clade_phylm estimate the impact on model estimates of phylogenetic linear regression after removing clades from the analysis.

  • Additional arguments:
    clade.col: The name of a column in the provided data frame with clades specification (a character vector with clade names).
    n.species: Minimum number of species in the clade in order to include this clade in the leave-one-out deletion analysis. Default is 5.
    times: The number of repetition for the randomization test
# Original data set:
knitr::kable(head(primates$data))
family adultMass sexMaturity homeRange
Cercopithecus_diana Cercopithecidae 4370 1977.08 1.27
Cercopithecus_neglectus Cercopithecidae 5320 1809.79 0.06
Cercopithecus_pogonias Cercopithecidae 3580 1472.16 1.17
Cercopithecus_cephus Cercopithecidae 3440 1320.08 0.24
Cercopithecus_ascanius Cercopithecidae 3540 1490.41 0.17
Cercopithecus_mitis Cercopithecidae 5040 1779.37 0.11
# run analysis:
clade <- clade_phylm(log(sexMaturity) ~ log(adultMass), phy = primates$phy[[1]],
                     data = primates$data, clade.col = "family", times = 99, track = F)
## Used dataset has  95  species that match data and phylogeny
# To check summary results and most influential clades:
summary(clade)
## $Slope
##     clade.removed N.species     slope     DFslope Change (%)         Pval
## 3 Cercopithecidae        32 0.3075424  0.05706804       22.8 5.742404e-11
## 2         Cebidae        19 0.2199214 -0.03055296       12.2 7.285954e-07
## 1  Callitrichidae         9 0.2260405 -0.02443393        9.8 5.285826e-08
## 4       Lemuridae         8 0.2581974  0.00772303        3.1 1.323367e-09
##   m.null.slope Pval.randomization
## 3    0.2799503         0.18181818
## 2    0.2644762         0.00000000
## 1    0.2580067         0.01010101
## 4    0.2553345         0.33333333
## 
## $Intercept
##     clade.removed N.species intercept DFintercept Change (%)         Pval
## 3 Cercopithecidae        32  4.496911 -0.38912425        8.0 5.742404e-11
## 2         Cebidae        19  5.051216  0.16518056        3.4 7.285954e-07
## 1  Callitrichidae         9  5.072203  0.18616707        3.8 5.285826e-08
## 4       Lemuridae         8  4.829880 -0.05615518        1.1 1.323367e-09
##   m.null.intercept Pval.randomization
## 3         4.652871          0.2323232
## 2         4.769680          0.0000000
## 1         4.825502          0.0000000
## 4         4.847405          0.3535354
# Visual diagnostics for clade removal:
sensi_plot(clade, "Cercopithecidae")

sensi_plot(clade, "Cebidae")



2.Phylogenetic uncertainty:

The function tree_phylm performs Phylogenetic linear regression evaluating uncertainty in trees topology.

2.1 Sensitivity analysis for phylogenetic trees:

# This analysis needs a multiphylo file:
class(alien$phy)
## [1] "multiPhylo"
alien$phy
## 101 phylogenetic trees
# run PGLS accounting for phylogenetic uncertain:
tree <- tree_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy, 
                   data = alien$data, times = 100, track = F)
## Used dataset has  84  species that match data and phylogeny
# To check summary results:
knitr::kable(summary(tree))
mean CI_low CI_high
intercept 2.326 2.319 2.334
se.intercept 0.342 0.342 0.343
pval.intercept 0.000 0.000 0.000
slope 0.152 0.151 0.153
se.slope 0.022 0.022 0.022
pval.slope 0.000 0.000 0.000
# Visual diagnostics
sensi_plot(tree)



3.Data uncertainty:

The function intra_phylm performs Phylogenetic linear regression evaluating intraspecific variability.

3.1 Sensitivity analysis for intraspecific variation and measurement error:

# run PGLS accounting for intraspecific variation:
intra <- intra_phylm(gestaLen ~ adultMass, phy = alien$phy[[1]], track = F, 
                     data = alien$data, Vy = "SD_gesta", Vx = "SD_mass",
                     times = 100, x.transf = log, y.transf = log)
## Used dataset has  84  species that match data and phylogeny
# To check summary results:
knitr::kable(summary(intra))
mean CI_low CI_high
intercept 2.390 2.359 2.421
se.intercept 0.368 0.364 0.373
pval.intercept 0.000 0.000 0.000
slope 0.144 0.140 0.148
se.slope 0.023 0.022 0.023
pval.slope 0.000 0.000 0.000
# Visual diagnostics
sensi_plot(intra)



4. Aditional functions

4.1 Phylogenetic signal for missing data

The function miss.phylo.d Calculates D statistic (Fritz & Purvis 2010), a measure of phylogenetic signal, for missing data. Missingness is recoded into a binary variable (1=missing, 0=non missing).

# Load caper:
library(caper)
# Load data
data(alien)
knitr::kable(head(alien.data))
family adultMass gestaLen homeRange SD_mass SD_gesta SD_range
Tachyglossus_aculeatus Tachyglossidae 4020.767 28.375 0.9991117 1218.29240 4.1995 0.7922419
Ornithorhynchus_anatinus Ornithorhynchidae 1458.208 15.000 0.1120000 180.81779 2.1600 0.0430000
Ondatra_zibethicus Cricetidae 1135.014 27.100 0.0044500 388.17479 3.3333 0.0000000
Mesocricetus_auratus Cricetidae 97.125 15.500 NA 12.52913 0.4960 NA
Castor_canadensis Castoridae 18085.634 110.000 NA 2875.61581 13.0900 NA
Chinchilla_lanigera Chinchillidae 452.575 NA NA 40.73175 NA NA
data <- alien.data
phy = alien.phy[[1]]

# Test phylogenetic signal for missing data:
homeNAsig <- miss.phylo.d(data, phy, binvar = homeRange)
## [1] "Percentage of missing data in traits:"
##    family adultMass  gestaLen homeRange   SD_mass  SD_gesta  SD_range 
##      0.00      2.13      9.57     44.68      2.13      9.57     44.68
print(homeNAsig)
## 
## Calculation of D statistic for the phylogenetic structure of a binary variable
## 
##   Data :  data
##   Binary variable :  homeRange
##   Counts of states:  0 = 52
##                      1 = 42
##   Phylogeny :  phy
##   Number of permutations :  1000
## 
## Estimated D :  0.9469183
## Probability of E(D) resulting from no (random) phylogenetic structure :  0.336
## Probability of E(D) resulting from Brownian phylogenetic structure    :  0
plot(homeNAsig)

massNAsig <- miss.phylo.d(data, phy, binvar = adultMass)
## [1] "Percentage of missing data in traits:"
##    family adultMass  gestaLen homeRange   SD_mass  SD_gesta  SD_range 
##      0.00      2.13      9.57     44.68      2.13      9.57     44.68
print(massNAsig)
## 
## Calculation of D statistic for the phylogenetic structure of a binary variable
## 
##   Data :  data
##   Binary variable :  adultMass
##   Counts of states:  0 = 92
##                      1 = 2
##   Phylogeny :  phy
##   Number of permutations :  1000
## 
## Estimated D :  0.8647909
## Probability of E(D) resulting from no (random) phylogenetic structure :  0.362
## Probability of E(D) resulting from Brownian phylogenetic structure    :  0.269
plot(massNAsig)

4.2 Combine data and phylogeny automatically

The funcion match_dataphy combines phylogeny and data to ensure that tips in phylogeny match data and that observations with missing values are removed.

This function uses all variables provided in the ‘formula’ to match data and phylogeny. To avoid cropping the full dataset, ‘match_dataphy’ searches for NA values only on variables provided by formula. Missing values on other variables, not included in ‘formula’, will not be removed from data.

# Load data:
data(alien)
# Match data and phy based on model formula:
comp.data <- match_dataphy(gestaLen ~ homeRange, data = alien$data, alien$phy[[1]])
## Used dataset has  49  species that match data and phylogeny
# With a `multiphylo` tree:
comp.data2 <- match_dataphy(homeRange ~ homeRange, data = alien$data, alien$phy)
## Used dataset has  52  species that match data and phylogeny
# Check combined data:
knitr::kable(comp.data$data)
family adultMass gestaLen homeRange SD_mass SD_gesta SD_range
Tachyglossus_aculeatus Tachyglossidae 4020.767 28.375 0.9991117 1218.29240 4.199500 0.7922419
Ornithorhynchus_anatinus Ornithorhynchidae 1458.208 15.000 0.1120000 180.81779 2.160000 0.0430000
Ondatra_zibethicus Cricetidae 1135.014 27.100 0.0044500 388.17479 3.333300 0.0000000
Myocastor_coypus Myocastoridae 6135.768 131.737 0.0376000 546.08335 3.425162 0.0156771
Marmota_monax Sciuridae 3747.182 31.600 0.0333582 528.35266 2.275200 0.0429752
Tamiasciurus_hudsonicus Sciuridae 209.452 35.724 0.0117357 23.66808 3.072264 0.0072091
Sciurus_carolinensis Sciuridae 538.715 44.100 0.0270056 49.02307 0.220500 0.0470268
Sciurus_vulgaris Sciuridae 649.843 40.545 0.0818883 223.54599 3.649050 0.1449527
Glis_glis Gliridae 146.000 27.750 0.0030500 21.17000 2.747250 0.0025500
Lepus_arcticus Leporidae 4166.869 52.000 18.9500050 145.84042 0.988000 16.0499950
Lepus_timidus Leporidae 3048.556 50.133 1.0090000 408.50650 0.100266 0.1200000
Lepus_europaeus Leporidae 3934.287 41.200 1.1241550 605.88020 1.606800 1.2592213
Daubentonia_madagascariensis Daubentoniidae 2634.074 165.750 0.2893750 213.35999 5.304000 0.3792628
Ovis_canadensis Bovidae 81829.394 177.088 22.9816067 35513.95700 5.843904 4.4763964
Capra_hircus Bovidae 45689.284 158.000 176.8260900 23027.39914 8.374000 132.8704278
Ammotragus_lervia Bovidae 97562.268 152.227 11.0133367 39902.96761 18.571694 9.3912932
Oreamnos_americanus Bovidae 73490.939 174.089 22.9285800 26456.73804 14.971654 1.0514030
Bos_taurus Bovidae 590397.500 280.500 17.9750000 295198.75000 2.524500 5.7347079
Alces_alces Cervidae 461860.313 235.840 182.3697044 115465.07825 9.669440 226.7850513
Odocoileus_hemionus Cervidae 87031.783 204.836 14.5387520 25326.24885 3.277376 16.2703626
Odocoileus_virginianus Cervidae 78331.245 205.069 2.6916700 28199.24820 14.764968 0.2033300
Axis_axis Cervidae 68752.082 230.222 1.8092367 13819.16848 11.971544 1.3112559
Axis_porcinus Cervidae 37015.550 221.938 0.7000000 5108.14590 19.308606 0.1000000
Cervus_nippon Cervidae 66598.688 222.553 2.9257450 38427.44298 8.457014 2.0742550
Dama_dama Cervidae 59049.963 229.623 2.2787525 15884.44005 9.184920 2.9084486
Moschus_moschiferus Moschidae 13357.143 171.000 0.8516667 1215.50001 27.189000 0.9905414
Antilocapra_americana Antilocapridae 46892.188 247.944 10.9361500 6986.93601 11.901312 5.2666490
Sus_scrofa Suidae 82160.027 117.735 5.7871275 31138.65023 6.004485 6.1376236
Mustela_putorius Mustelidae 952.942 41.063 0.8729158 188.68252 0.739134 0.8328026
Mustela_erminea Mustelidae 206.598 54.519 0.2205200 98.13405 19.408764 0.3400689
Mustela_nivalis Mustelidae 90.777 37.891 0.3641075 37.40012 4.433247 0.5252465
Mephitis_mephitis Mephitidae 2573.499 64.717 2.0949033 707.71222 2.653397 1.4576314
Procyon_lotor Procyonidae 7232.221 63.911 7.1197650 2256.45295 2.045152 3.8152350
Canis_latrans Canidae 12197.470 61.821 39.7382275 2110.16231 1.978272 37.1539830
Nyctereutes_procyonoides Canidae 4636.990 61.450 1.7072350 978.40489 2.396550 0.8172350
Vulpes_vulpes Canidae 5077.005 52.343 7.1448013 964.63095 3.978068 16.6391120
Vulpes_lagopus Canidae 4260.696 52.413 19.6376689 1218.55906 1.048260 12.0364612
Herpestes_javanicus Herpestidae 720.714 47.233 0.2462257 178.01636 2.645048 0.2516805
Desmana_moschata Talpidae 428.875 46.250 0.0045000 49.32062 1.248750 0.0000000
Erinaceus_europaeus Erinaceidae 865.896 35.556 0.1438589 282.28210 2.417808 0.1420654
Sorex_cinereus Soricidae 4.198 18.000 0.0046864 0.44079 0.000000 0.0000000
Trichosurus_vulpecula Phalangeridae 2665.563 17.660 0.0450433 455.81127 1.130240 0.0108214
Bettongia_gaimardi Potoroidae 1680.472 20.880 0.4533350 18.48519 0.730800 0.0033350
Macropus_rufogriseus Macropodidae 16242.858 30.182 0.2436667 2192.78583 3.410566 0.1470518
Macropus_robustus Macropodidae 24879.172 34.333 1.6569700 7861.81835 0.480662 0.9497399
Macropus_giganteus Macropodidae 33630.005 36.385 1.8885950 15268.02227 0.254695 2.4557433
Phascolarctos_cinereus Phascolarctidae 7752.538 32.775 0.0150000 2209.47333 3.080850 0.0075000
Perameles_gunnii Peramelidae 810.000 12.375 0.0302500 59.94000 0.210375 0.0124574
Didelphis_marsupialis Didelphidae 1182.023 13.600 0.4161250 177.30345 0.734400 0.4894138
# Check phy:
plot(comp.data$phy)

# See species dropped from phy or data:
comp.data$dropped
##  [1] "Mesocricetus_auratus"     "Castor_canadensis"       
##  [3] "Hystrix_brachyura"        "Chinchilla_lanigera"     
##  [5] "Marmota_bobak"            "Tamias_townsendii"       
##  [7] "Atlantoxerus_getulus"     "Sciurus_niger"           
##  [9] "Sciurus_aureogaster"      "Oryctolagus_cuniculus"   
## [11] "Macaca_arctoides"         "Macaca_mulatta"          
## [13] "Macaca_fascicularis"      "Ovis_ammon"              
## [15] "Ovis_aries"               "Hemitragus_jemlahicus"   
## [17] "Capra_ibex"               "Rupicapra_rupicapra"     
## [19] "Ovibos_moschatus"         "Gazella_subgutturosa"    
## [21] "Saiga_tatarica"           "Bubalus_bubalis"         
## [23] "Tragelaphus_strepsiceros" "Capreolus_capreolus"     
## [25] "Rangifer_tarandus"        "Rusa_timorensis"         
## [27] "Cervus_elaphus"           "Rusa_unicolor"           
## [29] "Camelus_bactrianus"       "Equus_hemionus"          
## [31] "Mustela_sibirica"         "Mustela_lutreola"        
## [33] "Neovison_vison"           "Nasua_nasua"             
## [35] "Lycalopex_griseus"        "Felis_catus"             
## [37] "Pseudocheirus_peregrinus" "Bettongia_lesueur"       
## [39] "Macropus_eugenii"         "Macropus_parma"          
## [41] "Petrogale_lateralis"      "Petrogale_penicillata"   
## [43] "Thylogale_billardierii"   "Potorous_tridactylus"    
## [45] "Lasiorhinus_latifrons"