Available prediction performance metrics and indices

Adrian Correndo

2022-05-11

Performance metrics available in metrica

The metrica package contains +40 functions. Two arguments are always required: observed(Oi; a.k.a. actual, measured, truth, target) and predicted (Pi; a.k.a. simulated, fitted) values. Also, there is an optional data arg. that allows to call an existing data frame containing both observed and predicted vectors.

Some functions, also require to define axis orientation, such as the slope of linear regression describing the bivariate scatter. Current included functions cover the world of “regression error” metrics (i.e. prediction performance for continuous variables). Classification error metrics coming soon.

Always keep in mind that predicted values should come from out-of-bag samples (unseen by training set) to avoid overestimation of prediction performance.

# Metric Definition Details Formula
A RSS Residual sum of squares (a.k.a. as sum of squares) The sum of squared differences between predicted and observed values. It represents the base of many error metrics using squared scale such as the MSE equation
B TSS Total sum of squares The sum of the squared differences between the observations and its mean. It is used as a reference error, for example, to estimate explained variance equation
C var_u Sample variance, uncorrected The mean of sum of squared differences between values of an x and its mean (divided by n, not n-1) equation
D uSD Sample standard deviation, uncorrected The square root of the mean of sum of squared differences between values of an x and its mean (divided by n, not n-1) equation
1 B0 Intercept of SMA regression SMA is a symmetric linear regression (invariant results/interpretation to axis orientation) recommended to describe the bivariate scatter instead of OLS regression (classic linear model, which results vary with the axis orientation). B0 could be used to test agreement along with B1 (H0: B0 = 0, B1 = 1) . Warton et al. (2006) equation
2 B1 Slope of SMA regression SMA is a symmetric linear regression (invariant results/interpretation to axis orientation) recommended to describe the bivariate scatter instead of OLS regression (classic linear model, which results vary with the axis orientation). B1 could be used to test isometry of the PO scatter (H0: B1 = 1). B1 also represents the ratio of standard deviations (So and Sp). Warton et al. (2006) equation
3 r Pearson’s correlation coefficient Strength of linear association between P and O. However, it measures “precision” but no accuracy. Kirch (2008) equation
4 R2 Coefficient of determination Strength of linear association between P and O. However, it measures “precision” but no accuracy equation
5 Xa Accuracy coefficient Measures accuracy. Used to adjust the precision measured by r to estimate agreement equation
6 CCC Concordance correlation coefficient Tests agreement. It presents both precision (r) and accuracy (Xa) components. Easy to interpret. Lin (1989) equation
7 MAE Mean Absolute Error Measures both lack of accuracy and precision in absolute scale. It keeps the same units than the response variable. Less sensitive to outliers than the MSE or RMSE. Willmott & Matsuura (2005) equation
8 RMAE Relative Mean Absolute Error Normalizes the MAE with respect to the mean of observations equation
9 MAPE Mean Absolute Percentage Error Percentage units (independent scale). Easy to explain and to compare performance across models with different response variables. Asymmetric and unbounded. equation
10 SMAPE Symmetric Mean Absolute Percentage Error SMAPE tackles the asymmetry issues of MAPE and includes lower (0%) and upper (200%) bounds. Makridakis (1993) equation
11 RAE Relative Absolute Error RAE normalizes MAE with respect to the total absolute error. Lower bound at 0 (perfect fit) and no upper bound (infinity) equation
12 RSE Relative Squared Error Proportion of the total sum of squares that corresponds to differences between predictions and observations (residual sum of squares) equation
13 MBE Mean Bias Error Main bias error metric. Same units as the response variable. Related to differences between means of predictions and observations. Negative values indicate overestimation. Positive values indicate underestimation. Unbounded. Also known as average error. Janssen & Heuberger (1995) equation
14 PBE Percentage Bias Error Useful to identify systematic over or under predictions. Percentage units. As the MBE, PBE negative values indicate overestimation, while positive values indicate underestimation. Unbounded. Gupta et al. (1999) equation
15 PAB Percentage Additive Bias Percentage of the MSE related to systematic additive issues on the predictions. Related to difference of the means of predictions and observations equation
16 PPB Percentage Proportional Bias Percentage of the MSE related to systematic proportionality issues on the predictions. Related to slope of regression line describing the bivariate scatter equation
17 MSE Mean Squared Error Comprises both accuracy and precision. High sensitivity to outliers equation
18 RMSE Root Mean Squared Error Comprises both precision and accuracy, has the same units than the variable of interest. Very sensitive to outliers equation
19 RRMSE Relative Root Mean Squared Error RMSE normalized by the mean of observations equation
20 RSR Root Mean Standard Deviation Ratio RMSE normalized by the standard deviation of observations. Moriasi et al. (2007) equation
21 iqRMSE Inter-quartile Normalized Root Mean Squared Error RMSE normalized by the interquartile range length (between percentiles 25th and 75th) equation
22 MLA Mean Lack of Accuracy Bias component of MSE decomposition. Correndo et al. (2021) equation
23 MLP Mean Lack of Precision Variance component of MSE decomposition. Correndo et al. (2021) equation
24 PLA Percentage Lack of Accuracy Percentage of the MSE related to lack of accuracy (systematic differences) on the predictions. Correndo et al. (2021) equation
25 PLP Percentage Lack of Precision Percentage of the MSE related to lack of precision (unsystematic differences) on the predictions. Correndo et al. (2021) equation
26 SB Squared Bias Additive bias component, MSE decomposition. Kobayashi and Salam (2000) equation
27 SDSD Product of Standard Deviations Proportional bias component, MSE decomposition. Kobayashi and Salam (2000) equation
28 LCS Lack of Correlation Random error component, MSE decomposition. Kobayashi and Salam (2000) equation
29 Ue Random error proportion The Ue estimates the proportion of the total sum of squares related to the random error (unsystematic error or variance) following the sum of squares decomposition suggested by Smith and Rose (1995) also known as Theil’s partial inequalities equation
30 Uc Lack of Consistency error proportion The Uc estimates the proportion of the total sum of squares related to the lack of consistency (proportional bias) following the sum of squares decomposition suggested by Smith and Rose (1995) also known as Theil’s partial inequalities equation
31 Ub Mean Bias error proportion The Ub estimates the proportion of the total sum of squares related to the mean bias following the sum of squares decomposition suggested by Smith and Rose (1995) also known as Theil’s partial inequalities equation
32 NSE Nash and Sutcliffe’s Model Efficiency Model efficiency using squared residuals normalized by the variance of observations. Nash and Sutcliffe (1970) equation
33 E1 Absolute Model Efficiency Model efficiency. Modification of NSE using absolute residuals instead of squared residuals. Legates and McCabe (1999) equation
34 Erel Relative Model Efficiency Compared to the NSE, the Erel is suggested as more sensitive to systematic over- or under-predictions. Krause et al. (2005) equation
35 KGE Kling-Gupta Model Efficiency Model efficiency with accuracy, precision, and consistency components. Kling et al. (2012) equation
36 d Index of Agreement Measures accuracy and precision using squared residuals. Dimensionless (normalized). Bounded [0;1]. Asymmetric Willmott (1981) equation
37 d1 Modified Index of Agreement Measures accuracy and precision using absolute residuals(1). Dimensionless (normalized). Bounded [0;1]. Asymmetric Willmott et al. (1985) equation
38 d1r Refined Index of Agreement Refines d1 by a modification on the denominator (potential error) to normalize absolute error. Willmott et al. (2012) equation
39 RAC Robinson’s Agreement Coefficient RAC measures both accuracy and precision (general agreement). Dimensionless (normalized). Bounded [0;1]. Symmetric. Robinson (1957; 1959) equation
where
equation
40 AC Ji and Gallo’s Agreement Coefficient AC measures both accuracy and precision (general agreement). Dimensionless (normalized). Positively bounded [-infinity;1]. Symmetric. Ji and Gallo (2006) equation
41 lambda Duveiller’s Lambda Coefficient lambda measures both accuracy and precision. Dimensionless (normalized). Bounded [-1;1]. Symmetric. Equivalent to CCC when r is greater or equal to 0. Duveiller et al. (2016) equation
where
equation,
otherwise
equation


References:

  1. Correndo et al. (2021). Revisiting linear regression to test agreement in continuous predicted-observed datasets. Agric. Syst. 192, 103194.

  2. Duveiller et al. (2016). Revisiting the concept of a symmetric index of agreement for continuous datasets. Sci. Rep. 6, 1-14.

  3. Gupta et al. (1999). Status of automatic calibration for hydrologic models: Comparison with multilevel expert calibration. J. Hydrologic Eng. 4(2): 135-143.

  4. Janssen & Heuberger (1995). Calibration of process-oriented models. Ecol. Modell. 83, 55-66.

  5. Ji & Gallo (2006). An agreement coefficient for image comparison. Photogramm. Eng. Remote Sensing 7, 823–833.

  6. Kling et al. (2012). Runoff conditions in the upper Danube basin under an ensemble of climate change scenarios. J. Hydrol., 424-425, 264-277.

  7. Kirch (2008). Pearson’s Correlation Coefficient. In: Kirch W. (eds) Encyclopedia of Public Health. Springer, Dordrecht.

  8. Krause et al. (2005). Comparison of different efficiency criteria for hydrological model assessment. Adv. Geosci. 5, 89–97.

  9. Kobayashi & Salam (2000). Comparing simulated and measured values using mean squared deviation and its components. Agron. J. 92, 345–352.

  10. Legates & McCabe (1999). Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour. Res.

  11. Lin (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics 45 (1), 255–268.

  12. Makridakis (1993). Accuracy measures: theoretical and practical concerns. Int. J. Forecast. 9, 527-529.

  13. Moriasi et al. (2007). Model Evaluation Guidelines for Systematic Quantification of Accuracy in Watershed Simulations. Trans. ASABE 50, 885–900.

  14. Nash & Sutcliffe (1970). River flow forecasting through conceptual models part I - A discussion of principles. J. Hydrol. 10(3), 292-290.

  15. Robinson (1957). The statistical measurement of agreement. Am. Sociol. Rev. 22(1), 17-25.

  16. Robinson (1959). The geometric interpretation of agreement. Am. Sociol. Rev. 24(3), 338-345.

  17. Smith & Rose (1995). Model goodness-of-fit analysis using regression and related techniques. Ecol. Model. 77, 49–64.

  18. Warton et al. (2006). Bivariate line-fitting methods for allometry. Biol. Rev. Camb. Philos. Soc. 81, 259–291.

  19. Willmott (1981). On the validation of models. Phys. Geogr. 2, 184–194.

  20. Willmott et al. (1985). Statistics for the evaluation and comparison of models. J. Geophys. Res. 90, 8995.

  21. Willmott & Matsuura (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30, 79–82.

  22. Willmott et al. (2012). A refined index of model performance. Int. J. Climatol. 32, 2088–2094.

  23. Yang et al. (2014). An evaluation of the statistical methods for testing the performance of crop models with observed data. Agric. Syst. 127, 81-89.