The metrica
package was developed to visualize
and compute the level of agreement between observed ground-truth values
and model-derived (e.g., mechanistic or empirical) predictions.
This package is intended to fit into the following workflow:
metrica
package is used to compute goodness of
fit and error metrics based on observed and predicted valuesmetrica
package is used to visualize model fit
and selected fit metricsThis vignette introduces the functionality of the
metrica
package applied to observed and
model-predicted values of wheat grain nitrogen (N) content (in grams of
N \(m^{-2}\)).
Let’s begin by loading the packages needed.
library(ggplot2)
library(dplyr)
library(metrica)
Now we load the wheat
data set included in the
metrica
package.
# Load
data(wheat)
# Printing first observations
head(wheat)
#> pred obs
#> 1 2.577314 2.544
#> 2 3.989590 4.831
#> 3 5.645253 6.121
#> 4 13.125101 10.960
#> 5 4.955917 5.767
#> 6 6.687800 8.222
This data set contains two columns:
The simplest way to visually assess agreement between observed and predicted values is with a scatterplot.
We can use the function scatter_plot()
from the
metrica package to create a scatterplot.
The function requires specifying at least:
data
argument)obs
argument)pred
argument)Besides a scatterplot, this function also adds to the plot the 1:1 line (solid line) and the linear regression line (dashed line).
scatter_plot(data = wheat,
obs = obs,
pred = pred)
The default behavior of scatter_plot()
places the
obs
column on the x axis and the pred
column
on the y axis (orientation = "PO"
). This can be inverted by
changing the argument orientation
to “OP”:
scatter_plot(data = wheat,
obs = obs,
pred = pred,
orientation = "OP")
The output of the scatter_plot()
function is a
ggplot2
object that can be further customized:
scatter_plot(data = wheat,
obs = obs,
pred = pred,
orientation = "OP")+
labs(x ="Predicted wheat N content (g N/m2)",
y = "Observed wheat N content (g N/m2)")+
theme_dark()
The Bland-Altman plot is another way of visually assessing observed vs. predicted agreement. It plots the difference between observed and predicted values on the y axis, and the observed values on the x axis:
bland_altman_plot(data = wheat,
obs = obs,
pred = pred)
The metrica package contains functions for 41 metrics to assess agreement between observed and predicted values for continuous data (i.e., regression error).
A list with all the the metrics including their name, definition, details, formula, and function name, please check [here].
All of the metric functions take the same three arguments as the plotting functions:
data
argument)obs
argument)pred
argument)The user can choose to calculate a single metric, or to calculate all metrics at once.
To calculate a single metric, the metric function can be
called.
For example, to calculate \(R^{2}\), we
can use the R2()
function:
R2(data = wheat,
obs = obs,
pred = pred)
#> [1] 0.8455538
Similarly, to calculate root mean squared error, we can use the
RMSE()
function:
RMSE(data = wheat,
obs = obs,
pred = pred)
#> [1] 1.666441
The user can also calculate all 41 metrics at once using the function
metrics_summary()
:
metrics_summary(data = wheat,
obs = obs,
pred = pred)
#> Metric Score
#> 1 B0 0.11315564
#> 2 B1 0.95057797
#> 3 r 0.91953997
#> 4 R2 0.84555376
#> 5 Xa 0.99564191
#> 6 CCC 0.91553253
#> 7 MAE 1.32781184
#> 8 RMAE 0.15214665
#> 9 MAPE 17.51424366
#> 10 SMAPE 17.43518492
#> 11 RAE 0.37156585
#> 12 RSE 0.16128874
#> 13 MBE 0.31815953
#> 14 PBE 3.64561486
#> 15 PAB 3.64510277
#> 16 PPB 1.51438787
#> 17 MSE 2.77702701
#> 18 RMSE 1.66644142
#> 19 RRMSE 0.19094834
#> 20 RSR 0.09678632
#> 21 iqRMSE 0.25237725
#> 22 MLA 0.14328045
#> 23 MLP 2.63374656
#> 24 SB 0.10122549
#> 25 SDSD 0.04205496
#> 26 LCS 2.63374656
#> 27 PLA 5.15949064
#> 28 PLP 94.84050936
#> 29 Ue 94.84050936
#> 30 Uc 1.51438787
#> 31 Ub 3.64510277
#> 32 NSE 0.99999141
#> 33 E1 0.62843415
#> 34 Erel 0.77057561
#> 35 KGE 0.91064709
#> 36 d 0.95632264
#> 37 d1 0.80649196
#> 38 d1r 0.80649196
#> 39 RAC 0.95770115
#> 40 AC 0.84174217
#> 41 lambda 0.91553253
If the user wants just specific metrics, within the same function
metrics_summary()
, user can pass a list of desired metrics
using the argument “metrics_list” as follows:
<- c("R2","MAE", "RMSE", "RSR", "NSE", "KGE")
my.metrics
metrics_summary(data = wheat,
obs = obs,
pred = pred,
metrics_list = my.metrics)
#> Metric Score
#> 1 R2 0.84555376
#> 2 MAE 1.32781184
#> 3 RMSE 1.66644142
#> 4 RSR 0.09678632
#> 5 NSE 0.99999141
#> 6 KGE 0.91064709
The user can also create a scatter plot that includes not only the predicted vs. observed points, 1:1 line, and regression line, but also selected metrics and their values plus the SMA regression equation.
This is accomplished with the function
scatter_plot()
:
scatter_plot(data = wheat,
obs = obs,
pred = pred)
To print the metrics on the scatter_plot()
, just use
print.metrics. Warning: do not forget to specify your
‘metrics.list’:
<- scatter_plot(data = wheat,
my.metrica.plot obs = obs,
pred = pred,
print_metrics = TRUE, metrics_list = my.metrics)
my.metrica.plot
Also, as a ggplot element, outputs are flexible of further edition:
+
my.metrica.plot # Modify labels
labs(x = "Observed (days to emergence)", y = "Predicted (days to emergence)")+
# Modify theme
theme_light()
+
my.metrica.plot # Modify labels
labs(x = "Observed (Mg/ha)", y = "Predicted (Mg/ha)")+
# Modify theme
theme_dark()
To export the metrics summary table, the user can simply write it to
file with the function write.csv()
:
metrics_summary(data = wheat,
obs = obs,
pred = pred) %>%
write.csv("metrics_summary.csv")
Similarly, to export a plot, the user can simply write it to file
with the function ggsave()
:
ggsave(plot = my.metrica.plot,
"scatter_metrics.png",
width = 5,
height = 5)