Assessing model agreement in wheat grain nitrogen content prediction - a use case

Leo Bastos & Adrian Correndo

2022-05-11

1 Introduction

The metrica package was developed to visualize and compute the level of agreement between observed ground-truth values and model-derived (e.g., mechanistic or empirical) predictions.

This package is intended to fit into the following workflow:

  1. a data set containing the observed values is used to train a model
  2. the trained model is used to generate predictions
  3. a data frame containing at least the observed and model-predicted values is created
  4. metrica package is used to compute goodness of fit and error metrics based on observed and predicted values
  5. metrica package is used to visualize model fit and selected fit metrics

This vignette introduces the functionality of the metrica package applied to observed and model-predicted values of wheat grain nitrogen (N) content (in grams of N \(m^{-2}\)).

2 Wheat grain N content

Let’s begin by loading the packages needed.

library(ggplot2)
library(dplyr)
library(metrica)

Now we load the wheat data set included in the metrica package.

# Load
data(wheat)

# Printing first observations
head(wheat)
#>        pred    obs
#> 1  2.577314  2.544
#> 2  3.989590  4.831
#> 3  5.645253  6.121
#> 4 13.125101 10.960
#> 5  4.955917  5.767
#> 6  6.687800  8.222

This data set contains two columns:

3 Visual assessment of agreement

3.1 Scatterplot of pred vs. obs

The simplest way to visually assess agreement between observed and predicted values is with a scatterplot.

We can use the function scatter_plot() from the metrica package to create a scatterplot.

The function requires specifying at least:

Besides a scatterplot, this function also adds to the plot the 1:1 line (solid line) and the linear regression line (dashed line).

scatter_plot(data = wheat, 
             obs = obs, 
             pred = pred)

The default behavior of scatter_plot() places the obs column on the x axis and the pred column on the y axis (orientation = "PO"). This can be inverted by changing the argument orientation to “OP”:

scatter_plot(data = wheat, 
             obs = obs, 
             pred = pred,
             orientation = "OP")

The output of the scatter_plot() function is a ggplot2 object that can be further customized:

scatter_plot(data = wheat, 
             obs = obs, 
             pred = pred,
             orientation = "OP")+
  labs(x ="Predicted wheat N content (g N/m2)",
       y = "Observed wheat N content (g N/m2)")+
  theme_dark()

3.2 Bland-Altman plot

The Bland-Altman plot is another way of visually assessing observed vs. predicted agreement. It plots the difference between observed and predicted values on the y axis, and the observed values on the x axis:

bland_altman_plot(data = wheat,
                  obs = obs, 
                  pred = pred)

4 Numerical assessment of agreement

The metrica package contains functions for 41 metrics to assess agreement between observed and predicted values for continuous data (i.e., regression error).

A list with all the the metrics including their name, definition, details, formula, and function name, please check [here].

All of the metric functions take the same three arguments as the plotting functions:

The user can choose to calculate a single metric, or to calculate all metrics at once.

To calculate a single metric, the metric function can be called.
For example, to calculate \(R^{2}\), we can use the R2() function:

R2(data = wheat,
   obs = obs, 
   pred = pred)
#> [1] 0.8455538

Similarly, to calculate root mean squared error, we can use the RMSE() function:

RMSE(data = wheat,
   obs = obs, 
   pred = pred)
#> [1] 1.666441

The user can also calculate all 41 metrics at once using the function metrics_summary():

metrics_summary(data = wheat,
   obs = obs, 
   pred = pred)
#>    Metric       Score
#> 1      B0  0.11315564
#> 2      B1  0.95057797
#> 3       r  0.91953997
#> 4      R2  0.84555376
#> 5      Xa  0.99564191
#> 6     CCC  0.91553253
#> 7     MAE  1.32781184
#> 8    RMAE  0.15214665
#> 9    MAPE 17.51424366
#> 10  SMAPE 17.43518492
#> 11    RAE  0.37156585
#> 12    RSE  0.16128874
#> 13    MBE  0.31815953
#> 14    PBE  3.64561486
#> 15    PAB  3.64510277
#> 16    PPB  1.51438787
#> 17    MSE  2.77702701
#> 18   RMSE  1.66644142
#> 19  RRMSE  0.19094834
#> 20    RSR  0.09678632
#> 21 iqRMSE  0.25237725
#> 22    MLA  0.14328045
#> 23    MLP  2.63374656
#> 24     SB  0.10122549
#> 25   SDSD  0.04205496
#> 26    LCS  2.63374656
#> 27    PLA  5.15949064
#> 28    PLP 94.84050936
#> 29     Ue 94.84050936
#> 30     Uc  1.51438787
#> 31     Ub  3.64510277
#> 32    NSE  0.99999141
#> 33     E1  0.62843415
#> 34   Erel  0.77057561
#> 35    KGE  0.91064709
#> 36      d  0.95632264
#> 37     d1  0.80649196
#> 38    d1r  0.80649196
#> 39    RAC  0.95770115
#> 40     AC  0.84174217
#> 41 lambda  0.91553253

If the user wants just specific metrics, within the same function metrics_summary(), user can pass a list of desired metrics using the argument “metrics_list” as follows:


my.metrics <- c("R2","MAE", "RMSE", "RSR", "NSE", "KGE")

metrics_summary(data = wheat,   
                obs = obs,    
                pred = pred,
                metrics_list = my.metrics) 
#>   Metric      Score
#> 1     R2 0.84555376
#> 2    MAE 1.32781184
#> 3   RMSE 1.66644142
#> 4    RSR 0.09678632
#> 5    NSE 0.99999141
#> 6    KGE 0.91064709

5 Visual and numerical assessment combined

The user can also create a scatter plot that includes not only the predicted vs. observed points, 1:1 line, and regression line, but also selected metrics and their values plus the SMA regression equation.

This is accomplished with the function scatter_plot():

scatter_plot(data = wheat,
             obs = obs, 
             pred = pred)

To print the metrics on the scatter_plot(), just use print.metrics. Warning: do not forget to specify your ‘metrics.list’:


my.metrica.plot <- scatter_plot(data = wheat,
             obs = obs, 
             pred = pred,
             print_metrics = TRUE, metrics_list = my.metrics)

my.metrica.plot

Also, as a ggplot element, outputs are flexible of further edition:


my.metrica.plot +
  # Modify labels
  labs(x = "Observed (days to emergence)", y = "Predicted (days to emergence)")+
  # Modify theme
  theme_light()


my.metrica.plot +
  # Modify labels
  labs(x = "Observed (Mg/ha)", y = "Predicted (Mg/ha)")+
  # Modify theme
  theme_dark()

6 Exporting

To export the metrics summary table, the user can simply write it to file with the function write.csv():

metrics_summary(data = wheat,
   obs = obs, 
   pred = pred) %>%
  write.csv("metrics_summary.csv")

Similarly, to export a plot, the user can simply write it to file with the function ggsave():


ggsave(plot = my.metrica.plot,
       "scatter_metrics.png",
       width = 5,
       height = 5)