pRecipe| Precipitation R Recipes

Mijael Rodrigo Vargas Godoy, Yannis Markonis

2023-05-26


pRecipe was conceived back in 2020 as part of MRVG’s doctoral dissertation at the Faculty of Environmental Sciences, Czech University of Life Sciences Prague, Czechia. Designed with reproducible science in mind, pRecipe facilitates the download, exploration, visualization, and analysis of multiple precipitation data products across various spatiotemporal scales.


~The Global Water Cycle Budget | Vargas Godoy et al. (2021)

“Like civilization and technology, our understanding of the global water cycle has been continuously evolving, and we have adapted our quantification methods to better exploit new technological resources. The accurate quantification of global water fluxes and storage is crucial in studying the global water cycle.”


Before We Start

Like many other R packages, pRecipe has some system requirements:

Data

pRecipe database hosts 27 different precipitation data sets; seven gauge-based, eight satellite-based, seven reanalysis, and five hydrological model precipitation products. Their native specifications, as well as links to their providers, and their respective references are detailed in the following subsections. We have already homogenized, compacted to a single file, and stored them in a Zenodo repository under the following naming convention:

<data set>_<variable>_<units>_<coverage>_<start date>_<end date>_<resolution>_<time step>.nc

The pRecipe data collection was homogenized to these specifications:

E.g., GPCP v2.3 (Adler et al. 2018) would be:

gpcp_tp_mm_global_197901_202205_025_monthly.nc

Gauge-Based Products

Spatial Coverage
Data Set Spatial Resolution Global Land Ocean Temporal Resolution Record Length Get Data Reference
CPC-Global 0.5° x Daily 1979/01-2022/08 Download P. Xie, Chen, and Shi (2010)
CRU TS v4.06 0.5° x Monthly 1901/01-2021/12 Download Harris et al. (2020)
EM-EARTH 0.1° x Daily 1950/01-2019/12 Download Tang, Clark, and Papalexiou (2022)
GHCN v2 x Monthly 1900/01-2015/05 Download Peterson and Vose (1997)
GPCC v2020 0.25° x Monthly 1891/01-2022/08 Download Schneider et al. (2011)
PREC/L 0.5° x Monthly 1948/01-2022/08 Download Chen et al. (2002)
UDel v5.01 0.5° x Monthly 1901/01-2017/12 Download Willmott and Matsuura (2001)

Satellite-Based Products

Spatial Coverage
Data Set Spatial Resolution Global Land Ocean Temporal Resolution Record Length Get Data Reference
CHIRPS v2.0 0.05° 50°SN Monthly 1981/01-2022/07 Download Funk et al. (2015)
CMAP 2.5° x x x Monthly 1979/01-2022/07 Download Pingping Xie and Arkin (1997)
CMORPH 0.25° 60°SN 60°SN 60°SN Daily 1998/01-2021/12 Download Joyce et al. (2004)
GPCP v2.3 0.5° x x x Monthly 1979/01-2022/05 Download Adler et al. (2018)
GPM IMERGM v06 0.1° x x x Monthly 2000/06-2020/12 Download G. J. Huffman et al. (2019)
MSWEP v2.8 0.1° x x x Monthly 1979/02-2022/06 Download Beck et al. (2019)
PERSIANN-CDR 0.25° 60°SN 60°SN 60°SN Monthly 1983/01-2022/06 Download Ashouri et al. (2015)
TRMM 3B43 v7 0.25° 50°SN 50°SN 50°SN Monthly 1998/01-2019/12 Download George J. Huffman et al. (2010)

Reanalysis Products

Spatial Coverage
Data Set Spatial Resolution Global Land Ocean Temporal Resolution Record Length Get Data Reference
20CR v3 x x x Monthly 1836/01-2015/12 Download Slivinski et al. (2019)
ERA-20C 1.125° x x x Monthly 1900/01-2010/12 Download Poli et al. (2016)
ERA5 0.25° x x x Monthly 1959/01-2021/12 Download Hersbach et al. (2020)
JRA-55 1.25° x x x Monthly 1958/01-2021/12 Download Kobayashi et al. (2015)
MERRA-2 0.5° x 0.625° x x x Monthly 1980/01-2023/01 Download Gelaro et al. (2017)
NCEP/NCAR R1 1.875° x x x Monthly 1948/01-2022/08 Download Kalnay et al. (1996)
NCEP/DOE R2 1.875° x x x Monthly 1979/01-2022/08 Download Kanamitsu et al. (2002)

Hydrological Model Forcing

Spatial Coverage
Data Set Spatial Resolution Global Land Ocean Temporal Resolution Record Length Get Data Reference
FLDAS 0.1° x Monthly 1982/01-2021/12 Download McNally et al. (2017)
GLDAS CLSM v2.0 0.25° x Daily 1948/01-2014/12 Download Rodell et al. (2004)
GLDAS NOAH v2.0 0.25° x Monthly 1948/01-2014/12 Download Rodell et al. (2004)
GLDAS VIC v2.0 x Monthly 1948/01-2014/12 Download Rodell et al. (2004)
TerraClimate 4\(km\) x Monthly 1958/01-2021/12 Download Abatzoglou et al. (2018)

Recipe

In this introductory recipe we will first download the GPM-IMERGM data set. We will then subset the downloaded data over Central Europe for the 2001-2010 period, and crop it to the national scale for Czechia. In the next step, we will generate time series for our data sets and conclude with the visualization of our data.

NOTE: While the functions in pRecipe are intended to work directly with its data inventory. It can handle most other precipitation data sets in “.nc” format, as well as any other “.nc” file generated by its functions.

Installation

install.packages('pRecipe')
library(pRecipe)

Download

Downloading the entire data collection or only a few data sets is quite straightforward. You just call the download_data function, which has four arguments data_name, destination, domain, and time_res.

Let’s download the GPM-IMERGM data set and inspect its content with show_info:

download_data(data_name = 'gpm-imerg')
gpm_global <- raster::brick('gpm-imerg_tp_mm_global_200006_202012_025_monthly.nc')
show_info(gpm_global)
[1] "class      : RasterBrick "                                         
[2] "dimensions : 720, 1440, 1036800, 247  (nrow, ncol, ncell, nlayers)"
[3] "resolution : 0.25, 0.25  (x, y)"
[4] "extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)"
[5] "crs        : +proj=longlat +datum=WGS84 "
[6] "source     : gpm-imerg_tp_mm_global_200006_202012_025_monthly.nc "
[7] "names      : X2000.06.01, X2000.07.01, X2000.08.01, X2000.09.01, X2000.10.01, X2000.11.01, X2000.12.01, X2001.01.01, X2001.02.01, X2001.03.01, X2001.04.01, X2001.05.01, X2001.06.01, X2001.07.01, X2001.08.01, ... "
[8] "Date/time  : 2000-06-01, 2020-12-01 (min, max)"
[9] "varname    : tp " 

Processing

Once we have downloaded our database, we can start processing the data with:

Subset

To subset our data to a desired region and period of interest, we use the subset_spacetime function, which has four arguments data, years, bbox, and autosave.

  • data is the path to the data set of interest or a RasterBrick object.
  • years is the period of interest in the form (start_year, end_year)
  • bbox is the bounding box of the region of interest with the coordinates in degrees in the form (xmin, xmax, ymin, ymax).
  • autosave is set to FALSE by default. If TRUE data will be automatically stored in the same location of the input file.

Let’s subset the GPM-IMERGM data set over Central Europe (2,28,42,58) for the 1981-2020 period, and inspect its content with show_info:

gpm_subset <- subset_spacetime(gpm_global, years = c(2001, 2010), bbox = c(2,28,42,58))
show_info(gpm_subset)
[1] "class      : RasterBrick "
[2] "dimensions : 64, 104, 6656, 120  (nrow, ncol, ncell, nlayers)"
[3] "resolution : 0.25, 0.25  (x, y)"
[4] "extent     : 2, 28, 42, 58  (xmin, xmax, ymin, ymax)"
[5] "crs        : +proj=longlat +datum=WGS84 "
[6] "source     : memory"
[7] "names      : X2001.01.01,  X2001.02.01,  X2001.03.01,  X2001.04.01,  X2001.05.01,  X2001.06.01,  X2001.07.01,  X2001.08.01,  X2001.09.01,  X2001.10.01,  X2001.11.01,  X2001.12.01,  X2002.01.01,  X2002.02.01,  X2002.03.01, ... "
[8] "min values : 1.272205e+01, 4.698483e+00, 5.927317e+00, 2.240815e+00, 1.315575e+01, 1.301118e+00, 3.831070e+00, 4.547474e-13, 2.739577e+01, 1.662540e+00, 2.002276e+01, 1.084265e+00, 4.843051e+00, 3.975639e+00, 5.638179e+00, ... "
[9] "max values :     443.4645,     158.5196,     374.7221,     229.5028,     163.2903,     251.5495,     330.9900,     336.4113,     456.0420,     454.0903,     452.1386,     236.0807,     277.7888,     255.8143,     195.8183, ... "
[10] "time       : 2001-01-01, 2010-12-01 (min, max)"

Crop

To further crop our data to a desired polygon other than a rectangle, we use the crop_data function, which has three arguments x, shp_path, autosave.

  • x is the path to a “.nc” data set file or a RasterBrick object.
  • shp_path is the path to a “.shp” file that we want to use to crop our data.
  • autosave is set to FALSE by default. If TRUE data will be automatically stored in the same location of the input file.

Let’s crop our GPM-IMERG subset to cover only Czechia with the respective shape file, and inspect its content with show_info:

gpm_cz <- crop_data(x = gpm_subset, shp_path = "CZE_adm0.shp")
show_info(gpm_cz)
[1] "class      : RasterBrick "
[2] "dimensions : 64, 104, 6656, 480  (nrow, ncol, ncell, nlayers)"
[3] "resolution : 0.25, 0.25  (x, y)"
[4] "extent     : 2, 28, 42, 58  (xmin, xmax, ymin, ymax)"
[5] "crs        : +proj=longlat +datum=WGS84 "
[6] "source     : memory"
[7] "names      : X2001.01.01, X2001.02.01, X2001.03.01, X2001.04.01, X2001.05.01, X2001.06.01, X2001.07.01, X2001.08.01, X2001.09.01, X2001.10.01, X2001.11.01, X2001.12.01, X2002.01.01, X2002.02.01, X2002.03.01, ... "
[8] "min values :   43.226040,   30.070290,   65.995613,   46.767975,   44.382591,   52.406155,   83.416138,   51.177319,   88.692894,   14.673723,   49.876202,   55.442097,   21.179314,   57.682911,   33.612221, ... "
[9] "max values :    89.99401,    70.54952,   158.95328,   106.25800,    90.21087,   135.75002,   248.73044,   138.78595,   158.15816,    51.39417,   113.63100,   141.96646,    77.34425,   162.56750,   132.85863, ... "
[10] "time       : 2001-01-01, 2010-12-01 (min, max)"

Generate Time series

To make a time series out of our data, we use the make_ts function, which has two argument data and autosave.

  • data is the path to a “.nc” data set file or a RasterBrick object.
  • autosave is set to FALSE by default. If TRUE data will be automatically stored in the same location of the input file.

Let’s generate the time series for our three different GPM-IMERGM data sets (Global, Central Europe, and Czechia), and inspect its first 12 rows:

gpm_global_ts <- make_ts(gpm_global)
head(gpm_global_ts, 12)
          date     value           name            type
 1: 2000-06-01  93.60162 GPM IMERGM v06 Satellite-based
 2: 2000-07-01  96.01442 GPM IMERGM v06 Satellite-based
 3: 2000-08-01  94.16792 GPM IMERGM v06 Satellite-based
 4: 2000-09-01  90.38524 GPM IMERGM v06 Satellite-based
 5: 2000-10-01  93.90120 GPM IMERGM v06 Satellite-based
 6: 2000-11-01  93.55994 GPM IMERGM v06 Satellite-based
 7: 2000-12-01  96.68792 GPM IMERGM v06 Satellite-based
 8: 2001-01-01  94.71431 GPM IMERGM v06 Satellite-based
 9: 2001-02-01  85.94786 GPM IMERGM v06 Satellite-based
10: 2001-03-01  96.12793 GPM IMERGM v06 Satellite-based
11: 2001-04-01  96.99244 GPM IMERGM v06 Satellite-based
12: 2001-05-01 100.50446 GPM IMERGM v06 Satellite-based
gpm_subset_ts <- make_ts(gpm_subset)
head(gpm_subset_ts, 12)
          date     value           name            type
 1: 2001-01-01  96.67884 GPM IMERGM v06 Satellite-based
 2: 2001-02-01  58.80170 GPM IMERGM v06 Satellite-based
 3: 2001-03-01  96.04202 GPM IMERGM v06 Satellite-based
 4: 2001-04-01  80.09136 GPM IMERGM v06 Satellite-based
 5: 2001-05-01  55.94958 GPM IMERGM v06 Satellite-based
 6: 2001-06-01  92.74124 GPM IMERGM v06 Satellite-based
 7: 2001-07-01  95.06115 GPM IMERGM v06 Satellite-based
 8: 2001-08-01  76.70639 GPM IMERGM v06 Satellite-based
 9: 2001-09-01 141.68700 GPM IMERGM v06 Satellite-based
10: 2001-10-01  62.51384 GPM IMERGM v06 Satellite-based
11: 2001-11-01  97.12927 GPM IMERGM v06 Satellite-based
12: 2001-12-01  71.00100 GPM IMERGM v06 Satellite-based
gpm_cz_ts <- make_ts(gpm_cz)
head(gpm_cz_ts, 12)
          date     value           name            type
 1: 2001-01-01  59.36666 GPM IMERGM v06 Satellite-based
 2: 2001-02-01  50.59915 GPM IMERGM v06 Satellite-based
 3: 2001-03-01  96.69115 GPM IMERGM v06 Satellite-based
 4: 2001-04-01  73.23477 GPM IMERGM v06 Satellite-based
 5: 2001-05-01  64.74244 GPM IMERGM v06 Satellite-based
 6: 2001-06-01  86.48493 GPM IMERGM v06 Satellite-based
 7: 2001-07-01 127.52908 GPM IMERGM v06 Satellite-based
 8: 2001-08-01  94.31304 GPM IMERGM v06 Satellite-based
 9: 2001-09-01 119.28491 GPM IMERGM v06 Satellite-based
10: 2001-10-01  30.82040 GPM IMERGM v06 Satellite-based
11: 2001-11-01  72.33474 GPM IMERGM v06 Satellite-based
12: 2001-12-01  91.77480 GPM IMERGM v06 Satellite-based

NOTE: When working with files from memory (i.e., without saving .nc files) the make_ts functions may not generate the name and type columns appropriately.

Visualize

Either after we have processed our data as required or right after downloaded, we have six different options to visualize our data:

Let’s plot our three different GPM-IMERGM data sets (Global, Central Europe, and Czechia)

Maps

To see a map of any data set raw or processed, we use plot_map which takes only one layer of the RasterBrick as input.

plot_map(gpm_global[[1]])

plot_map(gpm_subset[[1]])

plot_map(gpm_cz[[1]])

Time Series Visuals

To draw a time series generated by make_ts, we use any of the options below, which takes only a make_ts “.csv” generated file.

Line

plot_line(gpm_global_ts)

plot_line(gpm_subset_ts)

plot_line(gpm_cz_ts)

Heatmap

plot_heatmap(gpm_global_ts)

plot_heatmap(gpm_subset_ts)

plot_heatmap(gpm_cz_ts)

Boxplot

plot_box(gpm_global_ts)

plot_box(gpm_subset_ts)

plot_box(gpm_cz_ts)

Density

plot_density(gpm_global_ts)

plot_density(gpm_subset_ts)

plot_density(gpm_cz_ts)

Summary

NOTE: For good aesthetics we recommend saving plot_summary with ggsave(<filename>, <plot>, width = 16.3, height = 15.03).

plot_summary(gpm_global_ts)
#plot_summary(gpm_subset_ts)
#plot_summary(gpm_cz_ts)

Time Series Analysis

Once we have generated our time series, we can start evaluating the data with:

The above functions have three arguments x, ref, and th (except for nse).

NOTE: Not demonstrated in the current demo because such metrics are not intended for monthly data but rather higher temporal resolution data, e.g., daily or subdaily (coming soon).

Coming Soon

More functions for data processing and analysis and expanding the database.

References

Abatzoglou, John T, Solomon Z Dobrowski, Sean A Parks, and Katherine C Hegewisch. 2018. “TerraClimate, a High-Resolution Global Dataset of Monthly Climate and Climatic Water Balance from 1958–2015.” Scientific Data 5 (1): 1–12.
Adler, Robert F., Mathew R. P. Sapiano, George J. Huffman, Jian-Jian Wang, Guojun Gu, David Bolvin, Long Chiu, et al. 2018. “The Global Precipitation Climatology Project (GPCP) Monthly Analysis (New Version 2.3) and a Review of 2017 Global Precipitation.” Atmosphere 9 (4): 138. https://doi.org/10.3390/atmos9040138.
Ashouri, Hamed, Kuo-Lin Hsu, Soroosh Sorooshian, Dan K. Braithwaite, Kenneth R. Knapp, L. Dewayne Cecil, Brian R. Nelson, and Olivier P. Prat. 2015. PERSIANN-CDR: Daily Precipitation Climate Data Record from Multisatellite Observations for Hydrological and Climate Studies.” Bulletin of the American Meteorological Society 96 (1): 69–83.
Beck, Hylke E, Eric F Wood, Ming Pan, Colby K Fisher, Diego G Miralles, Albert IJM Van Dijk, Tim R McVicar, and Robert F Adler. 2019. “MSWEP V2 Global 3-Hourly 0.1 Precipitation: Methodology and Quantitative Assessment.” Bulletin of the American Meteorological Society 100 (3): 473–500.
Chen, Mingyue, Pingping Xie, John E. Janowiak, and Phillip A. Arkin. 2002. “Global Land Precipitation: A 50-Yr Monthly Analysis Based on Gauge Observations.” Journal of Hydrometeorology 3 (3): 249–66.
Funk, Chris, Pete Peterson, Martin Landsfeld, Diego Pedreros, James Verdin, Shraddhanand Shukla, Gregory Husak, et al. 2015. “The Climate Hazards Infrared Precipitation with Stations—a New Environmental Record for Monitoring Extremes.” Scientific Data 2 (1): 150066. https://doi.org/10.1038/sdata.2015.66.
Gelaro, Ronald, Will McCarty, Max J. Suárez, Ricardo Todling, Andrea Molod, Lawrence Takacs, Cynthia A. Randles, et al. 2017. “The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2).” Journal of Climate 30 (14): 5419–54. https://doi.org/10.1175/JCLI-D-16-0758.1.
Harris, Ian, Timothy J Osborn, Phil Jones, and David Lister. 2020. “Version 4 of the CRU TS Monthly High-Resolution Gridded Multivariate Climate Dataset.” Scientific Data 7 (1): 1–18.
Hersbach, Hans, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz-Sabater, Julien Nicolas, et al. 2020. “The ERA5 Global Reanalysis.” Quarterly Journal of the Royal Meteorological Society 146 (730): 1999–2049.
Huffman, G. J., E. F. Stocker, D. T. Bolvin, E. J. Nelkin, and Jackson Tan. 2019. GPM IMERG Final Precipitation L3 1 Month 0.1 Degree x 0.1 Degree V06, Greenbelt, MD, Goddard Earth Sciences Data and Information Services Center (GES DISC).
Huffman, George J, Robert F Adler, David T Bolvin, and Eric J Nelkin. 2010. “The TRMM Multi-Satellite Precipitation Analysis (TMPA).” In Satellite Rainfall Applications for Surface Hydrology, 3–22. Springer.
Joyce, Robert J, John E Janowiak, Phillip A Arkin, and Pingping Xie. 2004. CMORPH: A Method That Produces Global Precipitation Estimates from Passive Microwave and Infrared Data at High Spatial and Temporal Resolution.” Journal of Hydrometeorology 5 (3): 487–503.
Kalnay, Eugenia, Masao Kanamitsu, Robert Kistler, William Collins, Dennis Deaven, Lev Gandin, Mark Iredell, Suranjana Saha, Glenn White, and John Woollen. 1996. “The NCEP/NCAR 40-Year Reanalysis Project.” Bulletin of the American Meteorological Society 77 (3): 437–72.
Kanamitsu, Masao, Wesley Ebisuzaki, Jack Woollen, Shi-Keng Yang, J. J. Hnilo, M. Fiorino, and G. L. Potter. 2002. “Ncep–Doe Amip-Ii Reanalysis (r-2).” Bulletin of the American Meteorological Society 83 (11): 1631–44.
Kobayashi, Shinya, Yukinari Ota, Yayoi Harada, Ayataka Ebita, Masami Moriya, Hirokatsu Onoda, Kazutoshi Onogi, et al. 2015. “The JRA-55 Reanalysis: General Specifications and Basic Characteristics.” Journal of the Meteorological Society of Japan. Ser. II 93 (1): 5–48. https://doi.org/10.2151/jmsj.2015-001.
McNally, Amy, Kristi Arsenault, Sujay Kumar, Shraddhanand Shukla, Pete Peterson, Shugong Wang, Chris Funk, Christa D. Peters-Lidard, and James P. Verdin. 2017. “A Land Data Assimilation System for Sub-Saharan Africa Food and Water Security Applications.” Scientific Data 4 (1): 170012. https://doi.org/10.1038/sdata.2017.12.
Peterson, Thomas C., and Russell S. Vose. 1997. “An Overview of the Global Historical Climatology Network Temperature Database.” Bulletin of the American Meteorological Society 78 (12): 2837–50.
Poli, Paul, Hans Hersbach, Dick P Dee, Paul Berrisford, Adrian J Simmons, Frédéric Vitart, Patrick Laloyaux, et al. 2016. ERA-20C: An Atmospheric Reanalysis of the Twentieth Century.” Journal of Climate 29 (11): 4083–97.
Rodell, Matthew, P. R. Houser, U. E. A. Jambor, J. Gottschalck, K. Mitchell, C.-J. Meng, K. Arsenault, B. Cosgrove, J. Radakovich, and M. Bosilovich. 2004. “The Global Land Data Assimilation System.” Bulletin of the American Meteorological Society 85 (3): 381–94.
Schneider, Udo, Andreas Becker, Peter Finger, Anja Meyer-Christoffer, Bruno Rudolf, and Markus Ziese. 2011. GPCC Full Data Reanalysis Version 6.0 at 0.5: Monthly Land-Surface Precipitation from Rain-Gauges Built on GTS-Based and Historic Data.” GPCC Data Rep., Doi 10.
Slivinski, Laura C, Gilbert P Compo, Jeffrey S Whitaker, Prashant D Sardeshmukh, Benjamin S Giese, Chesley McColl, Rob Allan, et al. 2019. “Towards a More Reliable Historical Reanalysis: Improvements for Version 3 of the Twentieth Century Reanalysis System.” Quarterly Journal of the Royal Meteorological Society 145 (724): 2876–2908.
Tang, Guoqiang, Martyn P. Clark, and Simon Michael Papalexiou. 2022. EM-Earth: The Ensemble Meteorological Dataset for Planet Earth.” Bulletin of the American Meteorological Society 103 (4): E996–1018. https://doi.org/10.1175/BAMS-D-21-0106.1.
Vargas Godoy, Mijael Rodrigo, Yannis Markonis, Martin Hanel, Jan Kyselỳ, and Simon Michael Papalexiou. 2021. “The Global Water Cycle Budget: A Chronological Review.” Surveys in Geophysics 42 (5): 1075–1107.
Willmott, C. J., and K. Matsuura. 2001. Terrestrial Air Temperature and Precipitation: Monthly and Annual Time Series (1950 - 1999).
Xie, P., M. Chen, and W. Shi. 2010. CPC Global Unified Gauge-Based Analysis of Daily Precipitation.” In Preprints, 24th Conf. On Hydrology, Atlanta, GA, Amer. Metero. Soc. Vol. 2.
Xie, Pingping, and Phillip A. Arkin. 1997. “Global Precipitation: A 17-Year Monthly Analysis Based on Gauge Observations, Satellite Estimates, and Numerical Model Outputs.” Bulletin of the American Meteorological Society 78 (11): 2539–58.