Vignette of AeRobiology

A Computational Tool for Aerobiological Data

Jesus Rojo1 Antonio Picornell2 Jose Oteros3

2019-05-06

AeRobiology R package

AeRobiology R package

This package gathers different tools for managing aerobiological databases, elaborating the main calculations and visualization of results. In a first step, data may be checked using tools for quality control and all missing gaps can be completed. Then, the main parameters of the pollen season can be calculated and represented graphically. Multiple graphical tools are available: pollen calendars, phenological plots, time series, tendencies, interactive plots, abundance plots…

Please, take into account that not all the arguments of each function are explained in this document. This is a quick guide and further details can be consulted in the official document of the package.
The first thing you have to do is to install from CRAN repository and to load the package AeRobiology.
install.packages("AeRobiology")
library (AeRobiology)

Using the attached data

During this tutorial we are going to use the pollen data from Munich which are integrated in the package. This will allow you to follow the tutorial obtaining the same results. If you want to follow the tutorial by using your own data, check the next section.

munich_pollen is a data set containing information of daily concentrations of pollen in the atmosphere of Munich during the years 2010_2015. Pollen types included: “Alnus”, “Betula”, “Taxus”, “Fraxinus”, “Poaceae”, “Quercus”, “Ulmus” and “Urtica”.
Data were obtained at Munich (Zentrum Allergie und Umwelt, ZAUM) using a Hirst_type volumetric pollen trap (Hirst, 1952) following the standard methodology (VDI4252_4_2016). Some gaps have been added to test some functions of the package (e.g. quality_control(), interpollen()).
The data were obtained by the research team of Prof. Jeroen Buters (Christine Weil & Ingrid Weichenmeier). At the Zentrum Allergie und Umwelt (ZAUM, directed by Prof. Carsten B. Schmidt_Weber).

You can load the data in your working environment by typing:

data("munich_pollen")

Loading your database from Excel

This section has been developed for people who are not familiar with R. If you have experience in loading databases in R, you might not be interested in this section and you can skip it.

There are several functions to import data from Excel. Arbitrarily, we are going to use the package readxl.

install.packages("readxl")
library (readxl)

Now we can import the data with the function read_xlsx(). I strongly recommend you to have your Excel file in the same folder in which you are working with R. This will avoid long paths to your files and also will prevent broken paths when some files have been moved. If you are working with an old database, you might be interested in the read_xls() function.

Mydata<-read_xlsx("C:/Users/Antonio/Desktop/Prueba Markdown/mydata.xlsx")

You can also select the sheet you want to import:

Mydata<-read_xlsx("C:/Users/Antonio/Desktop/Prueba Markdown/mydata.xlsx", sheet=2)
IMPORTANT: You must remove all the symbols or letters from your database. If you have missing gaps within your database don’t put “-” or “No Data” or any letter in the gaps, keep the cell empty or remove the cell. The dates should be in a complete format (E.g.:“14/02/2019”). There must be a first column with the date followed by as many columns as pollen types with only numeric values (check decimal separation: commas must be replaced by dots).

Now you should have loaded your data in the object Mydata. Let’s check if there is any mistake:

str(munich_pollen)
## 'data.frame':    2191 obs. of  9 variables:
##  $ date    : Date, format: "2010-01-01" "2010-01-02" ...
##  $ Alnus   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Betula  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Taxus   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Fraxinus: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Poaceae : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Quercus : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Ulmus   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Urtica  : num  NA NA NA NA NA NA NA NA NA NA ...

Note: I’m using munich_pollen database as example but you should use Mydata.

As you can see, the object is a data.frame, the first column is “Date” type and the rest are “num”, which means numeric. This is the format you database must have. Normally it is automatically recognized by the import function, but if not, check the functions: as.Date(), as.numeric() and as.data.frame().

Your database must have this format to start working with the package. The functions are designed to warn you in case the format is not correct but there may be some exceptions. This is the most important step when using the package and some strange errors reported by the functions can be solved by doing this. Sometimes the date column has 2 different types simultaneously (“POSIXlt” and “POSIXct”) and it might cause mistakes. You can use one or another, but not both. Furthermore, it is strongly recommended to use “Date” format instead of “POSIXlt” or “POSIXct” despite theoretically there may not be problems by using them.


Please, don`t exasperate. This is the slowest step of using the package and the most important. Once you have your data loaded, you only have to spend a few minutes in each function to have your results. If you have some unsolving problems, search on the internet. Some other people must have had your same problem and it should be solved in some forums. If not, write us and we will help you as soon as possible:

If you want to import your data from csv files, you might be interested in read.csv() function.

Function quality_control()

This function was designed to check the quality of an historical database of several pollen types. Since many of the quality requirements depend on the criteria selected to establish the main pollen season, this function has the calculate_ps() function integrated. You can insert arguments of calculate_ps() in this function to select the method you want for calculating the main pollen season. calculate_ps() details can be consulted later in this same manuscript.

If result = “plot”, this function returns a graphical resume of the quality of the database marking with different red intensities the “weak-points” of it, and if result = “table” a data.frame with the detailed reasons of their “weakness”.

It establishes the quality according to 5 different criteria:

When running the function, it gives you a graphical resume with different color intensity depending on the risk of including each pollen/spore type for a concrete year and a data.frame with detailed information about the reasons of this evaluation.

If result = “plot”, the function returns a list of objects of class ggplot2; if result = “table”, the function returns a data.frame. By default, result = “table”.
QualityControl<-quality_control(munich_pollen, result = "table")

If the filter has been successfully passed, it appears as TRUE in the generated data.frame. If not, it appears as FALSE.

head(QualityControl)
##    type seasons Complete Start  Peak   End Comp.MPS Risk
## 1 Alnus    2010     TRUE  TRUE  TRUE FALSE    FALSE    2
## 2 Alnus    2011     TRUE FALSE  TRUE  TRUE    FALSE    2
## 3 Alnus    2012     TRUE  TRUE  TRUE  TRUE     TRUE    0
## 4 Alnus    2013     TRUE FALSE  TRUE FALSE     TRUE    2
## 5 Alnus    2014     TRUE FALSE  TRUE FALSE    FALSE    3
## 6 Alnus    2015     TRUE FALSE FALSE  TRUE    FALSE    3
quality_control(munich_pollen, result = "plot")

There are lots of arguments for this function. You can consult them in the “help” section of the function in R or in the official document of the package.

Let’s suppose we want to check the quality of our database, but we want to calculate it for a 80% defined Main Pollen Season (arguments ps.method and perc). We don’t want to take into account years with a missing data less than 4 days away from the start, peak or end dates (argument int.window). Furthermore, we only want to exclude years with more than 50% of interpolated data within the main pollen season (argument perc.miss):

quality_control(munich_pollen, int.window = 4, perc.miss = 50, ps.method = "percentage", perc = 80)

As you would have noticed, there are several differences between this result and the previous one.

interpollen() function is integrated in this function. Consult the “help” section of this function to use it.

Function interpollen()

The function interpollen() was designed to complete all the missing gaps within a database in just one step. All the gaps of each pollen/spore type are completed simultaneously. There are different methods to do so which may be more or less appropriate according to the particularities of your database. The arguments are:

Interpolated<-interpollen(munich_pollen[,c(1,6)], method="lineal", plot = TRUE, result = "long")

As you can see above, when result = “long”, the missing gaps which have been completed by the algorithm are marked in red. It displays similar graphs for each pollen/spore type within your database. Furthermore, the “Interpolated” data.frame is produced as “long” with another new column indicating if each data has been interpolated or not (1 means interpolated and 0 means original data).

head(Interpolated)
##         Date    Type Pollen Interpolated
## 1 2010-02-24 Poaceae      0            0
## 2 2010-02-25 Poaceae      0            0
## 3 2010-02-26 Poaceae      0            0
## 4 2010-02-27 Poaceae      0            0
## 5 2010-02-28 Poaceae      0            0
## 6 2010-03-01 Poaceae      0            0

If you want to integrate all the new data in your database, you can assign the function to a new object and specify the argument result = “wide”:

CompleteData<-interpollen(munich_pollen[,c(1,6)], method="lineal", plot = FALSE, result = "wide")

Now your database without gaps appears as “CompleteData”.

As you might have noticed, if you add plot=FALSE, the graphs are not shown.

You don’t have to do this before applying another function of this package. “interpollen” function has been integrated in the main functions of the package so you can choose whether you want your data to be completed or not by a specific argument in each function.E.g.:

calculate_ps(munich_pollen, method="percentage", interpolation=TRUE, int.method = "lineal", plot = F)
FREQUENT ERROR: Sometimes when applying interpollen you receive the following error: “Error in rep(NA, ncasos): invalid ‘times’ argument”. This is due to a mistake in your database: you have some dates repeated or disorganized. The algorithm of interpollen() searches for correlative dates and, when they are not consecutive natural days, it carries out the interpolation. When the second date is an earlier date than the previous one, it reports an error. You have to correct these issues in your database before applying the interpollen function.

Methods of interpolation

In this function there are 5 different methods to interpolate missing gaps. Each method can be more appropriate than the others under some specific circumstances. Interpolated data are not real data, but including such estimations can reduce the error of some calculations such as the main pollen season by percentage or a pollen calendar, i.e.: if you keep the gap, these days would count as 0 for the calculations and this would suppose a bigger error than estimating pollen concentrations for these days. Obviously, this will also suppose a bigger error than having the real data.

“lineal” method

As mentioned above, there are many methods to interpolate your missing data. The simplest one was the showed in the previous example: “lineal”. It traces a straight line between each extreme of the gap. The pollen/spore concentration of the missing days is calculated according to this line. This method may be appropriate for dates without pollen/spores or for small gaps.

Nevertheless, there are other method which can be more effective during the pollen season.

“movingmean” method

This method calculates the moving mean of the pollen daily concentrations. The gaps are fulfilled by calculating the moving mean of the concentrations for this particular day. The window of the moving mean is centered on this day and its size is the result of multiplicating the gap size by the factor argument. E.g.: If you have a gap of 3 days and your factor is 2, the gaps are replaced by the value of a moving mean of 6 days (3 x 2) centered in this particular day and not taking into account the missing days for the mean value. Furthermore, it is a dynamic function: for each gap of the database the window size of the moving mean changes depending of each gap size.

“spline” method

This method carries out the interpolation by tracing a polinomic function to link both extremes of the gap. The polinomic function is chosen according to the best fitting equation to the previous and posterior days of the gap (second, third, fourth degree…). The number of days of each side of the gap which will be taken into account for calculating the spline regression are specified by ndays argument. The smoothness of the adjustment can be specified by the spar argument.

Note: By changing the ndays and spar argument, very different results can be obtained. E.g.:

“tseries” method

If you have long time series of data, this might be the most suitable method for you. This method analyses the time series of the pollen/spore database and performs a seasonal-trend decomposition based on LOESS (Cleveland et al., 1990). It extracts the seasonality of the historical database and uses it to predict the missing data by performing a linear regression with the target year.

Seasonality is represented in grey, real data in black and interpolated data in red. This method improves with long time series of years. Strange results may be caused by lack of years or small sampling periods.

“neighbour” method

Other near stations provided by the user are used to interpolate the missing data of the target station. First of all, a Spearman correlation is performed between the target station and the neighbour stations to discard the neighbour stations with a correlation coefficient smaller than mincorr value. For each gap, a linear regression is performed between the neighbour stations and the target stations to determine the equation which converts the pollen concentrations of the neighbour stations into the pollen concentration of the target station. Only neighbour stations without any missing data during the gap period are taken into account for each gap.

You can include 4 different databases of neighbour stations with the arguments data2, data3, data4 and data5. The format of these databases must be the same than the original one and the name of the pollen/spore types must be exactly the same.

With the mincorr argument you can specify the minimal correlation coefficient (Spearman correlation) that the neighbour stations must have with the target station to be taken into account for a concrete gap. This process is completely independent for each gap of each pollen type.

Function calculate_ps()

The function calculate_ps() is one of the core functions of the AeRobiology package. It was designed to calculate the main parameters of the pollen season with regard to phenology and pollen intensity from a historical database of several pollen types. The function can use the most common methods to define the main pollen season.

Please, be patient: This function has tons of arguments. Some of them are only useful for specific situations. You can use the function without knowing all of them. Nevertheless, we decided to include all the possible arguments in order to provide specific tools for the advanced users.

This function has several arguments, but don’t worry, we are going to perform examples later. The main arguments are:

IMPORTANT : if export.plot=TRUE, the plots are EXPORTED in “plot_AeRobiology” folder.

Methods of calculating the Main Pollen Season

This function allows to calculate the pollen season using five different methods which are described below. After calculating the start_date, end_date and peak_date for the pollen season all rest of parameters are calculated:

IMPORTANT : If export.result=TRUE, these objects will be exported as xlsx file within the “table_AeRobiology” directory created in your working directory. This Excel file will have a sheet for each pollen type and a last sheet with the legend of all the abbreviations and the method selected to calculate them. We strongly recommend to use export.result=TRUE

Although the results can be extracted in the working directory in a xlsx file, they can also be assigned to an object, and visualized, for example (an extract):

type seasons st.dt st.jd en.dt en.jd ln.ps sm.tt sm.ps pk.val pk.dt pk.jd ln.prpk sm.prpk ln.pspk sm.pspk daysth
24 Fraxinus 2015 2015-03-28 87 2015-04-21 111 25 4762.5 4538.5 910.5 2015-04-13 103 17 2960 8 1578.5 8
25 Poaceae 2010 2010-05-18 138 2010-08-25 237 100 1861.0 1773.0 179.0 2010-06-11 162 25 705 75 1068.0 4
26 Poaceae 2011 2011-05-02 122 2011-08-24 236 115 2244.0 2134.0 163.0 2011-05-25 145 24 588 91 1546.0 2
27 Poaceae 2012 2012-05-07 128 2012-08-17 230 103 2127.0 2023.0 84.0 2012-05-25 146 19 349 84 1674.0 0
28 Poaceae 2013 2013-05-18 138 2013-09-13 256 119 2481.0 2367.0 166.0 2013-06-13 164 27 773 92 1594.0 4
29 Poaceae 2014 2014-05-06 126 2014-08-09 221 96 2413.0 2306.0 97.0 2014-06-11 162 37 1218 59 1088.0 0
30 Poaceae 2015 2015-05-12 132 2015-08-10 222 91 4175.0 3998.0 322.0 2015-06-04 155 24 2089 67 1909.0 9
31 Quercus 2010 2010-04-08 98 2010-05-26 146 49 737.5 701.5 189.0 2010-04-30 120 23 416 26 285.5 1

The result is also plotted by default (plot = TRUE).

## [1] "2010 Poaceae"
## [1] "2011 Poaceae"
## [1] "2012 Poaceae"
## [1] "2013 Poaceae"
## [1] "2014 Poaceae"

## [1] "2015 Poaceae"
##      type seasons      st.dt st.jd      en.dt en.jd ln.ps sm.tt sm.ps
## 1 Poaceae    2010 2010-05-18   138 2010-08-25   237   100  1861  1773
## 2 Poaceae    2011 2011-05-02   122 2011-08-24   236   115  2244  2134
## 3 Poaceae    2012 2012-05-07   128 2012-08-17   230   103  2127  2023
## 4 Poaceae    2013 2013-05-18   138 2013-09-13   256   119  2481  2367
## 5 Poaceae    2014 2014-05-06   126 2014-08-09   221    96  2413  2306
## 6 Poaceae    2015 2015-05-12   132 2015-08-10   222    91  4175  3998
##   pk.val      pk.dt pk.jd ln.prpk sm.prpk ln.pspk sm.pspk daysth
## 1    179 2010-06-11   162      25     705      75    1068      4
## 2    163 2011-05-25   145      24     588      91    1546      2
## 3     84 2012-05-25   146      19     349      84    1674      0
## 4    166 2013-06-13   164      27     773      92    1594      4
## 5     97 2014-06-11   162      37    1218      59    1088      0
## 6    322 2015-06-04   155      24    2089      67    1909      9

“percentage” method

This is a commonly used method for defining the pollen season based on the elimination of a certain percentage in the beginning and the end of the pollen season ( Nilsson and Persson, 1981; Andersen, 1991). For example, if the pollen season is based on the 95% of the total annual pollen (perc = 95), the start_date of the pollen season is marked as the day in which 2.5% of the total pollen is registered and the end_date of the pollen season is marked as the day in which 97.5% of the total pollen is registered.

In this case we have calculated the main pollen season based on the 90% of the total annual pollen. Results are stored in the “table_AeRobiology” folder since export.result=TRUE.

You can select different methods of interpolation or even to not interpolate gaps:

IMPORTANT : if export.plot=TRUE, the plots are EXPORTED in “plot_AeRobiology” folder.

“logistic” method

This method was developed by Ribeiro et al. (2007) and modified by Cunha et al. (2015). It is based on fitting annually a non-linear logistic regression model to the daily accumulated curve for each pollen type. This logistic function and the different derivatives were considered to calculate the start_date and end_date of the pollen season, based on the asymptotes when pollen amounts are stabilized on the beginning and the end of the accumulated curve. For more information about the method, see Ribeiro et al. (2007) and Cunha et al. (2015). Three different derivatives may be used (derivative argument) 4, 5 or 6 that represent from higher to lower restrictive criterion for defining the pollen season.

This method may be complemented with an optional method for reduction the peaks values (reduction = TRUE), thus avoiding the effect of the great influence of extreme peaks. In this sense, peaks values will be cut below a certain level that the user may select based on a percentile analysis of peaks. For example, red.level = 0.90 will cut all peaks above the percentile 90.

type seasons st.dt st.jd en.dt en.jd ln.ps sm.tt sm.ps pk.val pk.dt pk.jd ln.prpk sm.prpk ln.pspk sm.pspk daysth
Poaceae 2010 2010-05-16 136 2010-07-22 203 68 1861 1666.0 179 2010-06-11 162 27 724 41 942.0 4
Poaceae 2011 2011-04-27 117 2011-07-16 197 81 2244 1994.0 163 2011-05-25 145 29 617 52 1377.0 2
Poaceae 2012 2012-04-30 121 2012-07-23 205 85 2127 1932.0 84 2012-05-25 146 26 390 59 1542.0 0
Poaceae 2013 2013-02-16 47 NA NA NA 2481 NA 166 2013-06-13 164 NA NA NA NA NA
Poaceae 2014 2014-05-07 127 2014-07-18 199 73 2413 2149.0 97 2014-06-11 162 36 1206 37 943.0 0
Poaceae 2015 2015-05-08 128 2015-07-08 189 62 4175 3491.5 322 2015-06-04 155 28 2132 34 1359.5 9

In the previous case, the reduction wasn’t carried out (reduction=FALSE) and all the peaks were conserved. We can cut some peaks and change the derivative:

type seasons st.dt st.jd en.dt en.jd ln.ps sm.tt sm.ps pk.val pk.dt pk.jd ln.prpk sm.prpk ln.pspk sm.pspk daysth
Poaceae 2010 2010-05-08 128 2010-07-30 211 84 1861 1731 179 2010-06-11 162 35 748 49 983 4
Poaceae 2011 2011-04-17 107 2011-07-26 207 101 2244 2073 163 2011-05-25 145 39 640 62 1433 2
Poaceae 2012 2012-04-19 110 2012-08-03 216 107 2127 2035 84 2012-05-25 146 37 400 70 1635 0
Poaceae 2013 2013-02-20 51 NA NA NA 2481 NA 166 2013-06-13 164 NA NA NA NA NA
Poaceae 2014 2014-04-28 118 2014-07-27 208 91 2413 2270 97 2014-06-11 162 45 1240 46 1030 0
Poaceae 2015 2015-05-01 121 2015-07-15 196 76 4175 3678 322 2015-06-04 155 35 2158 41 1520 9

As you can observe, results are a bit different.

IMPORTANT : if export.plot=TRUE, the plots are EXPORTED in “plot_AeRobiology” folder.

“clinical” method

This method was proposed by Pfaar et al. (2017). It is based on the expert consensus in relation to pollen exposure and the relationship with allergic symptoms derived of the literature.

Different periods may be defined by this method: the pollen season, the high pollen season and the high pollen days:

  1. The start_date and end_date of the pollen season were defined as a certain number of days (n.clinical argument) within a time window period (window.clinical argument) exceeding a certain pollen threshold (th.pollen argument) which summation is above a certain pollen sum (th.sum argument).
    All these parameters are established for each pollen type according to Pfaar et al. (2017) and using the type argument these parameters may be automatically adjusted for the specific pollen types (“birch”, “grasses”, “cypress”, “olive” or “ragweed”). Furthermore, the user may change all parameters to do a customized definition of the pollen season.

  2. The start_date and end_date of the high pollen season were defined as three consecutive days exceeding a certain pollen threshold (th.day argument).

  3. The number of high pollen days will also be calculated exceeding this pollen threshold (th.day). For more information about the method, see Pfaar et al. (2017).

Running the following example, the main pollen season will be established according to the birch requirements: more than 5 days within a week in which more than 10 pollen grains/m3 are registered and whose sum exceeds 100 pollen grains/m3. The high pollen days are those which exceed 100 pollen grains/m3 (n.clinical=5, window.clinical=7, th.pollen=10, th.sum=100, th.day=100)

type seasons st.dt st.jd en.dt en.jd ln.ps sm.tt sm.ps pk.val pk.dt pk.jd ln.prpk sm.prpk ln.pspk sm.pspk daysth st.dt.hs st.jd.hs en.dt.hs en.jd.hs
Poaceae 2010 2010-06-04 155 2010-07-15 196 42 1861 1475 179 2010-06-11 162 8 598 34 877 4 2010-06-10 161 2010-06-13 164
Poaceae 2011 2011-05-17 137 2011-07-04 185 49 2244 1673 163 2011-05-25 145 9 436 40 1237 2 NA NA NA NA
Poaceae 2012 2012-05-19 140 2012-07-19 201 62 2127 1799 84 2012-05-25 146 7 284 55 1515 0 NA NA NA NA
Poaceae 2013 2013-06-05 156 2013-07-27 208 53 2481 1910 166 2013-06-13 164 9 708 44 1202 4 NA NA NA NA
Poaceae 2014 2014-05-19 139 2014-07-07 188 50 2413 1948 97 2014-06-11 162 24 1107 26 841 0 NA NA NA NA
Poaceae 2015 2015-05-07 127 2015-07-27 208 82 4175 3893 322 2015-06-04 155 29 2144 53 1749 9 2015-05-29 149 2015-06-06 157

The code above returns the same result than:

type seasons st.dt st.jd en.dt en.jd ln.ps sm.tt sm.ps pk.val pk.dt pk.jd ln.prpk sm.prpk ln.pspk sm.pspk daysth st.dt.hs st.jd.hs en.dt.hs en.jd.hs
Poaceae 2010 2010-06-04 155 2010-07-15 196 42 1861 1475 179 2010-06-11 162 8 598 34 877 4 2010-06-10 161 2010-06-13 164
Poaceae 2011 2011-05-17 137 2011-07-04 185 49 2244 1673 163 2011-05-25 145 9 436 40 1237 2 NA NA NA NA
Poaceae 2012 2012-05-19 140 2012-07-19 201 62 2127 1799 84 2012-05-25 146 7 284 55 1515 0 NA NA NA NA
Poaceae 2013 2013-06-05 156 2013-07-27 208 53 2481 1910 166 2013-06-13 164 9 708 44 1202 4 NA NA NA NA
Poaceae 2014 2014-05-19 139 2014-07-07 188 50 2413 1948 97 2014-06-11 162 24 1107 26 841 0 NA NA NA NA
Poaceae 2015 2015-05-07 127 2015-07-27 208 82 4175 3893 322 2015-06-04 155 29 2144 53 1749 9 2015-05-29 149 2015-06-06 157

We have resumed all these parameters under the argument type=birch to facilitate its application.

IMPORTANT : if export.plot=TRUE, the plots are EXPORTED in “plot_AeRobiology” folder.

“grains” method

This method was proposed by Galan et al. (2001) originally in olive pollen but after also applied in other pollen types.

The start_date and end_date of the pollen season were defined as a certain number of days (window.grains argument) exceeding a certain pollen threshold (th.pollen argument). For more information about the method, see Galan et al. (2001).

We want to establish the main pollen season start in the first day when more than 3 consecutive days have more than 2 pollen grains/m3, and the end in the last day in which these conditions are fulfilled:

“moving” method

This method is proposed the first time by the authors of this package. We are developing a research paper explaining the method in detail.

The definition of the pollen season is based on the application of a moving average to the pollen series in order to obtain the general seasonality of the pollen curve avoiding the great variability of the daily fluctuations. Thus, the start_date and the end_date will be established when the curve of the moving average reaches a given pollen threshold (th.ma argument). Also the order of the moving average may be customized by the user (man argument). By default, man = 11 and th.ma = 5.

The idea of this method is to be able to calculate the start of the main pollen season when the season has not finished yet. Moreover, it allows to establish the start and end even if there is a huge amount of lacking data within the main pollen season. Its similar to the “grains” method but taking into account the moving mean to avoid daily variability.

You might understand it better by consulting the plots obtained.

IMPORTANT : if export.plot=TRUE, the plots are EXPORTED in “plot_AeRobiology” folder.

Let’s calculate it with a moving average of 7 days. We are going to establish the start and end of the main pollen season when the moving average reaches 4 pollen grains/m3:

Advanced examples

Southern Hemisphere & interannual types

As you may have noticed, we have been using the function under an “European” point of view: the calculations are from 1st January to 31th December. The researchers of the Southern Hemisphere are used to work with interanual pollen seasons. Don’t worry, we haven’t forgotten you!

  1. You can work from 1st June to 31th May by means of the argument def.season="interannual":

In this method, the season belongs to the first year of the pair of years, i.e: from June 2017 to May 2018 -> season “2017”.

  1. You can center the main pollen season in the average peak day (182 days before and after the average date of the peak):

In this last method, the season belongs to the year in which the average peak date - 182 days is located, i.e.: if the average peak date is in January 2013, the season is called “2012” in the data.frames.

Interpolation in detail

Pollen time series frequently have different gaps with no data and this fact could be a problem for the calculation of specific methods for defining the pollen season even providing incorrect results. In this sense by default a linear interpolation will be carried out to complete these gaps before to define the pollen season (interpolation = TRUE). Additionally, the users may select other interpolation methods using the int.method argument, as “lineal”, “movingmean”, “spline” or “tseries”.

Some advanced users may have noticed that you can’t use directly all the arguments of interpollen() through calculate_ps(). You only are able to select the interpolation method. Nevertheless, it is not impossible to use them:

  1. Use interpollen() with all the arguments you want and overwrite your interpolated database in an object:

  1. Then, use “CompleteData” instead of “munich_pollen” (or your database) in the following steps:

Note: You might select interpolation=FALSE in the other functions you use after interpolating manually. In some cases it doesn’t matter, but if you want to interpolate only gaps with less than 3 days and you did in the first step, if you don’t use interpolation=FALSE in calculate_ps function, a second interpolation will be carried out by using the default method (gaps with less than 30 days and lineal interpolation). You will obtain a database with the gaps of less than 3 days interpolated by “spline” method and the rest with the “lineal” method.

Function pollen_calendar

This function calculates the pollen calendar from a historical database of several pollen types using different designs. The main arguments are:

Data exportation:

Exclusive arguments for pollen calendars generated as “heatplot”:

Exclusive arguments for pollen calendars generated as “phenological”:

By default the database will be interpolated according to the interpollen function, and also other arguments for this function are incorporated in the pollen_calendar function.

“heatplot” method

This pollen calendar is constructed based on the daily or weekly average of pollen concentrations (depending on the preferences of the user, who may select “daily” or “weekly” as period argument). Then, these averages may be classified in different categories following different methods selected by the user. An example of this pollen calendar may be consulted in Rojo et al. (2016). This method to design pollen calendars is an adaptation from the pollen calendar proposed by Spieksma (1991), who considered 10-day periods instead of daily or weekly periods.

First, we are going to generate a pollen calendar based on the heatplot, designed with green color and constructed with daily averages:

In all cases, the table used by the pollen calendar with the averaged values will be created. This table can be visualized. To do that we have to set the argument result = “table” For example (an extract):

jd Alnus Betula Taxus Fraxinus Poaceae Quercus Ulmus Urtica
82 82 33.166667 0.000000 416.1667 0.1666667 0.0000000 0.0000000 8.333333 0
83 83 21.000000 1.833333 443.4167 10.5833333 0.0000000 0.0000000 17.666667 0
84 84 15.000000 3.333333 387.7500 32.5833333 0.0000000 0.8333333 17.500000 0
85 85 3.800000 6.333333 263.5833 62.5000000 0.1666667 0.0000000 12.166667 0
86 86 10.400000 11.833333 116.2000 7.3333333 0.0000000 1.3333333 8.000000 0
87 87 8.400000 26.666667 37.3000 30.0000000 0.0000000 1.1666667 6.000000 0
88 88 4.800000 52.833333 205.7000 68.6000000 0.0000000 0.6666667 13.416667 0
89 89 7.000000 118.333333 335.2500 240.2000000 0.0000000 2.6666667 34.750000 0
90 90 7.333333 121.200000 480.1667 185.6000000 0.0000000 1.1666667 17.500000 0

By default the method for defining the classes for the pollen calendar is according to the exponential method. Nevertheless, the classes can be customized by classes argument:

In addition, for species with their pollen season occurring between two natural years, the start of the pollen calendar can be selected by the start.month argument:

NA (no data) can be removed by using na.remove argument.

Also we can generate a pollen calendar based on the heatplot design with weekly averages.

In this case, we have included other restrictive arguments as n.types limiting the number of pollen types and y.start and y.end, limiting the period to be considered for the pollen calendar.

“phenological” method

This pollen calendar is based on the phenological definition of the pollen season and adapted from the methodology proposed by Werchan et al. (2018).

After obtaining the daily average pollen concentrations for the most abundant pollen types, different pollination periods are calculated. The main pollination period is calculated according to the percentage defined by perc1 argument (selected by the user, 80% by default; red) of the annual total pollen. For example, if perc1 = 80, the beginning of the high season is marked when the 10% of the annual value is reached; the end is selected when 90% is reached. The early/late pollination period is defined with the perc2 argument (selected by the user, 99% of the total annual pollen by default; orange), i.e.: the start of this period will be when the 0.5% is reached and the end when the 99.5% is reached. For this kind of pollen calendar the th.pollen argument defines the possible occurrence period as adapted by Werchan et al. (2018), considering the entire period between the first and the last day when this pollen level is reached (yellow).

Furthermore, different criteria can be customized:

In this last case, the pollen calendar has been generated with more restrictive criteria for perc1, perc2 and th.pollen.

“violinplot” method

This pollen calendar is based on the pollen intensity, and adapted from the pollen calendar published by ORourke (1990). At first, the daily averages of the pollen concentrations are calculated and then, these averages are represented by using the violin plot graph.

The shape of the violin plot represents the pollen intensity of the pollen types in a relative way, i.e.: the values will be calculated as relative measurements regarding to the most abundant pollen type in annual amounts. Therefore, this pollen calendar shows a relative comparison between the pollen intensity of the pollen types but without scales and units.

In addition, th.pollen can be established, specifying the minimum pollen concentration considered (E.g.: 10 pollen grains/m3):

Functions iplot_pollen() and iplot_years()

These functions have been designed for a quick view of your data for discussions or interpretations of them. Interactive plots are displayed. It might be interesting for group meetings or real time exposition of results. The functions create a emergent screen in which you can select the pollen/spore type or the years you want to plot.

IMPORTANT : To stop the real time visualization and continue using the package you must click in the “Stop” signal.

The function iplot_pollen() is to plot the pollen data during one season. The arguments are:

The function iplot_years() is to plot the pollen data during one season. The arguments are:

iplot_pollen(munich_pollen, year = 2012)
iplot_years(munich_pollen, pollen = "Betula")

Note: We are not able to plot interactive figures in this document. Please, run the codes above in your R session.

Function plot_summary()

The function plot_summary() is to plot the pollen data during several seasons. Also to plot the averaged pollen season over the study period. It is possible to plot the relative abundance per day and smoothing the pollen season by calculating a moving average. The main arguments are:

plot_summary(munich_pollen, pollen = "Betula")

In some cases the user may want to reduce the “noise” of the daily values by calculating moving means (E.g.: 5 days moving mean; mave = 5):

plot_summary(munich_pollen, pollen = "Betula", mave = 5)

You might be also interested in representing as background the percentage each day suppose to the main pollen season (normalized = TRUE):

plot_summary(munich_pollen, pollen = "Betula", mave = 5, normalized = TRUE)

interpollen() function is integrated in this function. Consult the “help” section of this function to use it.

Function plot_normsummary()

The function plot_normsummary() has been designed to plot the pollen data amplitude during several seasons: daily average pollen concentration over the study period, maximum pollen concentration of each day over the study period and minimum pollen concentration of each day value over the study period. It is possible to plot the relative abundance per day and smoothing the pollen season by calculating a moving average. The main arguments similar to plot_summary(), but as a result you will obtain the the max-min range of the study period instead of the values for every year.

plot_normsummary(munich_pollen, pollen = "Betula", color.plot = "red")

The maximum values are marked in red and the minimum values in white. The average is the black line.

Of course, you can change the color (color.plot argument):

plot_normsummary(munich_pollen, pollen = "Betula", color.plot = "green", mave = 5, normalized = TRUE)

interpollen() function is integrated in this function. Consult the “help” section of this function to use it.

Function analyse_trend()

The function analyse_trend() has been created to calculate the main seasonal indexes of the pollen season (“Start Date”, “Peak Date”, “End Date” and “Pollen Integral”), as well as trends analysis of those parameters over the seasons. It is a summary dot plot showing the distribution of the main seasonal indexes over the years.

The results can also be stored in two folders termed “plot_AeRobiology” and “table_AeRobiology”, which will be created in your working directory (only if export.result=TRUE & export.plot=TRUE). You can decide which result to be returned from the function by setting the argument result: result = “table” or result = “plot”.

The function allows you to decide if you want to interpolate the data or not, by the argument interpolation, it also allows you to select the interpolation method by the argument int.method. Furthermore, it allows you to select the pollen season definition method, by the argument method and additional arguments for the function calculate_ps(). Some arguments about the visualization are:

analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, result="plot")

Lets see what happen if we don’t split the graphics:

analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, split = FALSE, quantil = 1, result="plot")

Now the results are less readable, but this kind of graphical representation may be interesting for some people. When plotting all the results together, it might be useful to exclude some outliers:

analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, split=FALSE, quantil = 0.5, result="plot")

As you can appreciate, a lot of points have been omitted.

You can also change the significance level as mentioned above. Why don`t we try?:

analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, significant = 1, result = "plot")

Now everything is significant!

Have a look to the numbers by setting result = “table” (the default output).

analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, significant = 1, result="table")
interpollen() function is integrated in this function. Consult the “help” section of this function to use it.
If result = “plot”, the function returns a list of objects of class ggplot2; if result = “table”, the function returns a data.frame. By default, result = “table”.

Function plot_trend()

The function plot_trend() has been created to calculate the main seasonal indexes of the pollen season (“Start Date”, “Peak Date”, “End Date” and “Pollen Integral”) and their trends analysis over the seasons. It produces plots showing the distribution of the main seasonal indexes over the years.

The results are stored in two folders termed “plot_AeRobiology” and “table_AeRobiology” which will be located in your working directory (only if export.result=TRUE & export.plot=TRUE).

plot_trend(munich_pollen, interpolation = FALSE, export.plot = TRUE, export.result = TRUE)
NOTE: The plots are not shown in your R environment. Because of the high amount of graphs, they are stored in new folders created in your working directory (search were you have saved the R project and you will find the folders).

The interval of confidence only appears if more than 6 dots are plotted. If not, a line crossing all the dots is plotted.

interpollen() function is integrated in this function. Consult the “help” section of this function to use it.

Function iplot_abundance()

The function iplot_abundance() generates a barplot based on the relative abundance in the air (as percentage) of each pollen/spore type with respect to the total amounts.

The main arguments are:

iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE)

Now we are going to reduce the number of types:

iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE, n.types = 3)

We can also select the abundance for only one year and change the color:

iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE, n.types = 3, y.start = 2011, y.end = 2011, col.bar = "#d63131")

Furthermore, we can make it interactive:

iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE, n.types = 3, y.start = 2011, y.end = 2011, col.bar = "#d63131", type.plot = "dynamic")
interpollen() function is integrated in this function. Consult the “help” section of this function to use it.

Function iplot_pheno()

The function iplot_pheno() generates a boxplot based on phenological parameters (start_dates and end_dates) which are calculated by the estimation of the main parameters of the pollen season. The main arguments are:

iplot_pheno(munich_pollen, method= "percentage", perc=80, int.method="spline", n.types = 8)

We can change the method to establish the main pollen season and the number of pollen types to show:

iplot_pheno(munich_pollen, method= "clinical", n.clinical = 3, int.method="spline", n.types = 4)

Furthermore, we can make the plot interactive to obtain more information by clicking in each object:

iplot_pheno(munich_pollen, method= "clinical", n.clinical = 3, int.method="spline", n.types = 4, type.plot = "dynamic")

Function plot_ps()

The function plot_ps() function was designed to plot the main pollen season of a single pollen type and year. Some of the arguments are:

calculate_ps function is integrated in this function. Consult here for more information. Interpolation is mandatory for this function. Due to technical encoding, you must use interpollen function through calculate_ps function. Consult calculate_ps for more information.
plot_ps(munich_pollen, pollen.type="Alnus", year=2011)

If you misspell some pollen type name, the function will tell you:

plot_ps(munich_pollen, pollen.type="Alnuscdscscr", year=2011)
## Error in plot_ps(munich_pollen, pollen.type = "Alnuscdscscr", year = 2011): pollen.type: Please, insert only a pollen type which is in your database (type the exact name)

Let`s test more arguments:

plot_ps(munich_pollen, pollen.type="Alnus", year=2013, method= "percentage", perc=95 ,int.method = "lineal")

As you have noticed, the arguments method, perc and int.method are from calculate_ps function.

What about changing the interpolation method?

plot_ps(munich_pollen, pollen.type="Alnus", year=2013, method= "percentage", perc=95 ,int.method = "movingmean")

Do you want a larger scale?

plot_ps(munich_pollen, pollen.type="Alnus", year=2013, days = 90)

Maybe a different color and y-axis name?

plot_ps(munich_pollen, pollen.type="Alnus", year=2013, fill.col = "orange", axisname = "AeRobiology custom units")

Function plot_hour()

Please keep in mind that the function plot_hour() is only available after package version 2.0.

The input data must be a data.frame object with the structure long. Where the first two columns are factors indicating the pollen and the location. The 3 and 4 columns are POSIXct, showing the date with the hour. Where the third column is the beginning of the concentration from and the fourth column is the end time of the concentration data to. The fifth column shows the concentrations of the different pollen types as numeric. Please see the example 3-hourly data from the automatic pollen monitor BAA500 from Munich and Viechtach in Bavaria (Germany) data(“POMO_pollen”), supplied by ePIN Network supported by the Bavarian Government.

We will load an example dataset about 3-hourly data from the ePIN network in Bavaria (Germany). The dataset contains information of 3-hourly concentrations of pollen in the atmosphere of Munich (DEBIED) and Viechtach (DEVIEC) during the year 2018. Pollen types included: “Poaceae” and “Pinus”.

The data were obtained by the automatic pollen monitor BAA500 from Munich and Viechtach in Bavaria (Germany) data(“POMO_pollen”), supplied by the public ePIN Network supported by the Bavarian Government. The ePIN Network was built by Das Bayerische Landesamt für Gesundheit und Lebensmittelsicherheit (LGL) in collaboration with Zentrum Allergie und Umwelt (ZAUM).

data("POMO_pollen")

The function plots pollen data expressed in concentrations with time resolution higher than 1 day (e.g. hourly, bi-hourly concentrations). If the argument result = “plot”, the function returns a list of objects of class ggplot2; if result = “table”, the function returns a data.frame with the hourly patterns.

plot_hour(POMO_pollen)
## [[1]]

## 
## [[2]]

## 
## [[3]]

To display a table we have to set the argument result = “table”.

3-Hourly patterns
pollen location from to value date hour hour2 Hour percent
Pinus DEVIEC 2018-05-08 2018-05-08 03:00:00 0.0 2018-05-08 0 3 0-3 0.0
Pinus DEVIEC 2018-05-09 2018-05-09 03:00:00 424.4 2018-05-09 0 3 0-3 19.3
Pinus DEVIEC 2018-05-10 2018-05-10 03:00:00 28.6 2018-05-10 0 3 0-3 1.4
Pinus DEVIEC 2018-05-11 2018-05-11 03:00:00 143.0 2018-05-11 0 3 0-3 9.7
Pinus DEVIEC 2018-05-12 2018-05-12 03:00:00 367.2 2018-05-12 0 3 0-3 17.3
Pinus DEVIEC 2018-05-13 2018-05-13 03:00:00 133.3 2018-05-13 0 3 0-3 8.1
Pinus DEVIEC 2018-05-14 2018-05-14 03:00:00 261.5 2018-05-14 0 3 0-3 15.8
Pinus DEVIEC 2018-05-15 2018-05-15 03:00:00 57.2 2018-05-15 0 3 0-3 9.3
Pinus DEVIEC 2018-05-16 2018-05-16 03:00:00 23.8 2018-05-16 0 3 0-3 14.7
Pinus DEVIEC 2018-05-17 2018-05-17 03:00:00 4.8 2018-05-17 0 3 0-3 2.2

We can also split the different stations by setting the argument locations = TRUE.

plot_hour(POMO_pollen, locations = TRUE)
## [[1]]

## 
## [[2]]

## 
## [[3]]

Function plot_heathour()

An alternative to plot_hour() is plot_heathour(), which shows a summary of all particles with a heatplot. The input data should have the same format as for plot_hour().

plot_heathour(POMO_pollen)

You can also set the colors by the arguments: low.col, mid.col and high.col. E.g.

plot_heathour(POMO_pollen, low.col = "darkgreen", mid.col = "moccasin", high.col = "brown")

By setting locations = TRUE you can split the result by locations:

plot_heathour(POMO_pollen, locations = TRUE)
## [[1]]

## 
## [[2]]


  1. University of Castilla-La Mancha & ZAUM (TUM/Helmholtz Zentrum)

  2. University of Malaga

  3. Zentrum Allergie und Umwelt - ZAUM (TUM/Helmholtz Zentrum)