It is assumed here that the reader is already familiar with Weighted Regressions on Time, Discharge and Season (WRTDS) and the EGRET package. The purpose of this vignette is to provide a way that users can compute flow-weighted mean concentrations for individual water years or some other PA (Period of Analysis, such as a month or a season).
A flow weighted mean concentration should not be confused with a Flow Normalized Concentration. The process described here does not include the Flow Normalization which is a part of the WRTDS method. Flow normalization is explained in the EGRET User Guide. There are many good reasons to use flow normalization, but one drawback to the technique is that it requires an assumption of stationarity of discharge over the period of record. This assumption might be problematic if there were significant changes in the human actions that influence streamflow that happened during the period of record (e.g. a dam built, and dam removed, a significant change in diversions in or out of the watershed) or if there is strong evidence that the climate changed dramatically during the period of record (examples shown below in section on streamflow stationarity). Thus, if we choose not to flow normalize and we are using WRTDS (via EGRET) then we are left with two options of annual time series that we can use to characterize the history of water quality at an annual time scale. These are the estimates of annual concentration and of annual flux. If the issue of interest is one that is strongly related to concentration itself (concerns about in-stream water quality) then the annual concentrations are a good option to work with.
We can create a time series of estimated annual mean concentrations after running modelEstimation, and then we can do a couple of different kinds of trend tests on those annual results: linear regression against time or the Mann-Kendall trend test. To do the Mann-Kendall test, the R package rkt
must be installed. The script presented below uses the example data set Choptank_eList. For your own data set you would substitute your own eList in its place. It also assumes that modelEstimation
has already been run. If it hasn’t been run the following command would need to be inserted in the script. To do that the commands would be: eList <- modelEstimation(eList)
library(EGRET)
library(rkt)
eList <- Choptank_eList
Daily <- eList$Daily
AnnualResults <- setupYears(Daily)
# note, if you want some other period of analysis this can be done in the arguments to setupYears
plotConcHist(eList, plotFlowNorm = FALSE)
modConc <- lm(Conc~DecYear,data=AnnualResults) # linear regression
summary(modConc)
##
## Call:
## lm(formula = Conc ~ DecYear, data = AnnualResults)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.163337 -0.034037 0.005431 0.032926 0.103044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -26.95261 2.03615 -13.24 4.63e-14 ***
## DecYear 0.01410 0.00102 13.82 1.50e-14 ***
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05329 on 30 degrees of freedom
## Multiple R-squared: 0.8643, Adjusted R-squared: 0.8598
## F-statistic: 191.1 on 1 and 30 DF, p-value: 1.5e-14
mannKendallConc <- rkt(AnnualResults$DecYear,AnnualResults$Conc)
# Mann-Kendall trend
mannKendallConc
##
## Standard model
## Tau = 0.8064516
## Score = 400
## var(Score) = 3802.667
## 2-sided p-value = 9.776668e-11
## Theil-Sen's (MK) or seasonal/regional Kendall (SKT/RKT) slope= 0.01436003
The results tell us that by either test, there is a highly significant upwards trend. The regression estimate of the slope is 0.014 mg/L per year. The Theil-Sen’s slope (from the Mann-Kendall test) is also 0.014 mg/L per year.
The downsides of this approach are these:
The method doesn’t remove the variability due to discharge and thus the trends are harder to observe above the noise. We may also have the problem that a series of wet years or a series of dry years at the end of our record might mislead us into thinking that there is a real trend. Both of these are the main reasons we might want to do flow normalization.
The other issue may be that concentration isn’t what we are really interested in. Rather, we may be concerned about the flux of the analyte into some receiving water body (estuary, lake, reservoir).
This second reason might lead us to want to look at the annual flux estimates. We can do all the same types of things with flux that we just discussed doing with concentration:
library(EGRET)
library(rkt)
eList <- Choptank_eList
Daily <- eList$Daily
plotFluxHist(eList, plotFlowNorm = FALSE)
modFlux <- lm(Flux~DecYear,data=AnnualResults)
summary(modFlux)
##
## Call:
## lm(formula = Flux ~ DecYear, data = AnnualResults)
##
## Residuals:
## Min 1Q Median 3Q Max
## -275.834 -88.414 -1.315 78.903 315.052
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -12485.537 4767.315 -2.619 0.0137 *
## DecYear 6.439 2.389 2.696 0.0114 *
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 124.8 on 30 degrees of freedom
## Multiple R-squared: 0.195, Adjusted R-squared: 0.1682
## F-statistic: 7.267 on 1 and 30 DF, p-value: 0.0114
mannKendallFlux <- rkt(AnnualResults$DecYear,AnnualResults$Flux)
mannKendallFlux
##
## Standard model
## Tau = 0.266129
## Score = 132
## var(Score) = 3802.667
## 2-sided p-value = 0.03364044
## Theil-Sen's (MK) or seasonal/regional Kendall (SKT/RKT) slope= 6.414844
The drawbacks here are that we still suffer from the first problem mentioned above. However, in many cases the problem of there being a strong influence of discharge on the annual values can be much worse than it was for concentration. The flux in a high flow year can be extremely high and trend detection can be very problematic, but it does resolve the second problem. We are dealing with mass and not concentration.
The results show a p-value on the regression trend test for annual flux is 0.0114 and a slope of 6.44 kg/yr/yr. The Mann-Kendall results show a p-value of 0.0336 and a slope of 6.41 kg/yr/yr. Thus, in this case, we still get a significant upwards trend although the results are less certain than the results for concentration.
There is a compromise solution and that is the use of a flow-weighted concentration. It will still be related to discharge, but not as strongly as flux, but it will be a representation of flux but with units of concentration. It is likely to be less variable year to year than flux would be. The idea of flow-weighted mean concentration is that it is like having a big bucket at the monitoring site and all of the flow for the year goes into the bucket and at the end of the year we make sure the bucket is totally mixed and then we measure the concentration of what is in the bucket. So the units are concentrations (mg/L). The concentration of what is in the bucket will strongly reflect what came in on the high flow days. The concentrations on the low flow days will have very little influence on the concentration of what is in the bucket.
EGRET does not currently compute flow-weighted mean concentration but a simple function provided here will make that happen. What follows is a short script that will read in that function, load a data set (already in the form of an eList – having already been created by modelEstimation
), create a new data frame with the annual flow-weighted mean concentrations in it, and then run through some graphics and statistical tests.
We will assume here that the data have all been assembled into eList and the modelEstimation
function has been run.
eList <- modelEstimation(eList)
Now this script can be run to create the flow-weighted annual mean concentrations and store them in the data frame called AnnualResults
. Note that this is set up to be run on the example data set Choptank_eList
. To run it on your own data set modify this script to use the eList for your data set. If you have not already run modelEstimation, then that step will need to be inserted before running flowWeightedYears
.
library(EGRET)
library(rkt)
eList <- Choptank_eList
flowWeightedYears <- function(localDaily, paLong = 12, paStart = 10) {
numDays <- length(localDaily$MonthSeq)
firstMonthSeq <- localDaily$MonthSeq[1]
lastMonthSeq <- localDaily$MonthSeq[numDays]
Starts <- seq(paStart, lastMonthSeq, 12)
Ends <- Starts + paLong - 1
StartEndSeq <- data.frame(Starts, Ends)
StartEndSeq <- StartEndSeq[(StartEndSeq$Starts >= firstMonthSeq) &
(StartEndSeq$Ends <= lastMonthSeq), ]
firstMonth <- StartEndSeq[1, 1]
numYears <- length(StartEndSeq$Starts)
DecYear <- rep(NA, numYears)
Q <- rep(NA, numYears)
Conc <- rep(NA, numYears)
Flux <- rep(NA, numYears)
FNConc <- rep(NA, numYears)
FNFlux <- rep(NA, numYears)
FWConc <- rep(NA, numYears)
for (i in 1:numYears) {
startMonth <- (i - 1) * 12 + firstMonth
stopMonth <- startMonth + paLong - 1
DailyYear <- localDaily[which(localDaily$MonthSeq %in%
startMonth:stopMonth), ]
counter <- ifelse(is.na(DailyYear$ConcDay), 0, 1)
if (length(counter) > 0) {
good <- (sum(counter) > 25)
} else {
good <- FALSE
}
DecYear[i] <- mean(DailyYear$DecYear)
Q[i] <- mean(DailyYear$Q)
if (good) {
Conc[i] <- mean(DailyYear$ConcDay, na.rm = TRUE)
Flux[i] <- mean(DailyYear$FluxDay, na.rm = TRUE)
FNConc[i] <- mean(DailyYear$FNConc, na.rm = TRUE)
FNFlux[i] <- mean(DailyYear$FNFlux, na.rm = TRUE)
FWConc[i] <- mean(DailyYear$ConcDay * DailyYear$Q,
na.rm = TRUE)
denom <- mean(DailyYear$ConcDay * DailyYear$Q/DailyYear$ConcDay,
na.rm = TRUE)
FWConc[i] <- FWConc[i]/denom
}
}
PeriodStart <- rep(paStart, numYears)
PeriodLong <- rep(paLong, numYears)
AnnualResults <- data.frame(DecYear, Q, Conc, FWConc, Flux,
FNConc, FNFlux, PeriodLong, PeriodStart)
return(AnnualResults)
}
AnnualResults <- flowWeightedYears(eList$Daily)
Now we have a new version of AnnualResults that contains the same information as before, but has one additional column, called AnnualResults$FWConc, which contains the annual flow-weighted mean concentrations. By the way, if we want to make all of these calculations for a Period of Analysis other than water years, we can do that by adding the paLong and paStart arguments to the command. So, for example for a period of March-April-May-June the command would be:
AnnualResults <- flowWeightedYears(eList$Daily, paStart = 3, paLong = 4)
We can run parametric or non-parametric trend tests on that record as follows:
library(EGRET)
library(rkt)
AnnualResults <- flowWeightedYears(eList$Daily)
modFWConc <- lm(FWConc ~ DecYear, data = AnnualResults)
# linear regression
summary(modFWConc)
##
## Call:
## lm(formula = FWConc ~ DecYear, data = AnnualResults)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.207193 -0.080647 0.006826 0.083383 0.160139
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10.027537 3.841502 -2.610 0.01398 *
## DecYear 0.005556 0.001925 2.886 0.00716 **
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1005 on 30 degrees of freedom
## Multiple R-squared: 0.2173, Adjusted R-squared: 0.1913
## F-statistic: 8.331 on 1 and 30 DF, p-value: 0.007158
mannKendallFWConc <- rkt(AnnualResults$DecYear, AnnualResults$FWConc)
# Mann-Kendall trend
mannKendallFWConc
##
## Standard model
## Tau = 0.3306452
## Score = 164
## var(Score) = 3802.667
## 2-sided p-value = 0.008210665
## Theil-Sen's (MK) or seasonal/regional Kendall (SKT/RKT) slope= 0.006659257
What we see in the results is that the trend in the flow-weighted mean concentrations is highly significant, p-value is 0.00716 (by regression) or 0.0082 (by Mann-Kendall). The slope is estimated to be 0.00556 mg/L/yr (by regression) or 0.006559 mg/L/yr (by Mann-Kendall-Sen slope).
How would we go about plotting the annual mean concentrations and showing the regression fit? Let’s assume that we have run the flowWeightedYears function and have the data frame AnnualResults and have created the regression models for AnnualResults\(Conc (modConc) and AnnualResults\)FWConc (modFWConc).
nYears <- length(AnnualResults$DecYear)
xlim <- c(AnnualResults$DecYear[1] - 1, AnnualResults$DecYear[nYears] +
1)
xTicks <- pretty(xlim)
ylim <- c(0, 1.05 * max(AnnualResults$Conc))
yTicks <- yPretty(ylim[2])
plotTitle = "Annual Mean Concentrations"
# note that you can make more complex titles using the
# approach used in the code for plotConcHist
genericEGRETDotPlot(AnnualResults$DecYear, AnnualResults$Conc,
xlim = xlim, ylim = ylim, xTicks = xTicks, yTicks = yTicks,
xaxs = "i", yaxs = "i", xlab = "Year", ylab = "Concentration in mg/L",
plotTitle = plotTitle, xDate = TRUE)
abline(a = modConc$coef[1], b = modConc$coef[2], lwd = 2)
and we can make a similar plot for flow weighted concentrations
AnnualResults <- flowWeightedYears(eList$Daily)
nYears <- length(AnnualResults$DecYear)
xlim <- c(AnnualResults$DecYear[1] - 1, AnnualResults$DecYear[nYears] +
1)
xTicks <- pretty(xlim)
ylim <- c(0, 1.05 * max(AnnualResults$FWConc))
yTicks <- yPretty(ylim[2])
plotTitle = "Annual Flow-Weighted Mean Concentrations"
# note that you can make more complex titles using the
# approach used in the code for plotConcHist
genericEGRETDotPlot(AnnualResults$DecYear, AnnualResults$FWConc,
xlim = xlim, ylim = ylim, xTicks = xTicks, yTicks = yTicks,
xaxs = "i", yaxs = "i", xlab = "Year", ylab = "Concentration in mg/L",
plotTitle = plotTitle, xDate = TRUE)
abline(a = modFWConc$coef[1], b = modFWConc$coef[2], lwd = 2)
Or, we could do a single plot showing both (flow weighted in red)
AnnualResults <- flowWeightedYears(eList$Daily)
nYears <- length(AnnualResults$DecYear)
xlim <- c(AnnualResults$DecYear[1] - 1, AnnualResults$DecYear[nYears] +
1)
xTicks <- pretty(xlim)
yMax <- max(c(AnnualResults$Conc, AnnualResults$FWConc))
ylim <- c(0, 1.05 * yMax)
yTicks <- yPretty(ylim[2])
plotTitle = "Annual Mean Concentrations in Black\nFlow Weighted Mean in Red"
# note that you can make more complex titles using the
# approach used in the code for plotConcHist
genericEGRETDotPlot(AnnualResults$DecYear, AnnualResults$Conc,
xlim = xlim, ylim = ylim, xTicks = xTicks, yTicks = yTicks,
xaxs = "i", yaxs = "i", xlab = "Year", ylab = "Concentration in mg/L",
plotTitle = plotTitle, xDate = TRUE)
abline(a = modConc$coef[1], b = modConc$coef[2], lwd = 2)
par(new = TRUE)
genericEGRETDotPlot(AnnualResults$DecYear, AnnualResults$FWConc,
xlim = xlim, ylim = ylim, xTicks = xTicks, yTicks = yTicks,
xaxs = "i", yaxs = "i", xlab = "", ylab = "", plotTitle = "",
xDate = TRUE, col = "red")
abline(a = modFWConc$coef[1], b = modFWConc$coef[2], lwd = 2,
col = "red")
You may wish to show the results with a LOWESS (locally weighted scatterplot smooth) for each of them rather than the linear regression fit. This script will produce such a plot of both the annual mean concentrations and annual flow weighted mean concentration.
lowConc <- loess(Conc ~ DecYear, data = AnnualResults, span = 0.9)
lowFWConc <- loess(FWConc ~ DecYear, data = AnnualResults, span = 0.9)
nYears <- length(AnnualResults$DecYear)
xlim <- c(AnnualResults$DecYear[1] - 1, AnnualResults$DecYear[nYears] +
1)
xTicks <- pretty(xlim)
yMax <- max(c(AnnualResults$Conc, AnnualResults$FWConc))
ylim <- c(0, 1.05 * yMax)
yTicks <- yPretty(ylim[2])
plotTitle = "Annual Mean Concentrations in Black\nFlow Weighted Mean in Red"
# note that you can make more complex titles using the
# approach used in the code for plotConcHist
genericEGRETDotPlot(AnnualResults$DecYear, AnnualResults$Conc,
xlim = xlim, ylim = ylim, xTicks = xTicks, yTicks = yTicks,
xaxs = "i", yaxs = "i", xlab = "Year", ylab = "Concentration in mg/L",
plotTitle = plotTitle, xDate = TRUE)
par(new = TRUE)
genericEGRETDotPlot(AnnualResults$DecYear, lowConc$fit, xlim = xlim,
ylim = ylim, xTicks = xTicks, yTicks = yTicks, xaxs = "i",
yaxs = "i", xlab = "", ylab = "", plotTitle = "", xDate = TRUE,
type = "l", lwd = 2)
par(new = TRUE)
genericEGRETDotPlot(AnnualResults$DecYear, AnnualResults$FWConc,
xlim = xlim, ylim = ylim, xTicks = xTicks, yTicks = yTicks,
xaxs = "i", yaxs = "i", xlab = "", ylab = "", plotTitle = "",
xDate = TRUE, col = "red")
par(new = TRUE)
genericEGRETDotPlot(AnnualResults$DecYear, lowFWConc$fit, xlim = xlim,
ylim = ylim, xTicks = xTicks, yTicks = yTicks, xaxs = "i",
yaxs = "i", xlab = "", ylab = "", plotTitle = "", xDate = TRUE,
type = "l", lwd = 2, col = "red")
This figure, showing the LOWESS smooths really makes clear that while the annual mean concentrations seem to be progressing upwards at a rather constant slope over the 32 years, the flow weighted mean concentrations began by progressing upwards but in the latter half of the record they appear to have leveled off. What this suggests is that at base flow conditions concentrations continue to rise over the whole period, but at the higher discharges the increase seems to have haulted and the decrease at higher discharges seems to roughly balance (in terms of total mass delivered) the increase that continues to occur at lower discharges. A reasonable hypothesis in this watershed is that discharge that originates from ground water continues to carry larger and larger concentrations of nitrate (a legacy of nitrate inputs to the landscape) but efforts to reduce nitrate transport from surface runoff (in high flow events) is meeting with some success. In all, from an average flux perspective these results suggest that the watershed delivery of nitrate is about at steady state.
Describing results in terms of annual (or seasonal) flow-weighted mean concentrations is a worthwhile approach to consider. It has the merit of not being dependent on an assumption of stationary discharge distributions, and it may be substantially easier to explain to an audience than flow-normalized concentration or flux. This shouldn’t negate the generally useful properties of flow normalized results. They carry the strong advantage of being independent of the temporal pattern of streamflow and are thus unlikely to produce a temporal pattern of water quality estimates that are largely an artifact of the particular history of discharge.
Here is an example of a streamgage that shows a very strong trend in discharge over about a 40 year period, The James River near Scotland, South Dakota. Let’s assume that you have used EGRET to assemble the data set you are going to be using and that you have the INFO data frame (which contains the site metadata) and you have the Daily data frame (which contains the daily streamflow data) and that the total period for this analysis is water years 1971 through 2014, and that furthermore, you would like to divide the period into two roughly equal parts with the dividing line being the end of water year 1992. The following is a script for producing a set of side-by-side boxplots that indicates how different those two periods are.
packagePath <- system.file("extdata", package = "EGRET")
filePath <- file.path(packagePath, "James.rds")
eList <- readRDS(filePath)
Daily <- eList$Daily
Daily$group <- ifelse(Daily$Date >= "1992-10-01", "Second", "First")
title <- paste(eList$INFO$shortName, "\nDischarge for two periods\nWY 1971-1992 and 1993-2014")
boxplot(Daily$Q ~ Daily$group, log = "y", main = title, xlab = "",
ylab = "Discharge in cms")
We don’t need a formal statistical test to conclude that these two periods are very different from each other. The whole distribution of discharges is shifted way up in the latter period. Therefore we should conclude that it would be a bad idea to use flow normalization on this entire 1972-2014 period.
Here is another example data set. It is the Susquehanna River at Conowingo Maryland.
Here again, the same process was used as described for the James River.
filePath <- file.path(packagePath, "Susquehanna.rds")
eList <- readRDS(filePath)
Daily <- eList$Daily
Daily$group <- ifelse(Daily$Date >= "1998-10-01", "Second", "First")
title <- paste(eList$INFO$shortName, "\nDischarge for two periods\nWY 1985-1998 and 1999-2013")
boxplot(Daily$Q ~ Daily$group, log = "y", main = title, xlab = "",
ylab = "Discharge in cms")
This case, in contrast to the James River example, shows very little indication of non-stationarity is streamflow. Thus, using flow normalization would be very reasonable for this data set.
Software created by USGS employees along with contractors and grantees (unless specific stipulations are made in a contract or grant award) are to be released as Public Domain and free of copyright or license. Contributions of software components such as specific algorithms to existing software licensed through a third party are encouraged, but those contributions should be annotated as freely available in the Public Domain wherever possible. If USGS software uses existing licensed components, those licenses must be adhered to and redistributed.
Although this software has been used by the U.S. Geological Survey (USGS), no warranty, expressed or implied, is made by the USGS or the U.S. Government as to accuracy and functionality, nor shall the fact of distribution constitute any such warranty, and no responsibility is assumed by the USGS in connection therewith.