Filtering Example

Brett Johnson

19/04/2021

library(hakaiApi)

# Initialize the client
#   follow link in console and paste auth. code in console
#   (ignore alignment issue)
client <- hakaiApi::Client$new()

Filtering data with the API

Here’s a simple demonstration of how to filter data from the API. The API has limits in terms of how much data you can download in one query. So, instead of querying for all data of a certain type, it’s good to narrow it down to the time period, sites, or parameters that you are really interested in. Though the API has many options for querying, filtering, and sorting data, most R users will be more comfortable filtering data using R packages such as dplyr.

A good way to build a query is in steps. Let’s say we want all the chlorophyll data from QU39, from after 2016, with only accepted values, and only glass fibre filters (GF/F) from the surface.

# First return some chl data with no filters
endpoint <- "/eims/views/output/chlorophyll"
all_chl <- client$get(paste0(client$api_root, endpoint))
# by default only 20 rows are returned
str(all_chl) # Look at what columns are available to filter on

# Narrow it down to QU39
query <- "?site_id=QU39"
client$get(paste0(client$api_root, endpoint, query))

#Get back only accepted values
query <- "?site_id=QU39&chla_flag=AV"
client$get(paste0(client$api_root, endpoint, query))

# Get values from 2017
query <- "?site_id=QU39&chla_flag=AV&date.year=2017"
client$get(paste0(client$api_root, endpoint, query))

# Include only GF/F from the surface
# It may be useful to split up the queries and rejoin them over multiple lines
# Using the paste() or paste0() functions
query <- paste("site_id=QU39",
               "chla_flag=AV",
               "date.year=2017",
               "filter_type=GF/F",
               "line_out_depth=0", sep = "&")
client$get(paste0(client$api_root, endpoint, "?", query))

# Select only the columns I'm interested in
query <- paste("site_id=QU39",
               "chla_flag=AV",
               "date.year=2017",
               "filter_type=GF/F",
               "line_out_depth=0",
               "fields=date,chla,lab_technician", sep = "&")
client$get(paste0(client$api_root, endpoint, "?", query))

# This looks good so now you can assign a variable and remove the limit
query <- paste("site_id=QU39",
               "chla_flag=AV",
               "date.year=2017",
               "filter_type=GF/F",
               "line_out_depth=0",
               "fields=date,chla,lab_technician",
               "limit=-1", sep = "&")
a_great_chl_query <- client$get(paste0(client$api_root, endpoint, "?", query))

plot(a_great_chl_query$date,
     a_great_chl_query$chla,
     xlim = as.Date(c("2017-01-01", "2018-01-01")),
     ylim = c(0, 5))

Let’s say you’re only interested in receiving the highest 10 values for chlorophyll from 2017 from the portal. We can do that with the API as well using the sort descending capability and limiting the return to only 10 values.

endpoint <- "/eims/views/output/chlorophyll"
query <- paste("fields=date,chla,site_id,line_out_depth",
               "chla>0",  # Note: added chla>0 to remove NA values
               "date.year=2017",
               "sort=-chla",
               "limit=10", sep = "&")
top_10_chl <- client$get(paste0(client$api_root, endpoint, "?", query))

For more great querying capabilities see the querying-data docs