This package provides data ingestion functions for almost any data stored on the open data platform for environemental sensordata https://opensensemap.org. Its main goals are to provide means for:
Before we look at actual observations, lets get a grasp of the openSenseMap datasets’ structure.
## boxes total: 3653
##
## boxes by exposure:
## indoor mobile outdoor unknown
## 550 153 2930 20
##
## boxes by model:
## custom hackair_home_v2 homeEthernet
## 544 52 98
## homeEthernetFeinstaub homeV2Ethernet homeV2EthernetFeinstaub
## 56 3 16
## homeV2Lora homeV2Wifi homeV2WifiFeinstaub
## 19 94 275
## homeWifi homeWifiFeinstaub luftdaten_pms1003
## 217 190 1
## luftdaten_pms1003_bme280 luftdaten_pms3003_bme280 luftdaten_pms5003_bme280
## 2 2 16
## luftdaten_pms7003 luftdaten_pms7003_bme280 luftdaten_sds011
## 2 8 110
## luftdaten_sds011_bme280 luftdaten_sds011_bmp180 luftdaten_sds011_dht11
## 519 37 89
## luftdaten_sds011_dht22
## 1303
##
## $last_measurement_within
## 1h 1d 30d 365d never
## 1771 1810 2048 2280 1373
##
## oldest box: 2014-05-28 15:36:14 (CALIMERO)
## newest box: 2019-03-10 11:32:20 (Mosina Ul.Marii Konopnickiej)
##
## sensors per box:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 4.000 4.000 4.687 5.000 33.000
This gives a good overview already: As of writing this, there are more than 700 sensor stations, of which ~50% are currently running. Most of them are placed outdoors and have around 5 sensors each. The oldest station is from May 2014, while the latest station was registered a couple of minutes ago.
Another feature of interest is the spatial distribution of the boxes: plot()
can help us out here. This function requires a bunch of optional dependencies though.
if (!require('maps')) install.packages('maps')
if (!require('maptools')) install.packages('maptools')
if (!require('rgeos')) install.packages('rgeos')
plot(all_sensors)
It seems we have to reduce our area of interest to Germany.
But what do these sensor stations actually measure? Lets find out. osem_phenomena()
gives us a named list of of the counts of each observed phenomenon for the given set of sensor stations:
## List of 742
## $ Temperatur : int 3232
## $ rel. Luftfeuchte : int 2943
## $ PM10 : int 2772
## $ PM2.5 : int 2767
## $ Luftdruck : int 1683
## $ Beleuchtungsstärke : int 875
## $ UV-Intensität : int 865
## $ Temperature : int 133
## $ Luftfeuchtigkeit : int 118
## $ Humidity : int 102
## $ Pressure : int 52
## $ Umgebungslautstärke : int 39
## $ temperature : int 39
## $ Helligkeit : int 37
## $ humidity : int 35
## $ Lautstärke : int 27
## $ UV : int 27
## $ Latitude : int 25
## $ Longtitude : int 25
## $ PM01 : int 25
## $ Luftfeuchte : int 23
## $ Schall : int 20
## $ Licht : int 19
## $ co2 : int 19
## $ Signal : int 16
## $ Windrichtung : int 14
## $ rel. Luftfeuchtigkeit : int 14
## $ Feinstaub PM10 : int 13
## $ Lämpötila : int 13
## $ Windgeschwindigkeit : int 13
## $ Ilmanpaine : int 12
## $ NO2 AE : int 12
## $ NO2 WE : int 12
## $ Niederschlag : int 12
## $ O3 AE : int 12
## $ Speed : int 12
## $ Feinstaub PM2.5 : int 11
## $ O3 WE : int 11
## $ pressure : int 11
## $ Ozon : int 10
## $ Stickoxid : int 10
## $ Temperatur DHT22 : int 10
## $ Light : int 9
## $ Temperatura : int 9
## $ Wind speed : int 9
## $ Kosteus : int 8
## $ Temp : int 8
## $ UV-Strahlung : int 8
## $ Valonmäärä : int 8
## $ Wassertemperatur : int 8
## $ UV-säteily : int 7
## $ Air pressure : int 5
## $ Beleuchtungstärke : int 5
## $ Druck : int 5
## $ Feuchtigkeit : int 5
## $ Illuminance : int 5
## $ Ilmankosteus : int 5
## $ NO2 : int 5
## $ Regen : int 5
## $ UV Index : int 5
## $ Wind direction : int 5
## $ rel. Luftfeuchte DHT22 : int 5
## $ Air Pressure : int 4
## $ Battery : int 4
## $ CO : int 4
## $ CO2 : int 4
## $ Feinstaub : int 4
## $ Feuchte : int 4
## $ Luftqualität : int 4
## $ PM 10 : int 4
## $ PM 2.5 : int 4
## $ PM1.0 : int 4
## $ Rel. Luftfeuchte : int 4
## $ Relative Humidity : int 4
## $ Sound : int 4
## $ Temperature 1 : int 4
## $ UV-Index : int 4
## $ UV-Säteily : int 4
## $ lautstärke : int 4
## $ rel. Luftfeuchte 1 : int 4
## $ relative Luftfeuchtigkeit : int 4
## $ Batterie : int 3
## $ Batteriespannung : int 3
## $ Battery voltage : int 3
## $ DS18B20_Probe01 : int 3
## $ DS18B20_Probe02 : int 3
## $ DS18B20_Probe03 : int 3
## $ DS18B20_Probe04 : int 3
## $ DS18B20_Probe05 : int 3
## $ Durchschnitt Umgebungslautstärke : int 3
## $ Geschwindigkeit : int 3
## $ H2 : int 3
## $ Licht (digital) : int 3
## $ Luftdruck (BME280) : int 3
## $ Lufttemperatur : int 3
## $ NH3 : int 3
## $ Niederschlagsmenge : int 3
## $ Noise : int 3
## $ PM2,5 : int 3
## [list output truncated]
Thats quite some noise there, with many phenomena being measured by a single sensor only, or many duplicated phenomena due to slightly different spellings. We should clean that up, but for now let’s just filter out the noise and find those phenomena with high sensor numbers:
## $Temperatur
## [1] 3232
##
## $`rel. Luftfeuchte`
## [1] 2943
##
## $PM10
## [1] 2772
##
## $PM2.5
## [1] 2767
##
## $Luftdruck
## [1] 1683
##
## $Beleuchtungsstärke
## [1] 875
##
## $`UV-Intensität`
## [1] 865
##
## $Temperature
## [1] 133
##
## $Luftfeuchtigkeit
## [1] 118
##
## $Humidity
## [1] 102
##
## $Pressure
## [1] 52
##
## $Umgebungslautstärke
## [1] 39
##
## $temperature
## [1] 39
##
## $Helligkeit
## [1] 37
##
## $humidity
## [1] 35
##
## $Lautstärke
## [1] 27
##
## $UV
## [1] 27
##
## $Latitude
## [1] 25
##
## $Longtitude
## [1] 25
##
## $PM01
## [1] 25
##
## $Luftfeuchte
## [1] 23
Alright, temperature it is! Fine particulate matter (PM2.5) seems to be more interesting to analyze though. We should check how many sensor stations provide useful data: We want only those boxes with a PM2.5 sensor, that are placed outdoors and are currently submitting measurements:
pm25_sensors = osem_boxes(
exposure = 'outdoor',
date = Sys.time(), # ±4 hours
phenomenon = 'PM2.5'
)
## boxes total: 1622
##
## boxes by exposure:
## outdoor
## 1622
##
## boxes by model:
## custom hackair_home_v2 homeEthernetFeinstaub
## 51 18 32
## homeV2EthernetFeinstaub homeV2Lora homeV2WifiFeinstaub
## 3 3 70
## homeWifi homeWifiFeinstaub luftdaten_pms1003
## 3 51 1
## luftdaten_pms1003_bme280 luftdaten_pms5003_bme280 luftdaten_pms7003
## 1 8 1
## luftdaten_pms7003_bme280 luftdaten_sds011 luftdaten_sds011_bme280
## 4 52 345
## luftdaten_sds011_bmp180 luftdaten_sds011_dht11 luftdaten_sds011_dht22
## 24 58 897
##
## $last_measurement_within
## 1h 1d 30d 365d never
## 1581 1592 1595 1600 22
##
## oldest box: 2016-06-02 12:09:47 (BalkonBox Mindener Str.)
## newest box: 2019-03-10 11:32:20 (Mosina Ul.Marii Konopnickiej)
##
## sensors per box:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 4.000 4.000 4.581 5.000 15.000
Thats still more than 200 measuring stations, we can work with that.
Having analyzed the available data sources, let’s finally get some measurements. We could call osem_measurements(pm25_sensors)
now, however we are focusing on a restricted area of interest, the city of Berlin. Luckily we can get the measurements filtered by a bounding box:
## Linking to GEOS 3.6.1, GDAL 2.2.4, PROJ 4.9.3
## udunits system database from /usr/share/udunits
library(lubridate)
library(dplyr)
# construct a bounding box: 12 kilometers around Berlin
berlin = st_point(c(13.4034, 52.5120)) %>%
st_sfc(crs = 4326) %>%
st_transform(3857) %>% # allow setting a buffer in meters
st_buffer(set_units(12, km)) %>%
st_transform(4326) %>% # the opensensemap expects WGS 84
st_bbox()
pm25 = osem_measurements(
berlin,
phenomenon = 'PM2.5',
from = now() - days(3), # defaults to 2 days
to = now()
)
plot(pm25)
Now we can get started with actual spatiotemporal data analysis. First, lets mask the seemingly uncalibrated sensors:
outliers = filter(pm25, value > 100)$sensorId
bad_sensors = outliers[, drop = T] %>% levels()
pm25 = mutate(pm25, invalid = sensorId %in% bad_sensors)
Then plot the measuring locations, flagging the outliers:
Removing these sensors yields a nicer time series plot:
Further analysis: comparison with LANUV data TODO