Currently, there are 8 functions associated with the
sample
verb in the sgsR
package:
sample_srs()
- simple random sampling
sample_systematic()
- systematic sampling in a grid
or hexagon tessellation
sample_strat()
- stratified sampling within a
sraster
sample_nc()
- Nearest centroid sampling. See Melville &
Stone (2016)
sample_clhs()
- Latin hypercube sampling
sample_balanced()
- see BalancedSampling
sample_ahels()
- adapted hypercube evaluation of a
legacy sample (ahels)
sample_existing()
- Sub-sample within an existing
sample using clhs
One key feature of using some sample_*
functions is its
ability to define access
corridors. Users can supply a road
access
network (must be sf
line objects) and
define buffers around access
where samples should be
excluded and included.
Relevant and applicable parameters when access
is
defined are:
buff_inner
- Can be left as NULL
(default). Inner buffer parameter that defines the distance from
access
where samples cannot be taken (i.e. if you don’t
want samples within 50 m of your access
layer set
buff_inner = 50
).
buff_outer
- Outer buffer parameter that defines the
maximum distance that the samples can be located from
access
(i.e. if you don’t want samples more than 200 meters
from your access
layer set
buff_inner = 200
).
sample_srs
We have demonstrated a simple example of using the
sample_srs()
function in vignette("sgsR")
. We
will demonstrate additional examples below.
The input required for sample_srs()
is a
raster
. This means that sraster
and
mraster
are supported for this function.
#--- perform simple random sampling ---#
sample_srs(raster = sraster, # input sraster
nSamp = 200, # number of desired samples
plot = TRUE) # plot
#> Simple feature collection with 200 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431110 ymin: 5337710 xmax: 438550 ymax: 5343210
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (438190 5341150)
#> 2 POINT (438190 5341150)
#> 3 POINT (433750 5342190)
#> 4 POINT (433490 5341990)
#> 5 POINT (437410 5340770)
#> 6 POINT (437790 5342790)
#> 7 POINT (434630 5338990)
#> 8 POINT (438510 5340890)
#> 9 POINT (433350 5340870)
#> 10 POINT (433710 5340110)
sample_srs(raster = mraster, # input mraster
nSamp = 200, # number of desired samples
access = access, # define access road network
mindist = 200, # minimum distance samples must be apart from one another
buff_inner = 50, # inner buffer - no samples within this distance from road
buff_outer = 200, # outer buffer - no samples further than this distance from road
plot = TRUE) # plot
#> Simple feature collection with 200 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431150 ymin: 5337730 xmax: 438550 ymax: 5343230
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (434510 5339090)
#> 2 POINT (434390 5338790)
#> 3 POINT (434430 5341390)
#> 4 POINT (432870 5342250)
#> 5 POINT (433430 5340570)
#> 6 POINT (434550 5341690)
#> 7 POINT (434870 5342490)
#> 8 POINT (437170 5343210)
#> 9 POINT (438410 5340850)
#> 10 POINT (433910 5340350)
sample_systematic
The sample_systematic()
function applies systematic
sampling across an area with the cellsize
parameter
defining the resolution of the tessellation. The tessellation shape can
be modified using the square
parameter. Assigning
TRUE
(default) to the square
parameter results
in a regular grid and assigning FALSE
results in a
hexagonal grid. The location of samples can also be adjusted using the
locations
parameter, where centers
takes the
center, corners
takes all corners, and random
takes a random location within each tessellation.
#--- perform grid sampling ---#
sample_systematic(raster = sraster, # input sraster
cellsize = 1000, # grid distance
plot = TRUE) # plot
#> Simple feature collection with 36 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431108.5 ymin: 5338035 xmax: 438207.8 ymax: 5343117
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (438118.3 5338035)
#> 2 POINT (436704.4 5338064)
#> 3 POINT (438148.2 5339448)
#> 4 POINT (435290.5 5338094)
#> 5 POINT (436734.3 5339478)
#> 6 POINT (437456.1 5340170)
#> 7 POINT (438178 5340862)
#> 8 POINT (433876.6 5338124)
#> 9 POINT (434598.5 5338816)
#> 10 POINT (435320.4 5339508)
#--- perform grid sampling ---#
sample_systematic(raster = sraster, # input sraster
cellsize = 500, # grid distance
square = FALSE, # hexagonal tessellation
location = "random", # random sample within tessellation
plot = TRUE) # plot
#> Simple feature collection with 164 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431140.2 ymin: 5337727 xmax: 438500.6 ymax: 5343215
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (431944.8 5343194)
#> 2 POINT (431578.8 5342953)
#> 3 POINT (432492 5342958)
#> 4 POINT (432066.2 5342587)
#> 5 POINT (432748.9 5343095)
#> 6 POINT (433098.5 5342592)
#> 7 POINT (434292.7 5342995)
#> 8 POINT (434776.5 5343197)
#> 9 POINT (432227.9 5342218)
#> 10 POINT (433022.4 5342361)
sample_systematic(raster = sraster, # input sraster
cellsize = 500, # grid distance
access = access, # define access road network
buff_outer = 200, # outer buffer - no samples further than this distance from road
square = FALSE, # hexagonal tessellation
location = "corners", # take corners instead of centers
plot = TRUE)
#> Simple feature collection with 632 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431144.1 ymin: 5337739 xmax: 438522.6 ymax: 5343238
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (438522.6 5337880)
#> 2 POINT (438522.6 5337880)
#> 3 POINT (437759.6 5337845)
#> 4 POINT (438027.7 5337952)
#> 5 POINT (438522.6 5337880)
#> 6 POINT (438027.7 5337952)
#> 7 POINT (438069.2 5338237)
#> 8 POINT (438337.3 5338344)
#> 9 POINT (437759.6 5337845)
#> 10 POINT (437264.8 5337917)
sample_strat
The sample_strat()
contains two method
s to
perform sampling:
"Queinnec"
- Hierarchical sampling using a focal
window to isolate contiguous groups of stratum pixels, which was
originally developed by Martin Queinnec.
"random"
- Traditional stratified random sampling.
This method
ignores much of the functionality of the
algorithm to allow users the capability to use standard stratified
random sampling approaches without the use of a focal window to locate
contiguous stratum cells.
method = "Queinnec"
Queinnec, M., White, J. C., & Coops, N. C. (2021). Comparing airborne and spaceborne photon-counting LiDAR canopy structural estimates across different boreal forest types. Remote Sensing of Environment, 262(August 2020), 112510.
This algorithm uses moving window (wrow
and
wcol
parameters) to filter the input sraster
to prioritize sample locations where stratum pixels are spatially
grouped, rather than dispersed individuals across the landscape.
Sampling is performed using 2 rules:
Rule 1 - Sample within spatially grouped stratum
pixels. Moving window defined by wrow
and
wcol
.
Rule 2 - If no more samples exist to satisfy desired sampling count, individual stratum pixels are sampled.
The rule applied to a select a particular sample is defined in the
rule
attribute of output samples. We give a few examples
below:
#--- perform stratified sampling random sampling ---#
sample_strat(sraster = sraster, # input sraster
nSamp = 200, # desired sample number
plot = TRUE) # plot
#> Simple feature collection with 200 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431110 ymin: 5337810 xmax: 438490 ymax: 5343170
#> CRS: NA
#> First 10 features:
#> strata type rule geometry
#> x 1 new rule1 POINT (438190 5339570)
#> x1 1 new rule2 POINT (438170 5341090)
#> x2 1 new rule2 POINT (433770 5340850)
#> x3 1 new rule2 POINT (436870 5339210)
#> x4 1 new rule2 POINT (432870 5340990)
#> x5 1 new rule2 POINT (437930 5342770)
#> x6 1 new rule2 POINT (432830 5343090)
#> x7 1 new rule2 POINT (437550 5339230)
#> x8 1 new rule2 POINT (437990 5343150)
#> x9 1 new rule2 POINT (434970 5341570)
In some cases, users might want to include existing
samples within the algorithm. In order to adjust the total number of
samples needed per stratum to reflect those already present in
existing
, we can use the intermediate function
extract_strata()
.
This function uses the sraster
and existing
samples and extracts the stratum for each. These samples can be included
within sample_strat()
, which adjusts total samples required
per class based on representation in existing
.
#--- extract strata values to existing samples ---#
<- extract_strata(sraster = sraster, # input sraster
e.sr existing = existing) # existing samples to add strata value to
Notice that e.sr
now has an attribute named strata. If
that parameter is not there, sample_strat()
will give an
error.
sample_strat(sraster = sraster, # input sraster
nSamp = 200, # desired sample number
access = access, # define access road network
existing = e.sr, # existing samples with strata values
mindist = 200, # minimum distance samples must be apart from one another
buff_inner = 50, # inner buffer - no samples within this distance from road
buff_outer = 200, # outer buffer - no samples further than this distance from road
plot = TRUE) # plot
#> Simple feature collection with 400 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431150 ymin: 5337730 xmax: 438530 ymax: 5343230
#> CRS: NA
#> First 10 features:
#> strata type rule geometry
#> 1 1 existing existing POINT (437950 5338130)
#> 2 1 existing existing POINT (435630 5342550)
#> 3 1 existing existing POINT (435350 5339170)
#> 4 1 existing existing POINT (437990 5340110)
#> 5 1 existing existing POINT (437890 5339590)
#> 6 1 existing existing POINT (436950 5342410)
#> 7 1 existing existing POINT (433710 5338610)
#> 8 1 existing existing POINT (438250 5338210)
#> 9 1 existing existing POINT (435250 5342150)
#> 10 1 existing existing POINT (433330 5341150)
As seen on the code in the example above, the defined
mindist
parameter specifies the minimum euclidean distance
that samples must be apart from one another.
Notice that the sample outputs have type
and
rule
attributes which outline whether the samples are
existing
or new
and whether rule1
or rule2
were used to select the individual samples. If
type
is existing (a user provided
existing
sample), rule
will be
existing as well as seen above.
sample_strat(sraster = sraster, # input
nSamp = 200, # desired sample number
access = access, # define access road network
existing = e.sr, # existing samples with strata values
include = TRUE, # include existing plots in nSamp total
buff_outer = 200, # outer buffer - no samples further than this distance from road
plot = TRUE) # plot
#> Simple feature collection with 200 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431150 ymin: 5337730 xmax: 438530 ymax: 5343230
#> CRS: NA
#> First 10 features:
#> strata type rule geometry
#> 1 1 existing existing POINT (437950 5338130)
#> 2 1 existing existing POINT (435630 5342550)
#> 3 1 existing existing POINT (435350 5339170)
#> 4 1 existing existing POINT (437990 5340110)
#> 5 1 existing existing POINT (437890 5339590)
#> 6 1 existing existing POINT (436950 5342410)
#> 7 1 existing existing POINT (433710 5338610)
#> 8 1 existing existing POINT (438250 5338210)
#> 9 1 existing existing POINT (435250 5342150)
#> 10 1 existing existing POINT (433330 5341150)
The include
parameter determines whether
existing
samples should be included in the total count of
samples defined by nSamp
. By default, the
include
parameter is set as FALSE
.
method = "random
Stratified random sampling with equal probability for all cells
(using default algorithm values for mindist
and no use of
access
functionality). In essence this method perform the
sample_srs
algorithm for each stratum separately to meet
the specified sample allocation.
#--- perform stratified sampling random sampling ---#
sample_strat(sraster = sraster, # input sraster
method = "random", #stratified random sampling
nSamp = 200, # desired sample number
plot = TRUE) # plot
#> Simple feature collection with 200 features and 1 field
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431150 ymin: 5337750 xmax: 438550 ymax: 5343230
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> strata geometry
#> 1 1 POINT (435930 5342430)
#> 2 1 POINT (435930 5342430)
#> 3 1 POINT (434050 5341230)
#> 4 1 POINT (434350 5342170)
#> 5 1 POINT (437490 5342730)
#> 6 1 POINT (435370 5340490)
#> 7 1 POINT (433650 5343230)
#> 8 1 POINT (437430 5338190)
#> 9 1 POINT (438550 5338890)
#> 10 1 POINT (435050 5339650)
sample_nc
sample_nc()
function implements the Nearest Centroid
sampling algorithm described in Melville &
Stone (2016). The algorithm uses kmeans clustering where the number
of clusters (centroids) is equal to the desired number of samples
(nSamp
). Cluster centers are located, which then prompts
the nearest neighbour mraster
pixel for each cluster to be
located (assuming default k
parameter). These nearest
neighbours are the output samples. Basic usage is as follows:
#--- perform simple random sampling ---#
sample_nc(mraster = mraster, # input
nSamp = 25, # desired sample number
plot = TRUE)
#> K-means being performed on 3 layers with 25 centers.
#> Simple feature collection with 25 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431290 ymin: 5337870 xmax: 438010 ymax: 5342610
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> zq90 pzabove2 zsd kcenter geometry
#> 32048 26.50 87.4 8.24 1 POINT (437950 5341530)
#> 78491 3.00 7.6 0.58 2 POINT (434310 5339030)
#> 94291 8.38 85.8 1.83 3 POINT (436990 5338190)
#> 41299 19.10 61.9 5.70 4 POINT (436470 5341030)
#> 95218 15.80 95.3 2.68 5 POINT (433150 5338130)
#> 27675 15.60 62.6 4.32 6 POINT (432550 5341750)
#> 73674 23.20 88.9 6.68 7 POINT (434950 5339290)
#> 100176 16.20 84.8 4.11 8 POINT (435330 5337870)
#> 95922 14.60 90.7 3.44 9 POINT (432310 5338090)
#> 94656 6.27 61.8 1.45 10 POINT (436830 5338170)
Altering the k
parameter leads to a multiplicative
increase in output samples where total output samples = \(`nSamp` * `k`\).
#--- perform simple random sampling ---#
<- sample_nc(mraster = mraster, # input
samples k = 2, # number of nearest neighbours to take for each kmeans center
nSamp = 25, # desired sample number
plot = TRUE)
#> K-means being performed on 3 layers with 25 centers.
#--- total samples = nSamp * k (25 * 2) = 50 ---#
nrow(samples)
#> [1] 50
Visualizing what the kmeans centers and samples nearest neighbours
looks like is possible when using details = TRUE
. The
$kplot
output provides a quick visualization of where the
centers are based on a scatter plot of the first 2 layers in
mraster
. Notice that the centers are well distributed in
covariate space and chosen samples are the closest pixels to each center
(nearest neighbours).
#--- perform simple random sampling with details ---#
<- sample_nc(mraster = mraster, # input
details nSamp = 25, # desired sample number
details = TRUE)
#> K-means being performed on 3 layers with 25 centers.
#--- plot ggplot output ---#
$kplot details
sample_clhs
sample_clhs()
function implements conditioned Latin
hypercube (clhs) sampling methodology from the clhs
package. A number of other functions in the sgsR
package
help to provide guidance on clhs sampling including
calculate_pop()
and calculate_lhsOpt()
. Check
out these functions to better understand how sample numbers could be
optimized.
The syntax for this function is similar to others shown above,
although parameters like iter
, which define the number of
iterations within the Metropolis-Hastings process are important to
consider. In these examples we use a low iter
value because
it takes less time to run. Default values for iter
within
the clhs
package are 10,000.
sample_clhs(mraster = mraster, # input
nSamp = 200, # desired sample number
plot = TRUE, # plot
iter = 100) # number of iterations
sample_clhs(mraster = mraster, # input
nSamp = 300, # desired sample number
iter = 100, # number of iterations
existing = existing, # existing samples
access = access, # define access road network
buff_inner = 100, # inner buffer - no samples within this distance from road
buff_outer = 300, # outer buffer - no samples further than this distance from road
plot = TRUE) # plot
The cost
parameter defines the mraster
covariate, which is used to constrain the clhs sampling. This could be
any number of variables. An example could be the distance a pixel is
from road access
(e.g. from
calculate_distance()
see example below), terrain slope, the
output from calculate_coobs()
, or many others.
#--- cost constrained examples ---#
#--- calculate distance to access layer for each pixel in mr ---#
<- calculate_distance(raster = mraster, # input
mr.c access = access,
plot = TRUE) # define access road network
sample_clhs(mraster = mr.c, # input
nSamp = 250, # desired sample number
iter = 100, # number of iterations
cost = "dist2access", # cost parameter - name defined in calculate_distance()
plot = TRUE) # plot
sample_balanced
The sample_balanced()
algorithm performs a balanced
sampling methodology from the stratifyR / SamplingBigData
packages.
sample_balanced(mraster = mraster, # input
nSamp = 200, # desired sample number
plot = TRUE) # plot
#> Simple feature collection with 200 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431110 ymin: 5337710 xmax: 438550 ymax: 5343230
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (432510 5343230)
#> 2 POINT (435590 5343210)
#> 3 POINT (437210 5343210)
#> 4 POINT (437530 5343150)
#> 5 POINT (435570 5343130)
#> 6 POINT (436550 5343030)
#> 7 POINT (436030 5343010)
#> 8 POINT (436990 5342990)
#> 9 POINT (436570 5342970)
#> 10 POINT (431610 5342930)
sample_balanced(mraster = mraster, # input
nSamp = 100, # desired sample number
algorithm = "lcube", # algorithm type
access = access, # define access road network
buff_inner = 50, # inner buffer - no samples within this distance from road
buff_outer = 200) # outer buffer - no samples further than this distance from road
#> Simple feature collection with 100 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431130 ymin: 5337750 xmax: 438550 ymax: 5343210
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (432810 5342050)
#> 2 POINT (434490 5341930)
#> 3 POINT (436270 5339330)
#> 4 POINT (438310 5338990)
#> 5 POINT (431450 5340570)
#> 6 POINT (436330 5341430)
#> 7 POINT (432710 5340230)
#> 8 POINT (433370 5341670)
#> 9 POINT (433830 5340530)
#> 10 POINT (434850 5337750)
sample_ahels
The sample_ahels()
function performs the adapted
Hypercube Evaluation of a Legacy Sample (ahels) algorithm
usingexisting
sample data and an mraster
. New
samples are allocated based on quantile ratios between the
existing
sample and mraster
covariate
dataset.
This algorithm was adapted from that presented in the paper below, which we highly recommend.
Malone BP, Minansy B, Brungard C. 2019. Some methods to improve the utility of conditioned Latin hypercube sampling. PeerJ 7:e6451 DOI 10.7717/peerj.6451
This algorithm:
Determines the quantile distributions of existing
samples and mraster
covariates.
Determines quantiles where there is a disparity between samples and covariates.
Prioritizes sampling within those quantile to improve representation.
To use this function, user must first specify the number of quantiles
(nQuant
) followed by either the nSamp
(total
number of desired samples to be added) or the threshold
(sampling ratio vs. covariate coverage ratio for quantiles - default is
0.9) parameters. We recommended you setting the threshold
values at or below 0.9.
sample_ahels(mraster = mraster,
existing = existing, # existing samples
plot = TRUE) # plot
#> Simple feature collection with 230 features and 7 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431150 ymin: 5337730 xmax: 438530 ymax: 5343230
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> type.x zq90 pzabove2 zsd strata type.y rule geometry
#> 1 existing 5.64 0.4 1.36 1 new rule1 POINT (437950 5338130)
#> 2 existing 8.76 44.0 2.08 1 new rule2 POINT (435630 5342550)
#> 3 existing 4.83 9.5 1.08 1 new rule2 POINT (435350 5339170)
#> 4 existing 10.80 80.5 2.67 1 new rule2 POINT (437990 5340110)
#> 5 existing 8.24 70.0 2.03 1 new rule2 POINT (437890 5339590)
#> 6 existing 1.59 0.0 0.13 1 new rule2 POINT (436950 5342410)
#> 7 existing 10.20 76.9 2.42 1 new rule2 POINT (433710 5338610)
#> 8 existing 10.30 1.2 2.59 1 new rule2 POINT (438250 5338210)
#> 9 existing 6.85 37.5 1.77 1 new rule2 POINT (435250 5342150)
#> 10 existing 8.11 39.2 2.06 1 new rule2 POINT (433330 5341150)
Notice that no threshold
, nSamp
, or
nQuant
were defined. That is because the default setting
for threshold = 0.9
and nQuant = 10
.
The first matrix output shows the quantile ratios between the sample and the covariates. A value of 1.0 indicates that samples are represented relative to the quantile coverage. Values > 1.0 indicate over representation of samples, while < 1.0 indicate under representation of samples.
sample_ahels(mraster = mraster,
existing = existing, # existing samples
nQuant = 20, # define 20 quantiles
nSamp = 300) # total samples desired
#> Simple feature collection with 500 features and 7 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431110 ymin: 5337710 xmax: 438530 ymax: 5343230
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> type.x zq90 pzabove2 zsd strata type.y rule geometry
#> 1 existing 5.64 0.4 1.36 1 new rule1 POINT (437950 5338130)
#> 2 existing 8.76 44.0 2.08 1 new rule2 POINT (435630 5342550)
#> 3 existing 4.83 9.5 1.08 1 new rule2 POINT (435350 5339170)
#> 4 existing 10.80 80.5 2.67 1 new rule2 POINT (437990 5340110)
#> 5 existing 8.24 70.0 2.03 1 new rule2 POINT (437890 5339590)
#> 6 existing 1.59 0.0 0.13 1 new rule2 POINT (436950 5342410)
#> 7 existing 10.20 76.9 2.42 1 new rule2 POINT (433710 5338610)
#> 8 existing 10.30 1.2 2.59 1 new rule2 POINT (438250 5338210)
#> 9 existing 6.85 37.5 1.77 1 new rule2 POINT (435250 5342150)
#> 10 existing 8.11 39.2 2.06 1 new rule2 POINT (433330 5341150)
Notice that the total number of samples is 500. This value is the sum
of existing samples (200) and number of samples defined by
nSamp = 300
.
sample_existing
Acknowledging that existing
sample networks exist is
important. There is significant investment into these samples, and in
order to keep inventories up-to-date, we often need to collect new data
at these locations. The sample_existing
algorithm provides
a method for sub-sampling an existing
sample network should
the financial / logistical resources not be available to collect data at
all sample units. The algorithm leverages latin hypercube sampling using
the clhs package
to effectively sample within an existing
network.
The algorithm has two fundamental approaches:
Sample exclusively using the sample network and the attributes it contains
Should raster
information be available and
co-located with the sample, use these data as population values to
improve sub-sampling of existing
.
Much like the sample_clhs()
algorithm, users can define
a cost
parameter, which will be used to constrain
sub-sampling. A cost parameters is a user defined metric/attribute such
as distance from roads (e.g. calculate_distance()
),
elevation, etc.
Here some some basic examples:
existing
First we can create an existing dataset for our example. Lets imagine
we have a systematically sampled dataset of ~900 samples, and we know we
only have resources to sample 300 of them. We have some ALS data
available (mraster
), which we will use as our distributions
to sample within.
#--- generate existing samples and extract metrics ---#
<- sample_systematic(raster = mraster, cellsize = 200, plot = TRUE) %>%
existing extract_metrics(mraster = mraster, existing = .)
We see our systematic sample. Notice that we used
extract_metrics()
after creating it. If the user provides a
raster
for the algorithm this isn’t neccesary, it will be
handled internally in the algorithm if no attributes are present, but if
only samples are given, attributes must be provided and sampling
will be conducted on all included attributes. Now lets sub-sample
within it.
#--- sub sample using ---#
sample_existing(existing = existing, # our existing sample
nSamp = 300, # the number of samples we want
plot = TRUE) # plot
#> Simple feature collection with 300 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431142.9 ymin: 5337701 xmax: 438551.7 ymax: 5343230
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> zq90 pzabove2 zsd geometry
#> 835 6.90 88.0 2.2300000 POINT (431189.8 5340883)
#> 327 10.90 73.1 2.9900000 POINT (435146.9 5341619)
#> 509 8.75 77.4 2.2600000 POINT (436201.8 5337903)
#> 834 4.96 45.8 1.0799999 POINT (431310.5 5340724)
#> 882 12.00 70.3 3.4700000 POINT (432322.5 5337728)
#> 380 4.68 40.1 0.9899999 POINT (434707.2 5341537)
#> 660 16.50 88.7 3.9499998 POINT (432547.7 5341409)
#> 54 10.40 54.1 2.7300000 POINT (437746.1 5341829)
#> 247 16.20 76.7 4.7999997 POINT (434781 5343098)
#> 469 15.40 90.3 3.0999999 POINT (433987.4 5341494)
We see from the output that we get 300 samples that are a sub-sample
of the original existing
sample. The plotted output shows
sumulative frequency distributions of the population (all
existing
samples) and the sub-sample (the 300 samples we
requested). Notice that the distributions match quite well. This is a
simple example, so lets do another with a bit more complexity.
raster
distributionsOur systematic sample of ~900 plots is fairly comprehensive, however
we can generate a true population distribution through the inclusion of
the ALS metrics in the sampling process. The metrics will be included in
internal latin hypercube sampling to help guide sub-sampling of
existing
.
#--- sub sample using ---#
sample_existing(existing = existing, # our existing sample
nSamp = 300, # the number of samples we want
raster = mraster, # include mraster metrics to guide sampling of existing
plot = TRUE) # plot
#> Simple feature collection with 300 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431104 ymin: 5337709 xmax: 438559.7 ymax: 5343230
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> zq90 pzabove2 zsd geometry
#> 791 18.30 88.0 3.96 POINT (431991.4 5340486)
#> 78827 13.80 80.1 3.54 POINT (432474 5339848)
#> 8941 11.80 97.2 2.53 POINT (436582.6 5342705)
#> 39577 10.90 92.5 2.79 POINT (436839.9 5338386)
#> 40695 17.90 59.8 5.59 POINT (435150.9 5340619)
#> 52010 13.00 1.1 4.98 POINT (434633.5 5339977)
#> 1421 6.73 12.5 1.69 POINT (438034.3 5339791)
#> 7187 14.00 80.9 3.57 POINT (432228.7 5341167)
#> 20964 10.60 50.0 3.11 POINT (436026.3 5341783)
#> 39356 7.23 70.7 1.66 POINT (437081.2 5338067)
The sample distribution again mimics the population distribution quite well! Now lets try using a cost variable to constrain the sub-sample.
#--- create distance from roads metric ---#
<- calculate_distance(raster = mraster, access = access) dist
#--- sub sample using ---#
sample_existing(existing = existing, # our existing sample
nSamp = 300, # the number of samples we want
raster = dist, # include mraster metrics to guide sampling of existing
cost = 4, # either provide the index (band number) or the name of the cost layer
plot = TRUE) # plot
#> Simple feature collection with 300 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431104 ymin: 5337740 xmax: 438508.8 ymax: 5343230
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> zq90 pzabove2 zsd dist2access geometry
#> 81328 11.10 56.7 3.13 5.216921 POINT (431711.3 5340525)
#> 49528 3.83 9.1 0.77 10.508362 POINT (434189.8 5340895)
#> 1310 12.70 71.6 3.87 385.821767 POINT (438263.5 5342472)
#> 6331 8.33 81.1 1.76 55.423892 POINT (438189.8 5340911)
#> 437100 11.30 89.1 2.89 48.324298 POINT (434508.8 5341136)
#> 5050 16.90 92.6 4.08 75.245042 POINT (438228.6 5341191)
#> 40473 23.40 82.2 9.19 39.464954 POINT (435392.2 5340300)
#> 650 22.00 98.0 5.03 538.312387 POINT (433874.8 5339654)
#> 4681 4.84 35.0 1.32 91.021975 POINT (434108 5341335)
#> 150100 20.50 93.1 5.35 510.795275 POINT (437069.1 5341067)
Finally, should the user wish to further constrain the sample based
on access
like other sampling approaches in
sgsR
that is also possible.
#--- ensure access and existing are in the same CRS ---#
::st_crs(existing) <- sf::st_crs(access)
sf
#--- sub sample using ---#
sample_existing(existing = existing, # our existing sample
nSamp = 300, # the number of samples we want
raster = dist, # include mraster metrics to guide sampling of existing
cost = 4, # either provide the index (band number) or the name of the cost layer
access = access, # roads layer
buff_inner = 50, # inner buffer - no samples within this distance from road
buff_outer = 300, # outer buffer - no samples further than this distance from road
plot = TRUE) # plot
#> Simple feature collection with 300 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431112.1 ymin: 5337709 xmax: 438559.7 ymax: 5343214
#> Projected CRS: UTM_Zone_17_Northern_Hemisphere
#> First 10 features:
#> zq90 pzabove2 zsd dist2access geometry
#> 4095 15.30 91.6 4.03 245.96526 POINT (434641.5 5337977)
#> 1758 8.45 37.4 2.21 65.21969 POINT (434543.7 5342417)
#> 48287 18.00 95.1 3.95 43.50581 POINT (431801 5338086)
#> 1923 18.60 92.0 3.76 234.61997 POINT (438508.8 5341153)
#> 225100 22.40 88.4 6.26 72.04452 POINT (435512.8 5340141)
#> 202100 9.38 42.2 2.44 140.60120 POINT (436034.3 5339783)
#> 45423 17.30 91.6 4.07 92.44668 POINT (431831.9 5340366)
#> 4036 7.60 88.2 1.61 211.00646 POINT (432146.9 5341607)
#> 483100 19.20 93.0 4.80 153.00920 POINT (431680.4 5338245)
#> 2503 14.30 95.5 2.40 72.02930 POINT (433423.1 5342572)
The greater constraints we add to the samples, the less likely we will have strong correlations between the population and sample, so its always important to understand these limitations and plan accordingly.