VS1.1 - Example: An Artificial 2D Dataset

Elvan Ceyhan

2023-12-18

First we load the pcds package:

library(pcds)

We start our exposition of pcds functions for testing/detecting spatial interaction between classes (or species) in \(\mathbb R^2\) (i.e. 2D space) using two data sets, an artificial data and a real-life forestry data (see file “VS1_2_SwampTrees” for the latter).

1 Illustration of PCDs on an Artificial 2D Dataset

This data set consists of simulated points from two classes, \(\mathcal{X}\) and \(\mathcal{Y}\), where \(\mathcal{X}\) points are uniformly distributed on the unit square \([0,1]^2\), while \(\mathcal{Y}\) points are chosen closer to the vertices of the unit square for better illustration. Here \(n_x\) is the size of class \(\mathcal{X}\) points, \(n_y\) is the size of class \(\mathcal{Y}\) points, and for better visualization of certain structures and graph constructs.

nx<-10; ny<-5;  #try also nx<-40; ny<-10 or nx<-1000; ny<-20;
set.seed(123)
Xp<-cbind(runif(nx),runif(nx))
Yp<-cbind(runif(ny,0,.25),runif(ny,0,.25))+cbind(c(0,0,0.5,1,1),c(0,1,.5,0,1))  
#try also Yp<-cbind(runif(ny,0,1),runif(ny,0,1))

We take \(n_x=\) 10 and \(n_y=\) 5 (however, one is encouraged to try the specifications that follow in the comments after “#try also” in the commented script here and henceforth.) \(\mathcal{X}\) points are denoted as Xp and \(\mathcal{Y}\) points are denoted as Yp in the following scripts.

The scatterplot of \(\mathcal{X}\) points (black circles) and \(\mathcal{Y}\) points (red triangles) can be obtained with the below code.

XYpts = rbind(Xp,Yp) #combined Xp and Yp
lab=c(rep(1,nx),rep(2,ny))
lab.fac=as.factor(lab)
plot(XYpts,col=lab,pch=lab,xlab="x",ylab="y",main="Scatterplot of 2D Points from Two Classes")

The PCDs are constructed with vertices from \(\mathcal{X}\) points and the binary relation that determines the arcs are based on proximity regions which depend on the Delaunay triangulation of \(\mathcal{Y}\) points. More specifically, the proximity regions are defined with respect to the Delaunay triangles based on \(\mathcal{Y}\) points and vertex regions in each triangle are based on the center \(M\) or \(M=(\alpha,\beta,\gamma)\) in barycentric coordinates in the interior of each Delaunay triangle.

Convex hull of \(\mathcal{Y}\) points is partitioned by the Delaunay triangles constructed with the same \(\mathcal{Y}\) points (i.e., multiple triangles are the set of these Delaunay triangles whose union constitutes the convex hull of \(\mathcal{Y}\) points).

See Ceyhan (2005), Ceyhan (2010), and Ceyhan (2012) for more on AS-PCDs. Also see Okabe et al. (2000), Ceyhan (2010), and Sinclair (2016) for more on Delaunay triangulation and the corresponding algorithm (to compute the triangulation).

Below we plot the \(\mathcal{X}\) points together with the Delaunay triangulation of \(\mathcal{Y}\) points.

Xlim<-range(Xp[,1],Yp[,1])
Ylim<-range(Xp[,2],Yp[,2])
xd<-Xlim[2]-Xlim[1]
yd<-Ylim[2]-Ylim[1]
plot(Xp,xlab="x", ylab="y",xlim=Xlim+xd*c(-.05,.05),
     ylim=Ylim+yd*c(-.05,.05),pch=".",cex=3,main="X points and Delaunay Triangulation of Y Points")
#now, we add the Delaunay triangulation based on $Y$ points
DT<-interp::tri.mesh(Yp[,1],Yp[,2],duplicate="remove")
interp::plot.triSht(DT, add=TRUE, do.points = TRUE)
The scatterplot of the X points in the artificial data set together with the Delaunay triangulation of $Y$ points (dashed lines).

Figure 1.1: The scatterplot of the X points in the artificial data set together with the Delaunay triangulation of \(Y\) points (dashed lines).

Or, alternatively, we can use the plotDelaunay.tri function in pcds to obtain the same plot by executing plotDelaunay.tri(Xp,Yp,xlab="x",ylab="y",main="X points and Delaunay Triangulation of Y Points") command.

The number of Delaunay triangles based on \(\mathcal{Y}\) points can be obtained by the function num.delaunay.tri.

num.delaunay.tri(Yp)
#> [1] 4

1.1 Summary and Visualization with Arc-Slice PCDs

For AS-PCDs, the default center used to construct the vertex regions is M="CC" i.e., circumcenter in each triangle.

Number of arcs of the AS-PCD can be computed by the function num.arcsAS, which is an object of class “NumArcs” and takes the arguments

Its Call (with Narcs in the below script) just returns the description of the digraph. Its summary returns a description of the digraph, number of arcs of the AS-PCD, number of data (Xp) points in the convex hull of Yp (nontarget) points, number of data points in the Delaunay triangles based on Yp points, numbers of arcs in the induced subdigraphs in the Delaunay triangles, areas of the Delaunay triangles, indices of the vertices of the Delaunay triangles, indices of the Delaunay triangles data points resides. The plot function (i.e., plot.NumArcs) returns the plot of the Delaunay triangulation of Yp points, scatter plot of the Xp points and the number of arcs of the induced subdigraphs for each Delaunay triangle in the centroid of the triangle.

This function returns the following list as output:

M<-"CC" #try also M<-c(1,1,1) #or M<-c(1,2,3)
Narcs = num.arcsAS(Xp,Yp,M)
Narcs
#> Call:
#> num.arcsAS(Xp = Xp, Yp = Yp, M = M)
#>
#> Description:
#> Number of Arcs of the AS-PCD with vertices Xp and Related Quantities for the Induced Subdigraphs for the Points in the Delaunay Triangles 

summary(Narcs)
#> Call:
#> num.arcsAS(Xp = Xp, Yp = Yp, M = M)
#>
#> Description of the output:
#> Number of Arcs of the AS-PCD with vertices Xp and Related Quantities for the Induced Subdigraphs for the Points in the Delaunay Triangles
#>
#> Number of data (Xp) points in the convex hull of Yp (nontarget) points =  7
#> Number of data points in the Delaunay triangles based on Yp points =  2 1 1 3
#> Number of arcs in the entire digraph =  3
#> Numbers of arcs in the induced subdigraphs in the Delaunay triangles =  0 0 0 3
#> Areas of the Delaunay triangles (used as weights in the arc density of multi-triangle case):
#> 0.2214646 0.2173192 0.2593852 0.2648197
#>
#> Indices of the vertices of the Delaunay triangles (each column refers to a triangle):
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    5    3    3
#> [2,]    3    2    4    1
#> [3,]    2    3    5    4
#>
#> Indices of the Delaunay triangles data points resides:
#>  1  4  1  3 NA NA  4 NA  4  2
plot(Narcs)

The incidence matrix of the AS-PCD can be found by inci.matAS. Below, we only print the top 6 rows and columns of the incidence matrix.

IM<-inci.matAS(Xp,Yp,M)
IM[1:6,1:6]
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    1    0    0    0    0    0
#> [2,]    0    1    0    0    0    0
#> [3,]    0    0    1    0    0    0
#> [4,]    0    0    0    1    0    0
#> [5,]    0    0    0    0    1    0
#> [6,]    0    0    0    0    0    1

Technical aside: Once we have the incidence matrix of a digraph (or a graph), we can find an exact or an approximate dominating set and hence the exact or approximate domination number of the digraph (or graph). The function dom.num.greedy finds the approximate domination number (an upper bound) and also provides the indices of the points in the approximate dominating set. On the other hand, the function dom.num.exact finds the exact domination number and the indices of the points in an exact dominating set. dom.num.exact might take a long time for large \(n_x\) (e.g. \(n_x \ge 19\)), as it checks all possible subsets of the dataset to find a minimum dominating set.

dom.num.greedy(IM)  #try also dom.num.exact(IM)  #this might take a longer time for large  nx (i.e. nx >= 19)
#> $approx.dom.num
#> [1] 8
#> 
#> $ind.approx.mds
#> [1]  9  1 10  5  8  4  3  6

Plot of the arcs in the digraph AS-PCD is provided by the function plotASarcs, which take the arguments

For all the plots for AS-PCD, we use the option asp=1 so that the circles actually do look like circles (i.e., the arc-slices which are the boundary of the circles restricted to triangles look circular as they should).

plotASarcs(Xp,Yp,M,asp=1,xlab="",ylab="")
The arcs of the AS-PCD for the 2D artificial data set using the CC-vertex regions together with the Delaunay triangles based on the $Y$ points (dashed lines).

Figure 1.2: The arcs of the AS-PCD for the 2D artificial data set using the CC-vertex regions together with the Delaunay triangles based on the \(Y\) points (dashed lines).

Plot of the AS proximity regions is provided by the function plotASregs, which has the same arguments as the function plotASarcs.

plotASregs(Xp,Yp,M,xlab="",ylab="")
The AS proximity regions for all $X$ points in the 2D artificial data set using the CC-vertex regions together with the Delaunay triangles based on the $Y$ points (dashed lines).

Figure 1.3: The AS proximity regions for all \(X\) points in the 2D artificial data set using the CC-vertex regions together with the Delaunay triangles based on the \(Y\) points (dashed lines).

The function arcsAS is an object of class “PCDs” and has the same arguments as in num.arcsAS. Its Call (with Arcs in the below script) just returns the description of the digraph. Its summary returns a description of the digraph, selected tail (or source) points of the arcs in the digraph (first 6 or fewer are printed), selected head (or end) points of the arcs in the digraph (first 6 or fewer are printed), the parameters of the digraph (here it is only the center, “CC”), and various quantities of the digraph (namely, the number of vertices, number of partition points, number of triangles, number of arcs, and arc density. The plot function (i.e., plot.PCDs) returns the same plot as in plotASarcs, i.e., the plot of the arcs in the digraph together with the Delaunay triangles based on the \(\mathcal{Y}\) points, hence we comment it out below.

Arcs<-arcsAS(Xp,Yp,M)
Arcs
#> Call:
#> arcsAS(Xp = Xp, Yp = Yp, M = M)
#> 
#> Type:
#> [1] "Arc Slice Proximity Catch Digraph (AS-PCD) for 2D Points in Multiple Triangles with CC-Vertex Regions"
summary(Arcs)
#> Call:
#> arcsAS(Xp = Xp, Yp = Yp, M = M)
#> 
#> Type of the digraph:
#> [1] "Arc Slice Proximity Catch Digraph (AS-PCD) for 2D Points in Multiple Triangles with CC-Vertex Regions"
#> 
#>  Vertices of the digraph =  Xp 
#>  Partition points of the region =  Yp 
#> 
#>  Selected tail (or source) points of the arcs in the digraph
#>       (first 6 or fewer are printed) 
#>           [,1]      [,2]
#> [1,] 0.5281055 0.2460877
#> [2,] 0.5514350 0.3279207
#> [3,] 0.5514350 0.3279207
#> 
#>  Selected head (or end) points of the arcs in the digraph
#>       (first 6 or fewer are printed) 
#>           [,1]      [,2]
#> [1,] 0.5514350 0.3279207
#> [2,] 0.7883051 0.4533342
#> [3,] 0.5281055 0.2460877
#> 
#> Parameters of the digraph
#> $center
#> [1] "CC"
#> 
#> Various quantities of the digraph
#>         number of vertices number of partition points 
#>                 7.00000000                 5.00000000 
#>        number of triangles             number of arcs 
#>                 4.00000000                 3.00000000 
#>                arc density 
#>                 0.07142857
plot(Arcs, asp=1)

1.2 Summary and Visualization with Proportional Edge PCDs

The functions for PE-PCD have similar arguments as the AS-PCDs except (i) PE-PCDs have the additional argument for the expansion parameter \(r \ge 1\) and (ii) for PE-PCDs, the default center used to construct the vertex regions is M="CM" i.e., center of mass of each triangle.

Number of arcs of the PE-PCD can be computed by the function num.arcsPE which is an object of class “NumArcs”. The function takes arguments Xp,Yp,r,M where r is the expansion parameter and returns the same type of output as the function num.arcsAS.

M<-c(1,1,1) #try also M<-c(1,2,3) #or M<-"CC"
r<-1.5 #try also r<-2 or r=1.25
Narcs = num.arcsPE(Xp,Yp,r,M)
summary(Narcs)
#> Call:
#> num.arcsPE(Xp = Xp, Yp = Yp, r = r, M = M)
#>
#> Description of the output:
#> Number of Arcs of the PE-PCD with vertices Xp and Related Quantities for the Induced Subdigraphs for the Points in the Delaunay Triangles
#>
#> Number of data (Xp) points in the convex hull of Yp (nontarget) points =  7
#> Number of data points in the Delaunay triangles based on Yp points =  2 1 1 3
#> Number of arcs in the entire digraph =  3
#> Numbers of arcs in the induced subdigraphs in the Delaunay triangles =  1 0 0 2
#> Areas of the Delaunay triangles (used as weights in the arc density of multi-triangle case):
#> 0.2214646 0.2173192 0.2593852 0.2648197
#>
#> Indices of the vertices of the Delaunay triangles (each column refers to a triangle):
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    5    3    3
#> [2,]    3    2    4    1
#> [3,]    2    3    5    4
#>
#> Indices of the Delaunay triangles data points resides:
#>  1  4  1  3 NA NA  4 NA  4  2 

plot(Narcs)

The incidence matrix of the PE-PCD can be found by inci.matPE. Once the incidence matrix is found, approximate and exact dominating sets and hence domination numbers can be found by the functions dom.num.greedy and dom.num.exact, respectively.

Plot of the arcs in the digraph can be obtained by plotPEarcs.

plotPEarcs(Xp,Yp,r,M,xlab="",ylab="")
The arcs of the PE-PCD for the 2D artificial data set using the CM-vertex regions and expansion parameter $r=1.5$ together with the Delaunay triangles based on the $Y$ points (dashed lines).

Figure 1.4: The arcs of the PE-PCD for the 2D artificial data set using the CM-vertex regions and expansion parameter \(r=1.5\) together with the Delaunay triangles based on the \(Y\) points (dashed lines).

Plot of the PE proximity regions can be obtained by plotPEregs.

plotPEregs(Xp,Yp,r,M,xlab="",ylab="")
The PE proximity regions for all the points the 2D artificial data set  using the CM-vertex regions and expansion parameter $r=1.5$ together with the Delaunay triangles based on the $Y$ points (dashed lines).

Figure 1.5: The PE proximity regions for all the points the 2D artificial data set using the CM-vertex regions and expansion parameter \(r=1.5\) together with the Delaunay triangles based on the \(Y\) points (dashed lines).

The function arcsPE is an object of class “PCDs”. Its Call, summary, and plot are as in arcsAS with the addition of the expansion parameter (see the Parameters of the digraph part) in the summary.PCDs. The plot function returns the same plot as in plotPEarcs, hence we comment it out below.

Arcs<-arcsPE(Xp,Yp,r,M)
Arcs
#> Call:
#> arcsPE(Xp = Xp, Yp = Yp, r = r, M = M)
#> 
#> Type:
#> [1] "Proportional Edge Proximity Catch Digraph (PE-PCD) for 2D points in Multiple Triangles with Expansion parameter r = 1.5 and Center M = (1,1,1)"
summary(Arcs)
#> Call:
#> arcsPE(Xp = Xp, Yp = Yp, r = r, M = M)
#> 
#> Type of the digraph:
#> [1] "Proportional Edge Proximity Catch Digraph (PE-PCD) for 2D points in Multiple Triangles with Expansion parameter r = 1.5 and Center M = (1,1,1)"
#> 
#>  Vertices of the digraph =  Xp 
#>  Partition points of the region =  Yp 
#> 
#>  Selected tail (or source) points of the arcs in the digraph
#>       (first 6 or fewer are printed) 
#>           [,1]      [,2]
#> [1,] 0.4089769 0.6775706
#> [2,] 0.5281055 0.2460877
#> [3,] 0.5514350 0.3279207
#> 
#>  Selected head (or end) points of the arcs in the digraph
#>       (first 6 or fewer are printed) 
#>           [,1]      [,2]
#> [1,] 0.2875775 0.9568333
#> [2,] 0.5514350 0.3279207
#> [3,] 0.5281055 0.2460877
#> 
#> Parameters of the digraph
#> $center
#> [1] 1 1 1
#> 
#> $`expansion parameter`
#> [1] 1.5
#> 
#> Various quantities of the digraph
#>         number of vertices number of partition points 
#>                 7.00000000                 5.00000000 
#>        number of triangles             number of arcs 
#>                 4.00000000                 3.00000000 
#>                arc density 
#>                 0.07142857
plot(Arcs)

1.2.1 Testing Spatial Interaction with the PE-PCDs

We can test the spatial pattern or interaction of segregation/association in the 2D setting based on arc density or domination number of PE-PCDs.

The Use of Arc Density of PE-PCDs for Testing Spatial Interaction

We can test the spatial interaction between two classes or species based on the arc density of PE-PCDs using the function PEarc.dens.test which takes the arguments

  • Xp, a set of 2D points which constitute the vertices of the AS-PCD (i.e., class \(\mathcal{X}\) points),
  • Yp, a set of 2D points which constitute the vertices of the Delaunay triangles (i.e., class \(\mathcal{Y}\) points),
  • r a positive real number which serves as the expansion parameter in PE proximity region; must be \(\ge 1\),
  • ch.cor, a logical argument for convex hull correction, default ch.cor=FALSE, recommended when both Xp and Yp have the same or similar rectangular support.
  • alternative, type of the alternative hypothesis in the test, one of "two.sided", "less", "greater".
  • conf.level, the level of the confidence interval, default is 0.95, for the arc density of PE-PCD based on the 2D data set Xp.

This function is an object of class “htest” (i.e., hypothesis test) and performs a hypothesis test of complete spatial randomness (CSR) or uniformity of Xp points in the convex hull of Yp points against the alternatives of segregation (where Xp points cluster away from Yp points) and association (where Xp points cluster around Yp points) based on the normal approximation of the arc density of the PE-PCD for uniform 2D data utilizing the asymptotic normality of the \(U\)-statistics.

The function returns the test statistic, \(p\)-value for the corresponding alternative, the confidence interval, estimate and null value for the parameter of interest (which is the arc density here), and method and name of the data set used.

Under the null hypothesis of uniformity of Xp points in the convex hull of Yp points, arc density of PE-PCD whose vertices are Xp points equals to its expected value under the uniform distribution and alternative could be “two-sided" (i.e.,”two.sided”), or "left-sided" (i.e.,“less”for the case in which $\X$ points are accumulated around the $\Y$ points, or association) or "right-sided" (i.e.,“greater”` for the case in which \(\mathcal{X}\) points are accumulated around the centers of the triangles whose vertices are from the \(\mathcal{Y}\) points, or segregation).

See Ceyhan, Priebe, and Wierman (2006) and Ceyhan (2014) for more detail on the arc density of PE-PCD and its use for testing 2D spatial interactions. We only provide the two-sided test below, although both one-sided versions are also available.

PEarc.dens.test(Xp,Yp,r) #try also PEarc.dens.test(Xp,Yp,r,alt="l") or with alt="g"
#> 
#>  Large Sample z-Test Based on Arc Density of PE-PCD for Testing
#>  Uniformity of 2D Data ---
#>  without Convex Hull Correction
#> 
#> data:  Xp
#> standardized arc density (i.e., Z) = -0.21983, p-value = 0.826
#> alternative hypothesis: true (expected) arc density is not equal to 0.09712203
#> 95 percent confidence interval:
#>  0.04234726 0.14084889
#> sample estimates:
#> arc density 
#>  0.09159807

The Use of Domination Number of PE-PCDs for Testing Spatial Interaction

We first provide two functions to compute the domination number of PE-PCDs: PEdom.num and PEdom.num.nondeg.

The function PEdom.num takes the same arguments as num.arcsPE and returns a list with three elements as output:

  • dom.num, the overall domination number of the PE-PCD whose vertices are Xp points,
  • ind.mds, the data indices of the minimum dominating set of the PE-PCD whose vertices are Xp points,
  • tri.dom.nums, the vector of domination numbers of the PE-PCD components for the Delaunay triangles.

This function takes any center in the interior of the triangles as its argument or circumcenter (“CC”). The vertex regions in each triangle are based on the center \(M=(\alpha,\beta,\gamma)\) in barycentric coordinates in the interior of each Delaunay triangle or based on circumcenter of each Delaunay triangle (default for \(M=(1,1,1)\) which is the center of mass of the triangle).

On the other hand, PEdom.num.nondeg takes only the arguments Xp,Yp,r and returns the same output as in PEdom.num function, but uses one of the non-degeneracy centers in the multiple triangle case (hence M is not an argument for this function). That is, the center M is one of the three centers that renders the asymptotic distribution of domination number to be non-degenerate for a given value of \(r \in (1,1.5]\) and M is center of mass for \(r=1.5\).

These two functions are different from the function dom.num.greedy since they give an exact minimum dominating set and the exact domination number and from dom.num.exact, since they give a minimum dominating set and number in polynomial time (in the number of vertices of the digraph, i.e., number of Xp points).

PEdom.num(Xp,Yp,r,M) #try also PEdom.num(Xp,Yp,r=2,M)
#> $dom.num
#> [1] 5
#> 
#> $ind.mds
#> [1]  3 10  4  9  2
#> 
#> $tri.dom.nums
#> [1] 1 1 1 2
PEdom.num.nondeg(Xp,Yp,r) #try also PEdom.num.nondeg(Xp,Yp,r=1.25)
#> $dom.num
#> [1] 5
#> 
#> $ind.mds
#> [1]  3 10  4  2  9
#> 
#> $tri.dom.nums
#> [1] 1 1 1 2

We can test the spatial patterns of segregation or association based on domination number of PE-PCD using the functions PEdom.num.binom.test or PEdom.num.norm.test. Each of these functions is an object of class “htest” (i.e., hypothesis test) and performs the same hypothesis test as in PEarc.dens.test. Both functions take the arguments Xp,Yp,r,ch.cor,ndt,alternative,conf.level where ndt is the number of Delaunay triangles based on Yp points with default NULL and other variables are as in PEarc.dens.test. They return the test statistic, \(p\)-value for the corresponding alternative, the confidence interval, estimate and null value for the parameter of interest (which is \(P(\mbox{domination number}\le 2)\)), and method and name of the data set used.

Under the null hypothesis of uniformity of Xp points in the convex hull of Yp points, probability of success (i.e., \(P(\mbox{domination number}\le 2)\)) equals to its expected value under the uniform distribution and alternative could be two-sided, or right-sided (i.e., data is accumulated around the Yp points, or association) or left-sided (i.e., data is accumulated around the centers of the triangles, or segregation).

In this case, the PE proximity region is constructed with the expansion parameter \(r \in (1,1.5]\) and \(M\)-vertex regions where M is a center that yields non-degenerate asymptotic distribution of the domination number.

The test statistic in PEdom.num.binom.test is based on the binomial distribution, when success is defined as domination number being less than or equal to 2 in the one triangle case (i.e., number of failures is equal to number of times restricted domination number = 3 in the triangles). For this approximation to work, number of Xp points must be at least 7 times more than number of Yp points.

We only provide the two-sided tests below, although both one-sided versions are also available.

PEdom.num.binom.test(Xp,Yp,r) #try also PEdom.num.binom.test(Xp,Yp,r,alt="g") or with alt="l"
#> 
#>  Large Sample Binomial Test based on the Domination Number of PE-PCD for
#>  Testing Uniformity of 2D Data ---
#>  without Convex Hull Correction
#> 
#> data:  Xp
#> # of times domination number is <= 2 = 4, p-value = 0.5785
#> alternative hypothesis: true Pr(Domination Number <=2) is not equal to 0.7413
#> 95 percent confidence interval:
#>  0.3976354 1.0000000
#> sample estimates:
#>           domination number   || Pr(domination number <= 2) 
#>                             5                             1

The test statistic in PEdom.num.norm.test is based on the normal approximation to the binomial distribution of the (asymptotic) distribution of the domination number of PE-PCDs. For this approximation to work, number of Yp points must be at least 5 (i.e., about 7 or more Delaunay triangles) and number of Xp points must be at least 7 times more than number of Yp points.

We only provide the two-sided tests below, although both one-sided versions are also available.

PEdom.num.norm.test(Xp,Yp,r) #try also PEdom.num.norm.test(Xp,Yp,r,alt="g") or with alt="l"
#> 
#>  Normal Approximation to the Domination Number of PE-PCD for Testing
#>  Uniformity of 2D Data ---
#>  without Convex Hull Correction
#> 
#> data:  Xp
#> standardized domination number (i.e., Z) = 1.1815, p-value = 0.2374
#> alternative hypothesis: true expected domination number is not equal to 2.9652
#> 95 percent confidence interval:
#>  3.283383 6.716617
#> sample estimates:
#>          domination number   || Pr(domination number <= 2) 
#>                            5                            1

See Ceyhan and Priebe (2007), Ceyhan (2011), and Ceyhan (2012) for more on the domination number of PE-PCDs and their use in testing spatial interaction.

In all the test functions (based on arc density and domination number) above, the option ch.cor is for convex hull correction which is recommended when Xp points tend to be mostly within the support of the Yp points and is implemented with a correction factor (default is “no convex hull correction”, i.e., ch.cor=FALSE). When the symmetric difference of the supports of \(\mathcal{X}\) and \(\mathcal{Y}\) is non-negligible, the tests can be adjusted to account for the \(\mathcal{X}\) points outside the convex hull of \(\mathcal{Y}\) points with the option ch.cor=TRUE. However, currently, the convex hull correction factor is derived under the assumption of uniformity of Xp and Yp points in the study window. For example, PEarc.dens.test(Xp,Yp,r,ch=TRUE) would yield the convex hull corrected version of the arc-based test of spatial interaction.

1.3 Summary and Visualization with Central Similarity PCDs

The functions for CS-PCD have similar arguments as the num.arcsPE with the expansion parameter \(t >0\). For CS-PCDs, the default center used to construct the edge regions is M="CM" i.e., center of mass of each triangle.

The functions for CS-PCD have similar arguments as the PE-PCDs except (i) the expansion parameter is\(t>0\) and (ii) for CS-PCDs, the default center used to construct the edge regions is M="CM" i.e., center of mass of each triangle.

Number of arcs of the CS-PCD can be computed by the function num.arcsCS, which is an object of class “NumArcs” and takes similar arguments and returns the similar output items as in num.arcsPE.

M<-c(1,1,1) #try also M<-c(1,2,3)
tau<-1.5 #try also tau<-2
Narcs = num.arcsCS(Xp,Yp,tau,M)
summary(Narcs)
#> Call:
#> num.arcsCS(Xp = Xp, Yp = Yp, t = tau, M = M)
#>
#> Description of the output:
#> Number of Arcs of the CS-PCD with vertices Xp and Related Quantities for the Induced Subdigraphs for the Points in the Delaunay Triangles
#>
#> Number of data (Xp) points in the convex hull of Yp (nontarget) points =  7
#> Number of data points in the Delaunay triangles based on Yp points =  2 1 1 3
#> Number of arcs in the entire digraph =  3
#> Numbers of arcs in the induced subdigraphs in the Delaunay triangles =  1 0 0 2
#> Areas of the Delaunay triangles (used as weights in the arc density of multi-triangle case):
#> 0.2214646 0.2173192 0.2593852 0.2648197
#>
#> Indices of the vertices of the Delaunay triangles (each column refers to a triangle):
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    5    3    3
#> [2,]    3    2    4    1
#> [3,]    2    3    5    4
#>
#> Indices of the Delaunay triangles data points resides:
#>  1  4  1  3 NA NA  4 NA  4  2 
#>  
#plot(Narcs)

The incidence matrix of the CS-PCD can be found by inci.matCS. Below, we only print the top 6 rows of the incidence matrix. Once the incidence matrix is found, approximate and exact domination numbers can be found by the functions dom.num.greedy and dom.num.exact, respectively.

Plot of the arcs in the digraph can be obtained by the function plotCSarcs.

plotCSarcs(Xp,Yp,tau,M,xlab="",ylab="")
The arcs of the CS-PCD for the 2D artificial data set using the CM-edge regions and expansion parameter $t=1.5$ together with the Delaunay triangles based on the $Y$ points (dashed lines).

Figure 1.6: The arcs of the CS-PCD for the 2D artificial data set using the CM-edge regions and expansion parameter \(t=1.5\) together with the Delaunay triangles based on the \(Y\) points (dashed lines).

Plot of the CS proximity regions can be obtained by the function plotCSregs.

plotCSregs(Xp,Yp,tau,M,xlab="",ylab="")
The CS proximity regions for all the points the 2D artificial data set  using the CM-edge regions and expansion parameter $t=1.5$ together with the Delaunay triangles based on the $Y$ points (dashed lines).

Figure 1.7: The CS proximity regions for all the points the 2D artificial data set using the CM-edge regions and expansion parameter \(t=1.5\) together with the Delaunay triangles based on the \(Y\) points (dashed lines).

The function arcsCS is an object of class “PCDs”. Its arguments and the output for Call, summary, and plot are as in arcsPE. The plot function returns the same plot as in plotCSarcs, hence we comment it out below.

Arcs<-arcsCS(Xp,Yp,tau,M)
Arcs
#> Call:
#> arcsCS(Xp = Xp, Yp = Yp, t = tau, M = M)
#> 
#> Type:
#> [1] "Central Similarity Proximity Catch Digraph (CS-PCD) for 2D Points in the Multiple Triangles with Expansion Parameter t = 1.5 and Center M = (1,1,1)"
summary(Arcs)
#> Call:
#> arcsCS(Xp = Xp, Yp = Yp, t = tau, M = M)
#> 
#> Type of the digraph:
#> [1] "Central Similarity Proximity Catch Digraph (CS-PCD) for 2D Points in the Multiple Triangles with Expansion Parameter t = 1.5 and Center M = (1,1,1)"
#> 
#>  Vertices of the digraph =  Xp 
#>  Partition points of the region =  Yp 
#> 
#>  Selected tail (or source) points of the arcs in the digraph
#>       (first 6 or fewer are printed) 
#>           [,1]      [,2]
#> [1,] 0.4089769 0.6775706
#> [2,] 0.5281055 0.2460877
#> [3,] 0.5514350 0.3279207
#> 
#>  Selected head (or end) points of the arcs in the digraph
#>       (first 6 or fewer are printed) 
#>           [,1]      [,2]
#> [1,] 0.2875775 0.9568333
#> [2,] 0.5514350 0.3279207
#> [3,] 0.5281055 0.2460877
#> 
#> Parameters of the digraph
#> $center
#> [1] 1 1 1
#> 
#> $`expansion parameter`
#> [1] 1.5
#> 
#> Various quantities of the digraph
#>         number of vertices number of partition points 
#>                 7.00000000                 5.00000000 
#>        number of triangles             number of arcs 
#>                 4.00000000                 3.00000000 
#>                arc density 
#>                 0.07142857
plot(Arcs)

1.3.1 Testing Spatial Interaction with the CS-PCDs

We can test the spatial pattern or interaction of segregation/association based on arc density of CS-PCDs (as the distribution of the domination number of CS-PCDs is still a topic of ongoing work).

The Use of Arc Density of CS-PCDs for Testing Spatial Interaction

We can test the spatial interaction between two classes or species based on the arc density of CS-PCDs using the function CSarc.dens.test. This function is an object of class “htest” and performs the same hypothesis test as in Section 1.2.1 Moreover, it takes similar arguments and returns similar output as in PEarc.dens.test function, for the same null and alternative hypotheses. See also Ceyhan, Priebe, and Marchette (2007) and Ceyhan (2014) for more detail.

We only provide the two-sided test below, although both one-sided versions are also available.

CSarc.dens.test(Xp,Yp,tau) #try also CSarc.dens.test(Xp,Yp,tau,alt="l") or with alt="g"
#> 
#>  Large Sample z-Test Based on Arc Density of CS-PCD for Testing
#>  Uniformity of 2D Data ---
#>  without Convex Hull Correction
#> 
#> data:  Xp
#> standardized arc density (i.e., Z) = 0.6039, p-value = 0.5459
#> alternative hypothesis: true (expected) arc density is not equal to 0.06749794
#> 95 percent confidence interval:
#>  0.0252619 0.1473522
#> sample estimates:
#> arc density 
#>  0.08630702

As in the tests based on PE-PCD, it is possible to account for \(\mathcal{X}\) points outside the convex hull of \(\mathcal{Y}\) points, with the option ch.cor=TRUE. For example, CSarc.dens.test(Xp,Yp,tau,ch=TRUE) would yield the convex hull corrected version of the arc-based test of spatial interaction.

References

Ceyhan, E. 2005. “An Investigation of Proximity Catch Digraphs in Delaunay Tessellations, Also Available as Technical Monograph Titled Proximity Catch Digraphs: Auxiliary Tools, Properties, and Applications.” PhD thesis, The Johns Hopkins University, Baltimore, MD, 21218.
———. 2010. “Extension of One-Dimensional Proximity Regions to Higher Dimensions.” Computational Geometry: Theory and Applications 43(9): 721–48.
———. 2011. “Spatial Clustering Tests Based on Domination Number of a New Random Digraph Family.” Communications in Statistics - Theory and Methods 40(8): 1363–95.
———. 2012. “An Investigation of New Graph Invariants Related to the Domination Number of Random Proximity Catch Digraphs.” Methodology and Computing in Applied Probability 14(2): 299–334.
———. 2014. “Comparison of Relative Density of Two Random Geometric Digraph Families in Testing Spatial Clustering.” TEST 23(1): 100–134.
Ceyhan, E., and C. E. Priebe. 2007. “On the Distribution of the Domination Number of a New Family of Parametrized Random Digraphs.” Model Assisted Statistics and Applications 1(4): 231–55.
Ceyhan, E., C. E. Priebe, and D. J. Marchette. 2007. “A New Family of Random Graphs for Testing Spatial Segregation.” Canadian Journal of Statistics 35(1): 27–50.
Ceyhan, E., C. E. Priebe, and J. C. Wierman. 2006. “Relative Density of the Random \(r\)-Factor Proximity Catch Digraphs for Testing Spatial Patterns of Segregation and Association.” Computational Statistics & Data Analysis 50(8): 1925–64.
Okabe, A., B. Boots, K. Sugihara, and S. N. Chiu. 2000. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. Wiley, New York.
Sinclair, D. 2016. “S-Hull: A Fast Radial Sweep-Hull Routine for Delaunay Triangulation.” https://arxiv.org/abs/1604.01428.