In most application fields and disciplines, the continuous increase in the sample size of spatial data, both in terms of volume and richness of explanatory variables, has raised problems in the use of Geographically Weighted Regression (GWR, Brunsdon et al. 1996; McMillen, 1996) in recent years. As the spatial extent of a sample increases, its spatial resolution and the richness of the explanatory variables also increase and it becomes increasingly necessary to take spatial heterogeneity into account. This can impose computation times that may seem prohibitive for GWR methods. In 2017, the use of GWR methods on samples of more than 50,000 observations was not really feasible and some authors have proposed improvements to meet the challenges of using GWR with bigdata. The two main problems concern here the time required to calculate each local coefficients and the memory requirements imposed for storing the hat matrix of size \(n \times n\) for estimating variance of parameters. To answer to these two issues, various avenues have been explored:

- Optimization of routines for calculating local regression and spatial weight matrix (lower level language, sparse weight matrix),
- Avoiding to store the hat matrix,
- Use of adaptive kernels based on K-nearest neighbors,
- Paralellization of local models and distance calculations,
- Improvement of bandwidth selection procedures,
- Use a subset of local models, i.e. reduce the number of target points.

Indeed, one can use a limited set of evaluation points (or target points) and interpolation methods to generalize the fit to others points for speeding up the computation. In the more general context of locally weighted regression, Loader (1999) proposed an interpolation method based on an adaptive decision tree approach using local density of points. In non-parametric literature, this type of two stages strategy that consists to approximate a function on a limited set of points and to extrapolate it is fairly usual: the famous Super Smoother algorithm of Friedman (1984) is based on this idea. There are a wide variety of proposals for KDEs and regression kernels on how to choose a subset of points to obtain the best extrapolations with the least computational load possible. It could be based on the density of observations, the complexity of curves or by equidistributing them on the support of the function.

The release 1.0 of mgwrsar package proposes two improvements to speed
up the computation of the GWR like models considered in the previous
release of mgwrsar package, with or without spatial autocorrelation,
through the joint use of Gaussian rough kernel and estimations based on
target points subset. It proposes: - use of target points : a method
that considers a two stage procedure with computation of local
regression on a subset of target points with three ways of selecting the
and extrapolation on the full sample in a second stage.

- use of gaussian rough kernel : a method in which only the distance to
the \(NN\) first neighbors are
computed, with \(NN\) sufficiently
large to contain optimal bandwidth, but sufficiently small to reduce
significantly the number of distance computation and the number on non
null weights in local samples.

If we consider a sample of 4000 observations, the use of Gaussian rough kernel accelerates GWR computation by a factor of 2.5 without significant bias in the coefficients. Moreover, the use of suitable target points set accelerates GWR computation by a factor of 50 and surprisingly reduces the bias for final local coefficients by improving the choice of optimal bandwidth (see Geniaux 2022, in prep).

This package proposes two ways of selecting target points, first by using a quadcell algorithm based on the local density of locations, second by considering spatial smooth of residuals of a first stage OLS model. The later method outperforms in general the equidistributed method based on the local density of locations, particularly when the size of target points is smaller than 25 % of the locations. This way of selecting target points is proposed as default method in this release.

```
library(mgwrsar)
#> Loading required package: Rcpp
#> Loading required package: sp
#> Loading required package: leaflet
#> Loading required package: Matrix
## loading data example
data(mydata)
coord=as.matrix(mydata[,c("x","y")])
```

The GWR model can be expressed as: \[\begin{equation} y_{i}=\sum_{j=1}^{J}\beta_{j}(u_{i},v_{i})x_{ij}+\epsilon_{i} \;\;\; i=1,2,..,n \label{modelGWR} \end{equation}\] where \(J\) is the number of exogenous regressors, \(x_{ij}\) is the regressor \(j\) for the observation \(i\) and may be an intercept.

The estimator for \(\beta_{j}(u_{i},v_{i})\) is : \[\begin{equation} \hat{\beta}(u_{i},v_{i})=(X'W_{i}X)^{-1} X'W_{i}Y \label{BetaGWR} \end{equation}\]

If we note \(W\) a spatial weight matrix based on a distance decay kernel or using k-nearest neighbors, then \(W_{i}\) is the diagonal spatial weight matrix \(n\times n\) specific to location \(i\), with a diagonal composed by the ith row of \(W\) and zero for all others elements. Usual kernels are the Gaussian and bisquare kernels, with a bandwidth defined in distance or in number or neighbours for adaptive kernels.

One way to speed up GWR model estimation and to reduce the required amount of memory consists in increasing sparsity of the weight matrix \(W\). When one use \(k\) first neighbors weighting scheme or an adaptive kernel with null weights beyond bandwidth (like bisquare, trisquare, epanelechnikov, triangle or rectangle), then \(W\) is very sparse when the bandwidth is small. When Gaussian like kernels are used then one way to increase sparsity of \(W\) is to use truncated Gaussian kernels that shrink to zero weights when weights are very small or when it concerns observations beyond a given number of neighbors.

In the first example, we show how to compute a GWR model without or
with rough Gaussian kernel: we use an optimal bandwidth \(H\) of 0.3, corresponding on average to the
distance to the 25th neighbor and for the rough Gaussian kernel, only
the first 300 neighbors (*NN=300*) are taken into account in the
distance and weight calculations. We can observe in this case that the
computation time is divided by 2. The larger the sample, the greater the
relative improvement. We notice in this example where the optimal
bandwidth is far from the bound used to truncate the kernel, that the
use of a rough Gaussian kernel has no detrimental effect. When the
optimal bandwidth is closer to the neighborhood used to truncate the
Gaussian kernel, then the differences in the approximation will be
larger and it may be recommended to increase the \(NN\) parameter that defines the truncation
neighborhood of the Gaussian kernel.

```
## without rough gaussian kernel
ptm1<-proc.time()
model_GWR<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('gauss'),H=0.03, Model = 'GWR',control=list(SE=TRUE))
(proc.time()-ptm1)[3]
#> elapsed
#> 0.809
## with rough gaussian kernel
ptm1<-proc.time()
model_GWR_grk<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('gauss'),H=0.03, Model = 'GWR',control=list(SE=TRUE,NN=300))
(proc.time()-ptm1)[3]
#> elapsed
#> 0.297
summary(model_GWR$Betav)
#> Intercept X1 X2 X3
#> Min. :-3.7932 Min. :0.01693 Min. :-0.7656 Min. :-2.03293
#> 1st Qu.:-1.4802 1st Qu.:0.35118 1st Qu.: 1.0952 1st Qu.:-1.29209
#> Median :-0.8356 Median :0.54202 Median : 1.6313 Median :-1.00811
#> Mean :-0.8307 Mean :0.50006 Mean : 1.6746 Mean :-1.00113
#> 3rd Qu.:-0.2363 3rd Qu.:0.63705 3rd Qu.: 2.4534 3rd Qu.:-0.72407
#> Max. : 2.1026 Max. :0.94252 Max. : 4.0355 Max. : 0.04574
summary(model_GWR_grk$Betav)
#> Intercept X1 X2 X3
#> Min. :-3.7932 Min. :0.01693 Min. :-0.7656 Min. :-2.03293
#> 1st Qu.:-1.4802 1st Qu.:0.35118 1st Qu.: 1.0952 1st Qu.:-1.29209
#> Median :-0.8356 Median :0.54202 Median : 1.6313 Median :-1.00811
#> Mean :-0.8307 Mean :0.50006 Mean : 1.6746 Mean :-1.00113
#> 3rd Qu.:-0.2363 3rd Qu.:0.63705 3rd Qu.: 2.4534 3rd Qu.:-0.72407
#> Max. : 2.1026 Max. :0.94252 Max. : 4.0355 Max. : 0.04574
```

In the second example, we show how to compute a GWR model with sample
of target points using mgwrsar 1.0. Choosing target points with respect
to the spatial distribution of residuals of a first stage OLS regression
is preferable to select target point with respect to density of
locations; this option can be choosen using *type=‘residuals’* as
control variable in mgwrsar function.

The selection of target points with *type=‘residuals’* uses 3
steps:

- In a first step, we fit an initial linear model without taking into account spatial heterogeneity, using OLS, and get the residuals.
- In a second step, in order to identify hot spots of residuals, we smooth the first stage residuals using a k-nearest neighbors kernel with \(K\) as number of neighbors,
- In a third step, the selection of the target points is done by selecting the points where the absolute value of the smoothed residuals is locally the highest among a \(kWtp\) neighborhood.

In order to get local maxima at fine scale, we use between twelve and sixteen neighbors (default) for \(kWtp\). In the last step, the value of \(K\) allow to increase/decrease the number of local maxima chosen. The choice of \(kWtp\) is less critical than \(K\). The resulting number of target points depends both on \(K\) and on the spatial distribution of residuals. The more \(K\) increases, the more the number of target points decreases: the size of resulting target points set is less than \(n/K\).

To illustrate the methodology, we simulate a single run of the GWR DGP proposed by Geniaux and Martinetti (2017a) and apply the selection of target points with respect to first stage OLS residuals. The two figures 1 maps the value of the absolute value of the smoothed (with \(kWtp = 16\)) residuals of the linear model, and the chosen target points when \(K\) is set to 48 (green points), and when parameter \(K\) is set to 8 (blue points). We can see on the map on the left that the absolute value of the smoothed residual of the green target points is higher than for their 48 closest neighbors.

In the following example, the function \(find_TP\) is used to identify a target point set with \(K\)=6 that leads to choose 91 targets points, that are then used in MGWRSAR function : MGWRSAR function first compute the local \(\beta(u_i,v_i)\) coefficients for the 91 sub-samples (\(i %in% TP\)) and then extrapolate the local \(\beta(u_i,v_i)\) coefficients for the 909 other locations.

As expected approximation of\(\beta(u_i,v_i)\) coefficients using this target point set are different from those obtained with full GWR, with small difference for the mean of coefficients, but with smaller range and variance, for result with target points. The time calculation is divided by 10 and, as mentionned before, the larger the sample, the greater the relative improvement is, with a time calculation divided by 50 for big sample, particularly if this method is used in conjunction with rough gaussian kernel (see Geniaux 2022, in prep).

```
TP=find_TP(formula = 'Y_gwr~X1+X2+X3', data =mydata,coord=coord,K=6,type='residuals')
# only 60 targets points are used
length(TP)
#> [1] 91
ptm1<-proc.time()
model_GWR_tp<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('gauss'),H=0.03, Model = 'GWR',control=list(SE=TRUE,TP=TP,kWtp=12))
(proc.time()-ptm1)[3]
#> elapsed
#> 0.218
ptm1<-proc.time()
model_GWR_tp_NN<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('gauss'),H=0.03, Model = 'GWR',control=list(SE=TRUE,TP=TP,kWtp=12,NN=300))
(proc.time()-ptm1)[3]
#> elapsed
#> 0.184
summary(model_GWR$Betav)
#> Intercept X1 X2 X3
#> Min. :-3.7932 Min. :0.01693 Min. :-0.7656 Min. :-2.03293
#> 1st Qu.:-1.4802 1st Qu.:0.35118 1st Qu.: 1.0952 1st Qu.:-1.29209
#> Median :-0.8356 Median :0.54202 Median : 1.6313 Median :-1.00811
#> Mean :-0.8307 Mean :0.50006 Mean : 1.6746 Mean :-1.00113
#> 3rd Qu.:-0.2363 3rd Qu.:0.63705 3rd Qu.: 2.4534 3rd Qu.:-0.72407
#> Max. : 2.1026 Max. :0.94252 Max. : 4.0355 Max. : 0.04574
summary(model_GWR_tp$Betav)
#> Intercept X1 X2 X3
#> Min. :-3.5402 Min. :0.02275 Min. :-0.5035 Min. :-2.0329
#> 1st Qu.:-1.5069 1st Qu.:0.35801 1st Qu.: 1.1687 1st Qu.:-1.2881
#> Median :-0.9151 Median :0.56755 Median : 1.6790 Median :-0.9909
#> Mean :-0.8548 Mean :0.50704 Mean : 1.6974 Mean :-0.9969
#> 3rd Qu.:-0.3134 3rd Qu.:0.64612 3rd Qu.: 2.4576 3rd Qu.:-0.7323
#> Max. : 1.6742 Max. :0.94252 Max. : 3.6257 Max. :-0.1023
summary(model_GWR_tp_NN$Betav)
#> Intercept X1 X2 X3
#> Min. :-3.5402 Min. :0.02275 Min. :-0.5035 Min. :-2.0329
#> 1st Qu.:-1.5069 1st Qu.:0.35801 1st Qu.: 1.1687 1st Qu.:-1.2881
#> Median :-0.9151 Median :0.56755 Median : 1.6790 Median :-0.9909
#> Mean :-0.8548 Mean :0.50704 Mean : 1.6974 Mean :-0.9969
#> 3rd Qu.:-0.3134 3rd Qu.:0.64612 3rd Qu.: 2.4576 3rd Qu.:-0.7323
#> Max. : 1.6742 Max. :0.94252 Max. : 3.6257 Max. :-0.1023
```

In this example, we use an in-sample of size 800 for model estimation and an out-sample of size 200 for prediction. Note that for model with spatial autocorrelation you must provide a global weight matrix of size 1000, ordered by in-sample then out-sample locations.

For GWR and MGWR_1_0_kv, where all coefficients vary in space, the predictions are carried out using the corresponding model estimate with the out-sample location as target points, so the estimated coefficients nor the target points choosen for estimating the model are not used: only the call and the data of the estimated model are used.

For mixed model that have some constant coefficients (MGWR, MGXWR_0_0_kv, MGXWR_1_kc_kv, MGXWR_1_kc_0), the estimated coefficients (only the one of target points) are extrapolated using a weighting matrix. By default, the weighting matrix for prediction is based on the expected weights of outsample data if they were had been added to in-sample data to estimate the corresponding MGWRSAR (see Geniaux 2022, in prep, for further detail). The user can also choose to extrapolate the model coefficients using a shepperd kernel with custom number of neighbors or using the same kernel and bandwidth as the estimated model.

Predictions of GWR model using sheppard kernel with 8 neighbors:

```
length_out=800
index_in=sample(1:1000,length_out)
index_out=(1:1000)[-index_in]
coord_in=coord[index_in,]
data_in=mydata[index_in,]
TP=find_TP(formula = 'Y_gwr~X1+X2+X3', data =data_in,coord=coord_in,K=6,type='residuals')
# only 60 targets points are used
length(TP)
model_GWR_insample<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata[index_in,],coord=coord[index_in,], fixed_vars=NULL,kernels=c('gauss'),H=10, Model = 'GWR',control=list(adaptive=T,TP=TP))
summary_mgwrsar(model_GWR_insample)
newdata=mydata[index_out,]
newdata_coord=coord[index_out,]
newdata$Y_gwr=0
Y_pred=predict_mgwrsar(model_GWR_insample, newdata=newdata, newdata_coord=newdata_coord,method_pred='tWtp_model')
head(Y_pred)
head(mydata$Y_gwr[index_out])
sqrt(mean((mydata$Y_gwr[index_out]-Y_pred)^2)) # RMSE
```

Prediction of *MGWRSAR_1_0_kv* model using sheppard kernel
with 4 neighbors and Best Linear Unbiased Predictor (BLUP) :

```
length_out=800
index_in=sample(1:1000,length_out)
index_out=(1:1000)[-index_in]
### Global Spatial Weight matrix W should be ordered by in_ sample (S) then out_sample
W=kernel_matW(H=4,kernels='rectangle',coord_i=rbind(coord[index_in,],coord[index_out,]),NN=4,adaptive=TRUE,diagnull=TRUE,rowNorm=T)
W_in=kernel_matW(H=4,kernels='rectangle',coord_i=coord[index_in,],NN=4,adaptive=TRUE,diagnull=TRUE,rowNorm=T)
model_MGWRSAR_1_0_kv_insample<-MGWRSAR(formula = 'Y_mgwrsar_1_0_kv~X1+X2+X3', data = mydata[index_in,],coord=coord[index_in,], fixed_vars=NULL,kernels=c('gauss'),H=11, Model = 'MGWRSAR_1_0_kv',control=list(W=W_in,adaptive=TRUE,isgcv=F))
model_MGWRSAR_1_0_kv_insample$RMSE
summary_mgwrsar(model_MGWRSAR_1_0_kv_insample)
## without Best Linear Unbiased Predictor
newdata=mydata[index_out,]
newdata_coord=coord[index_out,]
newdata$Y_mgwrsar_1_0_kv=0
Y_pred=predict_mgwrsar(model_MGWRSAR_1_0_kv_insample, newdata=newdata, newdata_coord=newdata_coord,W=W,type='YTC')
head(Y_pred)
RMSE_YTC=sqrt(mean((mydata$Y_mgwrsar_1_0_kv[index_out]-Y_pred)^2))
RMSE_YTC
## Using Best Linear Unbiased Predictor
Y_pred=predict_mgwrsar(model_MGWRSAR_1_0_kv_insample, newdata=newdata, newdata_coord=newdata_coord,W=W,type='BPN')
head(Y_pred)
RMSE_BPN=sqrt(mean((mydata$Y_mgwrsar_1_0_kv[index_out]-Y_pred)^2))
RMSE_BPN
```

The two approximation methods proposed before can be use to reduce
time for finding the optimal bandwith, and are available in the
*bandwiths_MGWRASR* function. The next example compares the time
for finding an optimal bandwidth for full GWR and for GWR with target
points sample and rough gaussian kernel.

Time for finding the optimal bandwith is divided by around 8, and here again, larger improvements can be achieved when sample size are larger. In this example, the optimal bandwidth is smaller for methods with target points: this result is specific to this simulation and the opposite can be true with different data. If GWR with target points reached lower LOOCV in this example, it can not be compared to the LOOCV of full GWR because computation of Leave-One-Out Cross validation criteria (LOOCV) that are comparable between full GWR and GWR with target point is extremely complex and time consuming, mgwrsar 1.0 use only the LOOCV of target points for choosing the optimal bandwidth. If the user want to compare out-sample accuracy of the two methods, k-fold cross validation can be performed to compare the GWR method with and without target points.

```
ptm1<-proc.time()
mytab_TP_NN<-bandwidths_mgwrsar(formula = 'Y_gwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,Models=c('GWR'),candidates_Kernels=c('gauss'),control=list(TP=TP,NN=300,adaptive=FALSE),control_search=list())
(proc.time()-ptm1)[3]
names(mytab_TP_NN)
names(mytab_TP_NN[['GWR_gauss']])
mytab_TP_NN[['GWR_gauss']]$config_model
ptm1<-proc.time()
mytab<-bandwidths_mgwrsar(formula = 'Y_gwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,Models=c('GWR'),candidates_Kernels=c('gauss'),control=list(adaptive=FALSE),control_search=list())
(proc.time()-ptm1)[3]
names(mytab)
names(mytab[['GWR_gauss']])
mytab[['GWR_gauss']]$config_model
```

```
##### GWR #####
##### gauss adaptive= FALSE #####
########
Search bandwidth stage
########
..............kernel = gauss objective = 0.09682568 minimum = 0.04315604
elapsed
0.704
[1] "GWR_gauss"
[1] "config_model" "CV" "SSR" "model"
[5] "ctime"
$Model
[1] "GWR"
$kernels
[1] "gauss"
$adaptive
[1] FALSE
$H
0.1%
0.04315604
$kernel_w
[1] "rectangle"
$h_w
NULL
##### GWR #####
##### gauss adaptive= FALSE #####
########
Search bandwidth stage
########
.................kernel = gauss objective = 0.08993343 minimum = 0.030763
elapsed
6.972
[1] "GWR_gauss"
[1] "config_model" "CV" "SSR" "model"
[5] "ctime"
$Model
[1] "GWR"
$kernels
[1] "gauss"
$adaptive
[1] FALSE
$H
0.1%
0.030763
$kernel_w
[1] "rectangle"
$h_w
NULL
```

Brunsdon, C., Fotheringham, A., Charlton, M., 1996.
*Geographically weighted regression: a method for exploring spatial
nonstationarity*. Geogr. Anal. 28 (4), 281–298.

Friedman, J. H. 1984. *A variable span smoother*. Laboratory
for Computational Statistics, Department of Statistics, Stanford
University: Technical Report(5).

Geniaux, G. and Martinetti, D. 2017a. *A new method for dealing
simultaneously with spatial autocorrelation and spatial heterogeneity in
regression models*. Regional Science and Urban Economics, 2017

Geniaux, G. and Martinetti, D. 2017b. R package mgwrsar: Functions for computing (mixed) geographycally weighted regression with spatial autocorrelation,. https://CRAN.R-project.org/package=mgwrsar.

Geniaux, G. 2022. Speeding up estimation of spatially varying coefficients models, INRAe working Paper.

Loader, C. 1999. *Local regression and likelihood*, volume 47.
springer New York.

McMillen, D. P. 1996. *One hundred fifty years of land values in
chicago: A nonparametric approach*. Journal of Urban Economics,
40(1):100 – 124.