13.4 Spatial Autocorrelation Statistic
Relying on identifying spatial structure in the data by eye is bound to lead to spurious results, especially since the human brain is wired to find patterns. Instead, a spatial autocorrelation statistic is used.
A statistic is a summary measure computed from the data that allows one to assess the extent to which a given null hypothesis holds. Typically, this is accomplished by deriving the distribution of the statistic if the null hypothesis were true. If the observed value of the statistic is extreme relative to this so-called null distribution (i.e., in the extreme tails of the distribution), the null is rejected.
In classical statistics (as opposed to a Bayesian perspective), a fundamental notion in this regard is the Type I error, or the probability that the null is rejected, when in fact it is true. In other words, this is the chance one takes to make the wrong decision. A critical value of the Type I error is the so-called significance or p-value, a target cut-off point. If the probability that the value of the statistic (as computed from the data) follows the null distribution is less than the p-value, one typically rejects the null hypothesis. In this context, the choice of the p-value becomes crucial. Customary, this has often been set to 0.05, but recently there has been a lot of discussion around this choice in the context of replicability of scientific experiments. The current consensus is that a p-value should be much smaller than 0.05, or even be replaced by alternative concepts (Efron and Hastie 2016; Benjamin and 72 others 2018). The choice of a suitable p-value will be revisited in the discussion of inference for local indicators of spatial association (LISA, Chapter 16).
Spatial autocorrelation is about the coincidence of attribute similarity (how attribute values are alike) with locational similarity (how close observations are to each other). One aspect of the spatial case that is distinct from a standard correlation coefficient is that the attribute similarity pertains to the same variable at different locations, hence auto correlation. More precisely, it is a univariate measure, in contrast to the bivariate correlation coefficient.
A spatial autocorrelation statistic is then a summary computed from the data that combines these two notions. The objective is to assess the observed value of the statistic relative to the distribution of values it would take on under the null hypothesis of spatial randomness.
Formally, such a statistic can be expressed in cross-product form as: \[SA = \sum_i \sum_j f(x_i,x_j).l(i,j),\] where the sum is over all pairs of observations, \(x_i\) and \(x_j\) are observations on the variable \(x\) at locations \(i\) and \(j\), \(f(x_i,x_j)\) is a measure of attribute similarity, and \(l(i,j)\) is a measure of locational similarity between \(i\) and \(j\).
Different choices for the functions \(f\) and \(l\) lead to specific spatial autocorrelation statistics. A few common options are briefly reviewed next.
13.4.1 Attribute similarity
The measure of attribute similarity \(f(x_i,x_j)\) is a summary of the similarity or dissimilarity over all pairs of observations. Typical choices are the cross product, \(x_i.x_j\), the squared difference \((x_i - x_j)^2\), and the absolute difference, \(| x_i - x_j |\). The cross product focuses on similarity, with extreme large or extreme small values indicating greater similarity (i.e., the product of adjoining large values, or the product of adjoining small values). In contrast, the squared and absolute differences are measures of dissimilarity, with smaller values indicating more alike neighbors.
13.4.2 Locational similarity
There are two main ways to incorporate locational similarity in spatial autocorrelation statistics. One approach uses the spatial weights matrix, discussed at length in Chapters 10 through 12. The corresponding spatial autocorrelation statistics take the generic form: \[SA = \sum_i \sum_j f(x_i,x_j).w_{ij},\] i.e., the sum over all neighbors (i.e., pairs where \(w_{ij} \neq 0\)) of a measure of attribute similarity between them. This approach is most appropriate when the observations pertain to a set of discrete locations, a so-called lattice data structure.
A different perspective is taken when the observations are viewed as a sample from a continuous surface, although this approach can also be applied to discrete locations. Instead of subsuming the neighbor relations in the spatial autocorrelation statistic, a measure of attribute similarity is expressed as a function of the distance that separates pairs of observations: \[f(x_i,x_j) = g(d_{ij}),\] where \(g\) is a function of distance. The function can take on a specific form, as in a variogram or semi-variogram, or be left unspecified, in a non-parametric approach. This form of modeling is representative of geostatistical analysis, which concerns itself with modeling and interpolation on spatial surfaces (Isaaks and Srivastava 1989; Cressie 1993; Chilès and Delfiner 1999; Stein 1999). The discussion of this approach is deferred to Chapter 15.
13.4.3 Examples of spatial autocorrelation statistics
Each combination of a measure of attribute similarity with a spatial weights matrix yields a different spatial autocorrelation statistic.
An early example of a very generic spatial autocorrelation statistic is the Mantel test (Mantel 1967), popularized in spatial analysis through the work of Hubert and Golledge (Hubert, Golledge, and Costanzo 1981; Hubert et al. 1985). It combines the elements of a pairwise (dis)similarity matrix \(\mathbf{A}\) with a spatial weights matrix \(\mathbf{W}\), yielding the Gamma statistic: \[\Gamma = \sum_i \sum_j \mathbf{A}_{ij}.\mathbf{W}_{ij}.\] Some commonly used statistics can be viewed as special cases of this statistic, using a particular expression for \(\mathbf{A}_{ij}\).
13.4.3.1 Cross-product
Arguably the most commonly used spatial autocorrelation statistic is Moran’s I (Moran 1948), based on a pairwise cross-product as the measure of similarity. Moran’s I is typically expressed in deviations from the mean, using \(z_i = x_i - \bar{x}\), with \(\bar{x}\) as the mean. The full expression is: \[\begin{equation} I = \frac{\sum_i \sum_j z_i.z_j.w_{ij} / S_0}{\sum_i z_i^2 / n}, \tag{13.1} \end{equation}\] with \(S_0 = \sum_i \sum_j w_{ij}\) as the sum of the weights, and the other notation as before.
This statistic is examined more closely in Section 13.5.1.
13.4.3.2 Squared difference
Whereas squared difference is most commonly used as an attribute dissimilarity measure in geostatistics, it is combined with spatial weights in Geary’s c statistic (Geary 1954): \[c = \frac{\sum_i \sum_j (x_i - x_j)^2 w_{ij} / 2S_0}{\sum z_i^2 / (n-1)}.\] Since the numerator consists of a squared difference, it is not necessary to take the deviations from the mean (the two means would cancel each other out in the difference operation).
In contrast to Moran’s I, which takes values between roughly (not exactly) -1 and +1, Geary’s c is always positive, with values between 0 and 2. The theoretical mean for Geary’s c is 1. Because Geary’s c is based on a dissimilarity measure, positive spatial autocorrelation corresponds with small values for the statistic (less than 1), and negative spatial autocorrelation with large values (larger than 1).
13.4.3.3 Absolute difference
Although used less frequently in practice, attribute dissimilarity can also be based on absolute difference. This yields a slightly more robust measure in the sense that the influence of outliers is lessened. An early discussion of this criterion was contained in Sokal (1979). It will not be considered further.