11.2 Distance metrics

Pairs of observations are considered to be neighbors based on how close they are to each other. To make this precise, there is a need for a formal definition of distance. Two important concepts are considered: distance between points in a Cartesian coordinate system, and distance between points located on a sphere. This can be further generalized to distance between points in multivariate attribute space, i.e., to a non-geographical notion of distance.

11.2.1 Distance in a Cartesian coordinate system

A general concept for the distance between two points \(i\) and \(j\), with respective coordinates \((x_{i},y_{i})\) and \((x_{j},y_{j})\) in a Cartesian coordinate system, is the so-called Minkowski metric: \[ d_{ij}^{p} = \left( | x_{i} - x_{j} |^{p} + | y_{i} - y_{j} |^{p} \right)^{(1/p)}, \] with \(p\) as a general exponent. The Minkowski metric itself is not often used in spatial analysis, but there are two special cases of great interest.

Arguably the most familiar case is for \(p=2\), i.e., Euclidean or straight line distance, \(d_{ij}\), as the crow flies:⁷³ \[ d_{ij} = \left( | x_{i} - x_{j} |^{2} + | y_{i} - y_{j} |^{2} \right)^{(1/2)}, \] or, in its more familiar form: \[ d_{ij} = \sqrt{(x_{i} - x_{j})^{2} + (y_{i} - y_{j})^{2}}. \]

An alternative to Euclidean distance that is sometimes preferred because it lessens the influence of outliers is the so-called Manhattan block distance. This metric is obtained by setting \(p=1\) in the Minkowski expression. This notion only considers movement along the east-west and north-south directions, as in the city blocks of Manhattan. Formally, it is expressed as: \[ d_{ij}^m = | x_{i} - x_{j}| + | y_{i} - y_{j}|. \]

11.2.2 Great circle distance

Euclidean inter-point distances are only meaningful when the coordinates are recorded on a plane with a Cartesian coordinate system. This implies that any point layer must be projected for Euclidean distances to be appropriate.

In practice, one often works with unprojected points, expressed as decimal degrees of latitude and longitude. In this case, using a straight line distance measure is inappropriate, since it ignores the curvature of the earth. This is especially the case for longer distances, such as when observations span a continent. Also, decimal degrees are not Cartesian coordinates, and consequently a notion of distance between degrees is meaningless.

The proper distance measure in this case is the so-called arc distance or great circle distance. This approach takes the latitude and longitude in decimal degrees as input into a conversion formula.⁷⁴ Decimal degrees are obtained from the degree-minute-second value as degrees + minutes/60 + seconds/3600.

The latitude and longitude in decimal degrees are converted into radians as: \[\begin{eqnarray*} \mbox{Lat}_r &=& (\mbox{Lat}_d - 90) * \pi/180\\ \mbox{Lon}_r &=& \mbox{Lon}_d * \pi/180, \end{eqnarray*}\] where the subscripts \(d\) and \(r\) refer respectively to decimal degrees and radians, and \(\pi = 3.14159 \dots\). With \(\Delta \mbox{Lon} = \mbox{Lon}_{r(j)} - \mbox{Lon}_{r(i)}\), the expression for the arc distance is: \[\begin{eqnarray*} d_{ij} &=& \mbox{R} * \arccos [ \cos ( \Delta \mbox{Lon} ) * \sin \mbox{Lat}_{r(i)} * \sin \mbox{Lat}_{r(j)} )\\ &&+ \cos \mbox{Lat}_{r(i)} * \cos \mbox{Lat}_{r(j)} ], \end{eqnarray*}\] or, equivalently: \[\begin{eqnarray*} d_{ij} &=& \mbox{R} * \arccos [ \cos ( \Delta \mbox{Lon} ) * \cos \mbox{Lat}_{r(i)} * \cos \mbox{Lat}_{r(j)} )\\ &&+ \sin \mbox{Lat}_{r(i)} * \sin \mbox{Lat}_{r(j)} ], \end{eqnarray*}\] where R is the radius of the earth. In GeoDa, the arc distance is obtained in miles with R = 3959, and in kilometers with R = 6371.

However, it should be noted that these calculated distance values are only approximate, since the radius of the earth is taken at the equator. A more precise measure would take into account the actual latitude at which the distance is measured. In addition, the earth’s shape is much more complex than a sphere. In most instances in practice, the approximation works fine.

11.2.3 General distance

The distance metric used to construct weights as discussed in Section 11.3 can be readily extended to a notion of general distance, familiar in regional science theory (Isard 1969). In addition, multidimensional scaling can be applied to create coordinates in a multivariate variable space (considered in Part II of Volume 2). These artificial locations can then be used to derive spatial weights based on the distance between them.

In the empirical literature, such weights are often referred to as economic weights, based on on a notion of economic distance. Simply put, the distance between observation \(i\) and \(j\) expressed in terms of an economic (or social) variable \(z\) is: \[ d_{ij} = | z_{i} - z_{j} |^{\alpha}, \]

where \(\alpha\) is a scaling factor, often simply set equal to one.

A general distance measure readily serves as a proxy for the extent and strength of interactions between locations (e.g., a social network among neighborhoods) by extending the one-dimensional approach to a multivariate setting. Such a distance measure includes the difference between two observations according to several dimensions, summarized as a single value. Formally, with \(h = 1, \dots, H\) as the number of variables \(z_{h}\) considered to characterize the socio-economic makeup of each observation, the Euclidean distance between \(i\) and \(j\) is: \[ d_{ij} = \sqrt{\sum_{h=1}^{H} (z_{ih} - z_{jh})^{2}}. \]

This approach is often used in economic applications.

An alternative way to derive distance based weights in a multivariate setting is to employ multidimensional scaling, or MDS (see Volume 2). This yields a point map of locations in two-dimensional space, where the location and configuration of the points approximates their distance relationship in multivariate dimensions. Loosely put, the two dimensions in the MDS map correspond to the largest eigenvalues and matching eigenvectors of the correlation matrix for the variables considered. This yields a best fit solution in the sense that the relative distances between the points on the MDS map are the closest to the actual relative distances between observations. The distances in MDS space can then be used to define neighbors in multivariate attribute space.

Since \(|x_i - x_j |^2\) is equivalent to \((x_i - x_j )^2\), the latter expression is typically used for Euclidean distance.↩︎
The latitude is the \(y\) dimension, and the longitude the \(x\) dimension, so that the traditional reference to the pair (lat, lon) actually pertains to the coordinates as (y,x) and not (x,y).↩︎