Chapter 20 Density-Based Clustering Methods

In this last chapter dealing with local patterns, density-based clustering methods are considered. These approaches search the data for high density subregions of arbitrary shape, separated by low-density regions. Alternatively, the elevated regions can be interpreted as modes in the spatial distribution over the support of the observations.

The methods covered form a transition between the local spatial autocorrelation statistics and the regionalization methods considered in Volume 2. They pertain primarily to point patterns, but can also be extended to a full multivariate setting. At first sight, density-based clustering methods may seem similar to spatially constrained clustering, but they are not quite the same. In contrast to the regionalization methods considered in Volume 2, the result of density based clustering does not necessarily yield a complete partitioning of the data. In a sense, the density-based cluster methods are thus similar in spirit to the identification of clusters by means of local spatial autocorrelation statistics, although they are not formulated as hypothesis tests. Therefore, these methods are included in the discussion of local spatial autocorrelation, rather than with the regionalization methods considered in Volume 2.

Attempts to discover high density regions in the data distribution go back to the classic paper on mode analysis by Wishart (1969), and its refinement in Hartigan (1975). In the literature, these methods are also referred to as bump hunting, i.e., looking for bumps (high regions) in the data distribution.

In this chapter, the focus is on the application of density-based clustering methods to the geographic location of points, but the methods can be generalized to locations in high-dimensional attribute space as well.

Four approaches are considered. First is a simple heat map as a uniform density kernel centered on each location. The logic behind this graph is similar to that of Openshaw’s Geographical Analysis Machine (Openshaw et al. 1987) and the approach taken in spatial scan statistics (Kulldorff 1997), i.e., a simple count of the points within the given radius. This is also the main idea behind the Getis-Ord local statistics considered in Chapter 17.

The remaining methods are all related to DBSCAN (Ester et al. 1996), i.e., Density Based Spatial Clustering of Applications with Noise. Both the original DBSCAN is outlined, as well as its improved version, referred to as DBSCAN*, and its Hierarchical version, referred to as HDBSCAN, or, sometimes, HDBSCAN* (Campello, Moulavi, and Sander 2013; Campello et al. 2015).

The methods are illustrated with the Italy Community Banks sample data set that contains the locations of 261 community banks.