Chapter 9 Spatializing Classic Clustering Methods

In Part III, the focus shifts to how spatial aspects of the data can be included explicitly into cluster analysis. Foremost among these aspects are location and contiguity, as a way to formalize locational similarity. The treatment in the current chapter is largely pedagogical, aimed at illustrating the tension between attribute similarity and locational similarity. This is carried out through methods that spatialize classic cluster techniques. The clustering methods are the same as covered in the preceding chapters, i.e., Hierarchical Clustering, K-Means, K-Medians, K-Medoids, and Spectral Clustering. They all are based on non-spatial considerations to group the data. In this chapter, the spatial dimension is introduced in four different ways.

First, classic methods can be applied to geographical coordinates to create regions that are purely based on location in geographical space. This does not consider other attributes. An illustration of this approach was already given in the discussion of K-Medoids, in Chapter 7.

The next two sets of methods attempt to construct a compromise between attribute similarity and locational similarity, with different degrees of forcing a hard contiguity constraint. Both approaches use standard clustering techniques. In one, the feature set (i.e., the variables under consideration) is expanded with the coordinates of the location. This provides some weight to the locational pull (spatial compactness) of the observations, although this is by no means binding. In the other approach, the problem is turned into a form of multi-objective optimization, where both the objective of attribute similarity and the objective of geographical similarity (co-location) are weighted so that a compromise between the two can be evaluated.

A final method aims to turn the non-spatial solution of a classic clustering method into a set of spatially contiguous groupings, i.e., where the members of each cluster are also spatially connected. The heuristic outlined is primarily intended to illustrate the trade-offs between attribute and locational similarity. It is usually less than optimal compared to the spatially explicit methods covered in the next two chapters.

The spatial clustering methods are illustrated by means of the Ceará Zika sample set with 184 municipal entities. The specific variables are the five urban dimensions included in the Brazilian index of urban structure (IBEU): mobility, environmental conditions, housing conditions, sanitation and infrastructure. The indices are composites of other variables and are scaled between zero and one (larger values indicating better conditions). Following the analysis in Amaral et al. (2019), GDP per capita is included as well. These indicators are illustrative of the type of variables typically used in a regional clustering exercise.