Chapter 10 Spatially Constrained Clustering - Hierarchical Methods

In this chapter, the focus shifts to methods that impose contiguity as a hard constraint in a clustering procedure. Such methods are known under a number of different terms, including zonation, districting, regionalization, spatially constrained clustering, the p-region problem and the p-compact regions problem. They are concerned with dividing an original set of n spatial units into p internally connected regions that maximize within similarity (for recent reviews, see, e.g., Murray and Grubesic 2002; Duque, Ramos, and Suriñach 2007; Duque, Church, and Middleton 2011; Li, Church, and Goodchild 2014).⁴⁵

In the previous chapter, approaches were considered that impose a soft spatial constraint, in the form of a trade-off between attribute similarity and spatial similarity. In the methods considered in the current and next chapter, the contiguity is a strict constraint, in that clusters can only consist of entities that are geographically connected. As a result, in some cases the resulting attribute similarity can be of poor quality, when dissimilar units are grouped together primarily due to the contiguity constraint.

Three sets of methods are considered, all based on the principle of hierarchical clustering.

The first approach employs agglomerative hierarchical clustering, to which a spatial contiguity constraint is introduced. Early descriptions of the idea of spatially constrained hierarchical clustering can be found in Murtagh (1985) and Gordon (1996), among others. Next, an algorithm is considered that takes a divisive hierarchical approach. The SKATER algorithm (Assunção et al. 2006; Teixeira, Assunção, and Loschi 2015; Aydin et al. 2018), or, Spatial `K’luster Analysis by Tree Edge Removal, obtains regionalization through a graph partitioning approach. Finally, a collection of methods that combine aspects of both agglomerative and divisive clustering is outlined, referred to as REDCAP (Guo 2008; Guo and Wang 2011). The acronym stands for REgionalization with Dynamically Constrained Agglomerative clustering and Partitioning, and refers to a family of six (later extended to eight) hierarchical regionalization methods.

As before, the methods considered here share many of the same options with previously discussed techniques covered in earlier chapters. Common options will not be considered, but the focus will be on aspects that are specific to the spatially constrained methods.

The Ceará Zika sample data set is again used to illustrate the methods, with the same variables as in the previous chapter.

The terminology is again a bit confusing, since in this literature \(p\) is often used for the number of clusters, instead of \(k\), standard for classic clustering techniques. In this chapter, both are employed. The context will make clear how many clusters are considered.↩︎