Chapter 12 Cluster Validation

The journey through the range of clustering methods may come across as a bit bewildering, since the choice of algorithm, tuning parameter settings and other decisions lead to sometimes very different results. These so-called researcher degrees of freedom (Gelman and Loken 2014), considered in the Epilogue of Volume 1 are part and parcel of the unsupervised aspect of spatial data science.

Unless a given classification is known as truth, to which different solutions can be compared, it is near impossible to select a best approach. Different criteria will favor some solutions over others. It is therefore important to assess which criteria are most appropriate in a given empirical situation. In this volume, a lot of attention has focused on the tension between attribute similarity and locational similarity. Depending on the context, one will prevail over the other. Even when internal similarity is the dominant objective, it remains important to put the results in a spatial context, to assess the spatial distributional aspects of the grouping of observations.

In this final chapter, attention shifts to assessing the performance of a given cluster result to a given standard and how to compare results obtained through different methods to each other. This is referred to in the literature as, respectively, external validity and internal validity (Akhanli and Hennig 2020).

External validity is typically used to compare the outcome of various clustering methods to a known truth. Unless one is carrying out an experimental design, this is not really that practical in actual empirical situations. However, it remains a very useful approach when studying the trade-offs created by different tuning parameter and other design choices. An extensive review of external validity indices is contained in Meila (2015). In a recent application, Aydin et al. (2021) computed some thirteen different metrics to compare the performance of six spatially constrained clustering methods in an experimental setting (true cluster categories known).

In an actual empirical application, measures of internal validity of a cluster are much more useful. As mentioned, such measures tend to favor one objective over another, or implement a given compromise between conflicting goals. Typical properties considered are within cluster homogeneity, between cluster separation and stability (for recent overviews of internal validity indices, see Halkidi, Vazirgiannis, and Hennig 2015; Akhanli and Hennig 2020). Such measures are then often included in a search for the optimal number of clusters, or \(k\).

In this chapter, an overview is presented of a range of validity measures, but in the limited context where the value of \(k\) is taken as given. This is illustrated for the various results obtained in previous chapters using the Ceára Zika sample data set. For consistency, \(k\) was always set to 12.

In addition to classic indicators from the literature, some novel spatial aspects are introduced as well. This includes a join count ratio to assess cluster contiguity structure, and a cluster match map to compare the spatial alignment of observations between clusters.

The chapter closes with some brief concluding remarks.