1.1 Overview of Volume 1

The first volume is organized into four main Parts and an Epilogue, offering a progression from basic data manipulation, through description and exploration, to the identification of clusters and outliers by means of spatial autocorrelation analysis. It closes with some reflections on the limits of exploration and its role in scientific discovery. As mentioned, spatial clustering methods are covered in Volume 2.

The five parts are:

  • Spatial data wrangling

  • EDA and ESDA

  • Spatial weights

  • Global spatial autocorrelation

  • Local spatial autocorrelation

  • Epilogue

The first part deals with basic data operations for both tabular and spatial data, covered in two chapters. The material includes a review of the distinctive characteristics of spatial data, how to create spatial layers inside GeoDa, as well as essential transformations and data queries. There is also a rudimentary discussion of a range of basic GIS operations, such as projections, converting between points and polygons, and spatial joins. Even though GeoDa is not (and not intended to be) a GIS, this functionality has been included over the years in response to user demand.

Part II covers the principles behind exploratory data analysis (EDA) and its spatial counterpart, exploratory spatial data analysis (ESDA). This includes six chapters. Three of these are devoted to map use in various degrees of complexity, starting with basic mapping concepts, and moving to statistical maps and maps for rates. The other three chapters deal with conventional (non-spatial) EDA, in the form of univariate and bivariate data exploration, multivariate data exploration, and space-time exploration. The core idea here is to leverage linking and brushing between various graphical representations (views of the data), which is central to the architecture of GeoDa.

The remaining three main parts deal with the topic of spatial autocorrelation. First, in Part III, three chapters are devoted to spatial weights, both contiguity-based as well as distance-based spatial weights, and various spatial weights operations. These are essential pre-requisites for the computation of the global and local spatial autocorrelation indices covered in Parts IV and V.

Part IV contains three chapters on global spatial autocorrelation, centered around the Moran scatter plot as a visualization device. The basic concepts are covered, as well as more advanced applications and extensions to a bivariate setting. A third chapter provides an overview of some non-parametric techniques, such as a spatial correlogram.

Part V includes an in-depth treatment of local spatial autocorrelation, spread over five chapters. It starts with the introduction of the concept of a LISA and the Local Moran statistic. A second chapter deals with other local spatial autocorrelation statistics, such as the Local Geary and the Getis-Ord statistics. The next two chapters outline extensions to the multivariate domain and to discrete variables. These chapters contain material that was only fairly recently developed. The last chapter of Part V reviews density based clustering methods applied to point locations, such as DBScan and HDBScan.

The Epilogue offers some thoughts on the limits of the exploratory perspective. This includes an assessment of the role of data exploration in aiding with scientific discovery and scientific reasoning, the limits of spatial analysis, and reproducibility in the exploratory framework as implemented in the GeoDa software.

An Appendix includes detailed preference settings for the software and an outline of the complete menu structure. To close, a brief discussion is offered of the new scripting possibilities through the geodalib library.

The division of the material in two volumes follows my own teaching practice. The first volume corresponds to what I cover in an Introduction to Spatial Data Science course, whereas the second volume matches the content of a Spatial Cluster Analysis course. The volumes are also designed to constitute a self-study guide. In fact, a previous version was used as such for remote teaching during the Covid pandemic (in the form of laboratory workbooks, available at https://geodacenter.github.io/documentation.html).

In addition to the material covered in the two volumes, the GeoDaCenter Github site (https://geodacenter.github.io) contains an extensive support infrastructure. This includes detailed documentation and illustrations, as well as a large collection of sample data sets, cookbook examples and links to a YouTube channel containing lectures and tutorials. Specific software support is provided by means of a list of frequently asked questions and answers to common technical questions, as well as by the community through the Google Groups Openspace list.