An Introduction to Spatial Data Science with GeoDa
Volume 1: Exploring Spatial Data
This two-volume set is the long overdue successor to the GeoDa Workbook that I wrote almost twenty years ago (Anselin 2005a). It was intended to facilitate instruction in spatial analysis and spatial regression by means of the
GeoDa software (Anselin, Syabri, and Kho 2006). In spite of its age, the workbook is still widely used and much cited, but it is due for a major update.
The update is two-fold. On the one hand, many new methods have been developed or original measures refined. This pertains not only to the spatial autocorrelation indices covered in the original Workbook, but also to a collection of newer methods that have become to define spatial data science. Secondly, the
GeoDa software has seen substantial changes to become an open source and cross-platform ecosystem that encompasses a much wider range of methods than its legacy predecessor.
The two volumes outline my vision for an Introduction to Spatial Data Science. They include a collection of methods that I view as the core of what is special about spatial data science, as distinct from applying data science to spatial data. They are not intended to be a comprehensive overview, but constitute my personal selection of materials that I see as central to promoting spatial thinking through teaching spatial data science.
The level in the current volume is introductory, aimed at my typical audience, which is largely composed of researchers and students (both undergraduate and graduate) who have not been exposed to any geographic or spatial concepts, or have only limited familiarity with the subject. So, by design, some of the treatment is rudimentary, covering basic concepts in GIS and spatial data manipulation, as well as elementary statistical graphs. I have included this material to keep the books accessible to a larger audience. Readers already familiar with these topics can easily skip to the core techniques.
I believe the two volumes offer a unique perspective, in that they approach the identification of spatial patterns from a number of different standpoints. The first volume includes an in-depth treatment of local indicators of spatial association, whereas Volume 2 focuses on spatial clustering techniques. A main objective is to indicate where a spatial perspective contributes to the broader field of data science and what is unique about it. In addition, the aim is to create an intuition for the type of method that should be applied in different empirical situations. In that sense, the volumes serve both as the complete user guide to the
GeoDa software and as a primer on spatial data science. However, in contrast with the original Workbook, spatial regression methods are not included. Those are covered in Anselin and Rey (2014), and not discussed here.
Most methods contained in the two volumes are treated in more technical detail in the various references provided. With respect to my own work, these include Anselin(1994, 1995, 1996, 1998, 1999, 2005b), Anselin, Syabri, and Smirnov (2002), Anselin, Kim, and Syabri (2004), Anselin, Syabri, and Kho (2006), and, more recently, Anselin (2019a, 2019b, 2020), Anselin and Li (2019, 2020), and Anselin, Li, and Koschinsky (2022). However, a few methods are new, and have not been reported elsewhere, or are discussed here in greater depth than previously appeared. In this volume, these include the co-location map and the local neighbor match test.
The methods are illustrated with a completely new collection of seven sample data sets that deal with topics ranging from crime, socio-economic determinants of health, and disease spread, to poverty, food insecurity and bank performance. The data pertain not only to the U.S. (Chicago), but also include municipalities in Brazil (the State of Ceará) and in Mexico (the State of Oaxaca), and community banks in Italy. Many of these data sets were used in previous empirical analyses. They are included as built-in Sample Data in the latest version of the
The empirical illustrations are based on Version 1.22 of the software, available in Summer 2023. Later versions may include slight changes as well as additional features, but the treatment provided here should remain valid. The software is free, cross-platform and open source, and can be downloaded from https://geodacenter.github.io/download.html.