1.3 Sample Data Sets

As mentioned in the Preface, the methods and software are illustrated by means of empirical examples that use seven new sample data sets. They are available directly from inside the GeoDa software through the Sample Data tab of the input/output interface (see Figure 2.2).

The specific data sets are:

Chicago Carjackings (n=1,412)
- point locations of carjackings in 2020 (Chicago Open Data Portal)
- see Chapters 2 and 3
Ceará Zika, municipalities in the State of Ceará, Brazil (n=184)
- Zika and Microcephaly infections and socio-economic profiles for 2013-2016 (adapted from Amaral et al. 2019)
- see Chapters 4-6, 10, and Part III of Volume 2 (Spatial Clustering)
Oaxaca Development, municipalities in the State of Oaxaca, Mexico (n=570)
- poverty and food insecurity indicators and census variables for 2010 and 2020 (CONEVAL and INEGI) (based on the same original sources as Farah Rivadeneyra 2017)
- see Chapters 7-9, 12, 14, and 16-17
Italy Community Banks (n=261)
- bank performance indicators for 2011-17 (used by Algeri et al. 2022)
- see Chapters 11-12, 15, and 20, as well as in Part I of Volume 2 (Dimension Reduction)
Chicago Community Areas, CCA Profiles (n=77)
- socio-economic snapshot for Chicago Community Areas in 2020 (American Community Survey from the Chicago Metropolitan Agency for Planning - CMAP - data portal)
- see Chapters 13 and Chapter 5 of Volume 2 (Hierarchical Clustering Methods)
Chicago SDOH, census Tracts (n=791)
- socio-economic determinants of health in 2014 (a subset of the data used in Kolak et al. 2020)
- see Chapters 18-19, and Chapters 6 and 7 of Volume 2 (Partioning Clustering Methods and Advanced Clustering Methods)
Spirals (n=300)
- canonical data set to test spectral clustering
- only used in Volume 2 (Chapter 8, Spectral Clustering)

In addition, a few auxiliary files are employed to illustrate basic data handling operations in Chapters 2 and 3, such as a boundary layer for Chicago community areas, and input data files in comma separated text format. These files are available from the GeoDaCenter sample data site at https://geodacenter.github.io/data-and-lab/.

Further details are provided in the context of specific methods.