1.3 Sample Data Sets
As mentioned in the Preface, the methods and software are illustrated by means
of empirical examples that use seven new sample data sets. They are available directly
from inside the GeoDa
software through the Sample Data tab of the input/output interface (see Figure 2.2).
The specific data sets are:
- Chicago Carjackings (n=1,412)
- Ceará Zika, municipalities in the State of Ceará, Brazil (n=184)
- Zika and Microcephaly infections and socio-economic profiles for 2013-2016 (adapted from Amaral et al. 2019)
- see Chapters 4-6, 10, and Part III of Volume 2 (Spatial Clustering)
- Oaxaca Development, municipalities in the State of Oaxaca, Mexico (n=570)
- Italy Community Banks (n=261)
- bank performance indicators for 2011-17 (used by Algeri et al. 2022)
- see Chapters 11-12, 15, and 20, as well as in Part I of Volume 2 (Dimension Reduction)
- Chicago Community Areas, CCA Profiles (n=77)
- socio-economic snapshot for Chicago Community Areas in 2020 (American Community Survey from the Chicago Metropolitan Agency for Planning - CMAP - data portal)
- see Chapters 13 and Chapter 5 of Volume 2 (Hierarchical Clustering Methods)
- Chicago SDOH, census Tracts (n=791)
- socio-economic determinants of health in 2014 (a subset of the data used in Kolak et al. 2020)
- see Chapters 18-19, and Chapters 6 and 7 of Volume 2 (Partioning Clustering Methods and Advanced Clustering Methods)
- Spirals (n=300)
- canonical data set to test spectral clustering
- only used in Volume 2 (Chapter 8, Spectral Clustering)
In addition, a few auxiliary files are employed to illustrate basic data handling operations in Chapters 2 and 3, such as a boundary layer for Chicago community areas, and input data files in comma separated text format. These files are available from the GeoDaCenter sample data site at https://geodacenter.github.io/data-and-lab/.
Further details are provided in the context of specific methods.