Chapter 2 Basic Data Operations

In this and the following chapter, I introduce the topic of data wrangling, i.e., the process of getting data from its raw input into a form that is amenable for analysis. This is often considered to be the most time consuming part of a data science project, taking as much as 80% of the effort (Dasu and Johnson 2003). Even though the focus in this book is on analysis and not on data manipulation per se, I provide a quick overview of the functionality contained in GeoDa to assist with these operations. Increasingly, data wrangling has evolved into a field of its own, with a growing number of operations turning into automatic procedures embedded into software (Rattenbury et al. 2017). A detailed discussion of this topic is beyond the scope of the book.

The coverage in this chapter is aimed at novices who are not very familiar with spatial data manipulations. Most of the features illustrated can be readily accomplished by means of dedicated GIS software or by exploiting the spatial data functionality available in the R and Python worlds. Readers knowledgeable in such operations may want to just skim the materials in order to become familiar with the way they are implemented in GeoDa. Alternatively, these operations can be performed outside GeoDa, with the end result loaded as a spatial data layer.

In the current chapter, I focus on essential input operations and data manipulations contained in the Table functionality. In the next chapter, I consider a range of basic GIS operations pertaining to spatial data wrangling.

To illustrate these features, I will use a data set with point locations of car jackings in Chicago in 2020. The Chicago Carjackings data layer is available from the Sample Data tab in the GeoDa file dialog (Figure 2.2).

In addition, in order to replicate the detailed steps used in the illustrations, three original input files are needed as well. These are available from the GeoDaCenter sample data site. They include a simple outline of the community areas, Chicago_community_areas.shp, as well as comma delimited (csv) text files with the socio-economic characteristics (Chicago_CCA_Profiles.csv), and the coordinates of the car jackings (Chicago_2020_carjackings.csv). The sample data site also contains the detailed listing of the variable names.