C Appendix C – Scripting with GeoDa via the geodalib Library

The architecture of the GeoDa software is based on a very tight integration of the graphical user interface and the actual computations. While effective in a traditional desktop environment, this approach becomes less efficient when moving the functionality to a different computing platform, such as a browser in a web-GIS, or a cyberGIS environment (Anselin, Kim, and Syabri 2004; Wang et al. 2013).

An alternative to the GUI-driven approach towards spatial data science taken in desktop GeoDa is to leverage open source software development environments, such as R (Pebesma and Bivand 2023) or Python (Rey, Arribas-Bel, and Wolf 2023). Such environments facilitate scripting and stress reproducibility, which has become of increased relevance and importance in spatial data science (see the discussion in Chapter 21).

The GeoDa desktop environment does not lend itself to scripting. It is also ill-suited for repeated execution of the same application, such as in a simulation experiment. Finally, apart from the limited record in the project file, there is no explicit way to ensure reproducibility.

In light of these limitations of the original design, a major refactoring effort was embarked upon to separate the user interaction in the software from the core computational functionality, and to collect the latter in a library, named libgeoda. The library contains the same C++ code as in the computations underlying desktop GeoDa, but has a more limited range of functionality. The focus has been on methods that are (still) unique to GeoDa, such as some of the recent LISA statistics. In addition, applications are included where the reliance on C++ yields large performance improvements in terms of speed and scalability. Examples include weights creation, and permutation tests from this Volume, as well as regionalization methods covered in Volume 2.

The libgeoda library has a clearly defined Application Programming Interface (API), which allows other C++ code to access its functionality directly. In fact, this is what currently happens under the hood for part of desktop GeoDa, and in the experimental web-GeoDa (jsgeoda, implemented through javascript). In addition to achieving a more flexible interaction with different graphical user interface implementations, the API also allows other software, such as R or Python programs to access the functionality through well-defined wrapper code. The overall architecture is illustrated in Figure C.1.

Figure C.1: libgeoda Architecture

The primary focus in this effort so far has been to create an R package, rgeoda, and a Python module, pygeoda. These provide easy access to the functionality in libgeoda through a native interface and designated middleware. The interaction between R and Python and the C++ library is implemented under the hood, so that from a user’s perspective, everything works natively as in any other R package or Python module.

As shown in Figure C.1, the core of the libgeoda library consists of three broad categories of functionality: spatial weights, LISA, and spatial clustering (regionalization). In addition, there are a number of helper functions, such as support for different map classifications (to facilitate visualization) and variable standardization (for use in the cluster routines). The functionality in pygeoda and rgeoda is the same, with only minor differences to reflect the particular characteristics of each software environment. For example, in Python, methods are attributes of a class (e.g., a spatial weights class) and invoked as such. In contrast, in R, the typical approach is to apply a function to an object to extract the relevant information (e.g., spatial weights characteristics).

Extensive details and specific examples can be found in Anselin, Li, and Koschinsky (2022) and in the documentation on the GitHub site. Since the functionality of libgeoda mimics what has been covered in the book, it is not further considered here. The software development effort is ongoing.