## 8.4 Conditional Plots

Conditional plots, also known as small multiples (Tufte 1983), Trellis graphs (Becker, Cleveland, and Shyu 1996), or facet graphs (Wickham 2016), provide a means to assess interactions among up to four variables. Multiple graphs are constructed for different subsets of the observations, obtained as a result of *conditioning* on the value of one or two variables,
different from the variable(s) of interest. The graphs can be any kind of statistical graph, such as a histogram, box plot, or scatter plot. The same
principle can be applied to choropleth maps, resulting in so-called *micromaps*
(Carr and Pickle 2010).

With one conditioning variable, the observations are grouped into subsets according to the values taken by the conditioning variable, organized along the x-axis from low to high, e.g., below or above the median. For each of these subsets, a separate graph or map is drawn for the variable(s) of interest. The same principle is applied when there are two conditioning variables, resulting in a matrix of graphs, each corresponding to a subset of the observations that fall in the specified intervals for the conditioning variables on the x and y-axis. Of course, for this to be meaningful, one has to make sure that each of the subsets contains a sufficient number of observations.

The point of departure in the conditional plots is that the subsetting of observations should have no impact on the statistic in question. In other words, all the graphs or maps should look more or less the same, irrespective of the subset. Systematic variation across subsets indicates the presence of heterogeneity, either in the form of structural breaks (discrete categories), or suggesting an interaction effect between the conditioning variable(s) and the statistic under scrutiny.

In `GeoDa`

, conditional graphs are implemented for the histogram, box plot, scatter plot and thematic map.

### 8.4.1 Implementation

A conditional plot is invoked from the menu by selecting **Explore > Conditional Plot**.
This brings up a list of four options: **Map**, **Histogram**, **Scatter Plot**, and
**Box Plot**. The same list of four options is also created after selecting the
**Conditional Plot** icon, the right-most in the multivariate EDA subset in the
toolbar shown in Figure 8.1. In addition, the conditional map can
also be started as the third item from the bottom in the **Map** menu.

Next follows a dialog to select the conditioning variables and the variable(s) of interest.
The conditioning variables are referred to as **Horizontal Cells** for the x-axis, and
**Vertical Cells** for the y-axis. They do not need to be chosen both. Conditioning can also be
carried out for a single conditioning variable, on either the horizontal or on the vertical axis alone.

The remaining columns in the dialog are either for
a single variable (histogram, box plot, map), or for both the **Independent Var (x-axis)**
and the **Dependent Var (y-axis)** in the scatter plot option.

#### 8.4.1.1 Conditional plot options

Two important options, special to the conditional plot, are **Vertical Bins Breaks** and
**Horizontal Bins Breaks**. These options provide a way to select the observation subsets for
each conditioning variable. They include all the same options
used in map classification (see Figure 4.2 in
Chapter 4). Of special interest are the **Unique Values** option
and the **Custom Breaks** options. **Unique Values** is particularly appropriate when the
conditioning variable takes on discrete values, in which case other classifications
may result in meaningless subsets (see the discussion of Figure 8.15).
**Custom Breaks** is useful when subclasses for the conditioning variable were
determined previously, preferably contained in a project file (see Section 4.6.3).

Each graph also has its usual range of options available, with the exception
of **Regimes Regression** for the conditional scatter plot. Selection of observations
is implemented, but it does not result in a re-computation of the linear regression fit.
On the other hand, the conditional scatter plot does include the **LOWESS Smoother** option.

### 8.4.2 Conditional statistical graphs

To illustrate these concepts, first, a set of conditional scatter plots is created for
the relationship between **peduc_20** (x-axis) and **pfood_20** (y-axis). This is conditioned
by three subcategories (the default number) for **pch12** in the horizontal dimension
(an indicator variable for whether the population increased, =1, or decreased, =0, between
2010 and 2020). The conditioning variable in the vertical dimension is **ALTID**. The
default classification is to use **Quantile** with three categories for both conditioning
variables.

This yields the plot shown in Figure 8.15. Something clearly is wrong, since there turns out to be only one subcategory for the horizontal conditioning variable. This is because a 3-quantile classification of the indicator variable does not provide meaningful categories. This will be dealt with next.

However, first consider the substantive interpretation of this graph. The three plots show a strong positive and significant relationship in each case (significance indicated by **). In other words less access to education is linearly related to less food security, more or less in line with our prior expectations. The lack of difference between the graphs would suggest that the relationship is stable across all three ranges for altitude, the conditioning variable.

Figure 8.16 results after changing the **Horizontal Bins Breaks** option
to **Unique Values**. At this point, the two categories for the horizontal conditioning
variable correspond to the values of 0 and 1 for the population change indicator
variable.

Conditioning on this variable does provide some indication of *interaction*. For
**pch12 = 1**, the linear relationship between **peduc_20** and **pfood_20** is strong
and positive, and not affected by altitude. However, for **pch12 = 0**, there does not
seem to be a significant slope for any of the subgraphs, suggesting a lack of relationship
between the two variables in those municipalities suffering from population loss. The
change in sign of the slope along altitude is not meaningful since the slopes are
not significant.

If only the horizontal classification had been used as a condition, the result
would be the same as a **Regimes Regression** in a standard scatter plot. However,
in order to consider the simultaneous conditioning on two variable,six separate selection sets would be required to create six individual scatter plots with regimes, which is not very practical. In addition, the double conditioning allows for the investigation of
more complex interactions among the variables.

An alternative as well as generalization of the consideration of spatial heterogeneity
by means of the **Averages Chart** (see Section 7.5.1) is the
application of a **Conditional Box Plot** with a conditioning variable that
corresponds to *spatial subsets* of the data. In
Figure 8.17, this is illustrated for **pheal_20** (percent population
that lacks access to health care), using the **Region** classification as the
conditioning variable along the horizontal axis. In the graph, **Valles Centrales** (8)
shows a median value that is higher than the other regions, with **Canada** (1) having
the best median health outcome (lowest value for lack of access). While the conditional
box plot is not an alternative to a formal difference in means (or medians) test,
it provides a visual overview of the extent to which heterogeneity may be present.

### 8.4.3 Conditional Maps

A final example is the conditional map or micromap matrix shown in
Figure 8.18. Four box maps are included for **ppov_20**, conditioned
by **peduc_20** on the horizontal axis and **pheal_20** on the vertical axis. The graph
is obtained after changing the bin breaks to two quantiles, i.e., below and above
the median for each of the conditioning variable.

As before, the interest lies in the extent to which the maps represent similar
spatial patterns in each of the subcategories. The maps suggest an interaction with
**peduc**, where more of the brown values (more poverty) are found for above median
values for lack of education, and more blue values (less poverty) in the lower median
(better education). A potential interaction with health access is less pronounced.

As in any EDA exercise, considerable experimentation may be needed before any meaningful patterns are found for the right categories of conditioning variables. Of course, this runs the danger than one ends up finding what one wants to find, an issue which revisited in Chapter 21.