4.4 Implementation
The t-SNE functionality is invoked from the drop down list created by the toolbar Clusters icon (Figure 4.1), as the third item (more precisely, the third item in the dimension reduction category). Alternatively, from the main menu, Clusters > t-SNE gets the process started.
To illustrate these methods, the same ten bank indicators are used as in the previous chapter (for a full list, see Section 3.2.2).
The main dialog is the t-SNE Settings panel, shown in Figure 4.2. The left-hand side contains the list of available variables as well as several parameters that can be specified to fine tune the algorithm. Both Euclidean and Manhattan distance metrics are available for the Distance Function (to compute the probability functions), with Euclidean as the default. As before, the Standardize (Z) is the default Transformation.
On the right-hand side of the interface are two major panels. The top consists of a space that is used to track the changes in the two-dimensional scatter plot as the iterations proceed. The bottom panel gives further quantitative information on the evolution of the quality of the result, reported for every 50 iterations. This lists a value for the error, i.e., the cost function evaluated at that stage. The table provides an indication of how the process is converging to a stable state.
In-between the two result panes on the right is a small control panel to adjust the speed of the animation, as well as allowing to pause and resume as the iterations proceed (see Section 4.4.1).
With all the parameters left to their default settings, the Run button populates the panels on the right-hand side. Figure 4.3 shows the result after the default 5000 iterations, with a final cost of 0.714386. The top panel contains the familiar scatter plot of the t-SNE coordinates in two dimensions. The bottom pane (only partly shown) lists the final cost as well as the error for every 50 iterations. It shows that the error has been hovering around a similar value for (at least) the last 500 iterations.
In the implementation in GeoDa
, the final cost value listed (0.714386) does not exactly equal the result for the error at the last iteration (0.713598). This is due to the use of the Barnes-Hut approximation (Section 4.3.2.2).19
With the final result in hand, selecting the Save button brings up the familiar dialog to specify the variable names to be saved to the data table (defaults are V1 and V2, but here set to V1tsne and V2tsne). Once these coordinate variables are specified, the final scatter plot is generated in a separate window, as illustrated in Figure 4.4.
At the bottom of the graph is a listing of the variables used in the analysis, as well as the final cost (0.714), the rank correlation (0.640), and the number of iterations (the default 5000). Since the t-SNE optimization algorithm is not based on a specific stopping criterion, unless the process is somehow paused (see Section 4.4.1), the number of iterations will always correspond to what is specified in the options panel. However, this does not mean that it necessarily has resulted in a stable optimum. In practice, one therefore needs to pay close attention to the point pattern of the iterations as they proceed.
The final cost is not comparable to the objective function for MDS, since it is not based on a stress function. On the other hand, the rank correlation is a rough global measure of the correspondence between the relative distances and remains a valid measure. However, it should be kept in mind that t-SNE is geared to optimize local distances, so a global performance measure is not necessarily the best measure of performance.
4.4.0.1 Inspecting the iterations
The bottom panel allows for an inspection of the results for the error at each iteration. A more visual representation of the change in point configurations can be found by manipulating the Iteration button in the top panel. As the button is moved to the left, the change in configuration is illustrated in the graph as the optimization process proceeds.
This is particularly illuminating in the beginning stages of the algorithm, where substantial jumps can be observed. For example, in Figure 4.5, the Iteration is shown for two early results, one at iteration 250 (error 54.7166) and one shortly afterwards, at iteration 300 (error 0.852884). This is the point in the solution process where initial very chaotic solutions (jumping from one side of the graph to the other) begin to stabilize to a more circular cloud shape.
This process continues until a certain degree of stability is obtained, after which the changes become marginal. For example, in Figure 4.6, the results at 2500 iterations (error 0.71491) and at 4000 iterations (error 0.715028) are shown, which portray a very similar pattern with almost identical error measures, near impossible to distinguish from the final solution in Figure 4.3.
4.4.1 Animation
As mentioned, with the default settings, the optimization runs its full course for the number of iterations specified in the options. The Speed button below the graph on the right-hand side allows for the process to be slowed down for visualization purposes, by moving the button to the left. This provides a very detailed view of the way the optimization proceeds, gets trapped in local optima and then manages to move out of them.
In addition, the pause button (||) can temporarily halt the process. The resume button (>) continues the optimization. The iteration count is given below the graph and in the bottom panel the progression of the cost minimization can be followed.
In this implementation, the pause button is different from the Stop button that is situated below the options in the left-hand panel. The latter ends the iterations and freezes the result, without the option to resume. However once frozen, one can move back through the previous sequence of iterations by moving the Iteration button to the left.
4.4.2 Tuning the Optimization
The number of options to tune the optimization algorithm may seem a bit overwhelming. However, in many instances, the default settings will be fine, although some experimenting is always a good idea.
To illustrate the effect of some of the options, changes in three central parameters are illustrated in turn: Theta, Perplexity, and the Iteration Switch for the Momentum.
4.4.2.1 Theta
The Barnes-Hut algorithm implemented for t-SNE sets the Theta (\(\theta\)) parameter to 0.5 by default. This is the criterion that determines the extent of simplification for the \(q_{ij}\) measures in the gradient equation. The parameter is primarily intended to allow the algorithm to scale up to large data sets.
In smaller data sets, like in the example considered here, this may be set to zero, which can speed up the solution since the actual distances are used, rather than a proxy. The result is shown in Figure 4.7. The final cost is 0.737581 with a rank correlation of 0.635, both slightly worse than the default. In this case, convergence is reached after fewer than 1000 iterations, somewhat faster than for the default case. Also, in this instance, there is no difference between the error at the last iteration and the final cost, since no Barnes-Hut simplification is carried out.
4.4.2.2 Perplexity
The second parameter to adjust is the Perplexity (with Theta set back to its default value). This is a rough proxy for the number of neighbors and is by default set to a value of 30. The results for a value of 50 (the highest suggested by van der Maaten and Hinton 2008) are visualized in Figure 4.8. Compared to the previous results, the cost is much lower at 0.540604, and a higher rank correlation of 0.689. Convergence is reached after about 1600 iterations.
4.4.2.3 Iteration momentum switch
The final adjustment is to the Iteration Switch for the Momentum parameter, which decides at which point in the iterations the Momentum moves from 0.5 to 0.8. The default value is 250. With Perplexity left at 50, a change of the Iteration Switch to 100 obtains less chaotic solutions much sooner than in the previous cases. The switch happens between 100 and 150 iterations. The end result, shown in Figure 4.9, is the best achieved so far: the final cost is 0.525096 with a rank correlation of 0.700.
Further experimentation with the various parameters can provide deep insight into the dynamics of this complex algorithm, although the specifics will vary from case to case. Also, as argued in van der Maaten (2014), the default settings are an excellent starting point.
4.4.3 Interpretation and Spatialization
All the visualization options reviewed for classic MDS in Chapter 3 can be applied to the t-SNE solutions, in the same way as described in Sections 3.4 and 3.5. This includes the incorporation of a categorical variable that is visualized in the coordinate plot (see Section 3.4.2). In addition, for t-SNE, those categories are also shown during the iterations and animations in the top right panel of Figure 4.2. Otherwise, everything operates as discussed previously.
One aspect that remains to be investigated is the extent to which the t-SNE approach yields solutions that provide a match with geographic neighbors. This would be reflected in a nearest neighbor match test based on t-SNE nearest neighbors.
4.4.3.1 Nearest neighbor match test
The nearest neighbor match test is illustrated for the coordinates of the latest t-SNE results, V1mom and V2mom. The k-nearest neighbors with k=6 are matched to the geographical k-nearest neighbors. The corresponding unique values cardinality map is depicted in Figure 4.10. Compared to the MDS results in Figure 3.17 and the standard results in Figure 3.18, the largest number of common neighbors is four, obtained for one observation (the Cassa Raiffeisen in Latsch in the Trentino-Alto Adige Region, observation 67). The associated probability is 0.000001, suggesting very strong evidence of a multivariate local cluster. In addition, there are 6 observations with three common neighbors (p=0.000133), and 23 with two common neighbors (p=0.006275). As shown in Section 3.5.3, the result with one common neighbor cannot be deemed to be significant (p=0.125506).
The cluster analysis based on the t-SNE computations yields the most significant clusters in the current example. This is somewhat to be expected, since the method stresses local similarities between the higher and lower embedded dimension. Many identified locations match the standard cluster results, but several are also in different locations. The involved trade offs remain to be further investigated.
As detailed in Section 4.3.2.2, when Theta is not zero, several values of \(q_{ij}\) are replaced by the center of the corresponding quadtree block. The final cost reported is computed with the values for all \(i-j\) pairs included in the calculation, and does not use the simplified \(q_{ij}\), whereas the error reported for each iteration uses the simplified version.↩︎