Chapter 4 Stochastic Neighbor Embedding (SNE)
As mentioned, a drawback of the MDS approach is that it employs squared distances in the objective function, and thus emphasizes the impact of points that are far apart. As a consequence, MDS tends to do less well to keep the low-dimensional representation of close data points in high dimension also close together in the embedded space. A different approach is offered by stochastic neighbor embedding (SNE) (Hinton and Roweis 2003). Instead of basing the alignment between high and low-dimensional representations on a stress function that includes the actual distances, the problem is reformulated as one of matching two different probability distributions. These distributions replace the explicit distance measure by a probability that any point \(j\) is a neighbor of a given point \(i\). This probability decreases with increasing distance between \(i\) and \(j\).
In other words, the distance metric of MDS is replaced by a probability measure. The essence of SNE then boils down to matching the probability distribution in high dimensional space to a distribution in low-dimensional embedded space, similar to how distances were matched in MDS.
In this chapter, the basic principles behind SNE and its most recent implementation, t-SNE (van der Maaten and Hinton 2008), are outlined. This material is quite technical. The formal discussion can be readily skipped if the main interest is in application and interpretation.
The chapter starts with a brief introduction to some information-theoretic concepts. This is followed by a description of t-SNE and its implementation. A brief section discusses interpretation and spatialization, most of which was covered in detail in Chapter 3 and will not be repeated here. The chapter closes with a comparison of t-SNE to MDS using the common coverage percentage introduced in the previous chapter.
The Italy Community Banks sample data set is again used to illustrate these techniques.