Locally Linear Embedding
Contents
Locally Linear Embedding#
The idea of locally linear embedding (LLE) appeared at the same time (and journal) as Isomap, uses Euclidean distances locally only like Isomap, and results in an eigenvalue problem (again, like Isomap). But details and motivation differ.
The basic assumption is, that the data set lies on a low dimensional manifold which can be decomposed into (approximately) linear snippets. LLE tries to arange those snippets in low (often 2) dimensions without modifying their neighborhood relations.
Finding an LLE requires two steps:
Represent each sample as an affine combination of its neighbors.
Arrange low dimensional points such that they can be represented by the same affine combination of neighbors as in high dimensions.
Local Affine Combinations#
For each sample \(x_l\) the \(k\) nearest neighbors \(x_{\lambda_1},\ldots,x_{\lambda_k}\) are determined. These neighbors span an affine manifold (that is, a translated subspace)
Projecting \(x_l\) orthogonally onto the affine manifold spanned by its neighbors yields representation
The coefficients \(w_{l,\lambda_1},\ldots,w_{l,\lambda_k}\) may be regarded as weights, because their sum is 1 and often they are positive if \(x_l\) is surrounded by its neighbors (more precicely, if \(x_l\) lies in the convex hull of its neighbors.
If the data set is ‘locally linear’, then \(x_l\) equals the weighted sum and the error induced by representing samples as weighted sums of their neighbors is zero. The more nonlinear a data set behaves locally, the less reliable the low dimensional representation obtained via LLE.
This first step of LLE yields at most \(n^2\) weights \(w_{l,\lambda}\). Weights \(w_{l,\lambda}\) for which corresponding samples aren’t neighbors are set to zero. Weights are not symmetric, that is, \(w_{l,\lambda}\neq w_{\lambda,l}\) in general.
To get the weights one first determines the indices \(\lambda_1(l),\ldots,\lambda_k(l)\) of the \(k\) nearest neightbors for all samples \(x_l\). Then one solves the constrained minimization problem
This minimization problem is quadratic and can be solved numerically in different efficient ways, for instance, by solving a system of linear equations.
Low Dimensional Fitting#
Given weights \(w_{l,\lambda}\) the second step of LLE is to find points \(u_1,\ldots,u_n\in\mathbb{R}^p\) in low dimensions which solve
The objective is the same as in the first step, but now in low dimensions and with fixed weights. Thus, LLE tries to reconstruct the local linear structure from high dimensions in low dimensions. Without constraints choosing all low dimensional points to be zero would solve the minimization problem. The covariance constraint excludes such trivial solutions by requiring that featurewise variance is \(1\). Thus, solutions have to be scattered in space to some extent. Covariance of zero prevents some other trivial solutions and avoids solution non-uniqueness due to rotations. Non-uniqueness due to translations is avoided by the mean zero contraint.
The solution to the minimization problem can be obtained from an eigenvalue problem similar to kernel PCA.
LLE with Scikit-Learn#
Scikit-Learn has the LocallyLinearEmbedding
class in the manifold
module.
import numpy as np
import sklearn.manifold as manifold
import plotly.graph_objects as go
from plotly.subplots import make_subplots
data_files = ['omega.npz', 'sphere.npz', 'cube.npz', 'clouds.npz']
for file in data_files:
loaded = np.load(file)
x = loaded['x']
y = loaded['y']
z = loaded['z']
red = loaded['red']
green = loaded['green']
blue = loaded['blue']
lle = manifold.LocallyLinearEmbedding(n_components=2, n_neighbors=30)
U = lle.fit_transform(np.stack((x, y, z), axis=1))
fig = make_subplots(rows=1, cols=2, specs=[[{'type': 'scatter3d'}, {'type': 'xy'}]])
fig.update_layout(width=1000, height=600, scene_aspectmode='cube')
fig.add_trace(go.Scatter3d(
x=x, y=y, z=z,
mode='markers',
marker={'size': 1.5, 'color': [f'rgb({r},{g},{b})' for r, g, b in zip(red, green, blue)]},
hoverinfo = 'none',
showlegend=False
), row=1, col=1)
fig.add_trace(go.Scatter(
x=U[:, 0], y=U[:, 1],
mode='markers',
marker={'size': 5, 'color': [f'rgb({r},{g},{b})' for r, g, b in zip(red, green, blue)]},
hoverinfo = 'none',
showlegend=False
), row=1, col=2)
fig.show()