Locally Linear Embedding#

The idea of locally linear embedding (LLE) appeared at the same time (and journal) as Isomap, uses Euclidean distances locally only like Isomap, and results in an eigenvalue problem (again, like Isomap). But details and motivation differ.

The basic assumption is, that the data set lies on a low dimensional manifold which can be decomposed into (approximately) linear snippets. LLE tries to arange those snippets in low (often 2) dimensions without modifying their neighborhood relations.

Finding an LLE requires two steps:

Represent each sample as an affine combination of its neighbors.
Arrange low dimensional points such that they can be represented by the same affine combination of neighbors as in high dimensions.

Local Affine Combinations#

For each sample $x_l$ the $k$ nearest neighbors $x_{\lambda_1},\ldots,x_{\lambda_k}$ are determined. These neighbors span an affine manifold (that is, a translated subspace)

\[\begin{equation*} \{a_1\,x_{\lambda_1}+\cdots+a_k\,x_{\lambda_k}:\;a_1,\ldots,a_k\in\mathbb{R},\;a_1+\ldots+a_k=1\}. \end{equation*}\]

spanned subspace versus spanned affine manifold

Fig. 71 Two vectors may span a two-dimensional subspace or a one-dimensional affine manifold depending on the set of coefficients considered.#

Projecting $x_l$ orthogonally onto the affine manifold spanned by its neighbors yields representation

\[\begin{equation*} x_l\approx w_{l,\lambda_1}\,x_{\lambda_1}+\cdots+w_{l,\lambda_k}\,x_{\lambda_k}. \end{equation*}\]

The coefficients $w_{l,\lambda_1},\ldots,w_{l,\lambda_k}$ may be regarded as weights, because their sum is 1 and often they are positive if $x_l$ is surrounded by its neighbors (more precicely, if $x_l$ lies in the convex hull of its neighbors.

Fig. 72 All coefficients in an affine combination are nonnegative if and only if the resulting point belongs to the convec hull of the points being combined.#

If the data set is ‘locally linear’, then $x_l$ equals the weighted sum and the error induced by representing samples as weighted sums of their neighbors is zero. The more nonlinear a data set behaves locally, the less reliable the low dimensional representation obtained via LLE.

This first step of LLE yields at most $n^2$ weights $w_{l,\lambda}$. Weights $w_{l,\lambda}$ for which corresponding samples aren’t neighbors are set to zero. Weights are not symmetric, that is, $w_{l,\lambda}\neq w_{\lambda,l}$ in general.

To get the weights one first determines the indices $\lambda_1(l),\ldots,\lambda_k(l)$ of the $k$ nearest neightbors for all samples $x_l$. Then one solves the constrained minimization problem

\[\begin{align*} &\sum_{l=1}^n\bigl(w_{l,\lambda_1(l)}\,x_{\lambda_1(l)}+\cdots+w_{l,\lambda_k(l)}\,x_{\lambda_k(l)}-x_l\bigr)^2\to\min_{w_{l,\lambda}}\\ &\text{with constraints}\quad w_{l,\lambda_1(l)}+\cdots+w_{l,\lambda_k(l)}=1\quad\text{for $l=1,\ldots,n$}\\ &\text{(unused $w_{l,\lambda}$ are set to zero).} \end{align*}\]

This minimization problem is quadratic and can be solved numerically in different efficient ways, for instance, by solving a system of linear equations.

Low Dimensional Fitting#

Given weights $w_{l,\lambda}$ the second step of LLE is to find points $u_1,\ldots,u_n\in\mathbb{R}^p$ in low dimensions which solve

\[\begin{align*} &\sum_{l=1}^n\bigl(w_{l,\lambda_1(l)}\,u_{\lambda_1(l)}+\cdots+w_{l,\lambda_k(l)}\,u_{\lambda_k(l)}-u_l\bigr)^2\to\min_{u_1,\ldots,u_n}\\ &\text{with constraints}\quad u_1+\cdots+u_n=0\quad\text{and}\quad\text{covariance matrix is identity}. \end{align*}\]

The objective is the same as in the first step, but now in low dimensions and with fixed weights. Thus, LLE tries to reconstruct the local linear structure from high dimensions in low dimensions. Without constraints choosing all low dimensional points to be zero would solve the minimization problem. The covariance constraint excludes such trivial solutions by requiring that featurewise variance is $1$. Thus, solutions have to be scattered in space to some extent. Covariance of zero prevents some other trivial solutions and avoids solution non-uniqueness due to rotations. Non-uniqueness due to translations is avoided by the mean zero contraint.

The solution to the minimization problem can be obtained from an eigenvalue problem similar to kernel PCA.

LLE with Scikit-Learn#

Scikit-Learn has the LocallyLinearEmbedding class in the manifold module.

import numpy as np
import sklearn.manifold as manifold
import plotly.graph_objects as go
from plotly.subplots import make_subplots

data_files = ['omega.npz', 'sphere.npz', 'cube.npz', 'clouds.npz']

for file in data_files:
    
    loaded = np.load(file)
    x = loaded['x']
    y = loaded['y']
    z = loaded['z']
    red = loaded['red']
    green = loaded['green']
    blue = loaded['blue']

    lle = manifold.LocallyLinearEmbedding(n_components=2, n_neighbors=30)
    U = lle.fit_transform(np.stack((x, y, z), axis=1))
    
    fig = make_subplots(rows=1, cols=2, specs=[[{'type': 'scatter3d'}, {'type': 'xy'}]])
    fig.update_layout(width=1000, height=600, scene_aspectmode='cube')
    fig.add_trace(go.Scatter3d(
        x=x, y=y, z=z,
        mode='markers',
        marker={'size': 1.5, 'color': [f'rgb({r},{g},{b})' for r, g, b in zip(red, green, blue)]},
        hoverinfo = 'none',
        showlegend=False
    ), row=1, col=1)
    fig.add_trace(go.Scatter(
        x=U[:, 0], y=U[:, 1],
        mode='markers',
        marker={'size': 5, 'color': [f'rgb({r},{g},{b})' for r, g, b in zip(red, green, blue)]},
        hoverinfo = 'none',
        showlegend=False
    ), row=1, col=2)
    fig.show()

Locally Linear Embedding

Contents

Locally Linear Embedding#

Local Affine Combinations#

Low Dimensional Fitting#

LLE with Scikit-Learn#