Typical Tasks#

Unsupervised machine learning algorithms extract information from data sets without the need for examples of the information to extract. The algorithm determines what to extract and humans have to interpret the extracted information to obtain insights into the data. Thus, it’s very easy to collect data sets suitable for unsupervised learning, but interpretation of results may be difficult.

Unsupervised learning is an extremely wide field. Here we focus on clustering and also consider dimensionality reduction. Further, we’ll have a first glance at generative models. But there exist other subfields, too, anomaly detection and association analysis for instance. Most techniques can be applied to different tasks.

Like for supervised learning we denote the data space by \(X\) and the items of the data set under consideration by \(x_1,\ldots,x_n\). Almost always we will have \(X=\mathbb{R}^m\). Again we need a training data set for fitting a model. Validation and test sets are used for hyperparameter optimization and model evaluation as before.

Clustering#

Clustering aims at finding subsets of similar items in large data sets.

data set with clusters

Fig. 56 How to define the term cluster is not as straight-forward as it seems.#

Details depend on the application in mind.

  • Do we want to find hard clusters (either a sample belongs to a cluster or not) or soft clusters (score/probability for each combination of samples and clusters)?

  • Shall all samples belong to some cluster or do we allow for outliers?

  • May clusters overlap (some samples belong to more than one cluster)?

There exist many different approaches for clustering algorithms. Main classes are:

  • Centroid-based algorithms represent each cluster by one point (midpoint or centroid). Which samples belong to which clusters is determined by some rule involving those midpoints.

  • Density-based algorithms look at the distances between samples and define clusters to be subsets of closely spaced samples.

  • Distribution-based algorithms represent each cluster be a probability distribution. A sample belongs to the cluster for which the sample’s probability is highest.

  • Hierarchical algorithms generate a sequence of clusterings. Either starting with as many clusters as there are samples (and then coarsening the clustering) or starting with one cluster (and then refining).

Dimensionality Reduction#

Dimensionality reduction tries to reduce the number of features without loosing too much information. We already know principal component analysis as a linear dimensionality reduction technique. Here, ‘linear’ means that the mapping from the high dimensional data space to the lower dimensional space is linear (a matrix).

Nonlinear dimensionality reduction techniques are more powerful, but also much more computationally expensive.

nonlinear dimensionality reduction from 2d to 1d

Fig. 57 Often data is not scattered over the whole space but lives close to a lower dimensional nonlinear manifold.#

Generative Models#

Unsupervised learning algorithms may not only learn to distinguish between similar and dissimilar samples. Some algorithms yield so called generative models. Generative models contain all information necessary to automatically create new samples similar to samples in the training data set.

Generative models can be used for generating natural looking artifical images or for generating art, for instance.