Support-Vector Regression (SVR)#

Support vector regression is the little brother of support vector machines used for classification.

Self-study task

Get an overview of SVR from A tutorial on support vector regression. Read section 1 (skip formulas in 1.3 and 1.4), subsections 2.1 and 2.2 (skipping formulas again), subsections 3.1 and 3.2, section 4. Then try to answer the following questions:

  • Which loss function is used for SVR formulated as a minimization problem?

  • What type of minimization problem has to be solved (linear, quadratic, cubic, general nonlinear)?

  • Is the term feature space used to denote the same thing in the paper and in our book?

  • Can SVR cope with high-dimensional feature spaces (in our terminology) similar to SVM?

SVR with Scikit-Learn#

Scikit-Learn implements SVR in SVR.

import numpy as np
import matplotlib.pyplot as plt

import sklearn.svm as svm

rng = np.random.default_rng(0)
def truth(x):
    return x + np.cos(2 * np.pi * x)

xmin = 0
xmax = 1
x = np.linspace(xmin, xmax, 100)

n = 100    # number of data points to generate
noise_level = 0.3    # standard deviation of artificial noise

# simulate data
X = (xmax - xmin) * rng.random((n, 1)) + xmin
y = truth(X).reshape(-1) + noise_level * rng.standard_normal(n)

# plot truth and data
fig, ax = plt.subplots()
ax.plot(x, truth(x), '-b', label='truth')
ax.plot(X.reshape(-1), y, 'or', markersize=3, label='data')
ax.legend()
plt.show()
../../../_images/svr_2_0.png

When creating the SVR object we may provide the width of the \(\varepsilon\)-tube containing data with no influence on the loss function and we may select one of several predefined kernels. The regularization parameter can be specified, too. Here we have to take care, because the higher the parameter the less regularization is applied.

epsilon = 0.4
alpha = 1e-1

# regression
svr = svm.SVR(epsilon=epsilon, kernel='rbf', C=1/alpha)
svr.fit(X, y)

# get hypothesis for plotting
y_svr = svr.predict(x.reshape(-1, 1))

# plot truth, data, hypothesis
fig, ax = plt.subplots()
ax.plot(x, truth(x), '-b', label='truth')
ax.plot(x, y_svr+epsilon, '-c', label='tube')
ax.plot(x, y_svr-epsilon, '-c')
ax.plot(X.reshape(-1), y, 'or', markersize=3, label='data')
ax.plot(x, y_svr, '-g', label='model')
ax.legend()
plt.show()
../../../_images/svr_4_0.png

If regularization is very weak, then the tube contains almost all data points. For higher regularization the fitted hypothesis is smoother, but the tube does not contain all data points.