What is Reinforcement Learning?#

By reinforcement learning we denote methods to create computer programs which learn to do the desired tasks by trial and error. To fully understand the difference between reinforcement learning and other paradigms for modeling computational processes we should recap the different approaches for creating models of processes and data.

Approaches to Modeling#

Classical Modeling#

Especially in (classical) physics one looks at the outcome of a hand full of experiments and tries to create a model of underlying processes from the collected data. Here modeling is a combination of very few observations (data) and lots of thinking. Derived models usually cover a wide range of applications, much wider than covered by the available data.

Newton’s law \(F=m\cdot a\) is a typical example. We expect that it holds throughout the universe (at least up to some precision) although we only have observations of forces and accelerations from a tiny fraction of the universe.

Another examples is CT imaging. Taking X-ray images of an object from different directions we may compute a 3d representation of the object’s insides. The model describing the transformation from a series of 2d X-ray images to 3d is known as Radon transform. This model originated in 1917 from human thinking without having any measurements at hand and proved to be correct later on with the advent of computers able to perform the transformation on real X-ray data.

Models obtained via classical modeling may be used for understanding real-world processes, for reconstructing situations in the past, and for predicting future developments.

Statistical Modeling#

Supervised and unsupervised machine learning use statistics to create computational models from large data sets. Models are fully data-based, (almost) no human thinking is involved. Corresponding models only cover situations represented directly or indirectly (via interpolation) by the underlying data.

A typical example for supervised learning is image classification. The transform from a collection of colored pixels to a label is too complex to derive it from few observations by simply thinking about the problem. But having lots of images and corresponding labels at hand we may train a machine learning model on the classification task. Of course, this model will only work on images similar to the images in the training data set. A model trained on cats and dogs won’t recognize cars.

Models obtained from statistical modeling do not describe real-world phenomena but the training data set. We may use statistical modeling to find structures in data sets (unsupervised learning) or to describe mappings between data sets (supervised learning).

Reinforcement Learning#

There are tasks that cannot be modeled by classical or statistical methods. They are too complex for classical modeling and require too much training data for successful statistical modelling. An example is autonomous driving, that is mapping lots of sensor data to car control commands. A car’s environment is too complex to describe it accurately by a finite fixed data set.

For tasks like autonomous driving we need models that on the one hand solve the task without relying on human understanding of all relevant processes. On the other hand the model has to incorporate data collection mechanisms, because we cannot provide sufficiently rich training data in advance. This is what reinforcement learning is about: create models that collect (relevant) data and solve the desired task based on that data.

If we do not provide training data or preimplement models of relevant processes, how to tell the reinforcement learning system which task to solve? Detailed answers to this question will be given later on, but the principal idea is to provide the model with a fixed set of actions it may chose from and to answer each action with a numerical feedback. If the sequence of actions chosen by the model tends to solve the task feedback values will be high. If the model chooses actions resulting in useless behavior, feedback values will be smaller.

Reinforcement learning models describe how to collect relevant training data and how to map data to useful actions. Reinforcement learning is used to make computers solve complex control tasks.

The Model#

What kind of task can solve with reinforcement learning? The answer is simple: all tasks fitting into the standard reinforcement learning model known as Markov decision process (MDP). We will give a precise definition later on. Here we only provide basic notions and ideas.

The final computer program solving the desired task is denoted or understood as an agent. The agent performs actions in an environment. The environment provides information about its state to the agent and the environment provides numerical feedback (reward) to the agent’s actions. Which action the agent chooses next is based on the environment’s current state and on the feedbacks received by the agent in the past. The mapping from states to actions is called policy.

The aim of reinforcement learning is to find a good (maybe in some sense optimal) policy. A good policy should, in the long-run, yield high accumulated reward (denoted as return). Training starts with a random or default policy. Action by action (and, thus, reward by reward) the policy gets modified to yield better actions maximizing the return.

scheme of information flow between agent and environment

Fig. 75 The agent choses an action according to its policy. The environment changes its state and sends a reward signal to the agent.#

The basic algorithm a typical reinforcement learning model performs is:

  1. Choose an initial policy.

  2. Take some actions and get corresponding rewards/return.

  3. Update/improve the policy.

  4. Go to 2.

There’s no dedicated training phase. Training and application of the model are one and the same. Of course, one may stop updating the policy if the policy is optimal or at least good enough for the task to be solved (convergence). But the more common situation is that the model improves its policy as long as it is used. This way the model may adapt to changing behavior of the environment.

Every task that can be formulated in such a framework based on an agent, actions, an evironment, and rewards can be tackled by reinforcement learning techniques.