Cloud Computing#

Almost all modern computers have several CPUs and one or more GPUs. CPUs are made for general purpose computations, GPUs are specialized chips for fast floating point computations and for matrix operations. Initially, GPUs were used for rendering 3d graphics, but they turned out to be very useful for training machine learning models. Training on a GPU is much faster than training on CPUs.

Powerful GPUs are very expensive and having several of them is even more expensive. Thus, training complex machine learning models should be done on remote computers operated by companies or research institutions. Compute time can be bought from Amazon, Google, Microsoft and many others.

Different Approaches for Remote Execution#

There exist different techniques for executing programs on remote computers. Here we discuss three of them. To understand differences and implications we first have to have a closer look at the relationships between a program and the operating system.

Libraries#

Each program uses some libraries (called ‘modules’ or ‘packages’ in Python). When shipping a program to the end user or to the cloud we have to decide whether and how to ship the libraries. We could include all libraries into our program (known as static linkage), but this would yield a very large program. Alternatively, we could ask the user or the cloud provider to install required libraries before running our program (dynamic linkage). This way we have a smaller program and libraries can be used in common by many different programs.

Static linkage is relatively rare nowadays. Dynamic linkage is the standard approach. The major drawback is that we cannot be sure that our program can be executed on the destination system. Even if we provide all required libraries for prior install, some libraries may conflict with ones already installed on the destination system. Concerning the question of how to handle such library problems, there exist three major techniques for executing programs on remote computers.

Physical Machine with Full Control#

If we have full control over the remote computer, then we may install everything we need to run our programs. On modern multi-user systems like Linux and macOS (and to some extent also Windows) different users may work in parallel without influencing each other. Some of the libraries are available for all users, but each user may install user specific libraries, too.

scheme of programs and librarys on a physical machine

Fig. 91 Programs and libraries on a physical machine.#

Advantages are relatively simple administration and efficient program execution. A disadvantage is that users are forced to use the operating system installed on the remote computer.

Virtual Machine#

The cloud provider may install virtual machines. A virtual machine is a program simulating a whole computer. On such a simulated computer one may install a different operating system than the one installed on the underlying real computer. Several virtual machines may run in parallel and isolated from each other. Every user gets access to a different virtual machine and can do whatever he or she wants to do.

scheme of programs and librarys on two virtual machine

Fig. 92 Programs and libraries on two virtual machines.#

This technique is rarely seen in practise because virtual machines run relatively slow and require lots of hardware resources. A typical use case is to run Linux software on a Windows machine and vice versa.

Containers (Docker)#

Non-Windows operating systems allow for a third technique combining the advantages of direct access to a physical machine and virtual machines. A program and all required libraries can be packaged into a container for shipment. This container is then executed by the operating system in an isolated environment very similar to virtual machines, but much more efficiently.

scheme of programs and librarys in two containers

Fig. 93 Programs and libraries in two containers.#

Docker is a widely used containerization software. Corresponding containers are created from so called Docker images (an image is a blueprint for a container). All major cloud providers can process Docker images. There exist several other containerization tools, Podman for instance.

Containerization uses specific features of non-Windows systems. Docker is available for Windows, too. But the installer installs a virtual machine containing a Linux system and then installs Docker in the virtual Linux.