Python and Jupyter#

In this book we use the Python programming language for talking to the computer. Tools from the Jupyter ecosystem allow for Python programming in a very comfortable graphical environment.

Data Science Tools#

There are lots of software tools for data science and artificial intelligence. They can be devided into two groups:

Tailor-made GUI tools

For common tasks in data science and AI like clustering data or classifying images there exist (mostly commerical) tools with graphical user interface (GUI). Such tools are easy to use, but they have a very limited scope of application. Each task requires a different tool. Available methods are restricted to well known ones. Implementing new problem specific methods is not possible.

General Purpose Tools

To enjoy maximum freedom in choice of methods one has to leave the world of GUI tools. Creating data science models (that is, computer programs) without any restrictions requires the use of some high-level programming language. Examples are R and Python. Both languages are very common in the data science community because they ship with lots of extensions for simple usage in data science and AI.

Tailor-made tools come and go as time moves on. Programming languages are much more long-lasting. In this book we stick to the Python programming language and its ecosystem. The R programming language would be a good alternative, but sticks more to statistical tasks than to general purpose programming.

Tip

Some people feel frightend if someone says ‘programming language’. Think of programming languages as usual software tools. The only difference is that they provide much more functionality than GUI tools. But there’s not enough space on screen to have a button for each function. So we write text commands.

Why Python?#

Python is a modern, free and open source programming language. It dates back to the early 1990s with a first official release in 1994. It’s father and BDFL (benevolent dictator for life) is Guido van Rossum.

line drawing showing a person on ground wondering why the other is flying

Fig. 14 I wrote 20 short programs in Python yesterday. It was wonderful. Perl, I’m leaving you. Source: Randall Munroe, xkcd.com/353#

Python code is very readable and straight forward without too many cumbersome symbols like in most other programming languages. Many technical aspects of computer programming are managed by Python instead of by the programmer. With Python one may develop the full range of software, from simple scripts to fully featured web or desktop applications. Thousands of extensions allow for rapid development.

There’s a large online community discussing Python topics. Almost every problem you’ll encounter has already been solved. Simply use a search engine to find the answer.

line chart showing the popularity of programming languages from 2008 till 2022

Fig. 15 Popuparity of programming languages on Stack Overflow. Source: Stack Overflow Trends (modified by the author)#

Some rules followed by Python and its community are collected in the Zen of Python. Here are some of them:

  • Beautiful is better than ugly.

  • Explicit is better than implicit.

  • Simple is better than complex.

  • Complex is better than complicated.

  • Readability counts.

  • There should be one – and preferably only one – obvious way to do it.

  • Although that way may not be obvious at first unless you’re Dutch.

  • If the implementation is hard to explain, it’s a bad idea.

  • If the implementation is easy to explain, it may be a good idea.

Last but not least Python is available on all platforms, Linux, macOS, Windows, and many more. Youtube’s player is written in Python and many other tech giants use Python. But it’s also not unlikely that a Python script controls your washing machine.

Hint

There are two versions of Python: Python 2 and Python 3. Source code is not compatible, that is, there are programs written in Python 2 which cannot be executed by a Python 3 interpreter. In this book we stick to Python 3. Python 2 is considered deprecated since January 2020.

Jupyter#

The Jupyter ecosystem is a collection of tools for Python programming with emphasis on data science. Jupyter allows for Python programming in a webbrowser. Outputs, including complex and interactive visualizations, can be put right below the code producing these outputs. Everything is in one document: code, outputs, text, images,…

screenshot of JupyterLab with running notebook

Fig. 16 JupyterLab is the most widely used member of the Jupyter ecosystem. It brings Python to the webbrowser.#

In this book you’ll meet at least four members of the Jupyter ecosystem.

JupyterLab is a web application bringing Python programming to the browser. It’s the everyday tool for data science. JupyterLab may run on a remote server (cloud) or on your local machine.

An alternative to JupyterLab is Jupyter Notebook. It’s a predecessor of JupyterLab and provides almost identical functionality, but with different look and feel.

Running JupyterLab in the cloud requires user authentication and user management. JupyterHub provides everything we need to run several JupyterLabs on a server in parallel. Almost all JupyterLab providers (e.g., Gauss at Zwickau University, Binder) rely on JupyterHub.

This book is being published using Jupyter Book. Each page is a Jupyter notebook file. Jupyter Book provides automatic generation of table of contents, handling bibliographies and rendering to different output formats.

Install and Use#

Work through the following projects to get up and running with Python and Jupyter: