Pandas Vectorization#

Before solving these exercises you should have read High-Level Data Management with Pandas.

import pandas as pd

For these exercises we use the dataset obtained in the Cafeteria project.

data = pd.read_csv('meals.csv', names=['date', 'category', 'name', 'students', 'staff', 'guests'])

data

Preprocessing#

Data Types#

Convert 'date' and 'category' columns to Timestamp and category, respectively (see Advanced Pandas exercises).

Solution:

# your solution

Count Categories#

For each category give the number of meals.

Solution:

# your solution

Remove Categories#

Remove Categories which do not contain full-fledged meals descriptions (e.g., 'Salat Bar').

Solution:

# your solution

Remove Allergens and Additives#

Most meals contain information on alergenes and additives (numbers in parantheses). Remove the information to get more readably meal descriptions. Implement the removal procedure twice: without and with vectorized string operations. Get and compare execution times.

Solution:

data['name_backup'] = data['name']
%%timeit
# your solution without vectorized string operations
data['name'] = data['name_backup']
data = data.drop(columns=['name_backup'])
%%timeit
# your solution with vectorized string operations

Simplify Meal Descriptions#

Create a new column 'simple' from the 'name' column by removing all lower-case words, all punctuation marks and so on. Only words starting with an upper-case letter are allowed.

Solution:

# your solution

All Meals with…#

Given a key word (e.g., 'Kartoffel') get the number of meals containing the keyword and print all meal descriptions.

Solution:

# your solution

Meal Plot#

For each day get the number of meals containing some keyword (e.g., 'Kartoffel'). Call Series.plot to visualize the result.

Solution:

# your solution