Pandas Vectorization
Contents
Pandas Vectorization#
Before solving these exercises you should have read High-Level Data Management with Pandas.
import pandas as pd
For these exercises we use the dataset obtained in the Cafeteria project.
data = pd.read_csv('meals.csv', names=['date', 'category', 'name', 'students', 'staff', 'guests'])
data
Preprocessing#
Data Types#
Convert 'date'
and 'category'
columns to Timestamp
and category
, respectively (see Advanced Pandas exercises).
Solution:
# your solution
Remove Categories#
Remove Categories which do not contain full-fledged meals descriptions (e.g., 'Salat Bar'
).
Solution:
# your solution
Remove Allergens and Additives#
Most meals contain information on alergenes and additives (numbers in parantheses). Remove the information to get more readably meal descriptions. Implement the removal procedure twice: without and with vectorized string operations. Get and compare execution times.
Solution:
data['name_backup'] = data['name']
%%timeit
# your solution without vectorized string operations
data['name'] = data['name_backup']
data = data.drop(columns=['name_backup'])
%%timeit
# your solution with vectorized string operations
Simplify Meal Descriptions#
Create a new column 'simple'
from the 'name'
column by removing all lower-case words, all punctuation marks and so on. Only words starting with an upper-case letter are allowed.
Solution:
# your solution
All Meals with…#
Given a key word (e.g., 'Kartoffel'
) get the number of meals containing the keyword and print all meal descriptions.
Solution:
# your solution
Meal Plot#
For each day get the number of meals containing some keyword (e.g., 'Kartoffel'
). Call Series.plot
to visualize the result.
Solution:
# your solution