Advanced Pandas#

Before solving these exercises you should have read Advanced Indexing, Dates and Times, Categorical Data, and Restructuring Data.

import pandas as pd
import numpy as np

Grades#

Use the following code to create a series containing student IDs as index and points in exam as data:

rng = np.random.default_rng(123)

n_students = 20
max_points = 40

student_ids = rng.integers(20000, 25000, n_students)
points = np.floor(rng.normal(0.6 * max_points, 0.2 * max_points, n_students))
points = points.clip(0, max_points).astype(np.int8)
exam_points = pd.Series(points, index=student_ids)

exam_points

Understand the Code#

What do the two points = ... lines in the above code do in detail?

Solution:

# your answer

Points to Grades#

Add a column to the series (resulting in a data frame) containing corresponding grades. Conversion from points to grade is as follows:

percent of points

grade

less than 40

5.0

at least 40

4.0

at least 54

3.7

at least 60

3.3

at least 68

3.0

at least 74

2.7

at least 80

2.3

at least 84

2.0

at least 88

1.7

at least 92

1.3

at least 96

1.0

Use pd.cut to get the grades. Result should look as follows:

       points grade
id                 
20077      24   3.3
23411      11   5.0
22964      33   2.3
...

The 'grade' column should be of categorical type.

Solution:

# your solution

Mean Grade#

Get the mean grade for all students who passed the exam (grade better than 5).

Solution:

# your solution

Cafeteria#

For these exercises we use the dataset obtained in the Cafeteria project.

data = pd.read_csv('meals.csv', names=['date', 'category', 'name', 'students', 'staff', 'guests'])

data

Dates#

Convert 'date' column to Timestamp. Hint: the pd.to_datetime function is very flexible.

Solution:

# your solution

Categories#

Set type of 'category' column to categorical. Print all categories.

Solution:

# your solution

Mean Price per Category#

Get mean students/staff/guests prices per category. Sort results by students price.

Solution:

# your solution

Prices over Time#

Drop rows with nan or 0.0 prices. Then get minimum, average, maxmium students prices per day. Create a data frame with three columns 'min', 'mean' 'max' and DatetimeIndex. Call the data frame’s plot method (works without arguments) to visualize the results.

Solution:

# your solution

Restructuring#

Create a data frame showing prices only, no meal names. Use dates for row index. Column index shall be multi-level with first level showing the category and second level showing the price level (students/staff/guests).

Solution:

# your solution