Advanced Pandas
Contents
Advanced Pandas#
Before solving these exercises you should have read Advanced Indexing, Dates and Times, Categorical Data, and Restructuring Data.
import pandas as pd
import numpy as np
Grades#
Use the following code to create a series containing student IDs as index and points in exam as data:
rng = np.random.default_rng(123)
n_students = 20
max_points = 40
student_ids = rng.integers(20000, 25000, n_students)
points = np.floor(rng.normal(0.6 * max_points, 0.2 * max_points, n_students))
points = points.clip(0, max_points).astype(np.int8)
exam_points = pd.Series(points, index=student_ids)
exam_points
Understand the Code#
What do the two points = ...
lines in the above code do in detail?
Solution:
# your answer
Points to Grades#
Add a column to the series (resulting in a data frame) containing corresponding grades. Conversion from points to grade is as follows:
percent of points |
grade |
---|---|
less than 40 |
5.0 |
at least 40 |
4.0 |
at least 54 |
3.7 |
at least 60 |
3.3 |
at least 68 |
3.0 |
at least 74 |
2.7 |
at least 80 |
2.3 |
at least 84 |
2.0 |
at least 88 |
1.7 |
at least 92 |
1.3 |
at least 96 |
1.0 |
Use pd.cut
to get the grades. Result should look as follows:
points grade
id
20077 24 3.3
23411 11 5.0
22964 33 2.3
...
The 'grade'
column should be of categorical type.
Solution:
# your solution
Mean Grade#
Get the mean grade for all students who passed the exam (grade better than 5).
Solution:
# your solution
Cafeteria#
For these exercises we use the dataset obtained in the Cafeteria project.
data = pd.read_csv('meals.csv', names=['date', 'category', 'name', 'students', 'staff', 'guests'])
data
Dates#
Convert 'date'
column to Timestamp
. Hint: the pd.to_datetime
function is very flexible.
Solution:
# your solution
Categories#
Set type of 'category'
column to categorical. Print all categories.
Solution:
# your solution
Mean Price per Category#
Get mean students/staff/guests prices per category. Sort results by students price.
Solution:
# your solution
Prices over Time#
Drop rows with nan
or 0.0
prices. Then get minimum, average, maxmium students prices per day. Create a data frame with three columns 'min'
, 'mean'
'max'
and DatetimeIndex
. Call the data frame’s plot
method (works without arguments) to visualize the results.
Solution:
# your solution
Restructuring#
Create a data frame showing prices only, no meal names. Use dates for row index. Column index shall be multi-level with first level showing the category and second level showing the price level (students/staff/guests).
Solution:
# your solution