Pandas Indexing
Contents
Pandas Indexing#
Before solving these exercises you should have read Advanced Indexing and Dates and Times.
import pandas as pd
Cars#
For these exercises we use a dataset describing used cars obtained from kaggle.com. Licences: Open Data Commons Database Contents License (DbCL) v1.0 and Open Data Commons Open Database License (ODbL) .
data = pd.read_csv('cars.csv')
Create Multi-Level Index#
Create a multi-level index for the data frame from columns 'name'
and 'year'
.
Solution:
# your solution
Diesel#
Select all 2018 cars and use value_counts
to get the percentage of Diesel cars.
Solution:
# your solution
Old Cars#
Print all cars with more than 100000 kilometers driven and manufactured before 2000.
Solution:
# your solution
E-Mails#
Consider an email account receiving emails every day. Use the following code to generate a list times
of time stamps representing arrival times of emails.
import numpy as np
rng = np.random.default_rng(0)
n_mails = 1000
start_time = pd.Timestamp('2019-01-01 00:00:00')
end_time = pd.Timestamp('2020-01-01 00:00:00')
total_seconds = int((end_time - start_time).total_seconds())
seconds = rng.integers(0, total_seconds, n_mails)
times = [start_time + pd.Timedelta(sec, unit='s') for sec in seconds]
del seconds
Mails per Day#
Given the list of time stamps of incoming mails create a series with daily mail counts.
Solution:
# your solution
Mails per Morning#
Every day the user only answers mails received not after 7:00am that day. From the list of time stamps create a series with daily mail counts at 7:00am. Hint: Have a look at the offset
argument of Series.resample
; label
might be of interest, too.
Solution:
# your solution
Mails per Business Day Morning#
Assume the user reads and answers emails at business days only (again, at 7:00am). Create a series containing the numbers of mails to process at each business day.
Solution:
# your solution
Vacation#
From the results of the previous task get the number of mails arriving during winter vacation in January and February. Use a variable for the year of interest:
year = 2019
Write code which works for all years (leap year or not).
Solution:
# your solution