Pandas Indexing#

Before solving these exercises you should have read Advanced Indexing and Dates and Times.

import pandas as pd

Cars#

For these exercises we use a dataset describing used cars obtained from kaggle.com. Licences: Open Data Commons Database Contents License (DbCL) v1.0 and Open Data Commons Open Database License (ODbL) .

data = pd.read_csv('cars.csv')

Create Multi-Level Index#

Create a multi-level index for the data frame from columns 'name' and 'year'.

Solution:

# your solution

Select Model#

Print all rows for the 'Maruti Swift Dzire VDI' 2018 model.

Solution:

# your solution

Diesel#

Select all 2018 cars and use value_counts to get the percentage of Diesel cars.

Solution:

# your solution

Old Cars#

Print all cars with more than 100000 kilometers driven and manufactured before 2000.

Solution:

# your solution

E-Mails#

Consider an email account receiving emails every day. Use the following code to generate a list times of time stamps representing arrival times of emails.

import numpy as np
rng = np.random.default_rng(0)

n_mails = 1000
start_time = pd.Timestamp('2019-01-01 00:00:00')
end_time = pd.Timestamp('2020-01-01 00:00:00')

total_seconds = int((end_time - start_time).total_seconds())
seconds = rng.integers(0, total_seconds, n_mails)
times = [start_time + pd.Timedelta(sec, unit='s') for sec in seconds]
del seconds

Mails per Day#

Given the list of time stamps of incoming mails create a series with daily mail counts.

Solution:

# your solution

Mails per Morning#

Every day the user only answers mails received not after 7:00am that day. From the list of time stamps create a series with daily mail counts at 7:00am. Hint: Have a look at the offset argument of Series.resample; label might be of interest, too.

Solution:

# your solution

Mails per Business Day Morning#

Assume the user reads and answers emails at business days only (again, at 7:00am). Create a series containing the numbers of mails to process at each business day.

Solution:

# your solution

Vacation#

From the results of the previous task get the number of mails arriving during winter vacation in January and February. Use a variable for the year of interest:

year = 2019

Write code which works for all years (leap year or not).

Solution:

# your solution