Pandas Basics#

Before solving these basic Pandas exercises you should have read Series and Data Frames.

For these exercises we use a dataset describing used cars obtained from kaggle.com. Licences: Open Data Commons Database Contents License (DbCL) v1.0 and Open Data Commons Open Database License (ODbL) .

import pandas as pd

data = pd.read_csv('cars.csv')

First Look#

Basic Information#

Print the following information about the data frame data:

  • first 10 rows,

  • number of rows,

  • basic statistical information,

  • column labels, data types, memory usage.

Solution:

# your solution

Missing Values#

Are there missing values in data?

Solution:

# your answer

Value Counts#

Use DataFrame.nunique to get the number of different values per column.

Solution:

# your solution

Unique Car Models#

Use DataFrame.value_counts to get the number of unique 'name'-'year' combinations.

Solution:

# your solution

Restructure Columns#

New Columns#

Append a column 'manual_trans' containing True where column 'transmission' shows 'Manual', else False.

Append a column 'age' showing a car’s age (now minus 'year').

Solution:

# your solution

Remove Columns#

Remove columns 'seller_type', 'transmission', and 'owner'.

Solution:

# your solution

Mean Price#

Series with String Index#

Create a Pandas series price with column 'name' as index and column 'selling_price' as data.

Solution:

# your solution

Mean#

Calculate mean price for model 'Maruti Swift Dzire VDI'.

Solution:

# your solution

Kilometers per Year#

Boolean Indexing#

Use boolean row indexing to get a data frame one_model with columns 'km_driven' and 'age' containing only rows with 'name' equal to 'Maruti Swift Dzire VDI'.

Solution:

# your solution

New Column#

Add a column 'km_per_year' to the one_model data frame containing kilometers per year.

Solution:

# your solution

Mean#

Get the mean of column 'km_per_year' in one_model.

Solution:

# your solution

Oldest Car#

Find the oldest car in data and print its name and manufacturing year. Have a look at Pandas’ documentation for suitable functions.

Solution:

# your solution