Load QMNIST#

In this project we develop a Python module for loading and preprocessing QMNIST images and metadata. Prerequisites:

Reading Data#

Task: Get QMNIST training and test data from QMNIST GitHub repository (4 files ending with ...idx3-ubyte.gz or ...idx2-int.gz) and find information on the file format.

Task: Write a function load which reads images and metadata from the QMNIST files. Parameters:

  • path: defaulting to '', path of directory with data files.

  • subset: defaulting to 'train' (load training data), passing 'test' loads test data.

  • as_list: defaulting to False (return one large array), passing True returns a list of images.

Return values:

  • NumPy array of shape (60000, 28, 28) or list of 60000 NumPy arrays of shape (28, 28) (range 0…1, type float16), depending on parameter as_list.

  • NumPy array of shape (60000, ) containing classes (type uint8).

  • NumPy array of shape (60000, ) containing series IDs (type uint8).

  • NumPy array of shape (60000, ) containing writer IDs (type uint16).

Test your function and show first and last images of training and test data. Print corresponding metainformation. You may use the code from Image Processing with NumPy to show images.

Hint

Going the obvious path via zipfile module and np.fromfile fails due to two problems:

  1. Python’s zipfile module has some trouble reading the QMNIST files. Try the gzip module from Python’s standard library instead.

  2. NumPy’s fromfile is not compatible with file objects created by the gzip module. The fromfile function will read compressed instead of uncompressed data (for some very knotty technical reasons). Thus, read with the file object’s read method and use np.frombuffer.

Solution:

# your solution

Preprocessing#

Before images can be used preprocessing steps might be appropriate. Given a list of preprocessing steps we would like to have a function which applies all the steps to all images.

Task: Write a function preprocess which applies a list of preprocessing steps to all images. Parameters:

  • images: large NumPy array or list of arrays (images to be processed).

  • steps: list of functions; each function takes an image and returns an image.

  • as_list: False (default) returns images in large array (and fails if image sizes differ after applying preprocessing steps); True returns list of images.

Return values:

  • list of processed images or large array of images, depening an parameter as_list.

Test your code with two preprocessing steps:

  1. horizontal mirrowing,

  2. color inversion (black to white, white to black).

Solution:

# your solution

Python Module#

Task: Create a Python module qmnist.py providing both functions load and preprocess.

Solution:

# your solution