Load QMNIST
Contents
Load QMNIST#
In this project we develop a Python module for loading and preprocessing QMNIST images and metadata. Prerequisites:
Reading Data#
Task: Get QMNIST training and test data from QMNIST GitHub repository (4 files ending with ...idx3-ubyte.gz
or ...idx2-int.gz
) and find information on the file format.
Task: Write a function load
which reads images and metadata from the QMNIST files. Parameters:
path
: defaulting to''
, path of directory with data files.subset
: defaulting to'train'
(load training data), passing'test'
loads test data.as_list
: defaulting toFalse
(return one large array), passingTrue
returns a list of images.
Return values:
NumPy array of shape
(60000, 28, 28)
or list of 60000 NumPy arrays of shape(28, 28)
(range 0…1, typefloat16
), depending on parameteras_list
.NumPy array of shape
(60000, )
containing classes (typeuint8
).NumPy array of shape
(60000, )
containing series IDs (typeuint8
).NumPy array of shape
(60000, )
containing writer IDs (typeuint16
).
Test your function and show first and last images of training and test data. Print corresponding metainformation. You may use the code from Image Processing with NumPy to show images.
Hint
Going the obvious path via zipfile
module and np.fromfile
fails due to two problems:
Python’s
zipfile
module has some trouble reading the QMNIST files. Try thegzip
module from Python’s standard library instead.NumPy’s
fromfile
is not compatible with file objects created by thegzip
module. Thefromfile
function will read compressed instead of uncompressed data (for some very knotty technical reasons). Thus, read with the file object’sread
method and usenp.frombuffer
.
Solution:
# your solution
Preprocessing#
Before images can be used preprocessing steps might be appropriate. Given a list of preprocessing steps we would like to have a function which applies all the steps to all images.
Task: Write a function preprocess
which applies a list of preprocessing steps to all images. Parameters:
images
: large NumPy array or list of arrays (images to be processed).steps
: list of functions; each function takes an image and returns an image.as_list
:False
(default) returns images in large array (and fails if image sizes differ after applying preprocessing steps);True
returns list of images.
Return values:
list of processed images or large array of images, depening an parameter
as_list
.
Test your code with two preprocessing steps:
horizontal mirrowing,
color inversion (black to white, white to black).
Solution:
# your solution
Python Module#
Task: Create a Python module qmnist.py
providing both functions load
and preprocess
.
Solution:
# your solution