The xMNIST Family of Data Sets
Contents
The xMNIST Family of Data Sets#
In the project we have a first look at the MNIST data set and related data sets. In subsequent projects we’ll use these data sets for training machine learning models.
A major benefit from the project is, that we see how difficult data preparation can be. As we’ll learn later on, obtaining unbiased data is extremely important for training machine learning algorithms.
NIST special database 19#
Task: Learn about NIST special data base 19 from
NIST Special Database 19, Handprinted Forms and Characters Database (sections 1 and 2)
Answer the following questions:
Who collected the data?
What are the conditions for using the data set?
Who wrote the characters and digits?
How many images are in the data set?
How much disk space is needed?
Solution:
# your answers
MNIST#
Task: Learn about MNIST data set from
Answer the following questions:
Who collected the data?
What are the conditions for using the data set?
How many images are in the data set?
What subset of symbols is shown on the images?
What’s the size of the images?
How much disk space is needed?
What preprocessing steps were done?
What’s the up to now best error rate for digit recognition based on MNIST?
Solution:
# your answers
QMNIST#
Task: Learn about QMNIST data set from
Answer the following questions:
Who collected the data?
What are the conditions for using the data set?
How many images are in the data set?
What’s the size of the images?
How much disk space is needed?
What preprocessing steps were done?
Is QMNIST a superset of MNIST?
Solution:
# your answers
Task: Download the QMNIST data set.
EMNIST#
Task: Learn about EMNIST data set from
EMNIST: an extension of MNIST to handwritten letters (section I and subsections A, B, C of section II)
Answer the following questions:
Who collected the data?
What are the conditions for using the data set?
How many images are in the data set?
What’s the size of the images?
How much disk space is needed?
What preprocessing steps were done?
Why EMNIST was created?
Solution:
# your answers