The xMNIST Family of Data Sets#

In the project we have a first look at the MNIST data set and related data sets. In subsequent projects we’ll use these data sets for training machine learning models.

A major benefit from the project is, that we see how difficult data preparation can be. As we’ll learn later on, obtaining unbiased data is extremely important for training machine learning algorithms.

NIST special database 19#

Task: Learn about NIST special data base 19 from

Answer the following questions:

  • Who collected the data?

  • What are the conditions for using the data set?

  • Who wrote the characters and digits?

  • How many images are in the data set?

  • How much disk space is needed?

Solution:

# your answers

MNIST#

Task: Learn about MNIST data set from

Answer the following questions:

  • Who collected the data?

  • What are the conditions for using the data set?

  • How many images are in the data set?

  • What subset of symbols is shown on the images?

  • What’s the size of the images?

  • How much disk space is needed?

  • What preprocessing steps were done?

  • What’s the up to now best error rate for digit recognition based on MNIST?

Solution:

# your answers

QMNIST#

Task: Learn about QMNIST data set from

Answer the following questions:

  • Who collected the data?

  • What are the conditions for using the data set?

  • How many images are in the data set?

  • What’s the size of the images?

  • How much disk space is needed?

  • What preprocessing steps were done?

  • Is QMNIST a superset of MNIST?

Solution:

# your answers

Task: Download the QMNIST data set.

EMNIST#

Task: Learn about EMNIST data set from

Answer the following questions:

  • Who collected the data?

  • What are the conditions for using the data set?

  • How many images are in the data set?

  • What’s the size of the images?

  • How much disk space is needed?

  • What preprocessing steps were done?

  • Why EMNIST was created?

Solution:

# your answers