IBAN recognition#

We aim at recognizing handwritten IBANs. We first train a CNN an QMNIST for detecting single handwritten digits. Then we use the CNN to recognize handwritten IBANs.

Task: Train and evaluate a CNN with log loss and softmax activation for classifying handwritten digits. Use the QMNIST data set for training and testing. Use Keras and Keras-Tuner.

Solution:

# your solution

Simple IBAN recognition#

We have a data set containing 10000 images of IBANs together with corresponding correct IBANs (strings). The data set only contains German IBANs of the form

DExxyyyyyyyyyyyyyyyyyy

with xx being a checksum (see below) and yyyyyyyyyyyyyyyyyy being 18 digits (0-9).

For our first attempt we ignore the checksum and try to recognize the IBAN digit by digit.

Each image has size 28x560 (20 images of size 28x28 placed next to each other) and does not contain the letters DE. Each 28x28 box contains exactly one 20x20 digit from the QMNIST test set randomly positioned in the box.

Task: Load the IBAN data set. Show an IBAN image and the corresponding correct IBAN.

# your solution

Task: Write a function get_iban_simple which takes an IBAN image and returns the IBAN as string (including DE).

# your solution

Task: Convert all IBAN images to strings and calculate the correct classification rate.

# your solution

Task: Based on the correct classification rate on the QMNIST test set calculate the probability that an IBAN is correctly recognized.

# your solution

Task: Based on the correct classification rate on the QMNIST test set calculate the probability that a recognized IBAN has at most one wrong digit.

# your solution

Task: Calculate the probability that a recognized IBAN has at most two wrong digits.

# your solution

IBAN recognition with checksum check#

The third and fourth digit of an IBAN is a checksum. The checksum allows to detect common typos (missing digits, interchanged digits, and others).

Task: Find out how to validate IBANs. For instance, have a look at Wikipedia on IBANs. Then write a function is_iban which takes an IBAN string and returns True or False depending on the validity of the IBAN.

# your solution

If exactly one digit of an IBAN is incorrect, then the check sum check is guaranteed to fail. For two incorrect digits, the check almost always fails.

Task: Write a function get_iban which takes an IBAN image and returns the IBAN as string (including DE). The returned IBAN should be valid. If the first attempt yields an invalid IBAN use probabilities returned by the model to determine other IBANs. Proceed as follows:

  • Generate a list of all IBANs which can be derived from the original one by replacing one or two digits.

  • Calculate probabilities for all generated IBANs.

  • Sort IBANs by probability.

  • Check IBANs starting with the most probable one.

Provide the IBAN’s probability as second return value of get_iban. Before you start: How many alternative IBANs will be generated in case of an invalid first attempt?

# your solution

Task: Recognize all IBANs and calculate correct classification rate.

# your solution

Task: Plot histograms of probabilities for correctly classified and for incorrectly classified IBANs. Use logarithmic binning.

# your solution

From the histograms we see, that it’s not (!) a goog idea to look at the probabilities for deciding whether an IBAN is correctly recognized or not. There are correct IBANs with very small probability and incorrect IBANs with probability close to 1.

To further improve IBAN recognition one could use other checksums described in national IBAN specifications. For German IBANs there are separate checksum for routing number and account number.

Task: Visualize all incorrectly classified IBANs together with true and recognized IBANs.

# your solution