ANNs with Keras
Contents
ANNs with Keras#
To represent complex hypotheses with ANNs we need thousands or even millions of artificial neurons. ANNs with few large layers turned out be less effective than ANNs with many smaller layers. The latter are referred to as deep networks and their usage is known as deep learning.
Training ANNs requires lots of computation time for large data sets and large networks. Thus, we need efficient implementations of learning procedures and powerful hardware. Scikit-Learn aims at educational projects and offers a wide scale of machine learning methods. Implementation is less optimized for execution speed than for providing insight into the algorithms and providing access to all the parameters and intermediate results. Further, Scikit-Learn does not use all features of modern hardware.
Libraries for high-performance machine learning have to be more specialized on specific tasks to allow for optimizing efficiency. Keras is an open source library for implementing and training ANNs. Like Scikit-Learn it provides preprocessing routines as well as postprocessing (optimizing hyperparameters). But Keras utilizes full computation power of modern computers for training ANNs, leading to much shorter training times.
Modern computers have multi-core CPUs. So they can process several programs in parallel. In addition, almost all computers have a powerful GPU (graphics processing unit). It’s like a second CPU specialized at doing floating point computations for rendering 3d graphics. GPUs are much more suited for training ANNs than CPUs, because they are designed to work with many large matrices of floating point numbers in parallel. Nowadays GPUs can be accessed by software developers relatively easily. Thus, we may run programs on the GPU instead of the CPU.
Keras seamlessly integrates GPU power for ANN training into Python. We do not have to care about the details. Keras piggybacks on an open source library called TensorFlow developed by Google. Keras does much of the work for us, but from time to time TensorFlow will show up, too. Keras started independently from TensorFlow, then integrated support for TensorFlow, and now is distributed as a module in the TensorFlow Python package.
import tensorflow.keras as keras
2023-04-25 10:26:31.847362: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
An ANN for Handwritten Digit Recognition#
To demonstrate usage of Keras we implement and train a layered feedforward ANN to classify handwritten digits from the QMNIST data set. Inputs to the ANN are images of size 28x28. Thus, the feature space has dimension 784. Outputs are 10 real numbers in \([0,1]\). Each number represents the probability that the image shows the corresponding digit.
We could also use an ANN with only one output and require that this output is the digit, that is, it has range \([0,9]\). But how to interpret an output of 3.5? It suggests that the ANN cannot decide between 3 and 4. Or it might waffle on 2 and 5. Using only one output we would introduce artificial order and, thus, wrong similarity assumptions. From the view of similarity of shape (and only that matters in digit recognition), 3 and 8 are more close to each other than 7 and 8 are. Using one output per figure we avoid artificial assumptions and get more precise information on possible missclassifications. Images with high outputs for both 1 and 7 could be marked for subsequent review by a human, for example
Loading Data#
For loading QMNIST data we may reuse code from Load QMNIST project. We have to load training data and test data, both consisting of 60000 images and corresponding labels.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import qmnist
train_images, train_labels, _, _ = qmnist.load('../../../../datasets/qmnist/', subset='train')
test_images, test_labels, _, _ = qmnist.load('../../../../datasets/qmnist/', subset='test')
train_images.shape, test_images.shape, train_labels.shape, test_labels.shape
((60000, 28, 28), (60000, 28, 28), (60000,), (60000,))
train_images[0, :, :].min(), train_images[0, :, :].max()
(0.0, 1.0)
For visualization of data we use a gray scale with black at smallest value and white at highest value.
def show_image(img):
fig, ax = plt.subplots(figsize=(2, 2))
ax.imshow(img, vmin=0, vmax=1, cmap='gray')
ax.axis('off')
plt.show()
idx = 123
show_image(train_images[idx, :, :])
print('label:', train_labels[idx])
label: 7
Preprocessing#
Input to an ANN should be standardized or normalized. QMNIST images have range \([0, 1]\). That’s okay.
Optionally, we may center the images with respect to a figure’s bounding box. Without this step the center of mass is identical to the image center (we may reuse code from Image Processing with NumPy). As a by-product of centering bounding boxes each image will have 4 unused pixels at the boundary. Thus, we may crop images to 20x20 pixels without loss of information (resulting in 400 instead of 784 features).
def auto_crop(img):
# binarize image
mask = img > 0
# whole image black?
if not mask.any():
return np.array([])
# get top and bottom index of bounding box
row_mask = mask.any(axis=1)
top = np.argmax(row_mask)
bottom = row_mask.size - np.argmax(row_mask[::-1]) # bottom index + 1
# get left and right index of bounding box
col_mask = mask[top:bottom, :].any(axis=0) # [top:bottom, :] for efficiency only
left = np.argmax(col_mask)
right = col_mask.size - np.argmax(col_mask[::-1]) # right index + 1
# crop
return img[top:bottom, left:right].copy()
def center(img, n):
# check image size
if np.max(img.shape) > n:
print('n too small! Cropping image.')
img = img[0:np.minimum(n, img.shape[0]), 0:np.minimum(n, img.shape[1])]
# calculate margin width
top_margin = (n - img.shape[0]) // 2
left_margin = (n - img.shape[1]) // 2
# create image
img_new = np.zeros((n, n), dtype=img.dtype)
img_new[top_margin:(top_margin + img.shape[0]),
left_margin:(left_margin + img.shape[1])] = img
return img_new
train_images = qmnist.preprocess(train_images, [auto_crop, lambda img: center(img, 20)])
test_images = qmnist.preprocess(test_images, [auto_crop, lambda img: center(img, 20)])
idx = 123
show_image(train_images[idx, :, :])
print('label:', train_labels[idx])
label: 7
Training labels have to be one-hot encoded. This can be done manually with NumPy or automatically with Pandas or Scikit-Learn. Also Keras provides a function for one-hot encoding: to_categorical
.
train_labels = keras.utils.to_categorical(train_labels)
test_labels = keras.utils.to_categorical(test_labels)
train_labels.shape, test_labels.shape
((60000, 10), (60000, 10))
Defining the ANN#
Keras has a Model
class representing a directed graph of layers of neurons. At the moment we content ourselves with simple network structures, that is, we have a sequence of layers. For such simple structures Keras has the Sequential
class. That class represents a stack of layers of neurons. It’s a subclass of Model
.
A layer is represented by one of several layer classes in Keras. For a fully connected feedforward ANN we need an Input
layer and several Dense
layers. Layers can be added one by one with Sequential.add
.
Input
layers accept multi-dimensional inputs. Thus, we do not have to convert the 20x20 images to vectors with 400 components. But Dense
layers want to have one-dimensional input. Thus, we use a Flatten layer. Like the Input
layer that’s not a layer of neurons. Layers in Keras have to be understood as transformations taking some input and yielding some output. For Dense
layers we need to specify the number of neurons and the activation function to use. There are several pre-defined activation functions in Keras.
Layers may have a name, which will help accessing single layers for analysis of a trained model. If we do not specify layer names, Keras generates them automatically.
model = keras.Sequential()
model.add(keras.Input(shape=(20, 20)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(10, activation='relu', name='dense1'))
model.add(keras.layers.Dense(10, activation='relu', name='dense2'))
2023-04-25 10:26:40.248000: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2023-04-25 10:26:40.248032: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: WHZ-46349
2023-04-25 10:26:40.248039: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: WHZ-46349
2023-04-25 10:26:40.248184: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 470.161.3
2023-04-25 10:26:40.248204: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 470.161.3
2023-04-25 10:26:40.248209: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 470.161.3
2023-04-25 10:26:40.248864: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
The output layer is a Dense
layer with 10 neurons. Because all outputs shall have range [0,1] we use the sigmoid function.
model.add(keras.layers.Dense(10, activation='sigmoid', name='out'))
Output shape can be accessed via corresponding member variable:
model.output_shape
(None, 10)
Almost always the first dimension of input or output shapes is the batch size for mini-batch training in Keras. None
is used, if there is no fixed batch size.
More detailed information about the constructed ANN is provided by Sequential.summary
:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 400) 0
dense1 (Dense) (None, 10) 4010
dense2 (Dense) (None, 10) 110
out (Dense) (None, 10) 110
=================================================================
Total params: 4,230
Trainable params: 4,230
Non-trainable params: 0
_________________________________________________________________
Training the ANN#
Parameters for training are set with Model.compile
. Here we may define an optimization routine. Next to gradient descent there are several other optimizers available. The optimizer can be passed by name (as string) or we may create a Python object of the respective optimizer class. The latter allows for custom parameter choice.
Next to the optimizer we have to provide a loss function. Again we may pass a string or an object. Because we have a classification problem we may use log loss.
If we want to validate the model during training, we may pass validation metrics to compile
. Then output during training includes updated values for the validation metrics on training and validation data. For classification we may use accuracy score. Again, metrics can be passsed by name or as an object. Since we might wish to compute several different metrics, the metrics
argument expects a list.
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['categorical_accuracy'])
Now the model is ready for training. In Keras training is done by calling Model.fit
. We may specify the batch size for mini-batch training and the number of epochs. An epoch is a sequence of iterations required to cycle through the training data once. Small batch sizes require more iterations per epoch, large batch sizes require fewer iterations. In full-batch training epochs and iterations are equivalent. We may also specify validation data (directly or as fraction of the training data) to get validation metrics after each epoch. Thus, we can see wether the model overfits the data during training and abort training if necessary. The return value of fit
will be discussed below.
history = model.fit(train_images, train_labels, batch_size=100, epochs=5, validation_split=0.2)
Epoch 1/5
480/480 [==============================] - 1s 2ms/step - loss: 2.0348 - categorical_accuracy: 0.3197 - val_loss: 1.6531 - val_categorical_accuracy: 0.4880
Epoch 2/5
480/480 [==============================] - 1s 1ms/step - loss: 1.2421 - categorical_accuracy: 0.6128 - val_loss: 0.8575 - val_categorical_accuracy: 0.7488
Epoch 3/5
480/480 [==============================] - 1s 1ms/step - loss: 0.7235 - categorical_accuracy: 0.7898 - val_loss: 0.5639 - val_categorical_accuracy: 0.8444
Epoch 4/5
480/480 [==============================] - 1s 1ms/step - loss: 0.5463 - categorical_accuracy: 0.8456 - val_loss: 0.4627 - val_categorical_accuracy: 0.8720
Epoch 5/5
480/480 [==============================] - 1s 1ms/step - loss: 0.4769 - categorical_accuracy: 0.8661 - val_loss: 0.4172 - val_categorical_accuracy: 0.8847
Hint
Note that validation accuracy displayed by Keras sometimes is higher than training accuracy. The reason is that train accuracy is the mean over all iterations of an epoche, but validation accuracy is calulated only at the end of an epoche. Thus, training accuracy includes poorer accuracy values from beginning of an epoche.
Incremental Training#
The Model.fit
method returns a History
object containing information about loss and metrics for each training epoch. The object has a dict member history
containing losses and metrics. Loss keys are loss
and val_loss
for training and validation, respectively. Metrics keys depend an the chosen metrics.
fig, ax = plt.subplots()
ax.plot(history.history['loss'], '-b', label='training loss')
ax.plot(history.history['val_loss'], '-r', label='validation loss')
ax.legend()
plt.show()
fig, ax = plt.subplots()
ax.plot(history.history['categorical_accuracy'], '-b', label='training accuracy')
ax.plot(history.history['val_categorical_accuracy'], '-r', label='validation accuracy')
ax.legend()
plt.show()
We see that further training could improve the model. Thus, we call fit
again. Training proceeds from where it has been stopped. We may execute corresponding code cell as often as we like to continue training. To keep the losses and metrics we append them to lists.
loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['categorical_accuracy']
val_acc = history.history['val_categorical_accuracy']
history = model.fit(train_images, train_labels, batch_size=100, epochs=5, validation_split=0.2)
loss.extend(history.history['loss'])
val_loss.extend(history.history['val_loss'])
acc.extend(history.history['categorical_accuracy'])
val_acc.extend(history.history['val_categorical_accuracy'])
Epoch 1/5
480/480 [==============================] - 1s 1ms/step - loss: 0.4419 - categorical_accuracy: 0.8773 - val_loss: 0.3943 - val_categorical_accuracy: 0.8908
Epoch 2/5
480/480 [==============================] - 1s 1ms/step - loss: 0.4203 - categorical_accuracy: 0.8845 - val_loss: 0.3797 - val_categorical_accuracy: 0.8942
Epoch 3/5
480/480 [==============================] - 1s 1ms/step - loss: 0.4045 - categorical_accuracy: 0.8891 - val_loss: 0.3674 - val_categorical_accuracy: 0.8992
Epoch 4/5
480/480 [==============================] - 1s 1ms/step - loss: 0.3920 - categorical_accuracy: 0.8930 - val_loss: 0.3564 - val_categorical_accuracy: 0.8999
Epoch 5/5
480/480 [==============================] - 1s 1ms/step - loss: 0.3813 - categorical_accuracy: 0.8954 - val_loss: 0.3486 - val_categorical_accuracy: 0.9016
fig, ax = plt.subplots()
ax.plot(loss, '-b', label='training loss')
ax.plot(val_loss, '-r', label='validation loss')
ax.legend()
plt.show()
fig, ax = plt.subplots()
ax.plot(acc, '-b', label='training accuracy')
ax.plot(val_acc, '-r', label='validation accuracy')
ax.legend()
plt.show()
Evaluation and Prediction#
To get loss and metrics on the test set call Model.evaluate
.
test_loss, test_metric = model.evaluate(test_images, test_labels)
test_loss, test_metric
1875/1875 [==============================] - 2s 968us/step - loss: 0.3572 - categorical_accuracy: 0.9006
(0.3571886718273163, 0.9005666375160217)
For predictions call Model.predict
.
test_pred = model.predict(test_images)
1875/1875 [==============================] - 1s 710us/step
Predictions are vectors of values from \([0, 1]\). A one indicates that the image shows the corresponding digit, a zero indicates that the digits is not shown in the image.
idx = 2
print('truth: ', test_labels[idx, :])
print('prediction:', test_pred[idx, :])
fig, ax = plt.subplots()
ax.plot(test_labels[idx, :], 'ob', label='truth')
ax.plot(test_pred[idx, :], 'or', markersize=4, label='prediction')
ax.legend()
plt.show()
truth: [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
prediction: [0.01495721 0.9989413 0.81571907 0.9306395 0.24000315 0.67344236
0.7693644 0.39637503 0.88925314 0.50644916]
To get more insight into the prediction accuracy we reverse one-hot encoding.
true_digits = test_labels.argmax(axis=1)
pred_digits = test_pred.argmax(axis=1)
# indices with wrong predictions
wrong_predictions = np.arange(0, true_digits.size)[true_digits != pred_digits]
print(wrong_predictions.size)
print(wrong_predictions)
5987
[ 8 18 33 ... 59974 59998 59999]
idx = 7
show_image(test_images[idx])
print('truth: {}, prediction: {}'.format(true_digits[idx], pred_digits[idx]))
truth: 9, prediction: 9
A confusion matrix depicts which digits are hard to separate for the ANN. The matrix is 10x10. The entry at row \(i\) and column \(j\) gives the number of images which show digit \(i\) (truth), but corresponding prediction is \(j\). Several Python modules provide functions for building a confusion matrix. Next to Scikit-Learn we may use Pandas for getting the matrix and Seaborn for plotting (pd.crosstab
, sns.heatmap
).
conf_matrix = pd.crosstab(pd.Series(true_digits, name='truth'),
pd.Series(pred_digits, name='prediction'))
print(conf_matrix)
prediction 0 1 2 3 4 5 6 7 8 9
truth
0 5591 3 35 26 41 71 74 8 97 6
1 1 6448 26 54 9 91 31 13 94 24
2 66 58 5374 94 54 29 136 66 139 10
3 23 22 147 5405 6 218 6 90 120 47
4 21 57 24 2 5183 8 132 2 33 318
5 46 38 43 253 125 4637 51 34 184 43
6 72 37 73 1 35 80 5618 0 41 0
7 6 41 41 28 31 16 1 5654 28 385
8 40 214 98 105 106 276 19 66 4883 83
9 20 33 3 79 150 43 1 200 86 5220
# scale values nonlinearly to get different colors at small values
scaled_conf_matrix = conf_matrix.apply(lambda x: x ** (1/2))
fig = plt.figure(figsize=(10, 8))
sns.heatmap(scaled_conf_matrix,
annot=conf_matrix, # use original matrix for labels
fmt='d', # format numbers as integers
cmap='hot', # color map
cbar_kws={'ticks': []}) # no ticks for colorbar (would have scaled labels)
plt.show()
Hyperparameter Optimization#
Keras itself offers no hyperparameter optimization routines. But there is the keras-tuner
module (import as keras_tuner
).
import keras_tuner
We first have to create a function which builds the model and returns a Model
instance. This function takes a HyperParameters
object as argument containing information about hyperparameters to optimize. The build function calls methods of the HyperParameters
object to get values from the current set of hyperparameters.
def build_model(hp):
model = keras.Sequential()
model.add(keras.Input(shape=(20, 20)))
model.add(keras.layers.Flatten())
layers = hp.Int('layers', 1, 3)
neurons_per_layer = hp.Int('neurons_per_layer', 10, 40, step=10)
for l in range(0, layers):
model.add(keras.layers.Dense(neurons_per_layer, activation='relu'))
model.add(keras.layers.Dense(10, activation='sigmoid'))
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['categorical_accuracy'])
return model
Now we create a Tuner
object and call its search
method. Several subclasses are available. RandomSearch
randomly selects sets of hyperparameters and trains the model for each set of hyperparameters. The constructor takes the model building function, an objective (string with objective name of one of the model’s metrics), and the maximum number of parameter sets to test. The search
function takes training and validation data in full analogy to fit
.
tuner = keras_tuner.tuners.randomsearch.RandomSearch(build_model, 'val_categorical_accuracy', 10)
tuner.search(train_images, train_labels, validation_split=0.2, epochs=10)
INFO:tensorflow:Reloading Tuner from ./untitled_project/tuner0.json
INFO:tensorflow:Oracle triggered exit
Here is a summary of all models considered during hyperparameter optimization:
tuner.results_summary()
Results summary
Results in ./untitled_project
Showing 10 best trials
Objective(name="val_categorical_accuracy", direction="max")
Trial 09 summary
Hyperparameters:
layers: 3
neurons_per_layer: 30
Score: 0.9556666612625122
Trial 02 summary
Hyperparameters:
layers: 2
neurons_per_layer: 30
Score: 0.9505000114440918
Trial 08 summary
Hyperparameters:
layers: 2
neurons_per_layer: 40
Score: 0.9503333568572998
Trial 05 summary
Hyperparameters:
layers: 2
neurons_per_layer: 20
Score: 0.9451666474342346
Trial 04 summary
Hyperparameters:
layers: 3
neurons_per_layer: 20
Score: 0.9445000290870667
Trial 01 summary
Hyperparameters:
layers: 1
neurons_per_layer: 40
Score: 0.9440000057220459
Trial 06 summary
Hyperparameters:
layers: 1
neurons_per_layer: 30
Score: 0.9401666522026062
Trial 07 summary
Hyperparameters:
layers: 1
neurons_per_layer: 20
Score: 0.9340833425521851
Trial 03 summary
Hyperparameters:
layers: 1
neurons_per_layer: 10
Score: 0.922249972820282
Trial 00 summary
Hyperparameters:
layers: 3
neurons_per_layer: 10
Score: 0.9195833206176758
To get the best model we may call Tuner.get_best_models
, which returns a sorted (best first) list of trained Model
instances. Alternatively, we may call Tuner.get_best_hyperparameters
returning a list of HyperParameter
objects of the best models. Based on the best hyperparameters we may train corresponding model on the full data set to improve results. Both methods take an argument specifying the number of models to return and defaulting to 1.
best_hp = tuner.get_best_hyperparameters()[0]
best_model = build_model(best_hp)
best_model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_1 (Flatten) (None, 400) 0
dense (Dense) (None, 30) 12030
dense_1 (Dense) (None, 30) 930
dense_2 (Dense) (None, 30) 930
dense_3 (Dense) (None, 10) 310
=================================================================
Total params: 14,200
Trainable params: 14,200
Non-trainable params: 0
_________________________________________________________________
best_model.fit(train_images, train_labels, epochs=10)
Epoch 1/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.7758 - categorical_accuracy: 0.7655
Epoch 2/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.3384 - categorical_accuracy: 0.9027
Epoch 3/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2765 - categorical_accuracy: 0.9202
Epoch 4/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2381 - categorical_accuracy: 0.9314
Epoch 5/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2127 - categorical_accuracy: 0.9384
Epoch 6/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1937 - categorical_accuracy: 0.9430
Epoch 7/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1786 - categorical_accuracy: 0.9478
Epoch 8/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1668 - categorical_accuracy: 0.9514
Epoch 9/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1571 - categorical_accuracy: 0.9545
Epoch 10/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1478 - categorical_accuracy: 0.9569
<keras.callbacks.History at 0x7f86bc2a9960>
test_loss, test_metric = best_model.evaluate(test_images, test_labels)
test_loss, test_metric
1875/1875 [==============================] - 2s 1000us/step - loss: 0.1729 - categorical_accuracy: 0.9482
(0.17285792529582977, 0.948199987411499)
Stopping Criteria#
So far we stopped training after a fixed number of epochs. But Keras also implements a mechanism for stopping training if loss or metrics stop improving. That mechanism is denoted as callbacks
. We simply have to create a suitable Callback
object and pass it to the fit method. Stopping criteria can be implemented with EarlyStopping
objects (it’s a subclass of Callback
). If we want to stop training if the validation loss starts to increase for at least 3 consecutive epochs, we have to pass monitor='val_loss'
, mode='min'
, patience=3
, restore_best_weights=True
. The last argument tells the fit
method to not return the final model, but the best model.
es = keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', patience=3, restore_best_weights=True)
best_model.fit(train_images, train_labels, validation_split=0.2, epochs=1000, callbacks=[es])
Epoch 1/1000
1500/1500 [==============================] - 2s 2ms/step - loss: 0.1421 - categorical_accuracy: 0.9586 - val_loss: 0.1316 - val_categorical_accuracy: 0.9625
Epoch 2/1000
1500/1500 [==============================] - 2s 1ms/step - loss: 0.1361 - categorical_accuracy: 0.9603 - val_loss: 0.1337 - val_categorical_accuracy: 0.9603
Epoch 3/1000
1500/1500 [==============================] - 2s 1ms/step - loss: 0.1307 - categorical_accuracy: 0.9615 - val_loss: 0.1365 - val_categorical_accuracy: 0.9598
Epoch 4/1000
1500/1500 [==============================] - 2s 1ms/step - loss: 0.1250 - categorical_accuracy: 0.9633 - val_loss: 0.1283 - val_categorical_accuracy: 0.9613
Epoch 5/1000
1500/1500 [==============================] - 2s 1ms/step - loss: 0.1196 - categorical_accuracy: 0.9649 - val_loss: 0.1277 - val_categorical_accuracy: 0.9620
Epoch 6/1000
1500/1500 [==============================] - 2s 1ms/step - loss: 0.1153 - categorical_accuracy: 0.9659 - val_loss: 0.1260 - val_categorical_accuracy: 0.9629
Epoch 7/1000
1500/1500 [==============================] - 2s 1ms/step - loss: 0.1112 - categorical_accuracy: 0.9673 - val_loss: 0.1286 - val_categorical_accuracy: 0.9628
Epoch 8/1000
1500/1500 [==============================] - 2s 1ms/step - loss: 0.1070 - categorical_accuracy: 0.9683 - val_loss: 0.1286 - val_categorical_accuracy: 0.9614
Epoch 9/1000
1500/1500 [==============================] - 2s 2ms/step - loss: 0.1032 - categorical_accuracy: 0.9697 - val_loss: 0.1299 - val_categorical_accuracy: 0.9618
<keras.callbacks.History at 0x7f86bc0224a0>
Resulting accuracy on test set:
test_loss, test_metric = best_model.evaluate(test_images, test_labels)
test_loss, test_metric
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1479 - categorical_accuracy: 0.9562
(0.14791476726531982, 0.9561833143234253)
Saving and Loading Models#
Keras models provide a save
to a model to a file. To load a model use load_model
.
best_model.save('keras_save_best_model')
WARNING:absl:Found untraced functions such as _update_step_xla while saving (showing 1 of 1). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: keras_save_best_model/assets
INFO:tensorflow:Assets written to: keras_save_best_model/assets
model = keras.models.load_model('keras_save_best_model')
Visualization of Training Progress#
TensorFlow comes with a visualization tool called TensorBoard. It uses a web interface for visualizing training dynamics and it can be integrated into Jupyter notebooks.
To use TensorBoard we have to pass a TensorBoard
callback to fit
. Corresponding constructor takes a path to a directory for storing temporary training data. Running
tensorboard --logdir=path/to/directory
in the terminal will show an URL to access the TensorBoard interface within a web browser.
To use TensorBoard inside a Jupyter Notebook, execute the magic commands
%load_ext tensorboard
%tensorboard --logdir path/to/directory
es = keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', patience=3, restore_best_weights=True)
tb = keras.callbacks.TensorBoard('tensorboard_data')
best_model.fit(train_images, train_labels, validation_split=0.2, epochs=1000, callbacks=[es, tb])
Epoch 1/1000
1500/1500 [==============================] - 2s 2ms/step - loss: 0.1106 - categorical_accuracy: 0.9669 - val_loss: 0.1313 - val_categorical_accuracy: 0.9596
Epoch 2/1000
1500/1500 [==============================] - 2s 2ms/step - loss: 0.1069 - categorical_accuracy: 0.9683 - val_loss: 0.1273 - val_categorical_accuracy: 0.9618
Epoch 3/1000
1500/1500 [==============================] - 2s 1ms/step - loss: 0.1036 - categorical_accuracy: 0.9689 - val_loss: 0.1255 - val_categorical_accuracy: 0.9626
Epoch 4/1000
1500/1500 [==============================] - 2s 1ms/step - loss: 0.0994 - categorical_accuracy: 0.9700 - val_loss: 0.1300 - val_categorical_accuracy: 0.9615
Epoch 5/1000
1500/1500 [==============================] - 2s 1ms/step - loss: 0.0970 - categorical_accuracy: 0.9715 - val_loss: 0.1274 - val_categorical_accuracy: 0.9628
Epoch 6/1000
1500/1500 [==============================] - 2s 1ms/step - loss: 0.0940 - categorical_accuracy: 0.9726 - val_loss: 0.1297 - val_categorical_accuracy: 0.9611
<keras.callbacks.History at 0x7f869ae5ae60>
%load_ext tensorboard
%tensorboard --logdir tensorboard_data