This article will outline the process of developing and training a Convolutional Neural Network (ConvNet) using TensorFlow for the purpose of addressing a multiclass classification task.
We will utilize Keras' versatile Functional API https://www.tensorflow.org/guide/keras/functional to construct a ConvNet capable of distinguishing between six sign language digits.
The Functional API is designed to accommodate models with non-linear structures, shared layers, and configurations with multiple inputs or outputs. In contrast to the Sequential API, which requires a linear progression through the layers, the Functional API offers significantly greater flexibility. While the Sequential model resembles a straight line, a Functional model behaves like a graph, allowing connections between the layers in numerous configurations.
In this implementation we will build Convolutional neural network of architecture shown below
\(figure1\)
Lets load the required libraries
import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
from matplotlib.pyplot import imread
import scipy
from PIL import Image
import pandas as pd
import tensorflow as tf
import tensorflow.keras.layers as tfl
from tensorflow.python.framework import ops
%matplotlib inline
np.random.seed(1)
Load the signs dataset from h5py file from local
def load_signs_dataset():
# loading training examples
train_dataset = h5py.File('datasets/train_signs.h5', "r")
# your train set features
train_set_x_orig = np.array(train_dataset["train_set_x"][:])
# your train set labels
train_set_y_orig = np.array(train_dataset["train_set_y"][:])
# loading test examples
test_dataset = h5py.File('datasets/test_signs.h5', "r")
# your test set features
test_set_x_orig = np.array(test_dataset["test_set_x"][:])
# your test set labels
test_set_y_orig = np.array(test_dataset["test_set_y"][:])
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_signs_dataset()
Shape of X( Features) and Y(labels) are
(1080, 64, 64, 3)
(1, 1080)
We will one hot encode the Y labels to convert all 6 sign digits in hot encoded way as shown below
If l Y label is sign digit 2 it will be converted to \(\\\begin{bmatrix} 0 & \\ 1 & \\0 \\0\\0\\0 \end{bmatrix}\\\), if label is sign digit 4 it will be converted to \(\\\begin{bmatrix} 0 & \\ 0 & \\0 \\1\\0\\0 \end{bmatrix}\\\)
def convert_to_one_hot(Y, C):
Y = np.eye(C)[Y.reshape(-1)].T
return Y
X_train = X_train_orig/255.
X_test = X_test_orig/255.
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T
Shape of Y_train after hot encoding is (1080, 6), where we have 1080 rows of samples and each Y label is converted to hot encoded into 6 columns of 0s and 1 as explained above.
Now we create a convolution neural network model of following architecture as was mentioned in the beginning of this article in \(figure1\)
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> DENSE
Conv2D: This layer creates a convolution kernel that is convolved with the layer input over a single spatial (or temporal) dimension to produce a tensor of outputs.
MaxPool2D: Downsamples your input using a window of size (f, f) and strides of size (s, s) to carry out max pooling over each window. For max pooling, we usually operate on a single example at a time and a single channel at a time.
Flatten: Given a tensor "P", this function takes each training (or test) example in the batch and flattens it into a 1D vector.
Dense: Given the flattened input F, it returns the output computed using a fully connected layer
def convolutional_model(input_shape):
input_img = tf.keras.Input(shape=input_shape)
Z1 = tfl.Conv2D(filters= 8 , kernel_size= (4,4) , padding='same')(input_img)
A1 = tfl.ReLU()(Z1)
P1 = tfl.MaxPool2D(pool_size=(8, 8), strides=(8, 8), padding='same')(A1)
Z2 = tfl.Conv2D(filters= 16, kernel_size= (2,2) , padding='same')(P1)
A2 = tfl.ReLU()(Z2)
P2 = tfl.MaxPool2D(pool_size=(4, 4), strides=(4, 4), padding='same')(A2)
F = tfl.Flatten()(P2)
outputs = tfl.Dense(units= 6 , activation='softmax')(F)
model = tf.keras.Model(inputs=input_img, outputs=outputs)
return model
Summary of the model is
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 64, 64, 3)] 0
conv2d (Conv2D) (None, 64, 64, 8) 392
re_lu (ReLU) (None, 64, 64, 8) 0
max_pooling2d (MaxPooling2 (None, 8, 8, 8) 0
D)
conv2d_1 (Conv2D) (None, 8, 8, 16) 528
re_lu_1 (ReLU) (None, 8, 8, 16) 0
max_pooling2d_1 (MaxPoolin (None, 2, 2, 16) 0
g2D)
flatten (Flatten) (None, 64) 0
dense (Dense) (None, 6) 390
=================================================================
Total params: 1310 (5.12 KB)
Trainable params: 1310 (5.12 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Lets Train the Model in a mini batch of 64
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, Y_train)).batch(64)
test_dataset = tf.data.Dataset.from_tensor_slices((X_test, Y_test)).batch(64)
history = conv_model.fit(train_dataset, epochs=100, validation_data=test_dataset)
Epoch 93/100
17/17 [==============================] - 0s 12ms/step - loss: 0.3921 - accuracy: 0.8907 - val_loss: 0.5238 - val_accuracy: 0.8250
Epoch 94/100
17/17 [==============================] - 0s 12ms/step - loss: 0.3883 - accuracy: 0.8917 - val_loss: 0.5210 - val_accuracy: 0.8333
Epoch 95/100
17/17 [==============================] - 0s 10ms/step - loss: 0.3845 - accuracy: 0.8917 - val_loss: 0.5181 - val_accuracy: 0.8333
Epoch 96/100
17/17 [==============================] - 0s 10ms/step - loss: 0.3807 - accuracy: 0.8944 - val_loss: 0.5153 - val_accuracy: 0.8250
Epoch 97/100
17/17 [==============================] - 0s 10ms/step - loss: 0.3775 - accuracy: 0.8944 - val_loss: 0.5129 - val_accuracy: 0.8333
Epoch 98/100
17/17 [==============================] - 0s 10ms/step - loss: 0.3739 - accuracy: 0.8954 - val_loss: 0.5099 - val_accuracy: 0.8333
Epoch 99/100
17/17 [==============================] - 0s 10ms/step - loss: 0.3708 - accuracy: 0.8972 - val_loss: 0.5076 - val_accuracy: 0.8333
Epoch 100/100
17/17 [==============================] - 0s 10ms/step - loss: 0.3672 - accuracy: 0.8981 - val_loss: 0.5047 - val_accuracy: 0.8250
We get validation accuracy of 82.5%, We can visualize loss and accuracy of training and validation set through history object which is the output of fit() function.
Lets evaluate model for test data
conv_model.evaluate(X_test,Y_test)
4/4 [==============================] - 0s 2ms/step - loss: 0.5047 - accuracy: 0.8250
[0.5047249794006348, 0.824999988079071]
we get accuracy of 83% on test data.