Implementing Convolutional Neural Network using TensorFlow

This article will outline the process of developing and training a Convolutional Neural Network (ConvNet) using TensorFlow for the purpose of addressing a multiclass classification task.

We will utilize Keras' versatile Functional API https://www.tensorflow.org/guide/keras/functional to construct a ConvNet capable of distinguishing between six sign language digits.

The Functional API is designed to accommodate models with non-linear structures, shared layers, and configurations with multiple inputs or outputs. In contrast to the Sequential API, which requires a linear progression through the layers, the Functional API offers significantly greater flexibility. While the Sequential model resembles a straight line, a Functional model behaves like a graph, allowing connections between the layers in numerous configurations.

In this implementation we will build Convolutional neural network of architecture shown below

\(figure1\)

Lets load the required libraries

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
from matplotlib.pyplot import imread
import scipy
from PIL import Image
import pandas as pd
import tensorflow as tf
import tensorflow.keras.layers as tfl
from tensorflow.python.framework import ops
%matplotlib inline
np.random.seed(1)

Load the signs dataset from h5py file from local

def load_signs_dataset():
    # loading training examples 
    train_dataset = h5py.File('datasets/train_signs.h5', "r")
    # your train set features
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) 
    # your train set labels
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) 
     # loading test examples 
    test_dataset = h5py.File('datasets/test_signs.h5', "r")
    # your test set features
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) 
    # your test set labels
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) 
    classes = np.array(test_dataset["list_classes"][:]) # the list of classes
    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_signs_dataset()

Shape of X( Features) and Y(labels) are

(1080, 64, 64, 3)
(1, 1080)

We will one hot encode the Y labels to convert all 6 sign digits in hot encoded way as shown below

If l Y label is sign digit 2 it will be converted to \(\\\begin{bmatrix} 0 & \\ 1 & \\0 \\0\\0\\0 \end{bmatrix}\\\), if label is sign digit 4 it will be converted to \(\\\begin{bmatrix} 0 & \\ 0 & \\0 \\1\\0\\0 \end{bmatrix}\\\)

def convert_to_one_hot(Y, C):
    Y = np.eye(C)[Y.reshape(-1)].T
    return Y
X_train = X_train_orig/255.
X_test = X_test_orig/255.
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T

Shape of Y_train after hot encoding is (1080, 6), where we have 1080 rows of samples and each Y label is converted to hot encoded into 6 columns of 0s and 1 as explained above.

Now we create a convolution neural network model of following architecture as was mentioned in the beginning of this article in \(figure1\)

`CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> DENSE`

Conv2D: This layer creates a convolution kernel that is convolved with the layer input over a single spatial (or temporal) dimension to produce a tensor of outputs.
MaxPool2D: Downsamples your input using a window of size (f, f) and strides of size (s, s) to carry out max pooling over each window. For max pooling, we usually operate on a single example at a time and a single channel at a time.
Flatten: Given a tensor "P", this function takes each training (or test) example in the batch and flattens it into a 1D vector.
Dense: Given the flattened input F, it returns the output computed using a fully connected layer

def convolutional_model(input_shape):
    input_img = tf.keras.Input(shape=input_shape)
    Z1 = tfl.Conv2D(filters= 8 , kernel_size= (4,4) , padding='same')(input_img)
    A1 = tfl.ReLU()(Z1)
    P1 = tfl.MaxPool2D(pool_size=(8, 8), strides=(8, 8), padding='same')(A1)
    Z2 = tfl.Conv2D(filters= 16, kernel_size= (2,2) , padding='same')(P1)
    A2 = tfl.ReLU()(Z2)
    P2 = tfl.MaxPool2D(pool_size=(4, 4), strides=(4, 4), padding='same')(A2)
    F = tfl.Flatten()(P2)
    outputs = tfl.Dense(units= 6 , activation='softmax')(F)
    model = tf.keras.Model(inputs=input_img, outputs=outputs)
    return model

Summary of the model is

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 64, 64, 3)]       0         

 conv2d (Conv2D)             (None, 64, 64, 8)         392       

 re_lu (ReLU)                (None, 64, 64, 8)         0         

 max_pooling2d (MaxPooling2  (None, 8, 8, 8)           0         
 D)                                                              

 conv2d_1 (Conv2D)           (None, 8, 8, 16)          528       

 re_lu_1 (ReLU)              (None, 8, 8, 16)          0         

 max_pooling2d_1 (MaxPoolin  (None, 2, 2, 16)          0         
 g2D)                                                            

 flatten (Flatten)           (None, 64)                0         

 dense (Dense)               (None, 6)                 390       

=================================================================
Total params: 1310 (5.12 KB)
Trainable params: 1310 (5.12 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Lets Train the Model in a mini batch of 64

train_dataset = tf.data.Dataset.from_tensor_slices((X_train, Y_train)).batch(64)
test_dataset = tf.data.Dataset.from_tensor_slices((X_test, Y_test)).batch(64)
history = conv_model.fit(train_dataset, epochs=100, validation_data=test_dataset)

Epoch 93/100
17/17 [==============================] - 0s 12ms/step - loss: 0.3921 - accuracy: 0.8907 - val_loss: 0.5238 - val_accuracy: 0.8250
Epoch 94/100
17/17 [==============================] - 0s 12ms/step - loss: 0.3883 - accuracy: 0.8917 - val_loss: 0.5210 - val_accuracy: 0.8333
Epoch 95/100
17/17 [==============================] - 0s 10ms/step - loss: 0.3845 - accuracy: 0.8917 - val_loss: 0.5181 - val_accuracy: 0.8333
Epoch 96/100
17/17 [==============================] - 0s 10ms/step - loss: 0.3807 - accuracy: 0.8944 - val_loss: 0.5153 - val_accuracy: 0.8250
Epoch 97/100
17/17 [==============================] - 0s 10ms/step - loss: 0.3775 - accuracy: 0.8944 - val_loss: 0.5129 - val_accuracy: 0.8333
Epoch 98/100
17/17 [==============================] - 0s 10ms/step - loss: 0.3739 - accuracy: 0.8954 - val_loss: 0.5099 - val_accuracy: 0.8333
Epoch 99/100
17/17 [==============================] - 0s 10ms/step - loss: 0.3708 - accuracy: 0.8972 - val_loss: 0.5076 - val_accuracy: 0.8333
Epoch 100/100
17/17 [==============================] - 0s 10ms/step - loss: 0.3672 - accuracy: 0.8981 - val_loss: 0.5047 - val_accuracy: 0.8250

We get validation accuracy of 82.5%, We can visualize loss and accuracy of training and validation set through history object which is the output of fit() function.

Lets evaluate model for test data

conv_model.evaluate(X_test,Y_test)

4/4 [==============================] - 0s 2ms/step - loss: 0.5047 - accuracy: 0.8250
[0.5047249794006348, 0.824999988079071]

we get accuracy of 83% on test data.