Implementing ResNet CNN

A residual neural network also referred to as a ResNet, is a deep learning architecture where the layers learn residual functions based on the inputs from previous layers. It was developed in 2015 specifically for image recognition tasks and went on to win the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) that same year.

Why we need ResNet?

In recent years, neural networks have significantly increased in depth, with state-of-the-art architectures evolving from possessing only a few layers, to exceeding one hundred layers.

The primary advantage of employing a very deep network lies in its capacity to represent highly complex functions. Such networks are capable of learning features across multiple levels of abstraction. The shallower layers, situated closer to the input, are responsible for capturing fundamental elements such as edges. Conversely, the deeper layers, positioned nearer to the output, are adept at identifying intricate and sophisticated features.

The implementation of deeper neural networks does not invariably result in superior outcomes. A significant obstacle in training such architectures is the phenomenon known as vanishing gradients. In deep networks, the gradient signal can diminish rapidly, rendering gradient descent slow and less effective.

During the gradient descent methodology, backpropagation progresses from the final layer to the initial layer. At each iteration, the multiplication by the weight matrix has the potential to cause the gradient to diminish exponentially toward zero. However, in rare instances, the gradient may experience a significant increase, resulting in "exploding" gradients and leading to excessively large values.

Residual Network

In Residual Networks (ResNets), a "shortcut" or "skip connection" facilitates the bypassing of certain layers within the model architecture as shown below.

In a Residual Network (ResNet), two primary types of blocks are utilized. The selection between these blocks is based on whether the input and output dimensions are identical or vary:

Identity block
Convolutional block.

Lets Import the libraries

import tensorflow as tf
import numpy as np
import scipy.misc
from tensorflow.keras.applications.resnet_v2 import ResNet50V2
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet_v2 import preprocess_input, decode_predictions
from tensorflow.keras import layers
from tensorflow.keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.initializers import random_uniform, glorot_uniform, constant, identity
from tensorflow.python.framework.ops import EagerTensor
from matplotlib.pyplot import imshow
import h5py
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.pyplot import imread
%matplotlib inline
np.random.seed(1)
tf.random.set_seed(2)

Identity Block

Lets now create an Identity block , network is comprised of following blocks

block 1:

CONV2D
BatchNorm
ReLU activation function.

block 2:

CONV2D
BatchNorm
ReLU activation function.

block 3:

CONV2D
BatchNorm

Final block:

The residual and the output from the block 3 are added together
ReLU activation function

def residual_identity_block(X, f, filters, initializer=random_uniform):

    # Retrieve Filters
    F1, F2, F3 = filters
    # Save the input value. You'll need this later to add back to the main path. 
    X_shortcut = X
    #  Block 1
    X = Conv2D(filters = F1, kernel_size = 1, strides = (1,1), padding = 'valid', kernel_initializer = initializer(seed=0))(X)
    X = BatchNormalization(axis = 3)(X) # Default axis
    X = Activation('relu')(X)
    # Block 2
    X = Conv2D(filters = F2, kernel_size = (f, f), strides = (1,1), padding = 'same', kernel_initializer = initializer(seed=0))(X)
    X = BatchNormalization(axis = 3)(X)
    X = Activation('relu')(X) 
    ## Block 3
    X = Conv2D(filters = F3, kernel_size = (1, 1), strides = (1,1), padding = 'valid', kernel_initializer = initializer(seed=0))(X)
    X = BatchNormalization(axis = 3)(X)
    ## Final Block
    X = Add()([X_shortcut,X])
    X = Activation('relu')(X)

    return X

Convolutional Block

Lets now create an convolutional block , network is comprised of following blocks, the block shown on shortcut path consist of Conv2D and BatchNorm to match input and output dimensions.

Block 1:

CONV2D
BatchNorm
ReLU activation function.

Block 2:

CONV2D
BatchNorm
ReLU activation function.

Block 3:

CONV2D
BatchNorm

Shortcut path: When the input and output dimensions don't match up we add a CONV2D layer in the shortcut path to match the shape.

CONV2D
BatchNorm

Final block:

The shortcut and the output from the block 3 are added together
ReLU activation function


def convolutional_block(X, f, filters, s = 2, initializer=glorot_uniform):

    # Retrieve Filters
    F1, F2, F3 = filters
    # Save the input value
    residual = X
    # Block 1
    X = Conv2D(filters = F1, kernel_size = 1, strides = (s, s), padding='valid', kernel_initializer = initializer(seed=0))(X)
    X =layers.BatchNormalization(axis=3)(X)
    X = Activation('relu')(X)
    ## Block 2
    X = Conv2D(filters = F2, kernel_size = f,strides = (1, 1),padding='same',kernel_initializer = initializer(seed=0))(X)
    X =layers.BatchNormalization(axis=3)(X)
    X = Activation('relu')(X)
    ## Block 3  
    X = Conv2D(filters = F3, kernel_size = 1, strides = (1, 1), padding='valid', kernel_initializer = initializer(seed=0))(X)
    X =layers.BatchNormalization(axis=3)(X)
    # shortcut 
    residual = Conv2D(filters = F3, kernel_size = 1, strides = (s, s), padding='valid', kernel_initializer = initializer(seed=0))(residual)
    residual =layers.BatchNormalization(axis=3)(residual)
    # Final Block
    X = Add()([X, residual])
    X = Activation('relu')(X)

    return X

Now we will create a ResNet with following architecture

Conv2D -> batchNorm -> Relu -> MaxPool -> Conv block -> Identity block * 2 -> Conv block -> Identity block * 3 -> Conv block -> Identity block * 5 -> Conv block -> Identity block * 2 -> AVGPOOL -> Flatten -> Dence


def ResNet(input_shape = (64, 64, 3), classes = 6, training=False):

    X_input = Input(input_shape)
    # Zero-Padding
    X = ZeroPadding2D((3, 3))(X_input)

    # Stage 1
    X = Conv2D(64, (7, 7), strides = (2, 2), kernel_initializer = glorot_uniform(seed=0))(X)
    X =layers.BatchNormalization(axis=3)(X)
    X = Activation('relu')(X)
    X = MaxPooling2D((3, 3), strides=(2, 2))(X)

    # Stage 2
    X = convolutional_block(X, f = 3, filters = [64, 64, 256], s = 1)
    X = residual_identity_block(X, 3, [64, 64, 256])
    X = residual_identity_block(X, 3, [64, 64, 256])

    ## Stage 3 
    X = convolutional_block(X, f = 3, filters = [128,128,512], s = 2)
    X = residual_identity_block(X, 3,  [128,128,512])
    X = residual_identity_block(X, 3,  [128,128,512])
    X = residual_identity_block(X, 3,  [128,128,512])

    # Stage 4 
    X = convolutional_block(X, f = 3, filters = [256, 256, 1024], s = 2)
    X = residual_identity_block(X, 3, [256, 256, 1024])
    X = residual_identity_block(X, 3, [256, 256, 1024])
    X = residual_identity_block(X, 3, [256, 256, 1024])
    X = residual_identity_block(X, 3, [256, 256, 1024])
    X = residual_identity_block(X, 3, [256, 256, 1024])

    # Stage 5 
    X = convolutional_block(X, f = 3, filters = [512, 512, 2048], s = 2)
    X = residual_identity_block(X, 3, [512, 512, 2048])
    X = residual_identity_block(X, 3, [512, 512, 2048])
    # AVGPOOL 
    X = AveragePooling2D((2, 2))(X)
    # output layer
    X = Flatten()(X)
    X = Dense(classes, activation='softmax', kernel_initializer = glorot_uniform(seed=0))(X)
    # Create model
    model = Model(inputs = X_input, outputs = X)

    return model

Now we compile the model

np.random.seed(1)
tf.random.set_seed(2)
opt = tf.keras.optimizers.Adam(learning_rate=0.00015)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

Lets define a function to load the dataset

def load_signs_dataset():
    # loading training examples 
    train_dataset = h5py.File('datasets/train_signs.h5', "r")
    # your train set features
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) 
    # your train set labels
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) 
     # loading test examples 
    test_dataset = h5py.File('datasets/test_signs.h5', "r")
    # your test set features
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) 
    # your test set labels
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) 
    classes = np.array(test_dataset["list_classes"][:]) # the list of classes
    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))

    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

Implement one hot encoding for Y labels

def convert_to_one_hot(Y, C):
    Y = np.eye(C)[Y.reshape(-1)].T
    return Y

Loading Dataset

X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_signs_dataset()
# Normalize image vectors
X_train = X_train_orig / 255.
X_test = X_test_orig / 255.
# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T

Train the model with 100 epochs with a batch size of 32

history=model.fit(X_train, Y_train, epochs = 100, batch_size = 32,validation_split=0.2)

Epoch 94/100
27/27 [==============================] - 8s 313ms/step - loss: 1.8774e-05 - accuracy: 1.0000 - val_loss: 0.2275 - val_accuracy: 0.9537
Epoch 95/100
27/27 [==============================] - 9s 320ms/step - loss: 1.5955e-05 - accuracy: 1.0000 - val_loss: 0.2276 - val_accuracy: 0.9537
Epoch 96/100
27/27 [==============================] - 8s 315ms/step - loss: 9.4384e-06 - accuracy: 1.0000 - val_loss: 0.2269 - val_accuracy: 0.9537
Epoch 97/100
27/27 [==============================] - 8s 311ms/step - loss: 1.9851e-05 - accuracy: 1.0000 - val_loss: 0.2266 - val_accuracy: 0.9537
Epoch 98/100
27/27 [==============================] - 8s 314ms/step - loss: 2.3506e-05 - accuracy: 1.0000 - val_loss: 0.2283 - val_accuracy: 0.9537
Epoch 99/100
27/27 [==============================] - 8s 303ms/step - loss: 1.6656e-05 - accuracy: 1.0000 - val_loss: 0.2291 - val_accuracy: 0.9537
Epoch 100/100
27/27 [==============================] - 8s 310ms/step - loss: 9.8548e-06 - accuracy: 1.0000 - val_loss: 0.2282 - val_accuracy: 0.9537

We can see with ResNet CNN we get an validation accuracy of 95 %

Lets plot loss and accuracy

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])

plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')

plt.legend(['training data', 'validation data'], loc = 'lower right')

# Plotting for loss fucntion of validation data and training data against each epoch
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['training data', 'validation data'], loc = 'upper right')

Lets evaluate model on test Data

model.evaluate(X_test,Y_test)
4/4 [==============================] - 0s 67ms/step - loss: 0.2286 - accuracy: 0.9500
[0.22860988974571228, 0.949999988079071]