A residual neural network also referred to as a ResNet, is a deep learning architecture where the layers learn residual functions based on the inputs from previous layers. It was developed in 2015 specifically for image recognition tasks and went on to win the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) that same year.
Why we need ResNet?
In recent years, neural networks have significantly increased in depth, with state-of-the-art architectures evolving from possessing only a few layers, to exceeding one hundred layers.
The primary advantage of employing a very deep network lies in its capacity to represent highly complex functions. Such networks are capable of learning features across multiple levels of abstraction. The shallower layers, situated closer to the input, are responsible for capturing fundamental elements such as edges. Conversely, the deeper layers, positioned nearer to the output, are adept at identifying intricate and sophisticated features.
The implementation of deeper neural networks does not invariably result in superior outcomes. A significant obstacle in training such architectures is the phenomenon known as vanishing gradients. In deep networks, the gradient signal can diminish rapidly, rendering gradient descent slow and less effective.
During the gradient descent methodology, backpropagation progresses from the final layer to the initial layer. At each iteration, the multiplication by the weight matrix has the potential to cause the gradient to diminish exponentially toward zero. However, in rare instances, the gradient may experience a significant increase, resulting in "exploding" gradients and leading to excessively large values.
Residual Network
In Residual Networks (ResNets), a "shortcut" or "skip connection" facilitates the bypassing of certain layers within the model architecture as shown below.
In a Residual Network (ResNet), two primary types of blocks are utilized. The selection between these blocks is based on whether the input and output dimensions are identical or vary:
Identity block
Convolutional block.
Lets Import the libraries
import tensorflow as tf
import numpy as np
import scipy.misc
from tensorflow.keras.applications.resnet_v2 import ResNet50V2
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet_v2 import preprocess_input, decode_predictions
from tensorflow.keras import layers
from tensorflow.keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.initializers import random_uniform, glorot_uniform, constant, identity
from tensorflow.python.framework.ops import EagerTensor
from matplotlib.pyplot import imshow
import h5py
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.pyplot import imread
%matplotlib inline
np.random.seed(1)
tf.random.set_seed(2)
Identity Block
Lets now create an Identity block , network is comprised of following blocks
block 1:
CONV2D
BatchNorm
ReLU activation function.
block 2:
CONV2D
BatchNorm
ReLU activation function.
block 3:
CONV2D
BatchNorm
Final block:
The residual and the output from the block 3 are added together
ReLU activation function
def residual_identity_block(X, f, filters, initializer=random_uniform):
# Retrieve Filters
F1, F2, F3 = filters
# Save the input value. You'll need this later to add back to the main path.
X_shortcut = X
# Block 1
X = Conv2D(filters = F1, kernel_size = 1, strides = (1,1), padding = 'valid', kernel_initializer = initializer(seed=0))(X)
X = BatchNormalization(axis = 3)(X) # Default axis
X = Activation('relu')(X)
# Block 2
X = Conv2D(filters = F2, kernel_size = (f, f), strides = (1,1), padding = 'same', kernel_initializer = initializer(seed=0))(X)
X = BatchNormalization(axis = 3)(X)
X = Activation('relu')(X)
## Block 3
X = Conv2D(filters = F3, kernel_size = (1, 1), strides = (1,1), padding = 'valid', kernel_initializer = initializer(seed=0))(X)
X = BatchNormalization(axis = 3)(X)
## Final Block
X = Add()([X_shortcut,X])
X = Activation('relu')(X)
return X
Convolutional Block
Lets now create an convolutional block , network is comprised of following blocks, the block shown on shortcut path consist of Conv2D and BatchNorm to match input and output dimensions.
Block 1:
CONV2D
BatchNorm
ReLU activation function.
Block 2:
CONV2D
BatchNorm
ReLU activation function.
Block 3:
CONV2D
BatchNorm
Shortcut path: When the input and output dimensions don't match up we add a CONV2D layer in the shortcut path to match the shape.
CONV2D
BatchNorm
Final block:
The shortcut and the output from the block 3 are added together
ReLU activation function
def convolutional_block(X, f, filters, s = 2, initializer=glorot_uniform):
# Retrieve Filters
F1, F2, F3 = filters
# Save the input value
residual = X
# Block 1
X = Conv2D(filters = F1, kernel_size = 1, strides = (s, s), padding='valid', kernel_initializer = initializer(seed=0))(X)
X =layers.BatchNormalization(axis=3)(X)
X = Activation('relu')(X)
## Block 2
X = Conv2D(filters = F2, kernel_size = f,strides = (1, 1),padding='same',kernel_initializer = initializer(seed=0))(X)
X =layers.BatchNormalization(axis=3)(X)
X = Activation('relu')(X)
## Block 3
X = Conv2D(filters = F3, kernel_size = 1, strides = (1, 1), padding='valid', kernel_initializer = initializer(seed=0))(X)
X =layers.BatchNormalization(axis=3)(X)
# shortcut
residual = Conv2D(filters = F3, kernel_size = 1, strides = (s, s), padding='valid', kernel_initializer = initializer(seed=0))(residual)
residual =layers.BatchNormalization(axis=3)(residual)
# Final Block
X = Add()([X, residual])
X = Activation('relu')(X)
return X
Now we will create a ResNet with following architecture
Conv2D -> batchNorm -> Relu -> MaxPool -> Conv block -> Identity block * 2 -> Conv block -> Identity block * 3 -> Conv block -> Identity block * 5 -> Conv block -> Identity block * 2 -> AVGPOOL -> Flatten -> Dence
def ResNet(input_shape = (64, 64, 3), classes = 6, training=False):
X_input = Input(input_shape)
# Zero-Padding
X = ZeroPadding2D((3, 3))(X_input)
# Stage 1
X = Conv2D(64, (7, 7), strides = (2, 2), kernel_initializer = glorot_uniform(seed=0))(X)
X =layers.BatchNormalization(axis=3)(X)
X = Activation('relu')(X)
X = MaxPooling2D((3, 3), strides=(2, 2))(X)
# Stage 2
X = convolutional_block(X, f = 3, filters = [64, 64, 256], s = 1)
X = residual_identity_block(X, 3, [64, 64, 256])
X = residual_identity_block(X, 3, [64, 64, 256])
## Stage 3
X = convolutional_block(X, f = 3, filters = [128,128,512], s = 2)
X = residual_identity_block(X, 3, [128,128,512])
X = residual_identity_block(X, 3, [128,128,512])
X = residual_identity_block(X, 3, [128,128,512])
# Stage 4
X = convolutional_block(X, f = 3, filters = [256, 256, 1024], s = 2)
X = residual_identity_block(X, 3, [256, 256, 1024])
X = residual_identity_block(X, 3, [256, 256, 1024])
X = residual_identity_block(X, 3, [256, 256, 1024])
X = residual_identity_block(X, 3, [256, 256, 1024])
X = residual_identity_block(X, 3, [256, 256, 1024])
# Stage 5
X = convolutional_block(X, f = 3, filters = [512, 512, 2048], s = 2)
X = residual_identity_block(X, 3, [512, 512, 2048])
X = residual_identity_block(X, 3, [512, 512, 2048])
# AVGPOOL
X = AveragePooling2D((2, 2))(X)
# output layer
X = Flatten()(X)
X = Dense(classes, activation='softmax', kernel_initializer = glorot_uniform(seed=0))(X)
# Create model
model = Model(inputs = X_input, outputs = X)
return model
Now we compile the model
np.random.seed(1)
tf.random.set_seed(2)
opt = tf.keras.optimizers.Adam(learning_rate=0.00015)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
Lets define a function to load the dataset
def load_signs_dataset():
# loading training examples
train_dataset = h5py.File('datasets/train_signs.h5', "r")
# your train set features
train_set_x_orig = np.array(train_dataset["train_set_x"][:])
# your train set labels
train_set_y_orig = np.array(train_dataset["train_set_y"][:])
# loading test examples
test_dataset = h5py.File('datasets/test_signs.h5', "r")
# your test set features
test_set_x_orig = np.array(test_dataset["test_set_x"][:])
# your test set labels
test_set_y_orig = np.array(test_dataset["test_set_y"][:])
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
Implement one hot encoding for Y labels
def convert_to_one_hot(Y, C):
Y = np.eye(C)[Y.reshape(-1)].T
return Y
Loading Dataset
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_signs_dataset()
# Normalize image vectors
X_train = X_train_orig / 255.
X_test = X_test_orig / 255.
# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T
Train the model with 100 epochs with a batch size of 32
history=model.fit(X_train, Y_train, epochs = 100, batch_size = 32,validation_split=0.2)
Epoch 94/100
27/27 [==============================] - 8s 313ms/step - loss: 1.8774e-05 - accuracy: 1.0000 - val_loss: 0.2275 - val_accuracy: 0.9537
Epoch 95/100
27/27 [==============================] - 9s 320ms/step - loss: 1.5955e-05 - accuracy: 1.0000 - val_loss: 0.2276 - val_accuracy: 0.9537
Epoch 96/100
27/27 [==============================] - 8s 315ms/step - loss: 9.4384e-06 - accuracy: 1.0000 - val_loss: 0.2269 - val_accuracy: 0.9537
Epoch 97/100
27/27 [==============================] - 8s 311ms/step - loss: 1.9851e-05 - accuracy: 1.0000 - val_loss: 0.2266 - val_accuracy: 0.9537
Epoch 98/100
27/27 [==============================] - 8s 314ms/step - loss: 2.3506e-05 - accuracy: 1.0000 - val_loss: 0.2283 - val_accuracy: 0.9537
Epoch 99/100
27/27 [==============================] - 8s 303ms/step - loss: 1.6656e-05 - accuracy: 1.0000 - val_loss: 0.2291 - val_accuracy: 0.9537
Epoch 100/100
27/27 [==============================] - 8s 310ms/step - loss: 9.8548e-06 - accuracy: 1.0000 - val_loss: 0.2282 - val_accuracy: 0.9537
We can see with ResNet CNN we get an validation accuracy of 95 %
Lets plot loss and accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['training data', 'validation data'], loc = 'lower right')
# Plotting for loss fucntion of validation data and training data against each epoch
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['training data', 'validation data'], loc = 'upper right')
Lets evaluate model on test Data
model.evaluate(X_test,Y_test)
4/4 [==============================] - 0s 67ms/step - loss: 0.2286 - accuracy: 0.9500
[0.22860988974571228, 0.949999988079071]