Links:

Can be used to handle vanishing and exploding gradients problem and Overfitting problems within Neural network.

First note: Normalisation vs Standardisation

How does Batch normalisation work?

Batch normalisation works by first standardising the inputs, then scales linearly - coefficients determined through training. This occurs between each layer.

Outcomes of this process:

  • epochs take longer, but less epochs are required.

Benefits:

  • Batch normalisation occurs at each layer, so do not need separate normalisation step for input data.
  • What about bias? We do not need bias in BN.

Example:

import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
 
mnist = keras.datasets.mnist
(X_train_full, y_train_full) , (X_test, y_test) = mnist.load_data()
 
plt.imshow(X_train_full[12], cmap=plt.get_cmap('gray' ))
X_valid, X_train = X_train_full[:5000] / 255, X_train_full[5000:]/255
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X test = x test/255
 
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=[28,28]),
keras.layers.Dense(300, activation = "relu"),
keras.layers.Dense(100, activation = "relu"),
keras.layers.Dense(10, activation = "softmax")])
 

Introducing BN into this model.

Do you put BN before or after a activation function? Author of Paper suggests before.

# Dont need as have BN now
# X valid, X train = X_train_full[ :5000] / 255, X_train_full[5000:]/255
# y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
# X test = X test/255
 
model = keras.models.Sequential ([
keras.layers.Flatten(input_shape=[28,28]),
keras.layers.BatchNormalization(), # normalisation layer.
keras.layers.Dense(300,use_bias=False),
keras.layers.BatchNormalization(),
keras.layers.Activation('relu'),
keras.layers.Dense(100,use_bias=False), I
keras.layers.BatchNormalization(),
keras.layers.Activation('relu'),
keras.layers.Dense(10, activation = "softmax")
 
])