Batch normalisation works by first standardising the inputs, then scales linearly - coefficients determined through training. This occurs between each layer.
Outcomes of this process:
epochs take longer, but less epochs are required.
Batch normalisation occurs at each layer, so do not need separate normalisation step for input data.
What about bias? We do not need bias in BN.
Introducing BN into this model.
Do you put BN before or after a activation function? Author of Paper suggests before.