Adam Optimizer

Adaptive learning rate adjust the learning rate for each parameter based on the estimates of the first and second moments of the gradients. Adam (short for Adaptive Moment Estimation) combines ideas from Momentum and adaptive learning rates to help the optimization process.