Activation functions play a role in Neural network by introducing non-linearity, allowing models to learn from complex patterns and relationships in the data.
How do we choose the right Activation Function
Key Uses of Activation Functions:
- Non-linearity: Without activation functions, neural networks would behave as linear models, unable to capture complex, non-linear patterns in the data
- Data transformation: Activation functions modify input signals from one layer to another, helping the model focus on important information while ignoring irrelevant data,
- Backpropagation: They enable gradient-based optimization by making the network differentiable, essential for efficient learning.
Purpose of Typical Activation Functions
Linear: Outputs a continuous value, suitable for regression.
ReLU (Rectified Linear Unit):
- Purpose: ReLU is used to introduce non-linearity by turning neurons “on” or “off.” It outputs the input directly if it is positive; otherwise, it outputs zero. This helps in efficiently training deep networks by mitigating the vanishing gradient problem.
- Function:
Sigmoid:
- Purpose: Sigmoid is used primarily in binary classification tasks. It squashes input values to a range between 0 and 1, making it suitable for representing probabilities.
- Function:
Tanh:
- Purpose: Tanh is similar to the sigmoid function but outputs values in the range of -1 to 1. This zero-centered output can be beneficial for optimization in certain scenarios.
- Function:
Softmax:
- Purpose: Softmax is used in multi-class classification tasks. It converts a vector of raw scores (logits) into a probability distribution, where each value is between 0 and 1, and the sum of all values is 1. This allows the outputs to be interpreted as probabilities, with larger inputs corresponding to larger output probabilities.
- Application: In both softmax regression and neural networks with softmax outputs, a vectoris generated by a linear function and then passed through the softmax function to produce a probability distribution. This enables the selection of one output as the predicted category.
The softmax function converts a vector of raw scores (logits) into a probability distribution. The formula for the softmax function for a vector is given by:
This ensures that the output values are between 0 and 1 and that they sum to 1, making them interpretable as probabilities.