SARSA stands for State-Action-Reward-State-Action
SARSA is another value-based Reinforcement learning algorithm, differing from Q-learning in that it updates the Q-values based on the action actually taken by the policy.
SARSA update rule:
Explanation:
- : The Q-value of the current state and action .
- : The learning rate, determining how much new information overrides old information.
- : The reward received after taking action from state .
- : The discount factor, balancing immediate and future rewards.
- : The Q-value for the next state and the action actually taken according to the policy.
Notes:
- SARSA’s on-policy nature ensures that it learns a policy that aligns with its exploration strategy, leading to more stable behavior in environments with randomness or noise.
- The learning process may be slower compared to Q-learning, but it can be more robust in environments where the agent’s behavior is expected to align closely with the policy it follows.