Summary
- On-Policy vs. Off-Policy: Unlike Q-Learning, which is off-policy and updates based on the best possible action in the next state, SARSA is on-policy and updates based on the actual action taken by the agent.
- Conservatism: SARSA tends to be more conservative in its policy updates, making it suitable for environments where the agent’s policy needs to adapt to uncertainties.