Q-Learning

Q-learning is a value-based, model-free Reinforcement learning algorithm where the agent learns the optimal policy by updating Q-values based on the rewards received. It is particularly useful in discrete environments like grids.

Uses a Q-Table which is populated by Q-values which are the maximum expected future reward for the given state and action. We improve the Q-Table in an iterative approach

Resources:

Q-Learning Explained - Reinforcement Learning Tutorial

Q-learning update rule:

The left hand side gets updated (Bellman Equations)

Q_{n e w} (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + α [r_{t + 1} + γ a^{'} max Q (s_{t + 1}, a^{'}) - Q (s_{t}, a_{t})]

Explanation:

$Q (s_{t}, a_{t})$ : The Q-value of the current state $s_{t}$ and action $a_{t}$ .
$α$ : The learning rate, determining how much new information overrides old information.
$r_{t + 1}$ : The reward received after taking action $a_{t}$ from state $s_{t}$ .
$γ$ : The discount factor, balancing immediate and future rewards.
$max_{a^{'}} Q (s_{t + 1}, a^{'})$ : The maximum Q-value for the next state $s_{t + 1}$ across all possible actions $a^{'}$ .

Notes:

Q-learning is well-suited for environments where the state and action spaces are discrete and manageable in size.
The algorithm is designed to converge to the optimal policy, even in non-deterministic environments, as long as each state-action pair is sufficiently explored.
Exploration vs Exploitation

Data Archive

Explorer

Q-Learning

Backlinks

Explorer