What are the Bellman equations that are used in RL?
Bellman Equations
Equations here may not be accurate.
In reinforcement learning, Bellman’s equations are fundamental to understanding how agents make decisions to maximize rewards over time. They are used to describe the relationship between the value of a state and the values of its successor states. There are two main types of Bellman’s equations:
-
Bellman Equation for State Value Function (V):
- This equation expresses the value of a state as the expected return starting from that state and following a particular policy. It is defined as:
- Here, (V(s)) is the value of state (s), (\pi(a|s)) is the policy (probability of taking action (a) in state (s)), (P(s’|s, a)) is the transition probability to state (s’) from state (s) taking action (a), (R(s, a, s’)) is the reward received, and (\gamma) is the discount factor.
-
Bellman Equation for Action Value Function (Q):
- This equation expresses the value of taking an action in a given state under a particular policy. It is defined as:
- Here, (Q(s, a)) is the value of taking action (a) in state (s), and the other terms are similar to those in the state value function.
Bellman’s equations are used in dynamic programming methods like Value Iteration and Policy Iteration to find optimal policies and value functions. They provide a recursive decomposition of the value functions.