<< mdp_Q_learning Markov Decision Processses (MDP) Toolbox mdp_check >>

Markov Decision Processses (MDP) Toolbox >> Markov Decision Processses (MDP) Toolbox > mdp_bellman_operator

mdp_bellman_operator

Applies the Bellman operator to a value function Vprev and returns a new value function and a Vprev-improving policy.

Calling Sequence

[V, policy] = mdp_bellman_operator(P, PR, discount, Vprev)

Description

mdp_bellman_operator applies the Bellman operator: PR + discount*P*Vprev to the value function Vprev.

Returns a new value function and a Vprev-improving policy.

Arguments

P

transition probability array.

P can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS).

PR

reward array.

PR can be a 2D array (SxA) possibly sparse.

discount

discount factor.

discount is a real number belonging to ]0; 1].

Vprev

value fonction.

Vprev is a (Sx1) vector.

Evaluation

V

new value fonction.

V is a (Sx1) vector.

policy

a policy.

policy is a (Sx1) vector. Each element is an integer corresponding to an action.

Examples

-> P = list()
-> P(1) = [ 0.5 0.5;   0.8 0.2 ];
-> P(2) = [ 0 1;   0.1 0.9 ];
-> R = [ 5 10;   -1 2 ];

-> [V, policy] = mdp_bellman_operator(P, R, 0.9, [0;0])
policy =
   2
   2
V =
   10
   2

In the above example, P can be a list containing sparse matrices:
-> P(1) = sparse([ 0.5 0.5;  0.8 0.2 ]);
-> P(2) = sparse([ 0 1;  0.1 0.9 ]);
The function is unchanged.

Authors


Report an issue
<< mdp_Q_learning Markov Decision Processses (MDP) Toolbox mdp_check >>