<< mdp_example_rand Markov Decision Processses (MDP) Toolbox mdp_policy_iteration >>

Markov Decision Processses (MDP) Toolbox >> Markov Decision Processses (MDP) Toolbox > mdp_finite_horizon

mdp_finite_horizon

Solves finite-horizon MDP with backwards induction algorithm.

Calling Sequence

[V, policy, cpu_time] = mdp_finite_horizon (P, R, discount, N)
[V, policy, cpu_time] = mdp_finite_horizon (P, R, discount, N, h)

Description

mdp_finite_horizon applies backwards induction algorithm for finite-horizon MDP. The optimality equations allow to recursively evaluate function values starting from the terminal stage.

This function uses verbose and silent modes. In verbose mode, the function displays the current stage and the corresponding optimal policy.

Arguments

P

transition probability array.

P can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS).

R

reward array.

R can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.

discount

discount factor.

discount is a real which belongs to ]0; 1].

N

number of stages.

N is an integer greater than 0.

Evaluation

V

value fonction.

V is a (Sx(N+1)) matrix. Each column n is the optimal value fonction at stage n, with n = 1, ... N.

V(:,N+1) is the terminal reward.

policy

optimal policy.

policy is a (SxN) matrix. Each element is an integer corresponding to an action and each column n is the optimal policy at stage n.

Examples

-> P = list();
-> P(1) = [ 0.5 0.5;   0.8 0.2 ];
-> P(2) = [ 0 1;   0.1 0.9 ];
-> R = [ 5 10;   -1 2 ];

-> [V, policy, cpu_time] = mdp_finite_horizon(P, R, 0.9, 3)
cpu_time =
   0.0400
policy =
   2 2 2
   1 1 2
V =
   15.9040 11.8000 10.0000 0
     8.6768   6.5600   2.0000 0

-> mdp_verbose()  // set verbose mode

-> [V, policy, cpu_time] = mdp_finite_horizon(P, R, 0.9, 3)
stage:3 policy transpose : 2 2
stage:2 policy transpose : 2 1
stage:1 policy transpose : 2 1
cpu_time =
   0.0400
policy =
   2 2 2
   1 1 2
V =
   15.9040 11.8000 10.0000 0
     8.6768   6.5600   2.0000 0

In the above example, P can be a list containing sparse matrices:
-> P(1) = sparse([ 0.5 0.5;  0.8 0.2 ]);
-> P(2) = sparse([ 0 1;  0.1 0.9 ]);
The function is unchanged.

Authors


Report an issue
<< mdp_example_rand Markov Decision Processses (MDP) Toolbox mdp_policy_iteration >>