Solves finite-horizon MDP with backwards induction algorithm.
[V, policy, cpu_time] = mdp_finite_horizon (P, R, discount, N) [V, policy, cpu_time] = mdp_finite_horizon (P, R, discount, N, h)
mdp_finite_horizon applies backwards induction algorithm for finite-horizon MDP. The optimality equations allow to recursively evaluate function values starting from the terminal stage.
This function uses verbose and silent modes. In verbose mode, the function displays the current stage and the corresponding optimal policy.
transition probability array.
P can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS).
reward array.
R can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.
discount factor.
discount is a real which belongs to ]0; 1].
number of stages.
N is an integer greater than 0.
value fonction.
V is a (Sx(N+1)) matrix. Each column n is the optimal value fonction at stage n, with n = 1, ... N.
V(:,N+1) is the terminal reward.
optimal policy.
policy is a (SxN) matrix. Each element is an integer corresponding to an action and each column n is the optimal policy at stage n.
-> P = list(); -> P(1) = [ 0.5 0.5; 0.8 0.2 ]; -> P(2) = [ 0 1; 0.1 0.9 ]; -> R = [ 5 10; -1 2 ]; -> [V, policy, cpu_time] = mdp_finite_horizon(P, R, 0.9, 3) cpu_time = 0.0400 policy = 2 2 2 1 1 2 V = 15.9040 11.8000 10.0000 0 8.6768 6.5600 2.0000 0 -> mdp_verbose() // set verbose mode -> [V, policy, cpu_time] = mdp_finite_horizon(P, R, 0.9, 3) stage:3 policy transpose : 2 2 stage:2 policy transpose : 2 1 stage:1 policy transpose : 2 1 cpu_time = 0.0400 policy = 2 2 2 1 1 2 V = 15.9040 11.8000 10.0000 0 8.6768 6.5600 2.0000 0 In the above example, P can be a list containing sparse matrices: -> P(1) = sparse([ 0.5 0.5; 0.8 0.2 ]); -> P(2) = sparse([ 0 1; 0.1 0.9 ]); The function is unchanged. | ![]() | ![]() |