<< mdp_value_iterationGS Markov Decision Processses (MDP) Toolbox mdp_verbose, mdp_silent >>

Markov Decision Processses (MDP) Toolbox >> Markov Decision Processses (MDP) Toolbox > mdp_value_iteration_boun

mdp_value_iteration_boun

Computes a bound on the number of iterations for the value iteration algorithm.

Calling Sequence

[max_iter, cpu_time] = mdp_value_iteration_boun(P, R, discount)
[max_iter, cpu_time] = mdp_value_iteration_boun(P, R, discount, epsilon)
[max_iter, cpu_time] = mdp_value_iteration_boun(P, R, discount, epsilon, V0)

Description

mdp_value_iteration_boun computes a bound on the number of iterations for the value iteration algorithm to find an epsilon-optimal policy with use of span for the stopping criterion.

Arguments

P

transition probability array.

P can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS).

R

reward array.

R can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.

discount

discount factor.

discount is a real which belongs to [0; 1[.

epsilon (optional)

search for an epsilon-optimal policy.

epsilon is a real in ]0; 1].

By default, epsilon = 0.01.

V0 (optional)

starting value function.

V0 is a (Sx1) vector.

By default, V0 is only composed of 0 elements.

Evaluation

iter

number of done iterations.

cpu_time

CPU time used to run the program.

Examples

-> P = list();
-> P(1) = [ 0.5 0.5;   0.8 0.2 ];
-> P(2) = [ 0 1;   0.1 0.9 ];
-> R = [ 5 10;   -1 2 ];

-> max_iter = mdp_value_iteration_boun(P, R,0.9)
max_iter =
   28

In the above example, P can be a list containing sparse matrices:
-> P(1) = sparse([ 0.5 0.5;  0.8 0.2 ]);
-> P(2) = sparse([ 0 1;  0.1 0.9 ]);
The function is unchanged.

Authors


Report an issue
<< mdp_value_iterationGS Markov Decision Processses (MDP) Toolbox mdp_verbose, mdp_silent >>