Computes a bound on the number of iterations for the value iteration algorithm.
[max_iter, cpu_time] = mdp_value_iteration_boun(P, R, discount) [max_iter, cpu_time] = mdp_value_iteration_boun(P, R, discount, epsilon) [max_iter, cpu_time] = mdp_value_iteration_boun(P, R, discount, epsilon, V0)
mdp_value_iteration_boun computes a bound on the number of iterations for the value iteration algorithm to find an epsilon-optimal policy with use of span for the stopping criterion.
transition probability array.
P can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS).
reward array.
R can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.
discount factor.
discount is a real which belongs to [0; 1[.
search for an epsilon-optimal policy.
epsilon is a real in ]0; 1].
By default, epsilon = 0.01.
starting value function.
V0 is a (Sx1) vector.
By default, V0 is only composed of 0 elements.
number of done iterations.
CPU time used to run the program.
-> P = list(); -> P(1) = [ 0.5 0.5; 0.8 0.2 ]; -> P(2) = [ 0 1; 0.1 0.9 ]; -> R = [ 5 10; -1 2 ]; -> max_iter = mdp_value_iteration_boun(P, R,0.9) max_iter = 28 In the above example, P can be a list containing sparse matrices: -> P(1) = sparse([ 0.5 0.5; 0.8 0.2 ]); -> P(2) = sparse([ 0 1; 0.1 0.9 ]); The function is unchanged. | ![]() | ![]() |