Determines sets of 'near optimal' actions for all states.
[multiple, optimal_actions] = mdp_eval_policy_optimali(P, R, discount, Vpolicy)
For some states, the evaluation of the value function may give close results for different actions. It is interesting to identify those states for which several actions have a value function very close the optimal one (i.e. less than 0.01 different). We called this the search for near optimal actions in each state.
transition probability array.
P can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS).
reward array.
R can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.
discount factor.
discount is a real which belongs to [0; 1[.
value function of the optimal policy.
Vpolicy is a (Sx1) vector.
existence of at least two 'nearly' optimal actions for a state.
multiple is egal to %T when at least one state has several epsilon-optimal actions, %F if not.
actions 'nearly' optimal for each state.
optimal_actions is a (SxA) boolean matrix whose element optimal_actions(s, a) is %T if the action a is 'nearly' optimal being in state s and %F if not.
-> P = list(); -> P(1) = [ 0.5 0.5; 0.8 0.2 ]; -> P(2) = [ 0 1; 0.1 0.9 ]; -> R = [ 5 10; -1 2 ]; -> Vpolicy = [ 42.4419; 36.0465 ]; -> [multiple, optimal_actions] = mdp_eval_policy_optimali(P, R, 0.9, Vpolicy) optimal_actions = F T T F multiple = F In the above example, P can be a list containing sparse matrices: -> P(1) = sparse([ 0.5 0.5; 0.8 0.2 ]); -> P(2) = sparse([ 0 1; 0.1 0.9 ]); The function is unchanged. | ![]() | ![]() |