<< mdp_eval_policy_matrix Markov Decision Processses (MDP) Toolbox mdp_example_forest >>

Markov Decision Processses (MDP) Toolbox >> Markov Decision Processses (MDP) Toolbox > mdp_eval_policy_optimali

mdp_eval_policy_optimali

Determines sets of 'near optimal' actions for all states.

Calling Sequence

[multiple, optimal_actions] = mdp_eval_policy_optimali(P, R, discount, Vpolicy)

Description

For some states, the evaluation of the value function may give close results for different actions. It is interesting to identify those states for which several actions have a value function very close the optimal one (i.e. less than 0.01 different). We called this the search for near optimal actions in each state.

Arguments

P

transition probability array.

P can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS).

R

reward array.

R can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.

discount

discount factor.

discount is a real which belongs to [0; 1[.

Vpolicy

value function of the optimal policy.

Vpolicy is a (Sx1) vector.

Evaluation

multiple

existence of at least two 'nearly' optimal actions for a state.

multiple is egal to %T when at least one state has several epsilon-optimal actions, %F if not.

optimal_actions

actions 'nearly' optimal for each state.

optimal_actions is a (SxA) boolean matrix whose element optimal_actions(s, a) is %T if the action a is 'nearly' optimal being in state s and %F if not.

Examples

-> P = list();
-> P(1) = [ 0.5 0.5;   0.8 0.2 ];
-> P(2) = [ 0 1;   0.1 0.9 ];
-> R = [ 5 10;   -1 2 ];
-> Vpolicy = [ 42.4419;   36.0465 ];
-> [multiple, optimal_actions] = mdp_eval_policy_optimali(P, R, 0.9, Vpolicy)
optimal_actions =
   F   T
   T   F
multiple =
   F

In the above example, P can be a list containing sparse matrices:
-> P(1) = sparse([ 0.5 0.5;  0.8 0.2 ]);
-> P(2) = sparse([ 0 1;  0.1 0.9 ]);
The function is unchanged.

Authors


Report an issue
<< mdp_eval_policy_matrix Markov Decision Processses (MDP) Toolbox mdp_example_forest >>