<< mdp_computePR Markov Decision Processses (MDP) Toolbox mdp_eval_policy_TD_0 >>

Markov Decision Processses (MDP) Toolbox >> Markov Decision Processses (MDP) Toolbox > mdp_computePpolicyPRpoli

mdp_computePpolicyPRpoli

Computes the transition matrix and the reward matrix for a given policy.

Calling Sequence

[Ppolicy, PRpolicy] = mdp_computePpolicyPRpoli(P, R, policy)

Description

mdp_computePpolicyPRpoli computes the state transition matrix and the reward matrix of a policy, given a probability matrix P and a reward matrix R.

Arguments

P

transition probability array.

P can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS).

R

reward array.

R can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.

policy

a policy.

policy is a (Sx1) vector of integer representing actions.

Evaluation

Ppolicy

transition probability array of the policy.

Ppolicy is a (SxS) matrix.

R

reward matrix of the policy.

PRpolicy is a (Sx1) vector.

Examples

-> P = list();
-> P(1) = [0.6116 0.3884;  0 1.0000];
-> P(2) = [0.6674 0.3326;  0 1.0000];
-> R = list();
-> R(1) = [-0.2433 0.7073;  0 0.1871];
-> R(2) = [-0.0069 0.6433;  0 0.2898];
-> policy = [2; 2];
-> [Ppolicy, PRpolicy] = mdp_computePpolicyPRpoli(P, R, policy)
PRpolicy =
   0.2093565
   0.2898
Ppolicy =
   0.6674    0.3326
            0    1

In the above example, P can be a list containing sparse matrices:
-> P(1) = sparse([0.6116 0.3884;  0 1.0000]);
-> P(2) = sparse([0.6674 0.3326;  0 1.0000]);
The function is unchanged.

Authors


Report an issue
<< mdp_computePR Markov Decision Processses (MDP) Toolbox mdp_eval_policy_TD_0 >>