Computes the transition matrix and the reward matrix for a given policy.
[Ppolicy, PRpolicy] = mdp_computePpolicyPRpoli(P, R, policy)
mdp_computePpolicyPRpoli computes the state transition matrix and the reward matrix of a policy, given a probability matrix P and a reward matrix R.
transition probability array.
P can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS).
reward array.
R can be a 3 dimensions array (SxSxA) or a list (1xA), each list element containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.
a policy.
policy is a (Sx1) vector of integer representing actions.
transition probability array of the policy.
Ppolicy is a (SxS) matrix.
reward matrix of the policy.
PRpolicy is a (Sx1) vector.
-> P = list(); -> P(1) = [0.6116 0.3884; 0 1.0000]; -> P(2) = [0.6674 0.3326; 0 1.0000]; -> R = list(); -> R(1) = [-0.2433 0.7073; 0 0.1871]; -> R(2) = [-0.0069 0.6433; 0 0.2898]; -> policy = [2; 2]; -> [Ppolicy, PRpolicy] = mdp_computePpolicyPRpoli(P, R, policy) PRpolicy = 0.2093565 0.2898 Ppolicy = 0.6674 0.3326 0 1 In the above example, P can be a list containing sparse matrices: -> P(1) = sparse([0.6116 0.3884; 0 1.0000]); -> P(2) = sparse([0.6674 0.3326; 0 1.0000]); The function is unchanged. | ![]() | ![]() |