You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Especially in complex problems that have multiple policy steps, it can be useful to return a history of the transitions of B matrices. Such a history could be used, for example, to validate that state (factor) transitions are occurring as expected. One way to accomplish this is to alter compute_G_policy_inductive() as follows (note any line with transition_history in it):
def compute_G_policy_inductive(qs_init, A, B, C, pA, pB, A_dependencies, B_dependencies, I, policy_i, inductive_epsilon=1e-3, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, use_inductive=False):
def scan_body(carry, t):
qs, neg_G, transition_history = carry
qs_next = compute_expected_state(qs, B, policy_i[t], B_dependencies)
qo = compute_expected_obs(qs_next, A, A_dependencies)
info_gain = compute_info_gain(qs_next, qo, A, A_dependencies) if
use_states_info_gain else 0.
utility = compute_expected_utility(t, qo, C) if use_utility else 0.
inductive_value = calc_inductive_value_t(qs_init, qs_next, I,
epsilon=inductive_epsilon) if use_inductive else 0.
param_info_gain = 0.
if pA is not None:
param_info_gain += calc_pA_info_gain(pA, qo, qs_next, A_dependencies) if
use_param_info_gain else 0.
if pB is not None:
param_info_gain += calc_pB_info_gain(pB, qs_next, qs, B_dependencies,
policy_i[t]) if use_param_info_gain else 0.
neg_G += info_gain + utility - param_info_gain + inductive_value
transition_history = transition_history.at[t].set([np.argmax(x) for x
in qs_next])
return (qs_next, neg_G, transition_history), None
qs = qs_init
neg_G = 0.
transition_history = np.zeros(policy_i.shape)
final_state, _ = lax.scan(scan_body, (qs, neg_G, transition_history),
jnp.arange(policy_i.shape[0]))
_, neg_G, transition_history = final_state
return neg_G, transition_history
Given the above, the functions agent.infer_policies() and control.update_posterior_policies_inductive() would also need to be changed to accept and return the transition_history tensor.
Going one step further, one could specify what kind of reducing function is applied to each qs_next. In the above, the function np.argmax is used, but a function like entropy() could be used instead. As well, the complete qs_next could be returned (at some memory cost), rather than a function of qs_next.
The text was updated successfully, but these errors were encountered:
Especially in complex problems that have multiple policy steps, it can be useful to return a history of the transitions of B matrices. Such a history could be used, for example, to validate that state (factor) transitions are occurring as expected. One way to accomplish this is to alter
compute_G_policy_inductive()
as follows (note any line withtransition_history
in it):Given the above, the functions
agent.infer_policies()
andcontrol.update_posterior_policies_inductive()
would also need to be changed to accept and return thetransition_history
tensor.Going one step further, one could specify what kind of reducing function is applied to each
qs_next
. In the above, the functionnp.argmax
is used, but a function likeentropy()
could be used instead. As well, the completeqs_next
could be returned (at some memory cost), rather than a function ofqs_next
.The text was updated successfully, but these errors were encountered: