Alter agent.infer_policies() and other functions to return an optional transition history #166

JohnBoik · 2025-01-23T19:14:28Z

Especially in complex problems that have multiple policy steps, it can be useful to return a history of the transitions of B matrices. Such a history could be used, for example, to validate that state (factor) transitions are occurring as expected. One way to accomplish this is to alter compute_G_policy_inductive() as follows (note any line with transition_history in it):

def compute_G_policy_inductive(qs_init, A, B, C, pA, pB, A_dependencies, B_dependencies, I, policy_i, inductive_epsilon=1e-3, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, use_inductive=False):
   def scan_body(carry, t):
       qs, neg_G, transition_history = carry
       qs_next = compute_expected_state(qs, B, policy_i[t], B_dependencies)
       qo = compute_expected_obs(qs_next, A, A_dependencies)
       info_gain = compute_info_gain(qs_next, qo, A, A_dependencies) if 
           use_states_info_gain else 0.
       utility = compute_expected_utility(t, qo, C) if use_utility else 0.
       inductive_value = calc_inductive_value_t(qs_init, qs_next, I,
           epsilon=inductive_epsilon) if use_inductive else 0.
       param_info_gain = 0.
       if pA is not None:
           param_info_gain += calc_pA_info_gain(pA, qo, qs_next, A_dependencies) if
               use_param_info_gain else 0.
       if pB is not None:
           param_info_gain += calc_pB_info_gain(pB, qs_next, qs, B_dependencies,
               policy_i[t]) if use_param_info_gain else 0.
       neg_G += info_gain + utility - param_info_gain + inductive_value
       transition_history = transition_history.at[t].set([np.argmax(x) for x 
           in qs_next])
       return (qs_next, neg_G, transition_history), None
   qs = qs_init
   neg_G = 0.
   transition_history = np.zeros(policy_i.shape)
   final_state, _ = lax.scan(scan_body, (qs, neg_G, transition_history), 
        jnp.arange(policy_i.shape[0]))
   _, neg_G, transition_history = final_state
   return neg_G, transition_history

Given the above, the functions agent.infer_policies() and control.update_posterior_policies_inductive() would also need to be changed to accept and return the transition_history tensor.

Going one step further, one could specify what kind of reducing function is applied to each qs_next. In the above, the function np.argmax is used, but a function like entropy() could be used instead. As well, the complete qs_next could be returned (at some memory cost), rather than a function of qs_next.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alter agent.infer_policies() and other functions to return an optional transition history #166

Alter agent.infer_policies() and other functions to return an optional transition history #166

JohnBoik commented Jan 23, 2025

Alter agent.infer_policies() and other functions to return an optional transition history #166

Alter agent.infer_policies() and other functions to return an optional transition history #166

Comments

JohnBoik commented Jan 23, 2025