Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alter agent.infer_policies() and other functions to return an optional transition history #166

Open
JohnBoik opened this issue Jan 23, 2025 · 0 comments

Comments

@JohnBoik
Copy link

Especially in complex problems that have multiple policy steps, it can be useful to return a history of the transitions of B matrices. Such a history could be used, for example, to validate that state (factor) transitions are occurring as expected. One way to accomplish this is to alter compute_G_policy_inductive() as follows (note any line with transition_history in it):

def compute_G_policy_inductive(qs_init, A, B, C, pA, pB, A_dependencies, B_dependencies, I, policy_i, inductive_epsilon=1e-3, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, use_inductive=False):
   def scan_body(carry, t):
       qs, neg_G, transition_history = carry
       qs_next = compute_expected_state(qs, B, policy_i[t], B_dependencies)
       qo = compute_expected_obs(qs_next, A, A_dependencies)
       info_gain = compute_info_gain(qs_next, qo, A, A_dependencies) if 
           use_states_info_gain else 0.
       utility = compute_expected_utility(t, qo, C) if use_utility else 0.
       inductive_value = calc_inductive_value_t(qs_init, qs_next, I,
           epsilon=inductive_epsilon) if use_inductive else 0.
       param_info_gain = 0.
       if pA is not None:
           param_info_gain += calc_pA_info_gain(pA, qo, qs_next, A_dependencies) if
               use_param_info_gain else 0.
       if pB is not None:
           param_info_gain += calc_pB_info_gain(pB, qs_next, qs, B_dependencies,
               policy_i[t]) if use_param_info_gain else 0.
       neg_G += info_gain + utility - param_info_gain + inductive_value
       transition_history = transition_history.at[t].set([np.argmax(x) for x 
           in qs_next])
       return (qs_next, neg_G, transition_history), None
   qs = qs_init
   neg_G = 0.
   transition_history = np.zeros(policy_i.shape)
   final_state, _ = lax.scan(scan_body, (qs, neg_G, transition_history), 
        jnp.arange(policy_i.shape[0]))
   _, neg_G, transition_history = final_state
   return neg_G, transition_history

Given the above, the functions agent.infer_policies() and control.update_posterior_policies_inductive() would also need to be changed to accept and return the transition_history tensor.

Going one step further, one could specify what kind of reducing function is applied to each qs_next. In the above, the function np.argmax is used, but a function like entropy() could be used instead. As well, the complete qs_next could be returned (at some memory cost), rather than a function of qs_next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant