Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial ExplorationWorker commit #383

Merged
merged 4 commits into from
Oct 17, 2024
Merged

Conversation

tvmarino
Copy link
Collaborator

Initial ExplorationWorker commit. ExplorationWorker implements the
interactive clang compilation together with exploration for a
given module. This commit adds the class with its compile_module
function which will compile the module with a given compiler policy.

interactive clang compilation together with exploration for a
given module. This commit adds the class with its compile_module
function which will compile the module with a given compiler policy.
Copy link
Collaborator

@mtrofin mtrofin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptal at your commit message, probably formatting split it.

from compiler_opt.rl import env


def add_int_feature(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this take a type parameter defaulted to np.int64? the type of the value should match the type of the element value in the feature list, right?

same q for the add_float_feature

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type value should match the type of the element value in the feature list. All of these are helper functions for add_feature_list. I can definitely make this take a type parameter but I don't see how this would help since at the end of the day I would have to cast whatever type is specified there to an int64 so that we create an int64 feature list. My understanding is that there are only a few types supported for feature lists and the three different add functions are these types.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see. sgtm

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaah! one thing I was missing: SequenceExamples really knows int64 and float32 IIRC, right? so the constraint here is what SequenceExamples knows about.

May be worth stressing that somehow. add_float_feature_to_seq_example may be a mouthful, maybe just documenting that is sufficient (in the docstring) (i.e. that "int" is really int64 and always will be, because SeqExample. Then the reader gets it that it's not a point in time choice we made; the choice was made for us)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly, I will add something to the documentation to emphasize this.

lst.extend([feature_value])


def add_string_feature(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why, do we ever think we'd use this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine that during training we want to have specific weights only specific modules, then we can read off the module name from here and compute the relevant weight. It's something that we experimented with, however, it did not significantly improve on our final method. I can imagine this being useful in similar contexts though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't currently (as in, part of this patch or known upcoming ones) use it, let's not add it. It's easy to add later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used it in part of the training for some runs just not the ones that ended up giving the best policy. It would be more effort to remove this and the module_name feature than actually keep it at this point but really up to you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! just wanted to make sure it has an envisioned use (other than test)

compiler_opt/rl/generate_bc_trajectories.py Outdated Show resolved Hide resolved
compiler_opt/rl/generate_bc_trajectories.py Outdated Show resolved Hide resolved
raise ValueError(
f'Compilation loop exited at step type {0} before last step'.format(
curr_obs_dict.step_type))
native_size = curr_obs_dict.score_policy[self.reward_key]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

native_size is tied to inlining, perhaps score? same then below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to reward

curr_obs_dict.step_type))
native_size = curr_obs_dict.score_policy[self.reward_key]
native_size_list = np.float32(native_size) * np.float32(np.ones(horizon))
add_feature_list(sequence_example, native_size_list, 'reward')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I understand: the reward key could be hi_mom, but we rename it here to reward?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sequence_examples are what is going to be used for training. reward_key is the the key to be used in the environment to determine what we use as a reward, the two are somewhat independent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be worth commenting that, easier for a reader to get that "it's intentional"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if the docstring for the function makes this more clear now.

native_size_list = np.float32(native_size) * np.float32(np.ones(horizon))
add_feature_list(sequence_example, native_size_list, 'reward')
module_name_list = [self.loaded_module_spec.name for _ in range(horizon)]
add_feature_list(sequence_example, module_name_list, 'module_name')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may want to pull up in a global variable the feature names that are basically forming an API: IIUC module_name (1) can't be a feature in the original policy, and (2) is used as part of the implementation here.

You can also validate then that the given trace doesn't have this feature

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if this looks good, otherwise I haven't understood what you mean here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if the feature names already appear in what is returned by the env is done in _process_obs now.

self,
policy: Callable[[Optional[time_step.TimeStep]], np.ndarray],
) -> tf.train.SequenceExample:
"""Compiles the module with the given policy and outputs a seq. example.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the documentation should capture that the sequence example also has some special, specifically-named features (module_name...) - IIUC

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

dtype=tf.int32,
name='step_type'),
reward=tf.convert_to_tensor([reward], dtype=tf.float32, name='reward'),
discount=tf.convert_to_tensor([0.0], dtype=tf.float32, name='discount'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reward and discount here are features required by TimeStep, right? one thing that could clarify what feature names mean what is to maybe group them as static members into a couple of classes - just so there's a namespace-ing that helps readability (i.e. here it would be name=TFAgentsTimeStep.DiscountFeatureName)

Or, since there don't seem to be many names, just having globals and grouping them with comments also works.

Either way the reader should get sufficient hinting where these names come from or what the intent is with them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what this means. The names should be documented by tfagents already, why do we need to do anything here? Or do you mean the names of the tensors in tf.convert_to_tensor, if that's the case the names just match whatever is expected by TimeStep.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make things easier for a reader. Rather than realizing that these may be tfagents things (not the most discoverable thing) and then reading on that, 1-2 sentences allows me to "get it" and move forward.

@@ -111,6 +114,13 @@ def add_feature_list(seq_example: tf.train.SequenceExample,
add_function(seq_example, feature, feature_name)


@dataclasses.dataclass
class SequenceExampleFeatureNames:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment or doc string saying what this is for ("feature names that we depend on" for example).

also use these in the test

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@tvmarino tvmarino changed the title Initial ExplorationWorker commit. ExplorationWorker implements the Initial ExplorationWorker commit Oct 17, 2024
@tvmarino tvmarino merged commit f9dee45 into google:main Oct 17, 2024
15 checks passed
@tvmarino tvmarino deleted the exploration_worker branch October 17, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants