-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate AskTellOptimizer that doesn't track datasets #834
Conversation
`state` property, and can be used to initialise a new instance of the optimizer. | ||
""" | ||
|
||
record: Record[StateType, ProbabilisticModelType] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this then contain the min on each trust region? This is being added in this PR too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the acquisition rule in automotive has been updated, these will be accessible in record.acquisition_state
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we need to do to the FeasibleSetTrustRegionAcquisitionRule
in order for the record to contain the correct acquisition state?
new_data: Mapping[Tag, Dataset] | Dataset, | ||
new_data_ixs: Optional[Sequence[TensorType]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be cleaner, easier to aggregate these two into a more general concept of new_data
, i.e. data and local indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the mainline use case we don't bother providing indices, and instead have the AskTellOptimizer infer them from the change of the data size and number of regions. Also, this ways preserves backwards compatibility in a widely used interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Left a few comments/suggestions.
@@ -999,6 +999,9 @@ def write_summary_observations( | |||
) -> None: | |||
"""Write TensorBoard summary for the current step observations.""" | |||
for tag in models: | |||
if tag not in tagged_output: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor point... feel free to ignore. It is probably more intuitive to check against datasets
, since those concepts exist in multiple places. datasets
and tagged_output
keys should match anyway.
if tag not in tagged_output: | |
if tag not in datasets: |
raise ValueError( | ||
f"new_data global keys {global_new} doesn't " | ||
f"match dataset global keys {global_old}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only allow this exception when new_data_ixs is None
. Otherwise we would have both existing local datasets in new_data
and indices for them. In which case the with_local_datasets
call below would ignore those indices (it only uses them if the local tags do not exist). This could be confusing for the user.
@@ -483,7 +603,7 @@ def tell(self, new_data: Mapping[Tag, Dataset] | Dataset) -> None: | |||
if summary_writer: | |||
with summary_writer.as_default(step=logging.get_step_number()): | |||
write_summary_observations( | |||
self._datasets, | |||
datasets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to this change, write_summary_initial_model_fit
call in __init__
above should probably use datasets
instead of self._datasets
, as datasets are updated for rules with local dataset.
Also, the write_summary_query_points
call in ask
should probably also track _dataset_ixs
for completeness. Otherwise it is missing some key information when track_data==False
.
trieste/ask_tell_optimization.py
Outdated
# infer dataset indices from change in dataset sizes | ||
new_dataset_len = self.dataset_len(new_data) | ||
num_new_points = new_dataset_len - self._dataset_len | ||
if num_new_points < 0 or num_new_points % len(self._dataset_ixs) != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe create a local variable num_local_datasets = len(self._dataset_ixs)
and use it in the various places in this else
section (8?) for clarity? Even though self._dataset_ixs
gets updated part way through, its length wouldn't change.
tests/unit/acquisition/test_utils.py
Outdated
datasets = with_local_datasets(datasets, num_local_datasets, indices) | ||
for d in datasets.items(): | ||
print(d) | ||
print("\n\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prints left in by accident?
Related issue(s)/PRs:
Summary
Support an AskTellOptimizer that doesn't track datasets: instead you use tell to tell it that the datasets have been updated (passing in the complete updated datasets, not just the new points).
Also see: https://github.com/Prowler-io/automotive/pull/5598
Fully backwards compatible: yes / no
PR checklist