Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate AskTellOptimizer that doesn't track datasets #834

Merged
merged 24 commits into from
May 6, 2024

Conversation

uri-granta
Copy link
Collaborator

@uri-granta uri-granta commented Mar 26, 2024

Related issue(s)/PRs:

Summary

Support an AskTellOptimizer that doesn't track datasets: instead you use tell to tell it that the datasets have been updated (passing in the complete updated datasets, not just the new points).

Also see: https://github.com/Prowler-io/automotive/pull/5598

Fully backwards compatible: yes / no

PR checklist

  • The quality checks are all passing
  • The bug case / new feature is covered by tests
  • Any new features are well-documented (in docstrings or notebooks)

`state` property, and can be used to initialise a new instance of the optimizer.
"""

record: Record[StateType, ProbabilisticModelType]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this then contain the min on each trust region? This is being added in this PR too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the acquisition rule in automotive has been updated, these will be accessible in record.acquisition_state.

Copy link
Collaborator

@avullo avullo May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we need to do to the FeasibleSetTrustRegionAcquisitionRule in order for the record to contain the correct acquisition state?

Comment on lines +504 to +505
new_data: Mapping[Tag, Dataset] | Dataset,
new_data_ixs: Optional[Sequence[TensorType]] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be cleaner, easier to aggregate these two into a more general concept of new_data, i.e. data and local indices?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the mainline use case we don't bother providing indices, and instead have the AskTellOptimizer infer them from the change of the data size and number of regions. Also, this ways preserves backwards compatibility in a widely used interface.

@uri-granta uri-granta marked this pull request as ready for review May 3, 2024 13:33
Copy link
Collaborator

@khurram-ghani khurram-ghani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Left a few comments/suggestions.

@@ -999,6 +999,9 @@ def write_summary_observations(
) -> None:
"""Write TensorBoard summary for the current step observations."""
for tag in models:
if tag not in tagged_output:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor point... feel free to ignore. It is probably more intuitive to check against datasets, since those concepts exist in multiple places. datasets and tagged_output keys should match anyway.

Suggested change
if tag not in tagged_output:
if tag not in datasets:

trieste/ask_tell_optimization.py Outdated Show resolved Hide resolved
trieste/ask_tell_optimization.py Outdated Show resolved Hide resolved
raise ValueError(
f"new_data global keys {global_new} doesn't "
f"match dataset global keys {global_old}"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only allow this exception when new_data_ixs is None. Otherwise we would have both existing local datasets in new_data and indices for them. In which case the with_local_datasets call below would ignore those indices (it only uses them if the local tags do not exist). This could be confusing for the user.

@@ -483,7 +603,7 @@ def tell(self, new_data: Mapping[Tag, Dataset] | Dataset) -> None:
if summary_writer:
with summary_writer.as_default(step=logging.get_step_number()):
write_summary_observations(
self._datasets,
datasets,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to this change, write_summary_initial_model_fit call in __init__ above should probably use datasets instead of self._datasets, as datasets are updated for rules with local dataset.

Also, the write_summary_query_points call in ask should probably also track _dataset_ixs for completeness. Otherwise it is missing some key information when track_data==False.

# infer dataset indices from change in dataset sizes
new_dataset_len = self.dataset_len(new_data)
num_new_points = new_dataset_len - self._dataset_len
if num_new_points < 0 or num_new_points % len(self._dataset_ixs) != 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe create a local variable num_local_datasets = len(self._dataset_ixs) and use it in the various places in this else section (8?) for clarity? Even though self._dataset_ixs gets updated part way through, its length wouldn't change.

datasets = with_local_datasets(datasets, num_local_datasets, indices)
for d in datasets.items():
print(d)
print("\n\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prints left in by accident?

@uri-granta uri-granta merged commit aabd293 into develop May 6, 2024
12 checks passed
@uri-granta uri-granta deleted the uri/generalise_ask_tell branch May 6, 2024 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants