Investigate AskTellOptimizer that doesn't track datasets #834

uri-granta · 2024-03-26T09:50:05Z

Related issue(s)/PRs:

Summary

Support an AskTellOptimizer that doesn't track datasets: instead you use tell to tell it that the datasets have been updated (passing in the complete updated datasets, not just the new points).

Also see: https://github.com/Prowler-io/automotive/pull/5598

Fully backwards compatible: yes / no

PR checklist

The quality checks are all passing
The bug case / new feature is covered by tests
Any new features are well-documented (in docstrings or notebooks)

…_tell

avullo · 2024-04-30T16:08:50Z

trieste/ask_tell_optimization.py

+    `state` property, and can be used to initialise a new instance of the optimizer.
+    """
+
+    record: Record[StateType, ProbabilisticModelType]


Will this then contain the min on each trust region? This is being added in this PR too?

Once the acquisition rule in automotive has been updated, these will be accessible in record.acquisition_state.

What do we need to do to the FeasibleSetTrustRegionAcquisitionRule in order for the record to contain the correct acquisition state?

avullo · 2024-04-30T16:27:59Z

trieste/ask_tell_optimization.py

+        new_data: Mapping[Tag, Dataset] | Dataset,
+        new_data_ixs: Optional[Sequence[TensorType]] = None,


Wouldn't it be cleaner, easier to aggregate these two into a more general concept of new_data, i.e. data and local indices?

In the mainline use case we don't bother providing indices, and instead have the AskTellOptimizer infer them from the change of the data size and number of regions. Also, this ways preserves backwards compatibility in a widely used interface.

khurram-ghani

Looks good. Left a few comments/suggestions.

khurram-ghani · 2024-05-03T13:54:41Z

trieste/bayesian_optimizer.py

@@ -999,6 +999,9 @@ def write_summary_observations(
 ) -> None:
    """Write TensorBoard summary for the current step observations."""
    for tag in models:
+        if tag not in tagged_output:


Very minor point... feel free to ignore. It is probably more intuitive to check against datasets, since those concepts exist in multiple places. datasets and tagged_output keys should match anyway.

Suggested change

if tag not in tagged_output:

if tag not in datasets:

trieste/ask_tell_optimization.py

khurram-ghani · 2024-05-03T14:47:11Z

trieste/ask_tell_optimization.py

+                raise ValueError(
+                    f"new_data global keys {global_new} doesn't "
+                    f"match dataset global keys {global_old}"
+                )


I think we should only allow this exception when new_data_ixs is None. Otherwise we would have both existing local datasets in new_data and indices for them. In which case the with_local_datasets call below would ignore those indices (it only uses them if the local tags do not exist). This could be confusing for the user.

khurram-ghani · 2024-05-03T15:13:27Z

trieste/ask_tell_optimization.py

@@ -483,7 +603,7 @@ def tell(self, new_data: Mapping[Tag, Dataset] | Dataset) -> None:
        if summary_writer:
            with summary_writer.as_default(step=logging.get_step_number()):
                write_summary_observations(
-                    self._datasets,
+                    datasets,


Similar to this change, write_summary_initial_model_fit call in __init__ above should probably use datasets instead of self._datasets, as datasets are updated for rules with local dataset.

Also, the write_summary_query_points call in ask should probably also track _dataset_ixs for completeness. Otherwise it is missing some key information when track_data==False.

khurram-ghani · 2024-05-03T15:21:01Z

trieste/ask_tell_optimization.py

+                # infer dataset indices from change in dataset sizes
+                new_dataset_len = self.dataset_len(new_data)
+                num_new_points = new_dataset_len - self._dataset_len
+                if num_new_points < 0 or num_new_points % len(self._dataset_ixs) != 0:


Maybe create a local variable num_local_datasets = len(self._dataset_ixs) and use it in the various places in this else section (8?) for clarity? Even though self._dataset_ixs gets updated part way through, its length wouldn't change.

khurram-ghani · 2024-05-03T15:55:41Z

tests/unit/acquisition/test_utils.py

+    datasets = with_local_datasets(datasets, num_local_datasets, indices)
+    for d in datasets.items():
+        print(d)
+        print("\n\n")


Prints left in by accident?

Uri Granta added 17 commits March 26, 2024 09:49

Investigate AskTellOptimizer that doesn't track datasets

94b845c

Typos

0b86589

Merge remote-tracking branch 'origin/develop' into uri/generalise_ask…

d2f7828

…_tell

Validation test

71b4b9c

Temporarily avoid isinstance(..., ProbabilisticModel)

7496525

Merge remote-tracking branch 'origin/develop' into uri/generalise_ask…

6196f7b

…_tell

Parameters aren't Tensors

0340b64

Unit test

97036da

Support initialising with local datasets

b4fe18c

Merge remote-tracking branch 'origin/develop' into uri/generalise_ask…

95c683d

…_tell

Typo

2616841

Rename parameter

c8ffaad

Handle None case

8dfb9e3

Plural

c6035c7

AskTellOptimizerState

be29017

Add integration test and see what breaks

f2533b7

Fix one of the failures

fbcefbb

avullo reviewed May 1, 2024

View reviewed changes

Uri Granta added 5 commits May 1, 2024 13:03

Fix another failure

b4626bb

More robust handling of dataset length calculation

2e494f6

Add more unit tests

33eb77c

Test local data idx handling with tell

30ea3aa

Test badly shaped new_data_idxs

ddf50cd

uri-granta marked this pull request as ready for review May 3, 2024 13:33

uri-granta requested a review from khurram-ghani May 3, 2024 13:34

khurram-ghani approved these changes May 3, 2024

View reviewed changes

Uri Granta added 2 commits May 5, 2024 18:48

Review comments

8647b65

Fix

2e1cd3e

uri-granta merged commit aabd293 into develop May 6, 2024
12 checks passed

uri-granta deleted the uri/generalise_ask_tell branch May 6, 2024 18:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate AskTellOptimizer that doesn't track datasets #834

Investigate AskTellOptimizer that doesn't track datasets #834

uri-granta commented Mar 26, 2024 •

edited

Loading

avullo Apr 30, 2024

uri-granta May 2, 2024

avullo May 14, 2024 •

edited

Loading

avullo Apr 30, 2024

uri-granta May 2, 2024

khurram-ghani left a comment

khurram-ghani May 3, 2024

khurram-ghani May 3, 2024

khurram-ghani May 3, 2024

khurram-ghani May 3, 2024

khurram-ghani May 3, 2024

		new_data: Mapping[Tag, Dataset] \| Dataset,
		new_data_ixs: Optional[Sequence[TensorType]] = None,

Investigate AskTellOptimizer that doesn't track datasets #834

Investigate AskTellOptimizer that doesn't track datasets #834

Conversation

uri-granta commented Mar 26, 2024 • edited Loading

Summary

PR checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avullo May 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

khurram-ghani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uri-granta commented Mar 26, 2024 •

edited

Loading

avullo May 14, 2024 •

edited

Loading