-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local models and datasets #788
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments on the first couple of files. Will hopefully have more time to continue looking at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to look at the tests but looking good so far! Various small comments, suggestions and questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a few more comments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't gone through it in details, there is a lot going on here...
my biggest comment is on adding additional complexity to ask-tell and bayesian_optimizer - do we really need to take care of the local models there?
very few acq rules will make use of local models, why not taking care of datasets and model training in the acquisition rule itself, in those few that make use of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok after a discussion
Related issue(s)/PRs: #782
Summary
This PR adds support for local models and datasets. The following scenarios are supported:
The initial dataset can be a global dataset sampled from the whole search space. This data will be replicated to each of the regions on the first iteration and subsequently each region will have an associated local dataset. For batch-TR algorithm, the dataset for each region are filtered after each iteration to only contain the points in the region (but
TREGO
doesn't do this).Note: this replication of initial data can potentially cause an issue when a global model is being used, as the points may be repeated. This will only be an issue if regions overlap and both contain that initial data-point (as filtering would otherwise remove duplicates). The main way to avoid this issue is to provide local initial datasets, instead of a global initial dataset.
Thetrust_region
notebook contains a new temporaryTEST
section, just to show how local models can be used in the notebook. It is worth noting in thegif
that the query-points are filtered to be only inside the boxes for each iteration. This section is only for demonstration and will be removed before merging this PR. A follow-on PR will update theTURBO
section to use local models and demonstrate this functionality.Fully backwards compatible: no
The
BatchTrustRegion
rule acquisition returns rank 3 points, instead of rank 2 as for other rules (and previous trust-region rules). This means the users should use the new batched observer with this rule. That is already taken care of inBayesianOptimizer
. However, withAskTellOptimizer
the users should use the batched observer as follows:PR checklist