-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ERT storage as simulator cache #9767
base: main
Are you sure you want to change the base?
Use ERT storage as simulator cache #9767
Conversation
CodSpeed Performance ReportMerging #9767 will not alter performanceComparing Summary
|
4e70ffd
to
be7341e
Compare
d4d8a92
to
2202375
Compare
2202375
to
3990831
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments but basically solid.
else None | ||
), | ||
) | ||
for idx, real in enumerate(evaluator_context.realizations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This maps the indices of the control vectors to realizations/perturbations. Is that the intention? Because if the intention is to map simulation ID (= ert realization?) then this will fail if some control vectors are not evaluated due to being cached or being marked as inactive. You would need to do the same as I did in my recent PR to get the simulation ID's for the simulations that actually did run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be correct & simplified now
if constraints is not None: | ||
assert cached_constraints is not None | ||
constraints[control_idx, ...] = cached_constraints | ||
for control_idx, ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think caching is now always assumed to be on? We do have a flag for in in the configuration, maybe we should kill it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions and comments
39226fe
to
af05d6a
Compare
07bd18e
to
633a9b1
Compare
25e74a6
to
a0a6006
Compare
73303ab
to
757939d
Compare
757939d
to
fe83890
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great job :)!! I just have a few minor comments! And thanks for the clear separate commits, helps a lot in the reviewing! PS: Some comments might be outdated since I reviewed per commit, sorry about that :")
@@ -360,6 +352,9 @@ def _init_domain_transforms( | |||
) | |||
) | |||
|
|||
print("sampling...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this was for debugging purpose?
} | ||
) | ||
|
||
EPS = float(np.finfo(np.float32).eps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a silly question, just wondering, this will be on the order of 1e-7. When we compare controls, they are scaled controls right (roughly on ~(-1, 1))? Then I guess this is relatively safe to do. If not, do we ever have controls that have really small values and therefore this could be a problem?
@@ -1085,3 +1085,90 @@ def save_everest_realization_info( | |||
self._storage._write_transaction( | |||
self._path / "index.json", self._index.model_dump_json().encode("utf-8") | |||
) | |||
|
|||
@property | |||
def all_parameters_and_gen_data(self) -> pl.DataFrame | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering what is best practice: should we consider using Optional[pl.DataFrame]
instead of pl.DataFrame | None
(and other situations)?
for param_group in self.experiment.parameter_configuration: | ||
params_pd = self.load_parameters(param_group)["values"].to_pandas() | ||
|
||
assert isinstance(params_pd, pd.DataFrame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing I was wondering, should we use assertions in these cases or is it better to raise TypeError
s?
**{c: pl.Float64 for c in param_df.columns if c != "realization"}, | ||
} | ||
) | ||
param_dfs += [param_df] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question, does it matter if we use .append()
or +=
(e.g., performance, readability)?
Replaces simulator cache with that of ERT storage by introducing a dataframe of simulation controls & results per experiment, which we can search for results.
Issue
Resolves #9751
Resolves #9674
Approach
Short description of the approach
(Screenshot of new behavior in GUI if applicable)
git rebase -i main --exec 'pytest tests/ert/unit_tests -n logical -m "not integration_test"'
)When applicable