Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ERT storage as simulator cache #9767

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

yngve-sk
Copy link
Contributor

@yngve-sk yngve-sk commented Jan 16, 2025

Replaces simulator cache with that of ERT storage by introducing a dataframe of simulation controls & results per experiment, which we can search for results.

Issue
Resolves #9751
Resolves #9674

Approach
Short description of the approach

(Screenshot of new behavior in GUI if applicable)

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure unit tests pass locally after every commit (git rebase -i main --exec 'pytest tests/ert/unit_tests -n logical -m "not integration_test"')

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Create Backport PR to latest release

@yngve-sk yngve-sk self-assigned this Jan 16, 2025
Copy link

codspeed-hq bot commented Jan 16, 2025

CodSpeed Performance Report

Merging #9767 will not alter performance

Comparing yngve-sk:25.01.16.add-everest-realization-info-to-ert (fe83890) with main (f92f048)

Summary

✅ 25 untouched benchmarks

@yngve-sk yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch 8 times, most recently from 4e70ffd to be7341e Compare January 27, 2025 08:20
Copy link
Contributor

@verveerpj verveerpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments but basically solid.

else None
),
)
for idx, real in enumerate(evaluator_context.realizations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maps the indices of the control vectors to realizations/perturbations. Is that the intention? Because if the intention is to map simulation ID (= ert realization?) then this will fail if some control vectors are not evaluated due to being cached or being marked as inactive. You would need to do the same as I did in my recent PR to get the simulation ID's for the simulations that actually did run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be correct & simplified now

if constraints is not None:
assert cached_constraints is not None
constraints[control_idx, ...] = cached_constraints
for control_idx, (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think caching is now always assumed to be on? We do have a flag for in in the configuration, maybe we should kill it.

Copy link
Collaborator

@oyvindeide oyvindeide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions and comments

@yngve-sk yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch 3 times, most recently from 39226fe to af05d6a Compare February 12, 2025 11:54
@yngve-sk yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch 4 times, most recently from 07bd18e to 633a9b1 Compare February 25, 2025 11:47
@yngve-sk yngve-sk changed the title Store ert<->everest realization mapping Use ERT storage as simulator cache Feb 25, 2025
@yngve-sk yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch 2 times, most recently from 25e74a6 to a0a6006 Compare February 25, 2025 11:59
@yngve-sk yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch 2 times, most recently from 73303ab to 757939d Compare February 26, 2025 08:04
@yngve-sk yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch from 757939d to fe83890 Compare February 26, 2025 08:12
Copy link
Contributor

@StephanDeHoop StephanDeHoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great job :)!! I just have a few minor comments! And thanks for the clear separate commits, helps a lot in the reviewing! PS: Some comments might be outdated since I reviewed per commit, sorry about that :")

@@ -360,6 +352,9 @@ def _init_domain_transforms(
)
)

print("sampling...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was for debugging purpose?

}
)

EPS = float(np.finfo(np.float32).eps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a silly question, just wondering, this will be on the order of 1e-7. When we compare controls, they are scaled controls right (roughly on ~(-1, 1))? Then I guess this is relatively safe to do. If not, do we ever have controls that have really small values and therefore this could be a problem?

@@ -1085,3 +1085,90 @@ def save_everest_realization_info(
self._storage._write_transaction(
self._path / "index.json", self._index.model_dump_json().encode("utf-8")
)

@property
def all_parameters_and_gen_data(self) -> pl.DataFrame | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering what is best practice: should we consider using Optional[pl.DataFrame] instead of pl.DataFrame | None (and other situations)?

for param_group in self.experiment.parameter_configuration:
params_pd = self.load_parameters(param_group)["values"].to_pandas()

assert isinstance(params_pd, pd.DataFrame)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing I was wondering, should we use assertions in these cases or is it better to raise TypeErrors?

**{c: pl.Float64 for c in param_df.columns if c != "realization"},
}
)
param_dfs += [param_df]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question, does it matter if we use .append() or += (e.g., performance, readability)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Reviewed
Development

Successfully merging this pull request may close these issues.

Add Everest realization metadata to ERT storage Use ERT storage for Everest simulator cache
4 participants