New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Use ERT storage as simulator cache #9767

Open

yngve-sk wants to merge 5 commits into equinor:main from yngve-sk:25.01.16.add-everest-realization-info-to-ert

Contributor

yngve-sk commented Jan 16, 2025 •

edited

Loading

Replaces simulator cache with that of ERT storage by introducing a dataframe of simulation controls & results per experiment, which we can search for results.

Issue
Resolves #9751
Resolves #9674

Approach
Short description of the approach

(Screenshot of new behavior in GUI if applicable)

PR title captures the intent of the changes, and is fitting for release notes.
Added appropriate release note label
Commit history is consistent and clean, in line with the contribution guidelines.
Make sure unit tests pass locally after every commit (git rebase -i main --exec 'pytest tests/ert/unit_tests -n logical -m "not integration_test"')

When applicable

When there are user facing changes: Updated documentation
New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
Large PR: Prepare changes in small commits for more convenient review
Bug fix: Add regression test for the bug
Bug fix: Create Backport PR to latest release

yngve-sk self-assigned this

codspeed-hq bot commented Jan 16, 2025 •

edited

Loading

CodSpeed Performance Report

Merging #9767 will not alter performance

_{Comparing yngve-sk:25.01.16.add-everest-realization-info-to-ert (fe83890) with main (f92f048)}

Summary

✅ 25 untouched benchmarks

yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch 8 times, most recently from 4e70ffd to be7341e Compare

January 27, 2025 08:20

verveerpj reviewed

View reviewed changes

src/ert/run_models/everest_run_model.py Outdated Show resolved Hide resolved

yngve-sk mentioned this pull request

Add dev documentation on everest vs ert data models #9820

Merged

9 tasks

yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch 4 times, most recently from d4d8a92 to 2202375 Compare

January 29, 2025 12:54

This was referenced Feb 3, 2025

Everest Storage: Load parameters by pointing to ERT storage #9936

Open

Everest Storage: Load objectives|constraints as responses from ERT storage #9937

Open

Everest Storage: Load batch_ aggregations from ERT storage #9938

Open

yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch from 2202375 to 3990831 Compare

February 3, 2025 07:55

verveerpj reviewed

View reviewed changes

Contributor

verveerpj left a comment

Some comments but basically solid.

src/ert/run_models/everest_run_model.py Outdated Show resolved Hide resolved

src/ert/storage/everest_ensemble.py Outdated Show resolved Hide resolved

src/ert/storage/everest_experiment.py Outdated Show resolved Hide resolved

src/ert/run_models/everest_run_model.py Outdated

+                                  else None
+                              ),
+                          )
+                          for idx, real in enumerate(evaluator_context.realizations)

Contributor

verveerpj Feb 7, 2025

This maps the indices of the control vectors to realizations/perturbations. Is that the intention? Because if the intention is to map simulation ID (= ert realization?) then this will fail if some control vectors are not evaluated due to being cached or being marked as inactive. You would need to do the same as I did in my recent PR to get the simulation ID's for the simulations that actually did run.

Contributor Author

yngve-sk Feb 25, 2025

Should be correct & simplified now

src/ert/run_models/everest_run_model.py Outdated Show resolved Hide resolved

src/ert/run_models/everest_run_model.py Outdated

-                              if constraints is not None:
-                                  assert cached_constraints is not None
-                                  constraints[control_idx, ...] = cached_constraints
+                      for control_idx, (

Contributor

verveerpj Feb 7, 2025

I think caching is now always assumed to be on? We do have a flag for in in the configuration, maybe we should kill it.

oyvindeide reviewed

View reviewed changes

Collaborator

oyvindeide left a comment

Some questions and comments

src/ert/run_models/everest_run_model.py Outdated Show resolved Hide resolved

src/ert/run_models/everest_run_model.py Outdated Show resolved Hide resolved

src/ert/storage/everest_ensemble.py Outdated Show resolved Hide resolved

src/ert/storage/local_storage.py Outdated Show resolved Hide resolved

yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch 3 times, most recently from 39226fe to af05d6a Compare

February 12, 2025 11:54

yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch 4 times, most recently from 07bd18e to 633a9b1 Compare

February 25, 2025 11:47

yngve-sk changed the title ~~Store ert<->everest realization mapping~~ Use ERT storage as simulator cache

yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch 2 times, most recently from 25e74a6 to a0a6006 Compare

February 25, 2025 11:59

yngve-sk added 3 commits

February 26, 2025 08:56


          Store everest realization info in ERT ensemble

fc2e57b


          Keep only one experiment per EverestRunModel instance

d31200a


          Use pl.Int32 as perturbation dtype to account for -1

e528bba

yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch 2 times, most recently from 73303ab to 757939d Compare

February 26, 2025 08:04

yngve-sk added 2 commits

February 26, 2025 09:11


          Use ERT storage as simulator cache

9a16c9f


          Add test for parameters and gen data consolidation

fe83890

yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch from 757939d to fe83890 Compare

February 26, 2025 08:12

StephanDeHoop approved these changes

View reviewed changes

Contributor

StephanDeHoop left a comment •

edited

Loading

Really great job :)!! I just have a few minor comments! And thanks for the clear separate commits, helps a lot in the reviewing! PS: Some comments might be outdated since I reviewed per commit, sorry about that :")

src/ert/run_models/everest_run_model.py Outdated

@@ @@ -360,6 +352,9 @@ def _init_domain_transforms( @@
                               )
                           )
+                          print("sampling...")

Contributor

StephanDeHoop Feb 26, 2025

I guess this was for debugging purpose?

src/ert/run_models/everest_run_model.py

+                              }
+                          )
+                          EPS = float(np.finfo(np.float32).eps)

Contributor

StephanDeHoop Feb 26, 2025

Maybe a silly question, just wondering, this will be on the order of 1e-7. When we compare controls, they are scaled controls right (roughly on ~(-1, 1))? Then I guess this is relatively safe to do. If not, do we ever have controls that have really small values and therefore this could be a problem?

src/ert/storage/local_ensemble.py

@@ @@ -1085,3 +1085,90 @@ def save_everest_realization_info( @@
                       self._storage._write_transaction(
                           self._path / "index.json", self._index.model_dump_json().encode("utf-8")
                       )
+                  @property
+                  def all_parameters_and_gen_data(self) -> pl.DataFrame | None:

Contributor

StephanDeHoop Feb 26, 2025

Just wondering what is best practice: should we consider using Optional[pl.DataFrame] instead of pl.DataFrame | None (and other situations)?

src/ert/storage/local_ensemble.py

+                      for param_group in self.experiment.parameter_configuration:
+                          params_pd = self.load_parameters(param_group)["values"].to_pandas()
+                          assert isinstance(params_pd, pd.DataFrame)

Contributor

StephanDeHoop Feb 26, 2025

Another thing I was wondering, should we use assertions in these cases or is it better to raise TypeErrors?

src/ert/storage/local_ensemble.py

+                                  **{c: pl.Float64 for c in param_df.columns if c != "realization"},
+                              }
+                          )
+                          param_dfs += [param_df]

Contributor

StephanDeHoop Feb 26, 2025

Just a question, does it matter if we use .append() or += (e.g., performance, readability)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet