Created a new unified flow module for RSS. #203

YuanbinLiu · 2024-11-07T17:44:23Z

A new flow (named RssMaker) has been created to make it easy to set up and run RSS, which can be found in autoplex/auto/rss/flows.py.

Update pyproject

naik-aakash · 2024-11-11T15:46:00Z

autoplex/data/common/utils.py

@@ -1003,7 +1004,8 @@ def cur_select(

    Notes
    -----
-    This function calculates the descriptor vector for each atom, then performs CUR selection on the resulting vectors.
+    This function calculates the descriptor vector for each atom,
+    then performs CUR selection on the resulting vectors.

    Adapted from:


Use References as a header here ?

We have adapted some code from the reference. So, would using "adapted" be better?

See this previous comment here why Adapted won't work :
#203 (comment)

Ah, I see. I have changed accordingly.

naik-aakash · 2024-11-11T15:56:14Z

autoplex/data/rss/jobs.py


        with open(bc_file, "w") as f:
            f.writelines(contents)

-    def _is_metal(self, element_symbol):
+    def _is_metal(self, element_symbol: str) -> bool:


Wasn't it agreed to use pymatgen species for this part in your previous PR to reduce code duplication? Did something change in from #114 PR and here that we explicitly need this ?

Thank you for the reminder. I have applied the pymatgen function in the new code.

naik-aakash · 2024-11-11T16:01:42Z

autoplex/data/rss/utils.py

+
+    Returns
+    -------
+    str | None


Maybe keep here only str as returns

And add another header Raises, and under that just mention It will raise RuntimeError when optimization fails?

No, here we should use str | None. Since we've set a maximum number of relaxation steps, for structures far from equilibrium, even if they haven't relaxed successfully within the max steps, we can still accept them and keep rss running. When sampling, we will discard all those marked as None. Note that we often relax 10,000 structures simultaneously, so discarding some outliers won't impact the results.

naik-aakash · 2024-11-11T17:21:14Z

Hi @YuanbinLiu , you can pull changes from the main branch and enable the jace_rss test now

JaGeo · 2024-11-11T23:01:47Z

@YuanbinLiu you might have to pull the changes on main again. I had to create a new docker image.

YuanbinLiu · 2024-11-12T00:04:15Z

Dear @YuanbinLiu , I will give this pull-request another more in-depth review again, but for now, I just would like to suggest that you go through my comments of this PR here #114 again, because I am not sure if you already addressed al of them. Then another thing I would like you to reconsider is the structure of the unit test files as e.g. you added an "rss" subfolder which does not fit into the current code structure (auto, data, fitting, benchmark): Additionally the rss folder contains test data (h2o.xyz), and the unit test file names also do not comply to the current code structure and file naming (e.g. test_auto_flows.py). I will give you more feedback soon.

The unit test files have been categorized and organized now.

but they are still in a separate "rss" folder

Ah yes. Fixed now. Thank you!

QuantumChemist · 2024-11-12T10:56:20Z

*removed 😅

QuantumChemist

Hey @YuanbinLiu , I have some suggestions to improve the code. I will have a look at the unit test files next.

autoplex/data/common/flows.py

autoplex/auto/rss/flows.py

autoplex/auto/rss/jobs.py

autoplex/auto/rss/flows.py

autoplex/data/common/utils.py

autoplex/data/rss/utils.py

autoplex/fitting/common/regularization.py

QuantumChemist

some rather minor comments and suggestions for the test files

QuantumChemist · 2024-11-12T13:17:09Z

tests/data/test_datagen_flows.py

+
+    mock_vasp(ref_paths, fake_run_vasp_kwargs)
+
+    job1 = DFTStaticLabelling(isolated_atom=True, 


more desciptive job name please

QuantumChemist · 2024-11-12T13:17:24Z

tests/data/test_datagen_flows.py

+                    },
+                    ).make(structures=test_structures)
+
+    job2 = collect_dft_data(vasp_dirs=job1.output)


here as well please

QuantumChemist · 2024-11-12T13:18:25Z