Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created a new unified flow module for RSS. #203

Merged
merged 74 commits into from
Nov 12, 2024
Merged

Conversation

YuanbinLiu
Copy link
Collaborator

@YuanbinLiu YuanbinLiu commented Nov 7, 2024

A new flow (named RssMaker) has been created to make it easy to set up and run RSS, which can be found in autoplex/auto/rss/flows.py.

@@ -1003,7 +1004,8 @@ def cur_select(

Notes
-----
This function calculates the descriptor vector for each atom, then performs CUR selection on the resulting vectors.
This function calculates the descriptor vector for each atom,
then performs CUR selection on the resulting vectors.

Adapted from:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use References as a header here ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have adapted some code from the reference. So, would using "adapted" be better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this previous comment here why Adapted won't work :
#203 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I have changed accordingly.


with open(bc_file, "w") as f:
f.writelines(contents)

def _is_metal(self, element_symbol):
def _is_metal(self, element_symbol: str) -> bool:
Copy link
Collaborator

@naik-aakash naik-aakash Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't it agreed to use pymatgen species for this part in your previous PR to reduce code duplication? Did something change in from #114 PR and here that we explicitly need this ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the reminder. I have applied the pymatgen function in the new code.


Returns
-------
str | None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe keep here only str as returns

And add another header Raises, and under that just mention It will raise RuntimeError when optimization fails?

Copy link
Collaborator Author

@YuanbinLiu YuanbinLiu Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, here we should use str | None. Since we've set a maximum number of relaxation steps, for structures far from equilibrium, even if they haven't relaxed successfully within the max steps, we can still accept them and keep rss running. When sampling, we will discard all those marked as None. Note that we often relax 10,000 structures simultaneously, so discarding some outliers won't impact the results.

@naik-aakash
Copy link
Collaborator

Hi @YuanbinLiu , you can pull changes from the main branch and enable the jace_rss test now

@JaGeo
Copy link
Collaborator

JaGeo commented Nov 11, 2024

@YuanbinLiu you might have to pull the changes on main again. I had to create a new docker image.

@YuanbinLiu
Copy link
Collaborator Author

YuanbinLiu commented Nov 12, 2024

Dear @YuanbinLiu , I will give this pull-request another more in-depth review again, but for now, I just would like to suggest that you go through my comments of this PR here #114 again, because I am not sure if you already addressed al of them. Then another thing I would like you to reconsider is the structure of the unit test files as e.g. you added an "rss" subfolder which does not fit into the current code structure (auto, data, fitting, benchmark): image Additionally the rss folder contains test data (h2o.xyz), and the unit test file names also do not comply to the current code structure and file naming (e.g. test_auto_flows.py). I will give you more feedback soon.

The unit test files have been categorized and organized now.

but they are still in a separate "rss" folder image

Ah yes. Fixed now. Thank you!

@QuantumChemist
Copy link
Collaborator

*removed 😅

Copy link
Collaborator

@QuantumChemist QuantumChemist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @YuanbinLiu , I have some suggestions to improve the code. I will have a look at the unit test files next.

autoplex/data/common/flows.py Show resolved Hide resolved
autoplex/data/common/flows.py Show resolved Hide resolved
autoplex/auto/rss/flows.py Outdated Show resolved Hide resolved
autoplex/auto/rss/jobs.py Outdated Show resolved Hide resolved
autoplex/auto/rss/flows.py Outdated Show resolved Hide resolved
autoplex/data/common/utils.py Show resolved Hide resolved
autoplex/data/common/utils.py Show resolved Hide resolved
autoplex/data/common/utils.py Show resolved Hide resolved
autoplex/data/rss/utils.py Outdated Show resolved Hide resolved
autoplex/fitting/common/regularization.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@QuantumChemist QuantumChemist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some rather minor comments and suggestions for the test files


mock_vasp(ref_paths, fake_run_vasp_kwargs)

job1 = DFTStaticLabelling(isolated_atom=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more desciptive job name please

},
).make(structures=test_structures)

job2 = collect_dft_data(vasp_dirs=job1.output)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well please

from ase.io import read
from pathlib import Path
import shutil
job = RandomizedStructure(struct_number=3,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
job = RandomizedStructure(struct_number=3,
rs_job = RandomizedStructure(struct_number=3,

'SYMMOPS':'1-2'},
num_processes=4).make()

responses = run_locally(job, ensure_success=True, create_folders=True, store=memory_jobstore)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
responses = run_locally(job, ensure_success=True, create_folders=True, store=memory_jobstore)
responses = run_locally(rs_job, ensure_success=True, create_folders=True, store=memory_jobstore)

h2o.cell = np.ones(3)*20
write(f'{test_dir}/data/h2o.xyz', h2o)

job = RandomizedStructure(struct_number=4,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
job = RandomizedStructure(struct_number=4,
rs_job = RandomizedStructure(struct_number=4,

remove_tmp_files=True,
num_processes=4).make()

_ = run_locally(job, ensure_success=True, create_folders=True, store=memory_jobstore)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ = run_locally(job, ensure_success=True, create_folders=True, store=memory_jobstore)
run_locally(rs_job, ensure_success=True, create_folders=True, store=memory_jobstore)

num_processes=4).make()

responses = run_locally(job, ensure_success=True, create_folders=True, store=memory_jobstore)
assert len(read(job.output.resolve(memory_jobstore), index=":")) == 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert len(read(job.output.resolve(memory_jobstore), index=":")) == 3
assert len(read(rs_job.output.resolve(memory_jobstore), index=":")) == 3

num_processes=4).make()

_ = run_locally(job, ensure_success=True, create_folders=True, store=memory_jobstore)
ats = read(job.output.resolve(memory_jobstore), index=":")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ats = read(job.output.resolve(memory_jobstore), index=":")
ats = read(rs_job.output.resolve(memory_jobstore), index=":")

bcur_params=bcur_params,
random_seed=None).make()

job = Flow(generate_structure, output=generate_structure.output)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same suggestion for more descriptive job names for all of the other job names as well

@@ -409,11 +411,10 @@ def test_mlip_fit_maker_with_automated_separated_dataset(
# Test if gap fit runs with pre_database_dir
gapfit = MLIPFitMaker().make(
species_list=["Li", "Cl"],
isolated_atoms_energy=[-0.28649227, -0.25638457],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gapfit takes the isolated atoms energy from the separate IsoAtom calculation entry in the xyz file directly.

@YuanbinLiu
Copy link
Collaborator Author

some rather minor comments and suggestions for the test files

The unit tests have been renamed.

@QuantumChemist
Copy link
Collaborator

some rather minor comments and suggestions for the test files

The unit tests have been renamed.

I think you didn't push it ?

@YuanbinLiu
Copy link
Collaborator Author

some rather minor comments and suggestions for the test files

The unit tests have been renamed.

I think you didn't push it ?

just pushed. I also fixed an issue with a unit test for j-ace.

@naik-aakash
Copy link
Collaborator

Hi @JaGeo , @QuantumChemist , please wait untill test durations file is updated

@JaGeo JaGeo merged commit 28c5e79 into autoatml:main Nov 12, 2024
15 checks passed
@QuantumChemist
Copy link
Collaborator

Hi @JaGeo , @QuantumChemist , please wait untill test durations file is updated

I don't plan to merge anything without @JaGeo explicitly asking me to do so 😶‍🌫️

@QuantumChemist
Copy link
Collaborator

QuantumChemist commented Nov 12, 2024

The only last general issue left now (that has to be addressed in a new PR) is the coverage for this particular file here

autoplex/data/rss/utils.py                    266    189    29%

@JaGeo
Copy link
Collaborator

JaGeo commented Nov 12, 2024

Yes, @QuantumChemist . Please see #223 😉

@QuantumChemist
Copy link
Collaborator

Yes, @QuantumChemist . Please see #223 😉

Okidoki 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants