Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Integration for CP2K #383

Open
wants to merge 55 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
c856382
Fixed import.
nwinner Apr 22, 2019
8ff38d1
A lot of clean-up. Had not been addressed in a while by the devs it l…
nwinner Apr 25, 2019
87ea810
Added lammps_input_set so that atom_style can be retained as a variable
nwinner Apr 25, 2019
29d7e94
Playing with the NEB workflow. Might try to figure out a way to
nwinner May 7, 2019
6c845e6
Currently thinking it would be best to use env_chk with vasp_neb_cmd …
nwinner May 7, 2019
02776ff
fix
nwinner May 8, 2019
68e5465
fix
nwinner May 8, 2019
2600dca
Minor correction to glue_tasks. If reading in the WAVECAR (which coul…
nwinner May 16, 2019
f294cf3
Lammps should support env_chk
nwinner Jun 5, 2019
2665de6
Lammps should support env_chk
nwinner Jun 5, 2019
c2a17fa
Exploring a new firetask.
nwinner Jun 5, 2019
6c3872d
Debug.
nwinner Jun 5, 2019
5108d19
Lammps should support env_chk
nwinner Jun 6, 2019
21b9fb2
Trying something different. Adding a seperate file for my user task.
nwinner Jun 6, 2019
17a466a
Added a brief function to transmute after relaxing.
nwinner Jun 6, 2019
9b1cd17
Working on this new task.
nwinner Jun 6, 2019
914b9cd
Working on Lammps run_calc
nwinner Jun 6, 2019
7005625
Debug
nwinner Jun 6, 2019
1d2e29e
Lammps Run
nwinner Jun 6, 2019
e3d670f
Working on it.
nwinner Jun 6, 2019
748e0c9
debug
nwinner Jun 6, 2019
9bc9563
Lammps should support env_chk
nwinner Jun 6, 2019
e46e77a
Lammps should support env_chk
nwinner Jun 6, 2019
31830ba
Debug
nwinner Jun 6, 2019
39d04a7
debug
nwinner Jun 6, 2019
33e6987
debug
nwinner Jun 6, 2019
e298e49
debug
nwinner Jun 6, 2019
d71f00c
debug
nwinner Jun 6, 2019
f76d774
Debug.
nwinner Jun 12, 2019
b087ea2
First design of cp2k module for atomate.
nwinner Feb 10, 2020
6262805
First commit for the cp2k module in atomate. Pretty rough, but to start
nwinner Feb 23, 2020
d22d72d
Continuing testing and refinement. Added database and drone functions…
nwinner Mar 10, 2020
b137ee9
File copying is a little messy right now. The change of file names ac…
nwinner Mar 12, 2020
9168197
Wavefunction files are bytes files. File copying needs to handle that.
nwinner Mar 12, 2020
ab1bd74
File copying is now to the point where things will at least run
nwinner Mar 13, 2020
7051dd7
Merge with master
nwinner Mar 15, 2020
10f846e
Merge https://github.com/hackingmaterials/atomate into cp2k
nwinner Mar 15, 2020
229a187
user_tasks.py not needed, now in MPMorph
nwinner Mar 15, 2020
808cb60
Database was missing kwargs, led to errors.
nwinner Mar 18, 2020
58f2152
Cleanup.
nwinner Mar 21, 2020
361cad5
Beginning to integrate the ability to do defect workflows by building
nwinner Mar 31, 2020
c0e7acd
Debugging last commit on the cluster.
nwinner Apr 1, 2020
226eabe
Minor refinements
nwinner Apr 19, 2020
0eaded2
Minor refinements
nwinner Apr 20, 2020
b8867c5
Typo.
nwinner Apr 20, 2020
2dd859a
Testing from very simple cell_opt wfs on the cluster.
nwinner May 12, 2020
b801820
Database manipulation for CalcDb task
nwinner May 18, 2020
0bac0d9
Missing import statement.
nwinner May 22, 2020
f0cc95a
Drones: fft can only apply to array, not array of arrays
nwinner May 26, 2020
6edb5db
Incorrect hartree file parsing. Works now, but might change again later.
nwinner Jun 9, 2020
c67165e
Bug.
nwinner Jun 15, 2020
6d9c4e1
Corrections to the drone assimilation.
nwinner Jul 1, 2020
2b3b579
Written incorrectly. It was working because the underlying sets were not
nwinner Jul 29, 2020
60c15a9
Fixing write inputs.py.
nwinner Jul 29, 2020
4201338
Quick commit. Saving state.
nwinner Aug 9, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added atomate/cp2k/__init__.py
Empty file.
281 changes: 281 additions & 0 deletions atomate/cp2k/database.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
# coding: utf-8

from monty.json import MontyEncoder
from monty.serialization import loadfn

"""
This module defines the database classes.
"""

import zlib
import json
from bson import ObjectId

from pymatgen.electronic_structure.bandstructure import (
BandStructure,
BandStructureSymmLine,
)
from pymatgen.electronic_structure.dos import CompleteDos

import gridfs
from pymongo import ASCENDING, DESCENDING

from atomate.utils.database import CalcDb
from atomate.utils.utils import get_logger

__author__ = "Nicholas Winner"

logger = get_logger(__name__)


class Cp2kCalcDb(CalcDb):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different to VASPCalcDb at all? If not, we should probably rename VASPCalcDb something more general and then it can be used in the CP2K module also.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not different at all I think, a more general VASPCalcDb would be a good idea.

"""
Class to help manage database insertions of cp2k drones
"""

def __init__(
self,
host="localhost",
port=27017,
database="cp2k",
collection="tasks",
user=None,
password=None,
**kwargs
):
super(Cp2kCalcDb, self).__init__(
host, port, database, collection, user, password, **kwargs
)

def build_indexes(self, indexes=None, background=True):
"""
Build the indexes.

Args:
indexes (list): list of single field indexes to be built.
background (bool): Run in the background or not.

TODO: make sure that the index building is sensible and check for
existing indexes.
"""
_indices = (
indexes
if indexes
else [
"formula_pretty",
"formula_anonymous",
"output.energy",
"output.energy_per_atom",
"dir_name",
]
)
self.collection.create_index(
"task_id", unique=True, background=background
)
# build single field indexes
for i in _indices:
self.collection.create_index(i, background=background)
# build compound indexes
for formula in ("formula_pretty", "formula_anonymous"):
self.collection.create_index(
[
(formula, ASCENDING),
("output.energy", DESCENDING),
("completed_at", DESCENDING),
],
background=background,
)
self.collection.create_index(
[
(formula, ASCENDING),
("output.energy_per_atom", DESCENDING),
("completed_at", DESCENDING),
],
background=background,
)

def insert_task(self, task_doc, use_gridfs=False):
"""
Inserts a task document (e.g., as returned by Drone.assimilate()) into the database.
Handles putting DOS, band structure and charge density into GridFS as needed.
During testing, a percentage of runs on some clusters had corrupted AECCAR files when even if everything else about the calculation looked OK.
So we do a quick check here and only record the AECCARs if they are valid

Args:
task_doc: (dict) the task document
use_gridfs (bool) use gridfs for bandstructures and DOS
Returns:
(int) - task_id of inserted document
"""
dos = None

# move dos BS and CHGCAR from doc to gridfs
if use_gridfs and "calcs_reversed" in task_doc:

if (
"dos" in task_doc["calcs_reversed"][0]
): # only store idx=0 (last step)
dos = json.dumps(
task_doc["calcs_reversed"][0]["dos"], cls=MontyEncoder
)
del task_doc["calcs_reversed"][0]["dos"]

# insert the task document
t_id = self.insert(task_doc)

# insert the dos into gridfs and update the task document
if dos:
dos_gfs_id, compression_type = self.insert_gridfs(
dos, "dos_fs", task_id=t_id
)
self.collection.update_one(
{"task_id": t_id},
{
"$set": {
"calcs_reversed.0.dos_compression": compression_type
}
},
)
self.collection.update_one(
{"task_id": t_id},
{"$set": {"calcs_reversed.0.dos_fs_id": dos_gfs_id}},
)

return t_id

def retrieve_task(self, task_id):
"""
Retrieves a task document and unpacks the band structure and DOS as dict

Args:
task_id: (int) task_id to retrieve

Returns:
(dict) complete task document with BS + DOS included

"""
task_doc = self.collection.find_one({"task_id": task_id})
calc = task_doc["calcs_reversed"][0]
if "dos_fs_id" in calc:
dos = self.get_dos(task_id)
calc["dos"] = dos.as_dict()
return task_doc

def insert_gridfs(
self, d, collection="fs", compress=True, oid=None, task_id=None
):
"""
Insert the given document into GridFS.

Args:
d (dict): the document
collection (string): the GridFS collection name
compress (bool): Whether to compress the data or not
oid (ObjectId()): the _id of the file; if specified, it must not already exist in GridFS
task_id(int or str): the task_id to store into the gridfs metadata
Returns:
file id, the type of compression used.
"""
oid = oid or ObjectId()
compression_type = None

if compress:
d = zlib.compress(d.encode(), compress)
compression_type = "zlib"

fs = gridfs.GridFS(self.db, collection)
if task_id:
# Putting task id in the metadata subdocument as per mongo specs:
# https://github.com/mongodb/specifications/blob/master/source/gridfs/gridfs-spec.rst#terms
fs_id = fs.put(
d,
_id=oid,
metadata={"task_id": task_id, "compression": compression_type},
)
else:
fs_id = fs.put(
d, _id=oid, metadata={"compression": compression_type}
)

return fs_id, compression_type

def get_band_structure(self, task_id):
m_task = self.collection.find_one(
{"task_id": task_id}, {"calcs_reversed": 1}
)
fs_id = m_task["calcs_reversed"][0]["bandstructure_fs_id"]
fs = gridfs.GridFS(self.db, "bandstructure_fs")
bs_json = zlib.decompress(fs.get(fs_id).read())
bs_dict = json.loads(bs_json.decode())
if bs_dict["@class"] == "BandStructure":
return BandStructure.from_dict(bs_dict)
elif bs_dict["@class"] == "BandStructureSymmLine":
return BandStructureSymmLine.from_dict(bs_dict)
else:
raise ValueError(
"Unknown class for band structure! {}".format(bs_dict["@class"])
)

def get_dos(self, task_id):
m_task = self.collection.find_one(
{"task_id": task_id}, {"calcs_reversed": 1}
)
fs_id = m_task["calcs_reversed"][0]["dos_fs_id"]
fs = gridfs.GridFS(self.db, "dos_fs")
dos_json = zlib.decompress(fs.get(fs_id).read())
dos_dict = json.loads(dos_json.decode())
return CompleteDos.from_dict(dos_dict)

def reset(self):
self.collection.delete_many({})
self.db.counter.delete_one({"_id": "taskid"})
self.db.counter.insert_one({"_id": "taskid", "c": 0})
self.db.dos_fs.files.delete_many({})
self.db.dos_fs.chunks.delete_many({})
self.build_indexes()

# TODO: This become part of CalcDb, VASP/CP2K specific Db methods dont make sense anyway
@classmethod
def from_db_file(cls, db_file, admin=True, user_settings={}):
"""
Create MMDB from database file. File requires host, port, database,
collection, and optionally admin_user/readonly_user and
admin_password/readonly_password

Args:
db_file (str): path to the file containing the credentials
admin (bool): whether to use the admin user
user_settings (dict): User settings to overwrite those in the db file.
Example: db_file is used to acquire all credentials, but
{'collection': 'test'} is used to overwrite the default DB insertion
collection to something else.

Returns:
MMDb object
"""
creds = loadfn(db_file)
if user_settings:
creds.update(user_settings)

if admin and "admin_user" not in creds and "readonly_user" in creds:
raise ValueError("Trying to use admin credentials, "
"but no admin credentials are defined. "
"Use admin=False if only read_only "
"credentials are available.")

if admin:
user = creds.get("admin_user")
password = creds.get("admin_password")
else:
user = creds.get("readonly_user")
password = creds.get("readonly_password")

kwargs = creds.get("mongoclient_kwargs", {}) # any other MongoClient kwargs can go here ...

if "authsource" in creds:
kwargs["authsource"] = creds["authsource"]
else:
kwargs["authsource"] = creds["database"]

return cls(creds["host"], int(creds.get("port", 27017)), creds["database"], creds["collection"],
user, password, **kwargs)
Loading