You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During our revamping of the post-processing, we classified properties (and operations) as per_atom and per_molecule. At the time it was discussed that we should probably adopt the same terminology within the hdf5 datasets, rather than prior scheme we developed (series_mol, series_atom, single_mol, single_atom, single_rec).
The original scheme was meant to provide a lot of flexibility and reduce file size for properties which do not depend on the conformation (e.g., atomic number, total_charge, etc.). The labels we defined basically tell us what the dimensions in the various arrays mean and how to handle them. I think at this point we can decide on basically 3 categories "per_atom", "per_molecule" and then everything else (which doesn't really need a label). We can just enforce that for all data, dim=0 is length n_configs, thus only require to define if a per_molecule or per_atom property. This of course is slightly less data efficient for something like total_charge, but at this point, I can't think of too many other properties where there would not be a conformer dependence (and changing from a single value to an array of values in the spice datasets resulted in a negligible change in file size).
By defining per_atom and per_molecule, we could define our own types that would potentially make it clearer what dimensions a given tensor in the code should have (at least in the first 2 dimensions).
This should simplify the hdf5 loader and make it easier to validate the inputs as well.
The text was updated successfully, but these errors were encountered:
During our revamping of the post-processing, we classified properties (and operations) as per_atom and per_molecule. At the time it was discussed that we should probably adopt the same terminology within the hdf5 datasets, rather than prior scheme we developed (series_mol, series_atom, single_mol, single_atom, single_rec).
The original scheme was meant to provide a lot of flexibility and reduce file size for properties which do not depend on the conformation (e.g., atomic number, total_charge, etc.). The labels we defined basically tell us what the dimensions in the various arrays mean and how to handle them. I think at this point we can decide on basically 3 categories "per_atom", "per_molecule" and then everything else (which doesn't really need a label). We can just enforce that for all data, dim=0 is length n_configs, thus only require to define if a per_molecule or per_atom property. This of course is slightly less data efficient for something like total_charge, but at this point, I can't think of too many other properties where there would not be a conformer dependence (and changing from a single value to an array of values in the spice datasets resulted in a negligible change in file size).
By defining per_atom and per_molecule, we could define our own types that would potentially make it clearer what dimensions a given tensor in the code should have (at least in the first 2 dimensions).
This should simplify the hdf5 loader and make it easier to validate the inputs as well.
The text was updated successfully, but these errors were encountered: