Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InChIKey property #1

Merged
merged 4 commits into from
Jan 16, 2024
Merged

InChIKey property #1

merged 4 commits into from
Jan 16, 2024

Conversation

merkys
Copy link
Member

@merkys merkys commented Jan 15, 2024

This PR introduces the InChIKey property. I tried to keep the definition as close to Materials-Consortia/OPTIMADE#466 as possible.

@merkys merkys requested review from rartino, vaitkus and ml-evs January 15, 2024 12:23
@rartino
Copy link
Contributor

rartino commented Jan 15, 2024

@ml-evs @merkys I guess this is a first point to discuss regarding these namespace providers: should now the field name, filenames, etc. be prefixed? I.e. _cheminformatics _inchikey? I would think so (although, cheminformatics is a bit lengthy for a prefix...)

@ml-evs
Copy link
Member

ml-evs commented Jan 15, 2024

Hmm, I was imagining that the fields are unprefixed in the definitions and then a global prefix is applied at the schema build level (_cheminfo?)

@merkys
Copy link
Member Author

merkys commented Jan 15, 2024

Good point. There are advantages/disadvantages of having both explicit and implicit (as @ml-evs assumed) prefixes, but I lean to the side of excluding them.

@rartino
Copy link
Contributor

rartino commented Jan 15, 2024

Hmm, I was imagining that the fields are unprefixed in the definitions and then a global prefix is applied at the schema build level (_cheminfo?)

Do you mean that the process_schemas tool should insert these during build time everywhere they belong? That isn't something the existing tool can do. But we can of course amend it to do that if it is important. At least, in the end we want JSON files with prefixes to be able to use them as JSON schemas.

What are the benefits of implicit? (I'm thinking in the spirit of" Zen of Python": 'explicit is better than implicit', if all other things are equal?)

@merkys
Copy link
Member Author

merkys commented Jan 15, 2024

What are the benefits of implicit? (I'm thinking in the spirit of" Zen of Python": 'explicit is better than implicit', if all other things are equal?)

I agree with "explicit is better than implicit", but repetition sometimes might lead to mistakes. My main concerns for prefixes are typos and accidental omissions. If we stick to explicit prefixes then some sort of checker would be beneficial to ensure that the prefix is present and correct.

@rartino
Copy link
Contributor

rartino commented Jan 15, 2024

My main concerns for prefixes are typos and accidental omissions. If we stick to explicit prefixes then some sort of checker would be beneficial to ensure that the prefix is present and correct.

I think most of the errors that are possible would have to be repeated in multiple places, e.g., both filenames and declarations, to pass the current sanity checking.

Implicit has the cost of a increasing the learning curve to understand when, to what, and how the implicit changes apply. Right now (despite the 1500 lines of processing script...) the modifications we do to the source files are fairly lightweight and somewhat well encapsulated with '$$' field names. I'm not quite sure where to start with somewhat generic code for mapping e.g. $id fields to inject a prefix that isn't there. Especially since some fields (e.g., compatibility, defining-relation and even inside the free-text of description) may contain both internal and external $id:s.

@merkys
Copy link
Member Author

merkys commented Jan 15, 2024

Implicit has the cost of a increasing the learning curve to understand when, to what, and how the implicit changes apply. Right now (despite the 1500 lines of processing script...) the modifications we do to the source files are fairly lightweight and somewhat well encapsulated with '$$' field names. I'm not quite sure where to start with somewhat generic code for mapping e.g. $id fields to inject a prefix that isn't there. Especially since some fields (e.g., compatibility, defining-relation and even inside the free-text of description) may contain both internal and external $id:s.

This all makes sense to me. Nevertheless I would add a validation step if we want to restrict properties to a namespace in question.

As for this PR, I should add _cheminformatics_ prefix to the property I have introduced, right?

@rartino
Copy link
Contributor

rartino commented Jan 15, 2024

This all makes sense to me. Nevertheless I would add a validation step if we want to restrict properties to a namespace in question.

This is more clear how it would work in the existing code - process_schemas can take a command line argument to restrict the namespace, and the sanity check that is done as part of the validation can check that everywhere it is aware it should check it.

As for this PR, I should add cheminformatics_ prefix to the property I have introduced, right?

I read @ml-evs comment as suggesting _cheminfo_ as the prefix, which I think is better.

@merkys
Copy link
Member Author

merkys commented Jan 15, 2024

I read @ml-evs comment as suggesting _cheminfo_ as the prefix, which I think is better.

Agree. Should the prefix be defined somewhere, i.e. in cheminformatics.yaml?

@rartino
Copy link
Contributor

rartino commented Jan 15, 2024

Agree. Should the prefix be defined somewhere, i.e. in cheminformatics.yaml?

No standard field exist for it yet. But I think you are right that it should.

But you can at least put it in the description field of the standard.

@rartino
Copy link
Contributor

rartino commented Jan 16, 2024

Looks good to me - test compile works and output looks reasonable.

@rartino rartino merged commit 7af8ea9 into main Jan 16, 2024
@merkys
Copy link
Member Author

merkys commented Jan 16, 2024

Thanks for review & merge!

@ml-evs
Copy link
Member

ml-evs commented Jan 16, 2024

This all makes sense to me. Nevertheless I would add a validation step if we want to restrict properties to a namespace in question.

This was basically my motivation too -- eventually we could go a step further and separate our existing OPTIMADE fields (just structures?) as a namespace (with some mechanism for implicit namespaces in the API, e.g., if an API is part of the OPTIMADE federation, assume the implicit OPTIMADE namespace for structures...)

Nice work here though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants