-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of a SEARCH
operator
#533
Conversation
Personally, I do not see many use-cases for search operator for which every implementation is free to choose the search method. When replying in #398 (comment) I was thinking more about When speaking about SMILES and SMARTS, I think users need to know what the underlying implementation does. As the PR is written now, an implementation A is free to choose to implement |
Yes, this kicks the can down the road a bit... But I was imagining the cheminfo namespace would define the search semantics on the relevant fields/types, if we don't get to the point of standardizing it in the core of OPTIMADE. In that sense, this PR basically just reserves the keyword in the filter language so that we can drill down on any more specific semantics we need to define. |
where this is not supported, the API should respond with a clear error message. | ||
It is RECOMMENDED that providers do not allow chaining together multiple :filter-fragment:`SEARCH` operations, but MAY allow e.g., a list value for the :filter-fragment:`SEARCH` which can be considered the equivalent :filter-fragment:`<property> SEARCH [x, y] === <property> SEARCH x AND property SEARCH y` | ||
|
||
**Examples**: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Examples**: | |
**Examples**: | |
This operator can act on any field and value type, and can be interpreted by the | ||
database provider in any way they desire. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operator can act on any field and value type, and can be interpreted by the | |
database provider in any way they desire. | |
This operator can act on any field and value type, and can be interpreted by the database provider in any way they desire. |
The cutoff for 'relevance' can be entirely decided by the database; it is not | ||
necessary to rank and return all entries in the database according to the search | ||
criteria. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cutoff for 'relevance' can be entirely decided by the database; it is not | |
necessary to rank and return all entries in the database according to the search | |
criteria. | |
The cutoff for 'relevance' can be entirely decided by the database; it is not necessary to rank and return all entries in the database according to the search criteria. |
Where implemented, it MAY be used in conjunction with other filters; in cases | ||
where this is not supported, the API should respond with a clear error message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where implemented, it MAY be used in conjunction with other filters; in cases | |
where this is not supported, the API should respond with a clear error message. | |
Where implemented, it MAY be used in conjunction with other filters; in cases where this is not supported, the API should respond with a clear error message. |
I think having a reserved operator for type-specific queries is a nice idea. Having such operator would shift the responsibility of query standardization from the main specification to type-governing namespaces. However, I feel that the current draft should be rewritten to convey this precise meaning. I like the idea of introducing the Edit: I noticed now you have been talking about property-specific search semantics while I was talking about type-specific. Yours (property-specific) would allow greater granularity, which I like. However, my concern about the need for standardization (in namespaces) still stands. |
Maybe I'm missing a bigger picture behind suggesting this feature, but I prefer the related feature that has been suggested previously and I think overlaps with this idea: custom data types that are allowed to provide their own definitions of all operators included in the OPTIMADE grammar. So, for example for SMILES - if not already standardized - a SMILES data type could be defined by an implementation or prefix organization with some chosen filter semantics. But, then, what operator should one define for searching, e.g., SMILES? I think the answer is the string regex operator not yet implemented/merged "MATCH" (or possibly "MATCHES") which IMO grammatically fits the current structure of the OPTIMADE filer language than "SEARCH". The danger with defining a "SEARCH" operator as "implementation-specific search" is that it will easily leads to confusion about things working differently without it being clear why. |
I'm going to close this to avoid polluting the discussions, I think in summary the preference was to overload our current filter operations (and allow them to break semantics on custom data types). The only bit that this leaves out is the kind of search that returns a ranking of matches, rather than a simple enumeration of boolean matches. |
Following discussion with @merkys and others in the thread at #398 (comment), I though I'd attempt to draft something for a loosely defined
SEARCH
operator. I think this would be pretty useful for quite a few applications, where a database wants to enable queries that may not fall directly under the strict semantics we expect for e.g., string matching or arithmetic.I've tried to keep it necessarily vague here, but motivate it via examples.
Outstanding issues:
ENDS WITH
-- you just try the query and wait for the error. It will be much harder to discover if a database supportsSEARCH
on a field, or how it is interpreted. Do we need to a) add a specific info metadata field for this, along the lines ofsortable
for the moment?searchable
? or b) should we enforce that the database must describe in full the search semantics on a field at its given info endpoint? This is a bit tricky as this implementation definition would overlap with the field definition itself, especially in cases where a database wants to enable search on an already standardized OPTIMADE field (like thechemical_formula_reduced
example in this draft).search_score
indicating the amount to which an entry fits theSEARCH
(writing this with compositional/structural/substructural similarity in mind) -- should we define a reserved keyword for this in the new entry-level metadata?