-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect pagination in SPARQL CONSTRUCT dataset #413
Comments
Indeed, isn't the problem here on how to implement counting for a construct(ed) result? On what exactly to count? |
Well, the only count you can do is by executing the construct. And then based on that result implement paging. So far, the only solution is to rely on the finiteness of the respons of the construct (that users do not request stupid things). Pagination can only be implemented by collecting the complete respons in a temporary structure. |
@bertvannuffelen how would you handle paging in a temporary structure, am I correct in assuming that we cannot rely on a SPARQL endpoint returning the same order of triples for a construct query in consecutive calls? If that's the case are you suggesting caching the result for a query, perform in memory paging, returning the result and when a different page of the query is requested, get the cached object, page it in memory and return it to the client instead of performing the SPARQL query? |
SPARQL construct queries return always the complete answer. However it is up to the SPARQL endpoint implementation to handle the need of possible pagination. And here sits the problem. Most do not support pagination for construct queries. so CONSTRUCT { ...} where {...} will return all information at once. This can be the whole database e.g. use this query: Now for small volumes, there is no problem. For larger volumes, clients might stumble on it. For very large volumes, the supplying SPARQL endpoint will apply a strategy to reduce the chance to die. Virtuoso does that by implementing a cut-off in the respons (the magic 10000 number - part of the virtuoso configuration). If you get 10K triples/respons rows you do not know if there were just 10K triples/respons rows or more. I am indeed suggesting that for construct queries (for selects the current approach works fine) "caching the result for a query, perform in memory paging, returning the result and when a different page of the query is requested, get the cached object, page it in memory and return it to the client instead of performing the SPARQL query " is the approach. I see no other alternative for the moment (unless selecting a SPARQL endpoint that implements pagination on all requests). Constructs are actually used in the TDT setting for 2 cases: |
L4.2 also has swappable caching mechanism, supported out of the box are file, memcached and a few others so I don't think storing the object in a file would be necessary. |
If the limit/offset are not specified in the SPARQL query of the dataset definition, the SPARQLController calculates the pagination. This seems to be done wrongly in case of a CONSTRUCT query.
If the query has a structure like:
the number of results is calculated as follows:
This doesn't yield a correct result in case of a CONSTRUCT query.
Thanks to @bertvannuffelen for the catch.
The text was updated successfully, but these errors were encountered: