Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouping and aggregation #1135

Merged
merged 54 commits into from
Oct 6, 2023
Merged

Grouping and aggregation #1135

merged 54 commits into from
Oct 6, 2023

Conversation

wricciot
Copy link
Member

@wricciot wricciot commented Apr 25, 2022

This PR provides experimental support for SQL queries with grouping and aggregation. These require the mixing normaliser (mixing_norm=on in the configuration file).

The result of grouping over a relation is represented as a finite map, which in Links is treated as a list of (grouping key, associated subrelation) pairs. Aggregation can then be applied groupwise to a finite map to obtain again a relation. Such Links queries are translated to SQL queries using group by and aggregates.

This is used in map collections, employed by grouping.

The change propagates to parts of the code that deal with serialization and potentially with lenses.
I changed those parts of the code in the way that appeared most natural, but everything is, of course, experimental.
Fixes normalization within MapEntry elements.
Adds pseudo-typing of grouping queries.
Implements some related FIXMEs and TODOs.
@wricciot
Copy link
Member Author

I'm not sure why this needs to be unsafe. Under the assumption that the second argument is a finite map, isn't

fun lookupG(x,l) {
  for (ab <- l) where (fst(a) == x) b
}

equivalent?

Ah yes. It can indeed be defined as a safe operation (modulo calls to other unsafe operations (concatMap)).

@wricciot
Copy link
Member Author

Ah yes. It can indeed be defined as a safe operation (modulo calls to other unsafe operations (concatMap)).

Actually, I think my reasoning was that the semantics of the two definitions is not the same for inputs that are not valid maps (more than one entry for some keys). But surely we don't care about ill-formed inputs.

… "concatMapKey").

Adds "groupByMap" to the prelude (to allow nested grouping).
Copy link

@vcgalpin vcgalpin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this branch with two different Links applications with mixing_norm=false which will have exercised various database interactions:

Everything typechecked and showed no errors under execution.
I have not reviewed the code. I can do more specific testing if requested.

Copy link

@vcgalpin vcgalpin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this branch with two different Links applications with mixing_norm=true which will have exercised various database interactions:

  • Guide to Pharmacology DB (nontemporal)

Everything typechecked and showed no errors under execution (surprisingly?)

Everything typechecked. There was one error :

jslib.js:1583 Uncaught Error: Fatal error: call to server returned an error. 
Details: ***: Error: File "core/query/mixingQuery.ml", 
                     line 276, characters 8-14: Assertion failed 
                     at XMLHttpRequest.<anonymous> (jslib.js:1583:17)

It seemed to relate to an unnested query of the form

query {for (x <-v- covid_data) for (y <-- subcategory) for (z <-- csv_file) 
    where ...
       orderby ...
           [(...)]  }

Removing the orderby statement had no effect on the error.
Modifying it to

query nested {for (x <-v- covid_data) for (y <-- subcategory) for (z <-- csv_file) where ... }

appears to have resolved the error. I hypothesise that the addition of "nested" means it is no longer using the new mixing code. Should such a query work under the mixing code?

I have not reviewed the code. I can do more specific testing if requested.

@vcgalpin
Copy link

Modifying it to

query nested {for (x <-v- covid_data) for (y <-- subcategory) for (z <-- csv_file) where ... }

appears to have resolved the error. I hypothesise that the addition of "nested" means it is no longer using the new mixing code. Should such a query work under the mixing code?

Further experimentation has shown that the interaction of the mixing code and temporal queries leads to the error. For example,

query { for ( x <-v- covid_data ) [vtData(x)]

gives a similar error to above and

query nested { for ( x <-v- covid_data ) [vtData(x)]

doesn't.

@vcgalpin
Copy link

In the gtopdb code, only query nested and query flat are used, which possibly explains the success of the tests with mixing_norm=true

…on correctly.

Improves comments/deletes obsolete ones.
…ort.

Improves comments. Fixes a syntax error in the grouping query tests.
…ng normaliser.

The code to handle temporal tables used in the standard normaliser has been ported to mixing.
Additionally, redundant eta expansion code in MixingQuery.xlate which was creating trouble
with the new temporal query code has been removed.
@wricciot
Copy link
Member Author

wricciot commented Oct 2, 2023

Modifying it to

query nested {for (x <-v- covid_data) for (y <-- subcategory) for (z <-- csv_file) where ... }

appears to have resolved the error. I hypothesise that the addition of "nested" means it is no longer using the new mixing code. Should such a query work under the mixing code?

Further experimentation has shown that the interaction of the mixing code and temporal queries leads to the error. For example,

query { for ( x <-v- covid_data ) [vtData(x)]

gives a similar error to above and

query nested { for ( x <-v- covid_data ) [vtData(x)]

doesn't.

Thanks! The mixing normaliser wasn't exactly handling temporal tables. I have ported Simon's code and now queries such as the ones you mentioned are handled correctly.

@wricciot
Copy link
Member Author

wricciot commented Oct 2, 2023

This should now be ready to be merged.

@wricciot wricciot merged commit 35ecc43 into links-lang:master Oct 6, 2023
9 checks passed
@wricciot wricciot deleted the grouping branch October 6, 2023 19:19
@wricciot wricciot restored the grouping branch October 6, 2023 19:24
@jamescheney jamescheney mentioned this pull request Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants