Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panache Hibernate adapting query count based on distinct keyword #38316

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rysurd
Copy link
Contributor

@rysurd rysurd commented Jan 20, 2024

Fix #38033

Copy link
Contributor

@DavideD DavideD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems fine, but I wouldn't merge it without a test case.

@rysurd
Copy link
Contributor Author

rysurd commented Jan 23, 2024

It seems fine, but I wouldn't merge it without a test case.

I think there are errors in tests that I need to check and fix. And on top on that add a specific test indeed. This is still a work in progress I suppose :)

@rysurd rysurd marked this pull request as draft January 23, 2024 15:41
@DavideD
Copy link
Contributor

DavideD commented Jan 23, 2024

Thanks @rysurd, that's fine.

But it's usually better to send a PR when one also has a test case. It saves everybody's time.

@rysurd rysurd force-pushed the ISSUES-38033 branch 2 times, most recently from 87406db to 49aaeb6 Compare January 23, 2024 22:23
@rysurd
Copy link
Contributor Author

rysurd commented Jan 23, 2024

Hello @DavideD ! I updated my PR with two main changes ;

  1. Moved the query change from the select distinct to the from clause
  2. Slightly adjusted the integration tests to overcome the issue I had during previous tests (like here https://github.com/rysurd/quarkus/actions/runs/7631080457/job/20788624212#step:16:3215). This issue is the following :
java.lang.IllegalArgumentException: org.hibernate.query.SemanticException: Select item at position 1 in select list has no alias (aliases are required in CTEs and in subqueries occurring in from clause)

I also added one another test in TestEndpoint for projection ; we also check distinct query for projection on Long, just like in the original issue.

@rysurd rysurd marked this pull request as ready for review January 23, 2024 23:34
Copy link
Contributor

@DavideD DavideD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, this solution doesn't seem correct.

The issue is about generating a query like:

select count(distinct entity.id) from ...

but you are creating the following query instead:

select count(*) from (select distinct entity.id from ... )

Also, if existing tests don't work any more after applying your fix, it means there's something wrong with your solution. You shouldn't change them to make your solution work.

If we apply this solution, what do you think it will happen to the apps running distinct queries without aliases after the upgrade?

@@ -1326,7 +1326,7 @@ public String testProjection() {

PanacheQuery<CatProjectionBean> projectionDistinctQuery = Cat
// The spaces at the beginning are intentional
.find(" SELECT disTINct c.name, cast(null as string), SUM(c.weight) from Cat c where name = :name group by name ",
.find(" SELECT disTINct c.name as foo, cast(null as string) as bar, SUM(c.weight) as baz from Cat c where name = :name group by name ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suspicious, why this test suddenly requires a change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this should keep working.

// We are checking that not everything gets lowercased
PanacheQuery<CatProjectionBean> letterCaseQuery = Cat
// The spaces at the beginning are intentional
.find(" SELECT disTINct 'GARFIELD', 'JoN ArBuCkLe' from Cat c where name = :NamE group by name ",
.find(" SELECT disTINct 'GARFIELD' as foo, 'JoN ArBuCkLe' as bar from Cat c where name = :NamE group by name ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suspicious, why this test suddenly requires a change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@rysurd
Copy link
Contributor Author

rysurd commented Jan 24, 2024

Unless I'm missing something, this solution doesn't seem correct.

The issue is about generating a query like:

select count(distinct entity.id) from ...

but you are creating the following query instead:

select count(*) from (select distinct entity.id from ... )

So indeed I wanted to implement a query like the first one you suggested. It work for cases where you want to distinct count on one column. However it fails when we have aggregate function (like SUM) or when we want to count distinct on multiple columns. If you plainly put the input select clauses in the distinct statement you can end up having a query like this for instance

select count(distinct entity.id, SUM(entity.weight)) from ...

It'll generate an error (This one https://github.com/rysurd/quarkus/actions/runs/7628145894/job/20778994932#step:16:3219). So this is why I went rather for moving the distinct statement in a subquery in the from statement. Maybe (probably) I'm the one missing out something here that would make this work 😄

Also, if existing tests don't work any more after applying your fix, it means there's something wrong with your solution.
You shouldn't change them to make your solution work.

If we apply this solution, what do you think it will happen to the apps running distinct queries without aliases after the upgrade?

I'd agree that something can be wrong with the solution, but this could also mean we didn't took some edge case into consideration when writing the tests, in can happen sometimes. In any case if you can help me understand the previous problem about aggregate function in subquery I'd appreciate ! I'll keep working on this issue and will try to find another solution.

Thanks a lot!

EDIT : Also I'm not sure we can have distinct count on multiple columns right from the select statement with Hibernate

@DavideD
Copy link
Contributor

DavideD commented Jan 24, 2024

... but this could also mean we didn't took some edge case into consideration when writing the tests,

It seems we did in this case, otherwise CI wouldn't fail.

So indeed I wanted to implement a query like the first one you suggested. It work for most case however it fails when we have aggregate function (like SUM) in the select statement

I see now.
I'm not a fan of this solution but I guess it's the simplest compromise ¯_(ツ)_/¯

Alternatives I can think of is to split by , and remove all the fields starting with an aggregate functions. This won't work without having a proper parser because people can still include a text string with anything in the select clause, or use escape chars.

Or, add an attribute to .count(), so that a user can specify the argument of the count function in any cirumnstance.

@gsmet, If we decide to go for this approach, this is a breaking change that will make queries not using an alias fail.
So, at the very least, it shouldn't be included in a micro.

In any case if you can help me understand the previous problem about aggregate function in subquery I'd appreciate

I'm not sure, but I'm assuming that when having subqueries, you have to specify which field of which entity you refer to because you could have a reference to the same entity multiple times.

@rysurd
Copy link
Contributor Author

rysurd commented Jan 24, 2024

Or, add an attribute to .count(), so that a user can specify the argument of the count function in any cirumnstance.

I think this could be a solution, I'd even like to try to implement it just to see. But yeah, I think unless we missed something there is no straightforward solution to the problem ... Let's see what @gsmet thinks.

@DavideD
Copy link
Contributor

DavideD commented Jan 24, 2024

Alternatives I can think of is to split by , and remove all the fields starting with an aggregate functions. This won't work without having a proper parser because people can still include a text string with anything in the select clause, or use escape chars.

This solution is not really different from what we did in other places, though

@FroMage
Copy link
Member

FroMage commented Jan 24, 2024

Hello.

While I agree this is a bug, I'm skeptical about what the behaviour of .project(Long.class) should be, because the docs state:

Defines a projection class. This will transform the returned values into instances of the given type using the following mapping rules:

  • If your query already selects some specific columns (starts with select distinct? a, b, c…) then we transform it into a query of the form: select distinct? new ProjectionClass(a, b, c)…. There must be a matching constructor that accepts the selected column types, in the right order.
  • If your query does not select any specific column (starts with from…) then we transform it into a query of the form: select new ProjectionClass(a, b, c…) from… where we fetch the list of selected columns from your projection class' single constructor, using its parameter names (or their ProjectedFieldName annotations), in the same order as the constructor.
    If this is already a project query of the form select distinct? new…, we throw a PanacheQueryException

At this point, it's already unclear what projections do in the case of a no-op. Should we really generate select distinct new java.lang.Long(entity.id)…?

I don't even know if this works.

In any case, there's definitely a bug because since this is a no-op, the count method should be the same as the original one, and it's not the case.

Now, given that the logic is duplicated in ORM, HR and Mongo, a fix should also include the other queries. Though in the case of Mongo, it doesn't appear to need a fix.

As to whether we need a real parser for this, I'm seriously starting to wonder if we should not start a discussion with @gavinking about the new shorter queries supported in HQL, and adding feature methods like project and count to any query types of ORM/HR if they're missing there. And let them deal with the parsing and all. This is bound to be better for both ORM and us.

@@ -1326,7 +1326,7 @@ public String testProjection() {

PanacheQuery<CatProjectionBean> projectionDistinctQuery = Cat
// The spaces at the beginning are intentional
.find(" SELECT disTINct c.name, cast(null as string), SUM(c.weight) from Cat c where name = :name group by name ",
.find(" SELECT disTINct c.name as foo, cast(null as string) as bar, SUM(c.weight) as baz from Cat c where name = :name group by name ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this should keep working.

// We are checking that not everything gets lowercased
PanacheQuery<CatProjectionBean> letterCaseQuery = Cat
// The spaces at the beginning are intentional
.find(" SELECT disTINct 'GARFIELD', 'JoN ArBuCkLe' from Cat c where name = :NamE group by name ",
.find(" SELECT disTINct 'GARFIELD' as foo, 'JoN ArBuCkLe' as bar from Cat c where name = :NamE group by name ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@DavideD
Copy link
Contributor

DavideD commented Jan 24, 2024

While I agree this is a bug, I'm skeptical about what the behaviour of .project(Long.class)

Ah, I missed this.
I'm not sure what's going on in this case.

As to whether we need a real parser for this, I'm seriously starting to wonder if we should not start a discussion with @gavinking about the new shorter queries supported in HQL, and adding feature methods like project and count to any query types of ORM/HR if they're missing there. And let them deal with the parsing and all.

If possible (I'm not familiar with the new shorter form), I think that would be the best approach.

IMHO, Panache should focus on providing a better programmatic DSL (maybe using Criteria and the JPA metamodel, or other libraries underneath) and leave strings parsing to the existing libraries that have been refined over the many years they've been around.

@rysurd
Copy link
Contributor Author

rysurd commented Jan 24, 2024

As to whether we need a real parser for this, I'm seriously starting to wonder if we should not start a discussion with @gavinking about the new shorter queries supported in HQL, and adding feature methods like project and count to any query types of ORM/HR if they're missing there. And let them deal with the parsing and all. This is bound to be better for both ORM and us.

Do you happen to have any documentation or guide on how to implement these new and shorter HQL queries? Is it done by using Criteria and JPA metamodel as @DavideD suggested before? If indeed it can allow to not deal with SQL parsing we could benefit from it.

@FroMage
Copy link
Member

FroMage commented Jan 24, 2024

This is a discussion I should have with @gavinking at some point.

@gsmet gsmet added the triage/on-ice Frozen until external concerns are resolved label Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/panache triage/on-ice Frozen until external concerns are resolved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PanacheQuery count method ignoring distinct after using project
4 participants