Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BatchFetch IN corrupts cache by overwriting BatchFetchPolicy dataResults #1998

Open
andilem opened this issue Nov 14, 2023 · 8 comments · May be fixed by #2367
Open

BatchFetch IN corrupts cache by overwriting BatchFetchPolicy dataResults #1998

andilem opened this issue Nov 14, 2023 · 8 comments · May be fixed by #2367

Comments

@andilem
Copy link

andilem commented Nov 14, 2023

EclipseLink version 4.0.2

When loading more referenced entities by @BatchFetch(value = BatchFetchType.IN) than the BatchFetch size (seems to be fixed 500, the annotation value is not used), multiple queries are issued to retrieve all results. But the IN (...) arguments for the last issued query are stored in the cache for every retrieved entity as "source" query arguments, which means that all entities which were not fetched with the last query have corrupted @OneToMany references (when loading them, an empty collection is returned because the query uses the wrong IDs).
The cause is BatchFetchPolicy.setDataResults which overwrites the previous rows. I implemented a dirty workaround by adding the results in executeObjectLevelReadQuery() instead of overwriting them, but this could cause bad performance because the number of rows can grow larger than the BatchFetch size, resulting in multiple queries when querying again.

This is relatively complicated to describe but very easy to reproduce. There is a simple test case attached with

  • two entity types (Parent and Child)
  • bidirectional @OneToMany/@ManyToOne with @BatchFetch(value = BatchFetchType.IN)
  • persist 501 parents with 1 child each
  • query all parents -> OK
  • query all children
  • observe that all parents apart from the 501st one seem to have no children anymore. FINE logging to see the wrong query.

eclipselink-batchfetch.zip

@andilem
Copy link
Author

andilem commented Jan 9, 2025

@Sheikah45 you seem to have some understanding with the BatchFetchPolicy, do you have any idea how to fix this properly?

@impacsl
Copy link

impacsl commented Feb 17, 2025

We have the same issue with OneToOne-references using BatchFetchType IN/JOIN/EXISTS - with EclipseLink version 4.0.5 as well.

When removing @BatchFetch or deactivating the cache it works as expected, but with performance losses.

I first thought it is related to #2011
But the fix to this issue didn't solve the problem.

@Sheikah45
Copy link
Contributor

@andilem I have taken a look at this and agree with your assessment that the cause is the overwriting of the dataResults when executing the batched queries.

Unfortunately this seems like a difficult issue to fix when the relationship is itself lazy. When the relationship is eager than the lifetime of the dataResults is fine since the objects are fully constructed before the DataResults are overwritten by the next batch. However in the case of the lazy relationship the query and thus the batchFetchPolicy and it's DataResults are stored inside of the BatchValueHolder to be later used for instantiating the object. This results only the most recent batch rows being seen when creating the relationships when later instantiated by the BatchValueHolder.

I believe a proper solution might entail creating clones of the query and BatchFetchPolicy for each batch to ensure that the BatchValueHolder has the correct set of rows. Although this does mean that there will be greater memory usage since currently the relationship is limited to only the batch size where as this change would store all the needed rows in memory as long as the BatchValueHolder existed.

@andilem
Copy link
Author

andilem commented Feb 17, 2025

Thanks for looking into this! Unfortunately, the current code is wrong even for eager relationships, and, much worse, it even corrupts the cache because the query originally used to fetch the row seems to be stored in the cache together with the latest batch arguments (not the correct ones to fetch the specific row), so every future cache hit results in parents with no children.

@Sheikah45
Copy link
Contributor

Sheikah45 commented Feb 17, 2025

I have opened a PR that will clone the batch query so that it can maintain it's own batchfetchpolicy independent of further batches. Additionally for the eager cache corruption do you have an example because the test cases that I used all seemed to operate correctly when set to eager fetch or it resulted in a StackOverFlow.

@andilem
Copy link
Author

andilem commented Feb 17, 2025

I currently cannot reproduce the issue when the parent-child-relationship is eager because of the StackOverflowError. The primary (owning) child-parent-relationship doesn't seem to matter.
But in many cases, it is just not possible to eagerly load all children, that would simply take too long and load too much data.
That StackOverflow is probably a different issue.

@Sheikah45
Copy link
Contributor

Sheikah45 commented Feb 17, 2025

Understood, the stack overflow is due to the cache not being populated until the object is fully constructed. And with the early batch fetching that causes an issues. I am not aware of any easy solution there.

For some reason in the test in the PR the eager fetch didn't not result in a stack overflow but I haven't dug into why not there.

If you are able to give the source in the PR a test in your use case that could be good confirmation the lazy case is resolved.

@andilem
Copy link
Author

andilem commented Feb 17, 2025

I just tested it and I can confirm that the issue is resolved for my case.

In the test, there is no StackOverflow because you're just testing three objects.
In my case, there are just too many calls for the stack.
When loading 501 parents and their children have to be fetched eagerly, it loads all 501 children, but each child requires its parent to be loaded, so it loads the remaining 500 parents, which loads the remaining 500 children, which loads the remaining 499 parents and so on.
There is no infinite recursion, it's just too deep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants