-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BatchFetch IN corrupts cache by overwriting BatchFetchPolicy dataResults #1998
Comments
@Sheikah45 you seem to have some understanding with the BatchFetchPolicy, do you have any idea how to fix this properly? |
We have the same issue with OneToOne-references using BatchFetchType IN/JOIN/EXISTS - with EclipseLink version 4.0.5 as well. When removing @BatchFetch or deactivating the cache it works as expected, but with performance losses. I first thought it is related to #2011 |
@andilem I have taken a look at this and agree with your assessment that the cause is the overwriting of the dataResults when executing the batched queries. Unfortunately this seems like a difficult issue to fix when the relationship is itself lazy. When the relationship is eager than the lifetime of the dataResults is fine since the objects are fully constructed before the DataResults are overwritten by the next batch. However in the case of the lazy relationship the query and thus the batchFetchPolicy and it's DataResults are stored inside of the BatchValueHolder to be later used for instantiating the object. This results only the most recent batch rows being seen when creating the relationships when later instantiated by the BatchValueHolder. I believe a proper solution might entail creating clones of the query and BatchFetchPolicy for each batch to ensure that the BatchValueHolder has the correct set of rows. Although this does mean that there will be greater memory usage since currently the relationship is limited to only the batch size where as this change would store all the needed rows in memory as long as the BatchValueHolder existed. |
Thanks for looking into this! Unfortunately, the current code is wrong even for eager relationships, and, much worse, it even corrupts the cache because the query originally used to fetch the row seems to be stored in the cache together with the latest batch arguments (not the correct ones to fetch the specific row), so every future cache hit results in parents with no children. |
I have opened a PR that will clone the batch query so that it can maintain it's own batchfetchpolicy independent of further batches. Additionally for the eager cache corruption do you have an example because the test cases that I used all seemed to operate correctly when set to eager fetch or it resulted in a StackOverFlow. |
I currently cannot reproduce the issue when the parent-child-relationship is eager because of the StackOverflowError. The primary (owning) child-parent-relationship doesn't seem to matter. |
For some reason in the test in the PR the eager fetch didn't not result in a stack overflow but I haven't dug into why not there. If you are able to give the source in the PR a test in your use case that could be good confirmation the lazy case is resolved. |
I just tested it and I can confirm that the issue is resolved for my case. In the test, there is no StackOverflow because you're just testing three objects. |
EclipseLink version 4.0.2
When loading more referenced entities by
@BatchFetch(value = BatchFetchType.IN)
than the BatchFetch size (seems to be fixed 500, the annotation value is not used), multiple queries are issued to retrieve all results. But theIN (...)
arguments for the last issued query are stored in the cache for every retrieved entity as "source" query arguments, which means that all entities which were not fetched with the last query have corrupted@OneToMany
references (when loading them, an empty collection is returned because the query uses the wrong IDs).The cause is BatchFetchPolicy.setDataResults which overwrites the previous rows. I implemented a dirty workaround by adding the results in executeObjectLevelReadQuery() instead of overwriting them, but this could cause bad performance because the number of rows can grow larger than the BatchFetch size, resulting in multiple queries when querying again.
This is relatively complicated to describe but very easy to reproduce. There is a simple test case attached with
Parent
andChild
)@OneToMany
/@ManyToOne
with@BatchFetch(value = BatchFetchType.IN)
eclipselink-batchfetch.zip
The text was updated successfully, but these errors were encountered: