Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does RetryOnConflict suppose to work with Bulk updates #2182

Open
csabavirag opened this issue Dec 1, 2023 · 6 comments
Open

Does RetryOnConflict suppose to work with Bulk updates #2182

csabavirag opened this issue Dec 1, 2023 · 6 comments

Comments

@csabavirag
Copy link
Contributor

I have an application which updates documents in bulk through multiple workers. I'd like to use the retry_on_conflict feature, but even if I have the setting at the ClientConfiguration, Bulk::addDocuments() does not seem to add the necessary piece to the Document metadata in order to get the proper update action as it is described at https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-bulk.html#bulk-update

They say, every update should contain the retry_on_conflict attribute like this:

POST _bulk
{ "update" : {"_id" : "1", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_index" : "index1", "retry_on_conflict" : 3} }
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
{ "update" : {"_id" : "3", "_index" : "index1", "_source" : true} }
{ "doc" : {"field" : "value"} }
{ "update" : {"_id" : "4", "_index" : "index1"} }
{ "doc" : {"field" : "value"}, "_source": true}

I only get the _id and _index values added to the document metadata. What do I do wrong?

@ruflin
Copy link
Owner

ruflin commented Dec 13, 2023

Can you share the exact code snippet you are using and the version of Elastica? There was in one version a change from _retry_on_conflict to retry_on_conflict. A quick search, i found there is a for a similar scenario you describe here:

$this->assertEquals(5, $metadata['retry_on_conflict']);

@csabavirag
Copy link
Contributor Author

Actually I'm using FOSElasticaBundle and they have added support for RetryOnConflict parameter per connection (FriendsOfSymfony/FOSElasticaBundle#965) long time ago.

My config looks like:

fos_elastica:
    clients:
        default: { url: '%env(ELASTICSEARCH_URL)%', retryOnConflict: 5 }
    indexes:

which sets

image

However I can't see if the Elastica\Bulk class would take the parameter into consideration at all and would apply the same like in the test

https://github.com/ruflin/Elastica/blob/0f6b04b5ebe6df51015fb54b4c455bd7af5b093f/tests/BulkTest.php#L686C9

@ruflin
Copy link
Owner

ruflin commented Dec 20, 2023

Having a quick look at the code, this might be a bug / missing feature in Elastica. I couldn't find a place where the retry_on_conflict from the agent it taken into account in bulk :-( When adding a document, to bulk, it should be checked if it is an update and if retry on conflict is set on the client side. If yes, it should add it: https://github.com/ruflin/Elastica/tree/8.x/src#L134 Could you by chance make this modification to your version of Elastica and see if it solves the problem?

@csabavirag
Copy link
Contributor Author

Sure, I will give a try. In the mean time, could you check the link (https://github.com/ruflin/Elastica/tree/8.x/src#L134). You might want to point to a file in the repo, but it got somehow removed.

@ruflin
Copy link
Owner

ruflin commented Dec 21, 2023

This is the file and line I wanted to link to: https://github.com/ruflin/Elastica/blob/8.x/src/Bulk.php#L134 Not sure where the file name was lost :-(

@csabavirag
Copy link
Contributor Author

Please see my PR. It adds the RetryOnConflict value (if exists) from the Client's connection configuration. With this change, I was able to generate the similar output (with this attribute) which was pasted in the opening message.

ruflin pushed a commit that referenced this issue Jan 18, 2024
This PR is created to solve the issue on the missing RetryOnConflict (retry_on_conflict) metadata attribute when adding documents in bulk (detailed at #2182)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants