-
Notifications
You must be signed in to change notification settings - Fork 22
Adding retry_on_conflict parameter? #2
Comments
@StoryStar glad it's been of help to you! It would appear that yes, editing the Sources:
I've added a different branch Please let me know if that does, in fact, fix your issue, and I will merge it with the |
Hey sorry took me a while to get back to you. Appreciate the help, that looks like it should work -- I'd love to confirm, but unfortunately I have a new issue and am unable to get back to that point so I can test. Not sure if this is worth opening a new issue for, let me know if you would like me to. My DB is quite large now, and the size seems to be causing me some issues. When first running the javascript, it indexes a bunch of the documents, occasionally running into a quota exceeded error (where it backs off and reconnects just fine) -- but eventually gets stuck and stops indexing. It seems to be an issue with reading Firebase, as I get this: 'Error 4: The datastore operation timed out, or the data was temporarily unavailable.' Not exactly sure what's causing the issue, besides slamming Firestore with a lot of requests for the number of documents I'm trying to index. I found possibly related bugs in the node admin sdk, however they are a little old: firebase/firebase-admin-node#119 and firebase/firebase-admin-node#271 After it runs into the error, it appears to completely stop indexing, and Elasticsearch also does not have any index for the documents that did successfully load -- if I make a test search, it returns index not found. Any ideas? Detailed log follows, including detailed Firestore logging
|
I am able to reproduce the issue with a simple function, so it seems to be unrelated to this project and is just a firestore issue
Unfortunately I seem to be stuck here for now 👎 |
@StoryStar really appreciate that you were able to find a way to reproduce it! I’ve seen that error before, but only because I was looping through a gigantic collection. Which it seems you also are doing. I’d thought about distributing listeners or limiting listeners to a specific time. I’m not sure. Alternatively, there may be a way to coordinate a Firestore trigger to tell an external script to update elasticstore? Just thinking out loud here. If you ended up opening an issue on firestore-admin, link it here please! I’ll go ahead and merge the pull request. |
(reopening as the initial issue was fixed, but a separate one is included in this issue) |
Exactly, seems to be an issue with very large collections. Good ideas, I had a similar idea to paginate through the entire collection and update elasticstore with the data, and pair that with a cloud function that is triggered whenever a document is added to the collection (adding the new data to elasticstore). I'll see about making a proof of concept for that. I've submitted a bug report and reached out to Firebase support in a few places, waiting to hear back from them now -- will keep this updated. I think the issue is with the backend and not the node firestore-admin implementation, as it seems to be an issue across other languages as well (ie firebase/firebase-js-sdk#359 ), but if I may still open an issue on firestore-admin and will certainly link it here if I do |
Hey, here's an update on our situation As expected, Firebase support suggested the same pagination of the entire collection idea, and that while this is a bug with their setup, it would not be possible or appropriate to have listening on an entire collection of this size. Firebase support:
So I've written the pagination importer, along with a cloud function that is triggered on document change (added, modified, deleted) paired with a simple express.js server that listens for document change requests and sends them to Elasticsearch-- and everything seems to be working just fine in this setup. Really appreciate all your help! I hate to bring up more issues again but it seems the SearchHandler now runs into an issue after a few days of uptime |
@StoryStar thanks for continuing to update! If you want to add your implementation to Github, I'd be happy to link to it on the readme of this repo. As for restarting listeners, I too have seen this issue if left up for too long. I thought about adding a restart into the default implementation, but it means that you'll end up doing a full query of the documents each time the listener is restarted, yes? So feasible to restart it, sure, but also a (potentially) expensive operation. I'm also unsure if the problem lies in listeners of any size or if it's spanning listeners that cause the problem. |
@StoryStar In my implementation, I make sure to filter based off of an Are you doing something similar? |
@ACupaJoe no problem! I'll see about adding out implementation, it will need to be cleaned up first at the very least. Will update here when I get to it. Regarding restarting listeners, I've actually fully separated the document import script from the search handler so restarting the listener is what I'm doing right now, just manually. I'd like to get it set up to automatically restart on the listener dying though. As for the cause, I am also unsure. I don't believe its due to size, the search collection is usually empty, so I suppose it has to do with just the length of time the listener is kept alive -- I see a similar issue here, though for the .NET library instead - googleapis/google-cloud-dotnet#2721 - and they're having the same 'Aborted' error we have. In that issue they discuss adding the aborted error code to the automatic retry status codes list, so maybe it is possible to have it added to the node library as well. Opened an issue here firebase/firebase-admin-node#478 I'm thinking to just add a function to handle listen errors and restart the listener if it dies due to this 'aborted' error. Something like this:
Thoughts? As an aside, this is what my current setup looks like:
|
@all-contributors add @StoryStar for bugs |
I've put up a pull request to add @StoryStar! 🎉 |
Hey, thanks for the cool project, got me up and running very quickly.
I'm using the project to allow search of some fields in our document collection, which is constantly being updated with new documents. Occasionally its possible we modify some of the documents, but that is rare.
I'm running into this issue occasionally:
Error in
FS_ADDEDhandler [doc@1098949404239818800]: [version_conflict_engine_exception] [tweets][1098949404239818800]: version conflict, current version [2] is different than the one provided [1], with { index_uuid="qj6ftGLfSYqNEut4ShXbeg" & shard="2" & index="tweets" }
I've not been able to confirm, but some testers have reported that not all search results that should show up are actually showing up. I believe this is the cause, as those documents simply have failed to add?
From my research, looks like adding a retry_on_conflict value of one or two should handle these issues. Looks like it would go in handleAdded in FirestoreHandler.ts
This look right to you?
The text was updated successfully, but these errors were encountered: