-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kinesis plugin eventually stops collecting messages, halting pipelines #35
Comments
Hm this looks like Amazon's KCL library getting an HTTP error response to a Kinesis request, and then getting another error trying to parse the error response. It's probably worth reporting to them, as well. However, the theory is that the worker should recover after an error like this, so there's likely another problem on our end that needs to be fixed. |
Actually, while I'm being hindered here by not having much Java experience, it looks like this is purely an issue with the KCL library. The KCL only catches RuntimeException and the JsonParseException is not a RuntimeException, so the entire thread dies instead of backing off and retrying as it was designed to. I think you'll need to report this to the KCL Project (or AWS premium support if you have that) so that they can debug with you. |
Java uses checked exception, which the JsonParseException is one of, that requires methods to declare if they throw a checked exception. So while JsonParseException is a checked exception so there isn't a way for it actually reach the method you linked to. In this case the AWS Java SDK handles the conversion of the JsonParseException to a AmazonServiceException which is a runtime exception. The stack trace above doesn't seem to show the points where the exception is wrapped, and thrown as a different exception. This exception should be handled by the KCL, and the request would be retried after the normal backoff period. It seems something else could be causing the problem. One thing I would recommend is setting KinesisClientLibConfiguration#withCallProcessRecordsEvenForEmptyRecordList to true. This will cause the KCL to call the processRecords method every time it gets a response from Kinesis, regardless if it got any records or not. This can be used to log information about every request instead of those that contain records. |
The specific error you're seeing is related to aws/aws-sdk-java#1106. The error should be handled by the KCL, and automatically retried. It's possible you're are running into awslabs/amazon-kinesis-client#185, we do have some mitigations that can be applied to handle it, and are investigating the issue still. |
Thanks for the analysis, I figured it was likely my lack of Java experience meant I was misunderstanding that exception issue. |
This bug is still occurring. What's the best way to fix it? Thanks in advance! |
2477946696968059424 in hex is 0x22636F6465223A20, which is ascii for |
I've been using this plugin to process a few kinesis streams, and it has generally worked very well. However, I've noticed that every few days, my nodes will stop sending data to ES. When I check on containers that should be forwarding data from these streams, I see the following in the logs:
Resource usage is minimal on affected nodes when they are in this state, and restarting the node resolves the issue. Left alone, affected nodes do not begin processing messages after any amount of time.
The text was updated successfully, but these errors were encountered: