Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offset storage mechanism for new kafka reader #5

Open
jsnoble opened this issue Sep 20, 2018 · 0 comments
Open

Offset storage mechanism for new kafka reader #5

jsnoble opened this issue Sep 20, 2018 · 0 comments

Comments

@jsnoble
Copy link
Member

jsnoble commented Sep 20, 2018

Open erik-stephens opened this issue on Jan 11 · 1 comment Comments
Assignees No one—
Labels
enhancement
priority:high
Projects None yet
Milestone No milestone
Notifications
You’re receiving notifications because you were mentioned.
2 participants
@erik-stephens
@kstaken
Lock conversation
@erik-stephens
Member
erik-stephens commented on Jan 11
The new kafka slicer will need to know the offsets that the worker consumed up to. There needs to be a way for the worker to communicate that back to the slicer. Does such a mechanism already exist? Chatted with @jsnoble and he didn't think so.

@kstaken kstaken added enhancement priority:high labels on May 30
@kstaken
Member
kstaken commented on May 30
We need a solution for storing offsets for the new kafka reader.

I think there are a couple possible solutions.

store the committed offsets in the state record. On the surface this is good however state records are associated with a particular execution so stopping / restarting a job would result in offsets being lost. It would require a job _recover to restore it and that is not really viable operationally. Positive attributes though is that this does bring the ability to run once jobs and to recover offsets.
Use a general state storage mechanism to store the offsets. This would build on the general state storage mechanism that we've been discussing to store the offsets in external storage (likely ES). Using this approach the job could recover automatically but you lose the recover benefits of storing explicit offsets.
Maybe a combination of both mechanisms. Downside is that these will not be atomic update operations so the risk of inconsistency will be high.
@kstaken kstaken changed the title from Communicate slice metadata to slicer to Offset storage mechanism for new kafka reader on May 30
@jsnoble

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant