Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dekaf: Expire Reads after 5m #1736

Merged
merged 1 commit into from
Oct 25, 2024
Merged

dekaf: Expire Reads after 5m #1736

merged 1 commit into from
Oct 25, 2024

Conversation

jshearer
Copy link
Contributor

@jshearer jshearer commented Oct 24, 2024

Description:

We saw problems before where something about a journal::Client would stop responding after some amount of time in use. We were seeing issues where consumers would stop seeing updates to journals after a certain amount of time, and I just realized that it's possible for a Read (which contains a journal::Client) to live for a long time. This times out a Read after 5 minutes, so it get a new journal::Client, and hopefully bypasses the issue.

Tested locally with kcat, will test in dekaf-dev when build passes


This change is Reviewable

Copy link
Member

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the mechanism by which this session will grab an entirely new gazette::Router (e.g. new underlying connection) for the next started read?

Copy link
Contributor Author

@jshearer jshearer Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that the call to authenticated_client() always created a new gazette client, but it turns out it only does when refreshing the access token, reducing the effective Read timeout to however long access tokens live for (an hour I think).

I updated the fetch() handler to explicitly call client.with_fresh_gazette_client() upon Read timeout. We could also go nuclear and have authenticated_client() always return a client with fresh router, but that seemed like it might be unnecessary. Thoughts?

We saw problems before where something about a `journal::Client` would stop responding after some amount of time in use. We were seeing issues where consumers would stop seeing updates to journals after a certain amount of time, and I just realized that it's possible for a `Read` (which contains a `journal::Client`) to live for a long time. This times out a `Read` after 5 minutes, so it get a new `journal::Client`, and hopefully bypasses the issue.
@jshearer jshearer merged commit 8356f99 into master Oct 25, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants