Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-klaviyo: fix incremental pagination #2037

Merged
merged 2 commits into from
Oct 10, 2024
Merged

Conversation

Alex-Bair
Copy link
Contributor

@Alex-Bair Alex-Bair commented Oct 9, 2024

Description:

Incremental streams were using > instead of >= when filtering results with a specific datetime. Although subsequent requests use cursors to paginate through responses, it's possible to miss results within the same second on connector restarts (like for schema inference updates).

Snapshot names have also been updated to align with what's generated automatically by pytest.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Tested on a local stack. Confirmed more data is captured using greater-or-equal instead of greater-than in the filter query param.


This change is Reviewable

@Alex-Bair Alex-Bair added the change:unplanned Unplanned change, useful for things like doc updates label Oct 9, 2024
Incremental streams were using `>` instead of `>=` when filtering
results with a specific datetime. Although subsequent requests use
cursors to paginate through responses, it's possible to miss results
within the same second on connector restarts (like for schema inference
updates).
Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

There's the obvious consideration that rather than missing records, we'll end up with some duplicates, but that seems better than missing some. And there may be scenarios where the capture could get "stuck" if a huge number of records all have the same timestamp? But I'm not sure how likely that is, and it's probably not something that we can practically address right now.

@Alex-Bair
Copy link
Contributor Author

Alex-Bair commented Oct 10, 2024

Agreed, I figured for a quick fix, it'd be better to have some duplicates rather than miss records. And the way the IncrementalKlaviyoStream is set up, I don't think it'll get stuck in a loop. When the connector starts up, it uses the checkpointed datetime for the initial request, then it paginates through the remaining records up to the present with page cursors. Then the datetime of the last emitted record is checkpointed to be used when the connector starts up again. I'm not sure why the connector was built to checkpoint datetime values when Klaviyo does support page cursors.

Changing the checkpointed values to be those page cursors would help reduce duplicates, but threading that through & ensuring nothing breaks is a larger effort that'd take a bit longer than just changing the filter query param.

@Alex-Bair Alex-Bair merged commit f799669 into main Oct 10, 2024
71 of 76 checks passed
@Alex-Bair Alex-Bair deleted the bair/source-klaviyo-gte branch October 10, 2024 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:unplanned Unplanned change, useful for things like doc updates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants