-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
source-zendesk-support: fix Groups and Organizations streams #2007
Conversation
""" | ||
Converts a date-time formatted string to a millisecond UNIX timestamp. | ||
""" | ||
return int(datetime.fromisoformat(dt_str).timestamp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually going to return seconds - the float result has the integer part as seconds, and the fractional part as microseconds. We have similar helpers elsewhere that account for this to get millisecond epoch times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, thanks for catching that. I was wrong in that function description. Zendesk's timestamps are in seconds, not milliseconds. I'll update the function description to say "...to a UNIX timestamp in seconds".
Also, the reason I stored the state as a date-time formatted string instead of an actual datetime
objects was because when I retrieved the state after restarting the task, the retrieved state wasn't a datetime
object but was the ISO formatted string. I'll update the helper function names to be clearer that they're accepting/returning a string and not a datetime
object.
""" | ||
API docs: https://developer.zendesk.com/api-reference/ticketing/ticket-management/incremental_exports/#incremental-organization-export | ||
|
||
Note: This stream theoretically can get stuck in a loop if 1000+ organizations are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, but per the API docs it kind of implies that this won't actually be possible, since it sounds like it'll keep returning results until one of them has a different timestamp?
... The 1000-item limit may be exceeded if items share the same timestamp ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the API docs do make it seem like it wouldn't be an issue. But we have seen other streams that used this time-based incremental export method get stuck in a loop if 1000+ resources are updated at the same time (ex: we had to move the Tickets
stream to a different pagination method because it was stuck requesting the same timestamp - PR was #1789).
I don't remember the exact number of tickets that were updated at the same time in that instance, but I think it was multiple thousands. I suspect there's a limit to how much higher that 1000-item limit can go. So if there are more resources updated in the same second than 1000 + whatever the flex amount is, the stream will get stuck.
58ce1d6
to
221b14a
Compare
…l time-based export The Organizations stream was previously using page numbers to retrieve results from Zendesk. This is an older pagination method for Zendesk Support, and we can only retrieve 10,000 documents using this endpoint. After 10k documents, Zendesk cuts us off. This meant that we weren't retrieving all organizations for users with more than 10k organizations. We can switch the Organizations stream to use "time-based" incremental exports. Instead of a page number, we use a specific start time to request all organizations that have been updated at or after that start time. Using this pagination strategy does have the potential downside of getting stuck in a loop if 1000+ organizations are updated at the exact same time, but there are not currently better ways to retrieve incremental updates for organizations. The new `SourceZendeskSupportIncrementalTimeExportStream` is an improvement on the existing `SourceZendeskSupportIncrementalExportStream` in almost every way. At a later date, we could consider refactoring out `SourceZendeskSupportIncrementalExportStream` since the new class should serve the same purpose even better. Doing so would require a lot of time intensive refactoring & inheritance un-tangling that are so I've left that for future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
221b14a
to
387feb6
Compare
Description:
The
Groups
andOrganizations
streams were missing data. This PR addresses the issues causing these streams to miss data.Groups
Deleted groups were not being retrieved previously. By including the
exclude_deleted
query param and setting it tofalse
, the Zendesk Support API is now returning deleted groups.Organizations
Only 10k organizations were being retrieved previously. The
Organizations
stream was using page numbers to paginate through results. This is an older, but still supported, pagination method for Zendesk Support, and they only allow you to paginate through 10k results even if there are more. Using the page number based pagination method was causing us to only retrieve 10k documents and miss any remaining ones.I've updated the
Organizations
stream to use Zendesk's time-based incremental export endpoint. This endpoint paginates by providing a start time, and all resources created/updated at or after that start time are returned. This lets us retrieve all resources instead of just a subset. This pagination strategy does have the potential downside of getting stuck in a loop if 1000+ (the max number of results per page) resources are updated at the exact same time, but there aren't currently better ways to retrieve incremental updates forOrganizations
.Capture snapshot changes are expected. The new time-based incremental export endpoint adds a
deleted_at
property to each organization resource.Workflow steps:
(How does one use this feature, and how has it changed)
Documentation links affected:
(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)
Notes for reviewers:
Tested on a local stack with a Zendesk account that had deleted groups and more than 10k organizations. Confirmed:
All
Organizations
bindings for existing tasks need to be backfilled after merging to clear out previous state.This change is