-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Initialize max_replication_key_value
via SELECT max(<replication_key>) ...
before starting a native BATCH
sync
#976
Comments
@edgarrmondragon I guess incremental replication with |
_sync_batch
does not handle STATE
Yeah, so the implementation of
They should be called at some point, but the latter relies on record-by-record sync so it's not useful really... |
Makes sense 🙂 In terms of the mechanism to retrieve the bookmark (because it cannot be automatically inferred from the batch itself without decompressing and reading the last line of the last file 😱), what do you think of allowing the user to return it? Something like: for encoding, manifest, state in self.get_batches(batch_config, context):
... where |
@kgpayne - thanks for raising this! @edgarrmondragon and I spoke about it today in our regular 1:1. What about this approach?
Other notes:
|
_sync_batch
does not handle STATE_sync_batch
does not handle STATE
_sync_batch
does not handle STATE
STATE
support for BATCH
messages in SQLStreams
@aaronsteers approach sounds good 👍 What are the next steps? Would you like me to work on this, or will we ship |
@kgpayne - Let's ship batch messaging as soon as it's available on tap- and target-snowflake. We need the example implementation as soon as possible for community members to reference. If we can put placeholders, at discussed, that would light up based on upstream SDK capability getting implemented, that would be ideal. Code comments and a note in the readme, with cross-links to the logged issue(s), would be much appreciated. |
@aaronsteers based on your comment above, how does:
square with #1011? In my mind, they are the same feature - i.e. min, max and limit values are passed in the It would be good to get to a complete spec for @edgarrmondragon FYI |
@kgpayne - Following from our past conversations, I logged a new proposal that would suggests a This would eliminate the need for the developer directly calling This also bakes in the |
@kgpayne and @edgarrmondragon - The above discussion regarding how to communicate constraints to The critical path item here, I think, is to something like the following to capture the
Does this sound right? @kgpayne - If this sounds reasonable, feel free to pick up as you have availability. Delivery sooner will mean less rework for those who build on the |
STATE
support for BATCH
messages in SQLStreamsmax_replication_key_value
via SELECT max(<replication_key>) ...
before starting BATCH
sync
max_replication_key_value
via SELECT max(<replication_key>) ...
before starting BATCH
syncmax_replication_key_value
via SELECT max(<replication_key>) ...
before starting a native BATCH
sync
@kgpayne - Do I remember correctly that you had this fixed, either in SDK or in the Snowflake implementation? |
@aaronsteers I think you might be thinking of this PR 🙂: |
@kgpayne Thanks. I didn't realize until now that you'd added the commits into the existing PR. Definitely we should try to merge that and get into the SDK. I'll see if I can get is merged while you are out next week, and maybe Derek can assist with adapting to the SDK. |
As the fix (linked above) has now merged into |
What's the progress of this? |
ping @kgpayne since this is assigned to you. I've added it to the SDK v1 issue as well. |
@tayloramurphy @luisvicenteatprima this is implemented in @edgarrmondragon FYI this was one of my next up which would be good to hand over 🐣 |
This is on my TODO list for the week. Update: PR in #1894 |
Any progress on this? There haven't been any updates in the PR for almost 2 months. |
Bumping to see if there's any progress on this? Thank you! |
UPDATE (from @aaronsteers) per #976 (comment):
Since this is high urgency, I took an initial pass over the break here on tap-snowflake:
That work can be migrated here but I wanted to start with a 'real' implementation for testing and to prove the design approach.
Details
Singer SDK Version
0.10.0
Python Version
3.8
Bug scope
Taps (catalog, state, stream maps, etc.)
Operating System
macOS
Description
When passing previous state to a tap with batch mode enabled, that state is not made available via the documented
Stream.get_starting_timestamp()
andStream.get_starting_replication_key_value()
methods.I believe this is because of missing setup of a
starting_replication_value
, which the methods above depend on to retrieve state. This is easiest to see when comparing theStream._sync_records()
andStream._sync_batches()
methods onsinger_sdk.streams.core.Stream
(snippets below). I think the critical missing call isself._write_starting_replication_value(current_context)
.https://github.com/meltano/sdk/blob/main/singer_sdk/streams/core.py#L1034
Code
The text was updated successfully, but these errors were encountered: