Should the SDK accommodate for differing start_date
input formats
#855
Replies: 1 comment 1 reply
-
@aaronsteers thanks for starting this discussion! I'm sold on a global However, we do allow arbitrary data types in replication keys and that can lead to unexpected behavior when using The solution seems to be to limit use of Note that we will need to communicate this as a breaking change. Finally, we can't really limit divergence because, in my opinion, it's ultimately caused by the myriad API "standards" in the wild. The Singer SDK should at least offer the reusable building blocks that make it easier for the community to debug and contribute to any tap. I think of frameworks like Django: if a developer is familiar with the standard project layout and framework APIs, they can jump from one project to another and hit the ground running. This facilitates adoption, forking, etc. |
Beta Was this translation helpful? Give feedback.
-
I read "should" as a hard requirement and I have some strong opinions as of now about allowing integer datetime values for the standard
start_date
config input. I did not know we had any cases in the wild already where start_date was epoch/unix timestamp based and my preference is to unwind these or implement the custom fixes/overrides there in those taps.Again, because
start_date
format is actually dictated by the spec (according to my reading, at least), orchestrators and users would reasonably expect to be able to send a consistent stringified date value to all taps in a project. If we diverge on this point and encourage tap developers to diverge, then we break assumptions in a way that will create silent failures and buggy behavior for our end users. For instance, trying to do an alpha-based greater-than-or-less-than comparison between a stringified integer and a stringified datetime will not fail but it will more likely just produce a wrong result. Many non-SDK taps which we cannot control are simply running a naive string comparison, which would work for the suggested standard and fail silently when mixed and matched with an epoch/integer format.I'm not going to say tap developers can't go against the standard, but I think the SDK should keep with the safe prescriptive guidance here, along a path that provides the most consistent experience for users.
Sidebar: in a past life my own team's CI/CD environments calculated dynamic date values of "yesterday" or "48 hours ago" for faster EL tests and we sent the same input to all taps in our project. I also have return a stack overflow answer prescribing the same and giving sample code to create an environment variable containing the correct string format using Linux or Mac. In theory, this is 100% safe according to spec but our solution would have broken for any cases where the expected input format differed across taps.
I think there's value in preserving ability to have a "project-wide" and Singer-wide filtering capability using a consistent convention - again for users and also for orchestrators.
Last point, there may be intuitions to tightly couple the tap config interface to the upstream API's behavior and I would strongly discourage us from that approach. For one, it implies that the API behaviors are monolithic, where in fact each stream in a tap (or even future-added streams to the tap) is allowed to have a completely different internal data type expectation. By keeping with a standard recommendation on the config/UX side, we keep the external user interface the same regardless of internals of the upstream APIs.
Secondly, the tap users (and downstream targets!) should be reasonably isolated from internal storage preferences of the distinct quirks and idioms of every upstream API. If something should arrive and be stored as a datetime, we should send it to the target as such.
Lastly, to be clear, I'm not saying the standard stringified date format is better or more preferable to epoch or Unix time - but given that the choice is already made, and given there's a high cost of supporting different taps on different standards, I do not think we should accommodate or support a divergence in this area.
Beta Was this translation helpful? Give feedback.
All reactions