-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Connector Registration and Dynamic Dispatch #1375
Conversation
d4f5a7e
to
b92c85b
Compare
I like this a lot. I had taken a slightly different approach in #1282 and while there are things I like about my approach, as written it still tethers too closely to ingest. There are things I like about keeping it more object oriented (going for a method where we parse a |
So then this also aligns with where ingest should try to move towards - mainly, getting all the "template"/"config" pydantic models into |
I like this too. Is it possible to define required/optional args at a connector level rather than unpacking kwargs and receiving an error when pushing/pulling? Or would it conflict with the underlying ABC class? Thinking of having some level of type safety |
b92c85b
to
027e905
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1375 +/- ##
==========================================
- Coverage 70.74% 70.50% -0.25%
==========================================
Files 119 124 +5
Lines 6041 6093 +52
Branches 706 702 -4
==========================================
+ Hits 4274 4296 +22
- Misses 1617 1650 +33
+ Partials 150 147 -3 ☔ View full report in Codecov by Sentry. |
be2cfcb
to
24e1a06
Compare
@fvankrieken @sf-dcp I've changed things to find a nice potential middle-ground where we can type-safety, by just binding/configuring connectors at the lifecycle level. The idea being that for |
24e1a06
to
2a05958
Compare
e402142
to
d86b864
Compare
4099b97
to
72c281f
Compare
@damonmcc FYI, my big refactor of the I also need to hook this up to GHA. But... as of now, there's not a single reference to |
d880a61
to
f7fe1da
Compare
@@ -87,10 +87,10 @@ def get_datasets_by_id(self) -> dict[str, DatasetMetadata]: | |||
def query_destinations( | |||
self, | |||
*, | |||
datasets: set[str] | None = None, | |||
datasets: frozenset[str] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
immutability helps the ol' typechecker.
f7fe1da
to
30206f8
Compare
@damonmcc I think I'm pretty much done here. Feel free to go ahead and implement that FTP connector. Or we can just review this and merge. Another real connector might give us a better sense of what we actually need though. |
30206f8
to
fb38653
Compare
a lotta changes here so let's get it merged and I'll add FTP in a follow-up. I'll review and let you know if a walkthrough would be helpful shall I rebase and |
fb38653
to
7c41cb6
Compare
7c41cb6
to
23bc075
Compare
also add mock dispatcher for FTP to illustrate the concept
23bc075
to
305f3f8
Compare
@damonmcc cool, marked as ready for review. Happy to walk through in person, if anything is murky. It's definitely a shotgun surgery PR. |
@@ -43,9 +52,17 @@ jobs: | |||
image: nycplanning/build-base:latest | |||
env: | |||
PUBLISHING_BUCKET: edm-publishing | |||
RECIPES_BUCKET: edm-recipes | |||
PRODUCT_METADATA_REPO_PATH: product-metadata | |||
_TYPER_STANDARD_TRACEBACK: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this disables the use of rich
for error messages. are "pretty" tracebacks in github action logs hard to read when distributing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the context variables from typer are humongous in this case, so it's a real pain to track back in the logs. Honestly, I really dislike having the full context all the time. It might be nice to tie this to the to log-level. (ie enable rich for debug level logs, but not for info)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! separate from this, would love to set our test coverage to ignore CLI code and then revisit coverage of stuff like package_and_distribute()
Implements dynamic (ie non-static / hardcoded) dispatcher for our connectors, at the lifecycle level. The idea is that if you define an destination type as, say,
ginger_ftp
as belowthen the distribution code should delegate the push to a connector registered as
ginger_ftp
.Impetus
This approach would enable others to register custom connectors. E.g. if another agency needed a
snowflake
connector, they could just register one. (example in the test case). We can implement something similar for ingest and package.Changes
lifecycle.distribute
to be a generic distributor (ie there should be nothing specific to Socrata or BYTES). The distributor should delegate push/pull requests to whatever connector is registered by the dispatcher.lifecycle.scripts
.Other Notes
distribute.to_dataset_destination
feels really nice. However, there's some work to be done here, as some of the methods I added have a stutter, e.g.package.validate_package(...)
. It would be nice to just callpackage.validate(...)
BUT there's also apackage.validate
module so we'd have a clash. There are a few ways around this, e.g. renaming the file topackage/_validate.py
. Open to thoughts!Unpack
for kwargs, but I'm not convinced it's actually all that helpful here. I might be nicer to just use a class.dcpy/test/models/test_connectors.py
is mostly for my sake to understand what variance/covariance mean for typing in the context of protocols vs regular functions. I'm happy to just delete it before merging.TODOS
ConnectorDispatcher
for result types.lifecycle.scripts.package_and_distribute
has a little more logic about distribution than it ought to. I started moving this intolifecycle.distribute
, but it got a little messy.Minor Changes
distribute
and other timesdistribution
. Standardize ondistribute
.Successful Job showing what a failure looks like:
https://github.com/NYCPlanning/data-engineering/actions/runs/13018821848