-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lebovits/issu1015 cleanup new pipeline #1037
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
hey @rmartinsen if you have the bandwidth, it'd be a huge help to get a PR review from you, specifically looking at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall there are lots of improvements here. Adding ruff, docstrings and config are all nice changes to make to the code. I left some comments about structuring the code. Adding some helper functions and consolidating error handling could really improve readability and maintainability going forward.
I don't have enough context to really speak on the logic, but none of the comments I left as dealbreakers, so take them or leave them as you see appropriate.
data/src/main.py
Outdated
END IF; | ||
END $$; | ||
""") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the situation where a table exists without having this column? Ideally table creation would be isolated from data processing. We could consider adding the column in opa_properties
, using a "add_create_date" flag if there cases where we don't want to create the date.
For sure! I left some comments. None of them are that major, but might help structure the code a little better. Please let me know if any of it doesn't make sense. I'm happy to clarify. I also want to emphasize that the code is perfectly fine without these changes so feel free to push back or ignore them. |
Made a bunch of progress in these latest commits--I've modularized the data loading and database connections that were previously in |
Q to consider before finalizing: how will tsdb handle a new column added to an existing table? Will it gracefully add the column and impute NAs in previous timestamps of the table? Or will it recreate the entire table? |
Major cleanup of new ETL pipeline:
pip
dependenciesruff
priority_level
calculation to use z-scores instead of percentilespg_stat
reporting to slack channel to monitor table sizesA couple of outstanding items that need to be done:
data-diff
and diff reporting to Slack- [ ] Make sure to notify the FE team that the new tiles will have a new name
- [ ] If possible, make sure that this and other relevant data are written to subdirs in GCS, per that old ticket from Nico