Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply naming convention to straggler assets #3052

Closed
21 of 22 tasks
Tracked by #3061
bendnorman opened this issue Nov 15, 2023 · 11 comments · Fixed by #3294
Closed
21 of 22 tasks
Tracked by #3061

Apply naming convention to straggler assets #3052

bendnorman opened this issue Nov 15, 2023 · 11 comments · Fixed by #3294
Assignees
Labels
dagster Issues related to our use of the Dagster orchestrator
Milestone

Comments

@bendnorman
Copy link
Member

bendnorman commented Nov 15, 2023

Oofta! I thought #3029 was the last renaming PR but it turns out there are a few straggler assets that haven't been renamed.

Tasks

Preview Give feedback

None of these assets are currently saved to the database. Given no one currently expects these to be in the database, I propose we hold off on renaming these assets so we can merge #2818 and start doing data-only releases ASAP.

Questions

  • What is the frequency of the state_average_fuel_costs_eia asset?
  • Should the exploded assets be a part of the core of our output layer?
  • Should the exploded assets be written to the database?
  • What should the naming convention be for these exploded assets?

For @bendnorman: I did some exploded asset renaming in #2914

@bendnorman bendnorman moved this from New to Icebox in Catalyst Megaproject Nov 16, 2023
@bendnorman bendnorman changed the title Rename straggler assets Apply naming convention to straggler assets Nov 16, 2023
@jdangerx jdangerx moved this from Icebox to Backlog in Catalyst Megaproject Jan 2, 2024
@zaneselvans zaneselvans added the dagster Issues related to our use of the Dagster orchestrator label Jan 2, 2024
@zaneselvans
Copy link
Member

I don't know if we want to deal with it now, but there are also a bunch of columns that do not conform to our column naming conventions that I think we should rename at some point. E.g. the rolling averages for CapEx etc in the FERC 1.

I think the place to start would be going through fields.py to ensure that there aren't any obvious wierdos. Identifying any fields that appear in that module which aren't part of any resource and purging them might be a good idea.

@zaneselvans zaneselvans added this to the v2024.01 milestone Jan 12, 2024
@bendnorman bendnorman self-assigned this Jan 16, 2024
@zaneselvans
Copy link
Member

The Census DP1 naming isn't great. It really contains state, county, and tract data (in 3 different tables / assets). If we're renaming things it seems like now would be a good time to shift it to just census dp1?

@cmgosnell
Copy link
Member

Should the exploded assets be a part of the core of our output layer?

definitely output! they are so weird and modified.

Should the exploded assets be written to the database?

yes (probably)! but we've been kinda waiting to publish them. Right now the main remaining step in the way of publishing them is actually finalizing a fourth rate base table that is much more explainable and user facing. I want to wait to actually publish the current three exploded tables until we have this rate base table because really most users should never use the exploded tables and we can point them to the rate base tables. I believe rmi-ers will want access to the three exploded tables.

What should the naming convention be for these exploded assets?

out_ferc1__yearly_exploded_{core seed table base name}:

  • out_ferc1__yearly_exploded_balance_sheet_assets
  • out_ferc1__yearly_exploded_balance_sheet_liabilities
  • out_ferc1__yearly_exploded_income_statement

Notes: i think the "core seed table base name" here should not include the schedule number because these tables include multiple tables with multiple schedule numbers.

@cmgosnell
Copy link
Member

hm but maybe exploded is too... idk not informative.. maybe it should be _detailed or _granular_break_down (omigosh that's long) or something? the gist is it takes a super high level data point like total assets and converts that into as much detail as is possible via the calculations/connections between records that is embedded in the FERC data. We've been calling it explosion bc it explodes a small set of info into a large, more granular view. But that might be too colloquial.

@bendnorman
Copy link
Member Author

I like detailed or granular! IDK if those are too vague but probably more informative than exploded.

@bendnorman bendnorman moved this from Backlog to In progress in Catalyst Megaproject Jan 24, 2024
@bendnorman bendnorman linked a pull request Jan 25, 2024 that will close this issue
@bendnorman bendnorman moved this from In progress to In review in Catalyst Megaproject Jan 25, 2024
@cmgosnell
Copy link
Member

I think i like detailed!

  • out_ferc1__yearly_detailed_balance_sheet_assets
  • out_ferc1__yearly_detailed_balance_sheet_liabilities
  • out_ferc1__yearly_detailed_income_statement

@cmgosnell
Copy link
Member

cmgosnell commented Jan 25, 2024

also i don't love the bare dp1 for the census... mostly bc idk what that means. i can look it up but it doesn't tell me anything as is. how about raw_census__demographics or raw_census__demographics_dp1 in the same spirit of adding FERC1 schedules to the end of those asset names.

DP-1. Profiles of General Demographic Characteristics

also does it matter that this asset is not a table? the sql part of it does communicate something that i personally find helpful bc without that it looks like it is just a table like the majority of the assets.

@bendnorman
Copy link
Member Author

I agree a more descriptive name would be helpful. How about raw_census__demographic_profiles_dp1?

I'm not sure I understand your question. Are you asking why raw_census__dp1 is not written to the database?

@bendnorman
Copy link
Member Author

I also like detailed! One issue is that there are ~130 references to "exploded" in the code. Should we create a separate PR for renaming all instances of exploded to detailed?

@cmgosnell
Copy link
Member

I like raw_census__demographic_profiles_dp1!

I'm not sure I understand your question. Are you asking why raw_census__dp1 is not written to the database?

I mean this isn't actually a table its some weird reference to a sqlite db right?

and i personally think its fine if we keep calling it explode or explosion in the code. especially as a verb i think it makes sense. but very open to converting away from the explosion language

@bendnorman
Copy link
Member Author

Oh I see. The raw_census__demographic_profiles_dp1 asset returns a path to the database. We have some other assets that don't return dataframes. I think to be consistent we should leave the storage method out of the asset name for now.

Would you be ok with having the tables use detailed but the code use exploded for a bit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dagster Issues related to our use of the Dagster orchestrator
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants