Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform eia861 short form #3565

Closed
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
e37f59e
updated short_form_eia861.csv
Nancy9ice Apr 13, 2024
2ac0190
updated eia861.py
Nancy9ice Apr 13, 2024
6ce3705
updated pudltabl.py
Nancy9ice Apr 13, 2024
9f84516
updated eia861.py
Nancy9ice Apr 13, 2024
574dbbc
Migration: added core_eia861_short_form
Nancy9ice Apr 14, 2024
a33045a
Revert "Migration: added core_eia861_short_form"
Nancy9ice Apr 16, 2024
dde8ccd
updated fields
Nancy9ice Apr 16, 2024
40831bc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 16, 2024
6a2e5fc
Merge branch 'main' into transform-eia861-short-form
Nancy9ice Apr 16, 2024
ebda8ca
updated fields
Nancy9ice Apr 18, 2024
8fa1ac9
Merge branch 'catalyst-cooperative:main' into transform-eia861-short-…
Nancy9ice Apr 18, 2024
f9011bd
update boolean columns associated with short-form table
Nancy9ice Apr 25, 2024
6992a23
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 25, 2024
da5606c
updated
Nancy9ice Apr 26, 2024
4a8ab46
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 26, 2024
0123d39
updated short_form transformations
Nancy9ice Apr 26, 2024
9c07b29
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 26, 2024
6ec2e2e
updated boolean column associated with short_form table
Nancy9ice Apr 26, 2024
1e2427c
updated boolean columns associated with short_form table
Nancy9ice Apr 26, 2024
fe20029
Merge branch 'catalyst-cooperative:main' into transform-eia861-short-…
Nancy9ice Apr 26, 2024
ec145d1
Migration: added short_form table
Nancy9ice Apr 26, 2024
39428e9
Updated exclusions in core_eia860__scd_utilities
Nancy9ice Apr 26, 2024
d555a80
Updated release_notes.rst
Nancy9ice Apr 26, 2024
55acf8e
Update eia861.py
Nancy9ice May 4, 2024
3972cd4
Made suggested changes
Nancy9ice May 19, 2024
cf51b0e
Merge with main
aesharpe May 22, 2024
1f00d77
Merge branch 'main' into transform-eia861-short-form
aesharpe May 23, 2024
289aaae
Fix alembic
aesharpe May 23, 2024
e444626
fix alembic again
aesharpe May 23, 2024
91a3571
Move changes to EIA860 schema to the newest migration instead of the …
aesharpe May 23, 2024
db8b823
Merge branch 'main' into transform-eia861-short-form
aesharpe May 23, 2024
3aefd2e
Merge branch 'main' into transform-eia861-short-form
zaneselvans May 23, 2024
5a522ca
Fix docstring formatting.
zaneselvans May 23, 2024
c1f79be
Merge branch 'main' into transform-eia861-short-form
zaneselvans May 23, 2024
11dea5d
Merge branch 'main' into transform-eia861-short-form
aesharpe May 27, 2024
8068520
Merge branch 'transform-eia861-short-form' of https://github.com/Nanc…
aesharpe May 27, 2024
6153600
Update pre-commit and dependencies to push as Zane
zaneselvans Jun 2, 2024
c89149d
Add default request timeout.
zaneselvans Jun 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions src/pudl/metadata/fields.py
Original file line number Diff line number Diff line change
Expand Up @@ -892,6 +892,10 @@
"description": "Annual demand per km2 of a given service territory.",
"unit": "MWh/km2",
},
"demand_side_management": {
"type": "boolean",
"description": "Were there strategies or measures used to control electricity demand by customers?",
},
Nancy9ice marked this conversation as resolved.
Show resolved Hide resolved
"depreciation_type": {
"type": "string",
"description": (
Expand Down Expand Up @@ -1715,6 +1719,10 @@
"type": "string",
"description": "Category of geographic aggregation in EIA bulk electricity data.",
},
"green_pricing": {
"type": "boolean",
"description": "Was there green pricing program associated with this utility during the reporting year?",
},
Nancy9ice marked this conversation as resolved.
Show resolved Hide resolved
"green_pricing_revenue": {
"type": "number",
"description": (
Expand Down
31 changes: 31 additions & 0 deletions src/pudl/metadata/resources/eia861.py
Original file line number Diff line number Diff line change
Expand Up @@ -694,6 +694,37 @@
"sources": ["eia861"],
"etl_group": "eia861",
},
"core_eia861__yearly_short_form": {
"description": "Abbreviated version of the EIA-861 data.",
Nancy9ice marked this conversation as resolved.
Show resolved Hide resolved
"schema": {
"fields": [
"report_date",
"utility_id_eia",
"utility_name_eia",
"entity_type",
"state",
"balancing_authority_code_eia",
"sales_revenue",
"sales_mwh",
"customers",
"water_heater",
Nancy9ice marked this conversation as resolved.
Show resolved Hide resolved
"net_metering",
Nancy9ice marked this conversation as resolved.
Show resolved Hide resolved
"demand_side_management",
"time_responsive_programs",
"green_pricing",
"data_maturity",
],
"primary_key": [
"utility_id_eia",
"state",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is state definitely a primary key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is state definitely a primary key?

'state' is not the primary key independently. It is one of the variables that contribute to the compound primary key. All the fields stated under the 'primary_key' all come together to form the compound primary key. Am I wrong about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I just wanted to check! (I looked at it in a notebook and it seems like balancing_authority_code_eia doesn't need to be part of the primary key (the tables are the same length if you drop duplicates without balancing_authority_code_eia) however, it's possible that in the future this column could be part of the primary key? @cmgosnell what do you think here?

"report_date",
"balancing_authority_code_eia",
],
},
"field_namespace": "eia",
"sources": ["eia861"],
"etl_group": "eia861",
},
"core_eia861__yearly_service_territory": {
"description": "County FIPS codes for counties composing utility service territories.",
"schema": {
Expand Down
1 change: 1 addition & 0 deletions src/pudl/output/pudltabl.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ def _register_output_methods(self: Self):
# eia861 (clean)
"core_eia861__yearly_service_territory": "service_territory_eia861",
"core_eia861__yearly_sales": "sales_eia861",
"core_eia861__yearly_short_form": "short_form_eia861",
"core_eia861__yearly_advanced_metering_infrastructure": "advanced_metering_infrastructure_eia861",
"core_eia861__yearly_demand_response": "demand_response_eia861",
"core_eia861__yearly_demand_response_water_heater": "demand_response_water_heater_eia861",
Expand Down
10 changes: 5 additions & 5 deletions src/pudl/package_data/eia861/column_maps/short_form_eia861.csv
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ utility_id_eia,1,1,1,1,1,1,1,-1,1,1,1
utility_name_eia,2,2,2,2,2,2,2,-1,2,2,2
entity_type,-1,-1,-1,3,3,3,3,-1,3,3,3
state,3,3,3,4,4,4,4,-1,4,4,4
ba_code,-1,4,4,5,5,5,5,-1,5,5,5
total_revenue,4,5,5,6,6,6,6,-1,6,6,6
total_sales,5,6,6,7,7,7,7,-1,7,7,7
total_customers,6,7,7,8,8,8,8,-1,8,8,8
balancing_authority_code_eia,-1,4,4,5,5,5,5,-1,5,5,5
sales_revenue,4,5,5,6,6,6,6,-1,6,6,6
Nancy9ice marked this conversation as resolved.
Show resolved Hide resolved
sales_mwh,5,6,6,7,7,7,7,-1,7,7,7
customers,6,7,7,8,8,8,8,-1,8,8,8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this would have consequences for other tables and so we don't want to deal with it right now, but I think this should probably be num_customers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the src.pudl.metadata.fields, there's the customers key in the JSON fields. There's no num_customers or total_customers. Please check to confirm that this is correct so I'll know if I still need to change this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "customers key"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nancy9ice what I meant here is that I think we originally did a poor job of naming this column, or that it was named before we had thought about our naming conventions, so I was suggesting that we take this opportunity to rename customers which is pretty vague, to something that provides a bit more context about the column, like customers_num or num_customers though searching through fields.py for other instances of _num and num_ it doesn't seem as if we have established a strong convention right now, so maybe we should just ignore this for now and deal with it in a future round of renaming / deeper EIA-861 integration.

water_heater,-1,8,8,9,9,9,9,-1,9,9,9
net_metering,8,9,9,10,10,10,10,-1,10,10,10
demand_side_management,9,10,10,11,11,11,11,-1,11,11,11
time_based_programs,10,11,11,12,12,12,12,-1,12,12,12
time_responsive_programs,10,11,11,12,12,12,12,-1,12,12,12
Nancy9ice marked this conversation as resolved.
Show resolved Hide resolved
green_pricing,7,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
early_release,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
43 changes: 43 additions & 0 deletions src/pudl/transform/eia861.py
Original file line number Diff line number Diff line change
Expand Up @@ -1216,6 +1216,48 @@ def core_eia861__yearly_sales(raw_eia861__sales: pd.DataFrame) -> pd.DataFrame:
return _post_process(transformed_sales)


@asset(io_manager_key="pudl_io_manager")
def core_eia861__yearly_short_form(
raw_eia861__short_form: pd.DataFrame,
) -> pd.DataFrame:
"""Transform the EIA 861 Short Form table.
Transformations include:
* Drop primary key duplicates.
* Convert N/Y values to boolean
* Change NA BA codes to 'UNK'
Nancy9ice marked this conversation as resolved.
Show resolved Hide resolved
"""
idx_cols = [
"utility_id_eia",
"state",
"report_date",
"balancing_authority_code_eia",
]

bool_cols = [
"water_heater",
"net_metering",
"demand_side_management",
"time_responsive_programs",
"green_pricing",
]

raw_sf = _pre_process(raw_eia861__short_form)
# * fill NA BA values with 'UNK'
raw_sf["balancing_authority_code_eia"] = raw_sf[
"balancing_authority_code_eia"
].fillna("UNK")

# * Drop Duplicates based on primary keys
deduped_sf = _drop_dupes(df=raw_sf, df_name="Short Form", subset=idx_cols)

# * Make Y/N's into booleans
logger.info("Performing value transformations on EIA 861 Short Form table.")
for col in bool_cols:
deduped_sf[col] = _make_yn_bool(deduped_sf[col])
Nancy9ice marked this conversation as resolved.
Show resolved Hide resolved

return _post_process(deduped_sf)


@asset(io_manager_key="pudl_io_manager")
def core_eia861__yearly_advanced_metering_infrastructure(
raw_eia861__advanced_metering_infrastructure: pd.DataFrame,
Expand Down Expand Up @@ -2440,6 +2482,7 @@ def core_utility_data_eia861(raw_eia861__utility_data: pd.DataFrame):
"core_eia861__yearly_operational_data_revenue": AssetIn(),
"core_eia861__yearly_reliability": AssetIn(),
"core_eia861__yearly_sales": AssetIn(),
"core_eia861__yearly_short_form": AssetIn(),
Nancy9ice marked this conversation as resolved.
Show resolved Hide resolved
"core_eia861__yearly_utility_data_misc": AssetIn(),
"core_eia861__yearly_utility_data_nerc": AssetIn(),
"core_eia861__yearly_utility_data_rto": AssetIn(),
Expand Down