Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Normalize VCE RARE spellings of great lakes. #4029

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

krivard
Copy link

@krivard krivard commented Jan 27, 2025

Overview

Closes #4007 .

What problem does this address?

The names of a couple of the great lakes in the VCE RARE dataset had unusual or creative spellings. This PR normalizes the spelling of Lake Huron and Lake Saint Clair in VCE RARE.

What did you change?

  • New function _spot_fix_great_lakes_values called by _prep_lat_long_fips_df to catch the cell values
  • New function _spot_fix_great_lakes_columns called by one_year_hourly_available_capacity_factor to catch the column names

Documentation

Make sure to update relevant aspects of the documentation.

Tasks

Preview Give feedback

Testing

How did you make sure this worked? How can a reviewer verify this?

  1. Materialize the core_vcerare asset group in Dagster
  2. Examine parquet file out_vcerare__hourly_available_capacity_factor and observe:
    • lake_hurron does not appear in column county_id_fips; lake_huron does appear
    • lake_st_clair does not appear in column county_id_fips; lake_saint_clair does appear

e.g.

import duckdb
from pudl.workspace.setup import PudlPaths

ddb = duckdb.read_parquet(str(PudlPaths().parquet_path("out_vcerare__hourly_available_capacity_factor")))
duckdb.query("""
SELECT county_or_lake_name, COUNT(*)
FROM ddb 
WHERE county_or_lake_name in ('lake_hurron', 'lake_st_clair')
GROUP BY county_or_lake_name""")
duckdb.query("""
SELECT county_or_lake_name, COUNT(*)
FROM ddb 
WHERE county_or_lake_name in ('lake_huron', 'lake_saint_clair')
GROUP BY county_or_lake_name""")

To-do list

Preview Give feedback

@krivard krivard requested a review from jdangerx January 27, 2025 19:13
@krivard krivard self-assigned this Jan 27, 2025
@krivard
Copy link
Author

krivard commented Jan 27, 2025

RTD build is failing due to Dagster removing their intersphinx .inv file; see also dagster-io/dagster#27328

@zaneselvans
Copy link
Member

I've commented out the Dagster intersphinx line on main so if you merge that in the docs build should work again.

Copy link
Member

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think we're probably good to merge without this, but also we might consider adding lakes Hurron and St. Clair to the check_unexpected_counties asset check.

@@ -369,6 +391,7 @@ def check_rows(context: AssetCheckExecutionContext) -> AssetCheckResult:
row_counts = {
"etl_full": 136437000,
"etl_fast": 27287400,
"__ASSET_JOB": 136437000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FMI: Is this is the default job name when you materialize an asset outside of a job?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could swap this to the annual checks like we do for ferc714

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

Successfully merging this pull request may close these issues.

Fix some place names for VCE RARE data
4 participants