Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform eia861 short form #3565

Conversation

Nancy9ice
Copy link
Contributor

@Nancy9ice Nancy9ice commented Apr 14, 2024

Closes #3540.

What problem does this address?
This addresses the lack of the eia861 short_form table

What did you change?

To-do list

  • The column mapping that lets us extract this data from the spreadsheets should be updated to use the same naming conventions as the rest of PUDL. E.g. ba_code should be balancing_authority_code_eia. Most of these columns should already be defined in pudl.metadata.fields. The column mappings are stored in src/pudl/package_data/eia861/column_maps/short_form_eia861.csv
  • A new core_eia861__yearly_short_form() asset will need to be defined in pudl.transform.eia861
  • Columns will need to be cleaned up to have appropriate NA values and to conform to the appropriate data types. See the other transform functions for examples. It looks like there are a number of boolean columns which currently contain Y or N. There may also be columns where only certain values are valid, which need to be cleaned up (e.g. do all state abbreviations actually correspond to a real state?)
  • The new short_form table should be integrated into the processes downstream which compile the table of balancing authorities
  • Either the 2019 short form data will need to be extracted from the other tables where it was reported and added to this one, or the contents of this table will need to be added to all those other tables. EDIT-- this now lives here: Add 2019 data to EIA861 Short Form table #3654
  • If we're adding a new table to the PUDL DB, it will need to have a schema defined in pudl.metadata.resources.eia861

Testing

How did you make sure this worked? How can a reviewer verify this?

I confirmed that this worked after I ran the 'make pytest-coverage' command. @zaneselvans please could you run the build-deploy-pudl GitHub Action so that I'll see if there's any other thing to do in the code?

@zaneselvans
Copy link
Member

Note that 150+ auto-generated RST files that document the API have been commited in this PR. They're not supposed to be part of the repository, and are built as an intermediate step between the Python modules and the HTML output of the docs.

@Nancy9ice
Copy link
Contributor Author

Note that 150+ auto-generated RST files that document the API have been commited in this PR. They're not supposed to be part of the repository, and are built as an intermediate step between the Python modules and the HTML output of the docs.

Okay, I'll backtrack the changes and update the PR

@Nancy9ice Nancy9ice force-pushed the transform-eia861-short-form branch from 5c83ffe to 574dbbc Compare April 16, 2024 07:01
This reverts commit 574dbbc.

remove index files
@Nancy9ice Nancy9ice force-pushed the transform-eia861-short-form branch 2 times, most recently from 6d95595 to a33045a Compare April 16, 2024 07:37
@Nancy9ice Nancy9ice force-pushed the transform-eia861-short-form branch from e7e542b to dde8ccd Compare April 16, 2024 07:41
@Nancy9ice
Copy link
Contributor Author

@zaneselvans I have backtracked the changes and removed the index files that were added before

Copy link
Member

@aesharpe aesharpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @Nancy9ice !

See my comments about column names etc. (Many of which are old things we did poorly and would love for you to help fix).

Other Updates:

  • Add a note to our release_notes.rst file explaining the addition of this table
  • Add this table to the list of foreign_key_rule exceptions in the schema for core_eia860__scd_utilities in metadata/resources/eia860

src/pudl/metadata/fields.py Outdated Show resolved Hide resolved
src/pudl/metadata/fields.py Outdated Show resolved Hide resolved
src/pudl/metadata/resources/eia861.py Outdated Show resolved Hide resolved
src/pudl/metadata/resources/eia861.py Outdated Show resolved Hide resolved
src/pudl/metadata/resources/eia861.py Outdated Show resolved Hide resolved
],
"primary_key": [
"utility_id_eia",
"state",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is state definitely a primary key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is state definitely a primary key?

'state' is not the primary key independently. It is one of the variables that contribute to the compound primary key. All the fields stated under the 'primary_key' all come together to form the compound primary key. Am I wrong about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I just wanted to check! (I looked at it in a notebook and it seems like balancing_authority_code_eia doesn't need to be part of the primary key (the tables are the same length if you drop duplicates without balancing_authority_code_eia) however, it's possible that in the future this column could be part of the primary key? @cmgosnell what do you think here?

src/pudl/transform/eia861.py Outdated Show resolved Hide resolved
@aesharpe
Copy link
Member

Also @Nancy9ice take a look at #3540 if you haven't already and make sure to address the issues there too.

@aesharpe aesharpe added new-data Requests for integration of new data. eia861 Anything having to do with EIA Form 861 labels Apr 17, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@aesharpe
Copy link
Member

aesharpe commented Apr 23, 2024

Hi @Nancy9ice, thanks for these edits (and sorry it's taken a minute to respond)--

Generally speaking we like to keep our PRs pretty small. When I was suggesting changing the boolean column names, I meant those pertaining to the short_form table not the whole of PUDL. But it's awesome that you gave that a shot!! If they all work nicely it will be a great addition to PUDL. However, it's harder to test for failures when there are lots of different types of changes in one PR. I would recommend just changing the column names in the short_form table and then splitting out the other boolean column name changes into a separate PR. Once you have the column changes in a separate PR, see if you can spin up Dagster and run the ETL. That will tell you where the errors are!

That way we can test run the whole ETL on the column name changes and not have to worry about short form table errors getting in the way (or vice versa).

@Nancy9ice Nancy9ice force-pushed the transform-eia861-short-form branch from bfdd24d to f9011bd Compare April 25, 2024 18:48
@Nancy9ice
Copy link
Contributor Author

Hi @Nancy9ice, thanks for these edits (and sorry it's taken a minute to respond)--

Generally speaking we like to keep our PRs pretty small. When I was suggesting changing the boolean column names, I meant those pertaining to the short_form table not the whole of PUDL. But it's awesome that you gave that a shot!! If they all work nicely it will be a great addition to PUDL. However, it's harder to test for failures when there are lots of different types of changes in one PR. I would recommend just changing the column names in the short_form table and then splitting out the other boolean column name changes into a separate PR. Once you have the column changes in a separate PR, see if you can spin up Dagster and run the ETL. That will tell you where the errors are!

That way we can test run the whole ETL on the column name changes and not have to worry about short form table errors getting in the way (or vice versa).

Okay. I have done the edits. This time, I touched only the columns associated with the short-form table. That is, green_pricing, demand_side_management, water_heater, net_metering, and time_responsive_programs. I also ensured that these column changes were reflected in the other tables that bore these columns too. I'll await your feedback as regards these changes while working on the other changes you suggested under this pull request.

@aesharpe
Copy link
Member

Hey @Nancy9ice I added one little comment, merged in main, and did some alembic gymnastics. What do you think about @zaneselvans's comment above regarding the 2019 data?

@zaneselvans
Copy link
Member

The fact that the 2019 short form data is interwoven with all of the other data is unfortunate, and seems like it'll take a bit of work to untangle. Probably just getting a table with all the other short form data available in the database is a good incremental step forward, and we should write up a separate issue about removing all the 2019 short form data from the other tables and integrating it into the dedicated short-form table to be addressed in the future.

@aesharpe
Copy link
Member

aesharpe commented May 22, 2024

I can write up an issue for the last legs here and move forward with merging this. Might be a good_first_issue for folks.

@Nancy9ice
Copy link
Contributor Author

Hey @Nancy9ice I added one little comment, merged in main, and did some alembic gymnastics. What do you think about @zaneselvans's comment above regarding the 2019 data?

It's a good idea to have a separate issue for that because it would require series of steps just like this issue

@zaneselvans
Copy link
Member

It looks like there's some alembic migration / database schema issues in here that need to be fixed. I tried running the tests locally and got the same multiple heads problem that's causing the unit tests to fail in CI.

@Nancy9ice
Copy link
Contributor Author

It looks like there's some alembic migration / database schema issues in here that need to be fixed. I tried running the tests locally and got the same multiple heads problem that's causing the unit tests to fail in CI.

Can I get the full error so I'll see what's wrong?

@zaneselvans
Copy link
Member

@Nancy9ice I think Austen has fixed the alembic issue, but for future reference all of the checks that run on a PR provide "Details" links to the logs generated by that check, and it's a good idea to click through and take a look at what happened when one of them fails.

I'm wondering if you've installed our git pre-commit hooks since it looks like the Ruff linter is failing within the pre-commit.ci check, and that (along with the unit tests) should be getting run by the pre-commit hooks automatically whenever you try to make a commit locally.

@zaneselvans
Copy link
Member

zaneselvans commented May 23, 2024

EDIT sorry, this was my bad. I hadn't rematerialized the furthest upstream extracted EIA-861 assets and there were some column name changes.

Now I'm getting a KeyError when attempting to materialize the new core_eia861__yearly_short_form table.

dagster._core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "core_eia861__yearly_short_form":

...

KeyError: 'balancing_authority_code_eia'

  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_core/execution/plan/utils.py", line 54, in op_execution_error_boundary
    yield
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_utils/__init__.py", line 468, in iterate_with_context
    next_output = next(iterator)
                  ^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_core/execution/plan/compute_generator.py", line 141, in _coerce_op_compute_fn_to_iterator
    result = invoke_compute_fn(
             ^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_core/execution/plan/compute_generator.py", line 129, in invoke_compute_fn
    return fn(context, **args_to_pass) if context_arg_provided else fn(**args_to_pass)
                                                                    ^^^^^^^^^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl/src/pudl/transform/eia861.py", line 1247, in core_eia861__yearly_short_form
    raw_sf["balancing_authority_code_eia"] = raw_sf[
                                             ^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/pandas/core/frame.py", line 4102, in __getitem__
    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
    raise KeyError(key) from err

...

@Nancy9ice
Copy link
Contributor Author

@Nancy9ice I think Austen has fixed the alembic issue, but for future reference all of the checks that run on a PR provide "Details" links to the logs generated by that check, and it's a good idea to click through and take a look at what happened when one of them fails.

I'm wondering if you've installed our git pre-commit hooks since it looks like the Ruff linter is failing within the pre-commit.ci check, and that (along with the unit tests) should be getting run by the pre-commit hooks automatically whenever you try to make a commit locally.

Oh, I think I skipped that. I'll add that now.

@aesharpe
Copy link
Member

Here's the issue for the 2019 data! #3654

@aesharpe
Copy link
Member

Now I'm getting a KeyError when attempting to materialize the new core_eia861__yearly_short_form table.

I was able to successfully materialize the core_eia861__yearly_short_form table and all it's predecessors. Not sure what might be going on here.

@zaneselvans
Copy link
Member

These changes have been merged in via #3660

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eia861 Anything having to do with EIA Form 861 new-data Requests for integration of new data.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Transform the EIA-861 short form table
3 participants