Fixes #3005 and #3592: Hide null ORCID fields in contributor representation and correct flipped starting/ending balance for FERC1 liabilities #3969

yolandazzz13 · 2024-11-19T02:27:38Z

This PR addresses two distinct issues:

Issue #3005: Ensures that null ORCID fields in contributor do not appear in the datapackage.json file or in class representations.

I added a repr method to the Contributor class that excludes None values in its string representation. This ensures that fields like ORCID are omitted when not present.
Verified that this change applies consistently across all cases where Contributor class is called. This is done by checking the output values for DataSource.from_id("eia860").contributors, in which eia860 is replaced with other datasets in src/pudl/metadata/sources.py.

Issue #3592: Corrected the flipped starting and ending balances in out_ferc1_yearly_other_regulatory_liabilities_sched278

Changed the FERC1 transformation logic to:
Assign "beg_yr_bal" to "starting_balance"
Assign "end_yr_bal" to "ending_balance"
whereas previously the fields are flipped, which caused the bug.

Verified the change actually worked by running the ETL pipeline and materialized out_ferc1_yearly_other_regulatory_liabilities_sched278, then accessed multiple data rows to confirm that the starting balance of one year is equal to the ending balance of the previous year.

Testing:
I ran make pytest-coverage and got 84 passed, 4 skipped, 6 xfailed, 2 xpassed, 888 warnings.

For more information, see https://pre-commit.ci

zaneselvans · 2024-11-21T04:17:14Z

migrations/versions/a56fa0739cd7_add_my_cool_table.py

Since there are no changes to the database schema, we don't need to have an alembic migration. This file can be removed.

zaneselvans · 2024-11-21T04:17:44Z

src/pudl/transform/params/ferc1.py

-                    "beg_yr_bal": "ending_balance",
-                    "end_yr_bal": "starting_balance",
+                    "beg_yr_bal": "starting_balance",
+                    "end_yr_bal": "ending_balance",


omg we literally just switched them 🤦🏼 Thank you for fixing!

zaneselvans · 2024-11-21T04:29:35Z

src/pudl/metadata/classes.py

@@ -897,11 +897,19 @@ class Contributor(PudlMeta):
        "work package leader",
    ] = "project member"
    organization: String | None = None
-    orcid: String | None = None
+    orcid: String | None | None = None


I think maybe there's an extra | None in this attribute definition.

zaneselvans · 2024-11-21T04:41:38Z

src/pudl/metadata/classes.py

+    def __repr__(self):
+        """Representation excluding None values."""
+        attrs = {k: v for k, v in self.__dict__.items() if v is not None}
+        attr_str = ", ".join(f"{k}={repr(v)}" for k, v in attrs.items())
+        return f"{self.__class__.__name__}({attr_str})"


I'm not sure what the consequences of having omitting the attribute in all cases of None values will be. Under the datapackage spec it's allowed, but I could also see the existence of those fields being expected on the other end where this is being used in the pudl-archiver repo -- potentially with the None value being used to check for completeness of the fields we want to have.

I'm also not sure if __repr__() is the method being used to output the attributes of the class. This is an immutable Pydandic BaseModel, so getting the JSON representation is probably being done via Contributor.model_dump().

If you instantiate a Contributor and use model_dump() does it behave as expected?

I tested this and found that redefining __repr__ doesn't affect the output of model_dump() and it seems like the right place for this to happen is probably over in the pudl-archiver repository since the serialization behavior we're trying to fix is specific to when we're creating a new datapackage.json for use on Zenodo.

I went ahead and tried to make a PR that does this

zaneselvans

Given that there's no change to the database schema, and the null ORCID issue is (hopefully) fixed in catalyst-cooperative/pudl-archiver#486 I think we should just merge in the fix to the swapped start/end date columns.

zaneselvans · 2024-11-30T18:46:43Z

I went ahead and merged in the null ORCID fix in the pudl-archiver repo.

zaneselvans · 2024-11-30T20:57:07Z

I brought the flipped start/end balance fix into #3979 and the null ORCID fix is made in the pudl-archiver repo so I'm going to close this. Thank you @yolandazzz13 for moving this fixes forward!

yufeng zhou and others added 5 commits November 16, 2024 20:46

Migration: added my cool table

b42a9de

solved issues catalyst-cooperative#3592 and catalyst-cooperative#3005

6ec10e3

Merge branch 'main' into main

d9ea1b2

[pre-commit.ci] auto fixes from pre-commit.com hooks

ab2d144

For more information, see https://pre-commit.ci

Merge branch 'main' into main

923d843

zaneselvans reviewed Nov 21, 2024

View reviewed changes

zaneselvans assigned yolandazzz13 Nov 21, 2024

This was linked to issues Nov 21, 2024

If ORCID null, don't include in datapackage #3005

Closed

Flipped starting/ending balance in out_ferc1_yearly_other_regulatory_liabilities_sched278 #3592

Closed

zaneselvans mentioned this pull request Nov 27, 2024

Exclude null values from contributor serialization catalyst-cooperative/pudl-archiver#486

Merged

Merge branch 'main' into main

d6951ab

zaneselvans requested changes Nov 27, 2024

View reviewed changes

zaneselvans mentioned this pull request Nov 30, 2024

Correct flipped starting/ending balance column rename. #3979

Merged

zaneselvans closed this Nov 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #3005 and #3592: Hide null ORCID fields in contributor representation and correct flipped starting/ending balance for FERC1 liabilities #3969

Fixes #3005 and #3592: Hide null ORCID fields in contributor representation and correct flipped starting/ending balance for FERC1 liabilities #3969

yolandazzz13 commented Nov 19, 2024

zaneselvans Nov 21, 2024

zaneselvans Nov 21, 2024

zaneselvans Nov 21, 2024

zaneselvans Nov 21, 2024

zaneselvans Nov 27, 2024

zaneselvans left a comment

zaneselvans commented Nov 30, 2024

zaneselvans commented Nov 30, 2024

Fixes #3005 and #3592: Hide null ORCID fields in contributor representation and correct flipped starting/ending balance for FERC1 liabilities #3969

Fixes #3005 and #3592: Hide null ORCID fields in contributor representation and correct flipped starting/ending balance for FERC1 liabilities #3969

Conversation

yolandazzz13 commented Nov 19, 2024

zaneselvans Nov 21, 2024

Choose a reason for hiding this comment

zaneselvans Nov 21, 2024

Choose a reason for hiding this comment

zaneselvans Nov 21, 2024

Choose a reason for hiding this comment

zaneselvans Nov 21, 2024

Choose a reason for hiding this comment

zaneselvans Nov 27, 2024

Choose a reason for hiding this comment

zaneselvans left a comment

Choose a reason for hiding this comment

zaneselvans commented Nov 30, 2024

zaneselvans commented Nov 30, 2024