-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #3005 and #3592: Hide null ORCID fields in contributor representation and correct flipped starting/ending balance for FERC1 liabilities #3969
Conversation
For more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are no changes to the database schema, we don't need to have an alembic migration. This file can be removed.
"beg_yr_bal": "ending_balance", | ||
"end_yr_bal": "starting_balance", | ||
"beg_yr_bal": "starting_balance", | ||
"end_yr_bal": "ending_balance", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omg we literally just switched them 🤦🏼 Thank you for fixing!
@@ -897,11 +897,19 @@ class Contributor(PudlMeta): | |||
"work package leader", | |||
] = "project member" | |||
organization: String | None = None | |||
orcid: String | None = None | |||
orcid: String | None | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe there's an extra | None
in this attribute definition.
def __repr__(self): | ||
"""Representation excluding None values.""" | ||
attrs = {k: v for k, v in self.__dict__.items() if v is not None} | ||
attr_str = ", ".join(f"{k}={repr(v)}" for k, v in attrs.items()) | ||
return f"{self.__class__.__name__}({attr_str})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the consequences of having omitting the attribute in all cases of None
values will be. Under the datapackage spec it's allowed, but I could also see the existence of those fields being expected on the other end where this is being used in the pudl-archiver
repo -- potentially with the None
value being used to check for completeness of the fields we want to have.
I'm also not sure if __repr__()
is the method being used to output the attributes of the class. This is an immutable Pydandic BaseModel, so getting the JSON representation is probably being done via Contributor.model_dump()
.
If you instantiate a Contributor and use model_dump()
does it behave as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this and found that redefining __repr__
doesn't affect the output of model_dump()
and it seems like the right place for this to happen is probably over in the pudl-archiver
repository since the serialization behavior we're trying to fix is specific to when we're creating a new datapackage.json
for use on Zenodo.
I went ahead and tried to make a PR that does this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there's no change to the database schema, and the null ORCID issue is (hopefully) fixed in catalyst-cooperative/pudl-archiver#486 I think we should just merge in the fix to the swapped start/end date columns.
I went ahead and merged in the null ORCID fix in the |
I brought the flipped start/end balance fix into #3979 and the null ORCID fix is made in the |
This PR addresses two distinct issues:
Issue #3005: Ensures that null ORCID fields in contributor do not appear in the datapackage.json file or in class representations.
I added a repr method to the Contributor class that excludes None values in its string representation. This ensures that fields like ORCID are omitted when not present.
Verified that this change applies consistently across all cases where Contributor class is called. This is done by checking the output values for
DataSource.from_id("eia860").contributors
, in which eia860 is replaced with other datasets in src/pudl/metadata/sources.py.Issue #3592: Corrected the flipped starting and ending balances in out_ferc1_yearly_other_regulatory_liabilities_sched278
Changed the FERC1 transformation logic to:
Assign "beg_yr_bal" to "starting_balance"
Assign "end_yr_bal" to "ending_balance"
whereas previously the fields are flipped, which caused the bug.
Verified the change actually worked by running the ETL pipeline and materialized out_ferc1_yearly_other_regulatory_liabilities_sched278, then accessed multiple data rows to confirm that the starting balance of one year is equal to the ending balance of the previous year.
Testing:
I ran make pytest-coverage and got 84 passed, 4 skipped, 6 xfailed, 2 xpassed, 888 warnings.