Fix validation `test_fbp_ferc1_mismatched_fuels` error #3025

cmgosnell · 2023-11-07T22:03:42Z

PR Overview

We were getting the following error:

FAILED test/validate/fbp_ferc1_test.py::test_fbp_ferc1_mismatched_fuels[ferc1_annual] - AssertionError: Too many records (9.69%) have mismatched primary fuel types.

I determined that this was happening bc the nulls in the fuel table were somehow getting converted to "nan":

first solution:

A part of the pudl.analysis.classify_plants_ferc1.fuel_by_plant_ferc1 step does convert nulls to "" and there is a post-processing step that we were already applying (pudl.analysis.classify_plants_ferc1.revert_filled_in_string_nulls). I simply added "nan" in here.

Still tbh do no know why these "nan"s were all of a sudden appearing but hey.. they are gone now 👀

current solution:

it looooks as if the "nan" is added in right here...

for fuel_str in fuel_categories:
    try:
        mmbtu_mask = df[f"{fuel_str}_fraction_mmbtu"] > thresh
        df.loc[mmbtu_mask, "primary_fuel_by_mmbtu"] = fuel_str
    except KeyError:
        pass

    try:
        cost_mask = df[f"{fuel_str}_fraction_cost"] > thresh
        df.loc[cost_mask, "primary_fuel_by_cost"] = fuel_str
    except KeyError:
        pass

So I stopped em in their tracks 🚥 . these columns don't exist before this loop and not all of the records get labeled with a fuel_str bc not all of the records meet the thresh so I added the whole column before the loop w/ values of pd.NA

PR Checklist

Merge the most recent version of the branch you are merging into (probably dev).
All CI checks are passing. Run tests locally to debug failures
Make sure you've included good docstrings.
For major data coverage & analysis changes, run data validation tests
Include unit tests for new functions and classes.
Defensive data quality/sanity checks in analyses & data processing functions.
Update the release notes and reference reference the PR and related issues.
Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

…ight clean up

zaneselvans · 2023-11-07T23:07:51Z

This is disconcerting. Previously the ID assignment process was expecting empty strings in there. Rather than turning the mysterious "nan” strings into np.nan at the end did you look at the place where the np.nan values are turned into empty strings on the way in?

zaneselvans

In the code snipped you cite as the source of the "nan" values, how are they getting there?

for fuel_str in fuel_categories:
    try:
        mmbtu_mask = df2[f"{fuel_str}_fraction_mmbtu"] > thresh
        df2.loc[mmbtu_mask, "primary_fuel_by_mmbtu"] = fuel_str
    except KeyError:
        pass

    try:
        cost_mask = df2[f"{fuel_str}_fraction_cost"] > thresh
        df2.loc[cost_mask, "primary_fuel_by_cost"] = fuel_str
    except KeyError:
        pass

Is it that "nan" is somehow showing up the fuel_categories? That doesn't seem right, since it's just the keys of the pudl.transform.params.ferc1.FUEL_CATEGORIES["categories"] dictionary.

src/pudl/analysis/classify_plants_ferc1.py

codecov · 2023-11-08T00:30:40Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (f25e921) 88.7% compared to head (b7533cf) 88.7%.
Report is 1 commits behind head on dev.

Additional details and impacted files

@@          Coverage Diff          @@
##             dev   #3025   +/-   ##
=====================================
  Coverage   88.7%   88.7%           
=====================================
  Files         91      91           
  Lines      11010   11011    +1     
=====================================
+ Hits        9768    9769    +1     
  Misses      1242    1242

Files	Coverage Δ
src/pudl/analysis/classify_plants_ferc1.py	`92.5% <100.0%> (+<0.1%)`	⬆️
src/pudl/output/ferc1.py	`88.2% <ø> (-0.1%)`	⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cmgosnell · 2023-11-08T14:25:32Z

Is it that "nan" is somehow showing up the fuel_categories? That doesn't seem right, since it's just the keys of the pudl.transform.params.ferc1.FUEL_CATEGORIES["categories"] dictionary.

no, the fuel cats have a specific na_category but that definitely isn't the way the "nan"s are coming from. see my comment above

cmgosnell added 2 commits November 7, 2023 16:55

idk exactly why the "nan"s began existing but this fixes it

095d31b

revert the replace of "nan" by stopping introducing them! plus some l…

14e6bc8

…ight clean up

zaneselvans reviewed Nov 7, 2023

View reviewed changes

src/pudl/analysis/classify_plants_ferc1.py Show resolved Hide resolved

src/pudl/analysis/classify_plants_ferc1.py Show resolved Hide resolved

zaneselvans approved these changes Nov 8, 2023

View reviewed changes

REALLY REALLY its a nullable string

b7533cf

cmgosnell merged commit a2bdffa into dev Nov 8, 2023
9 checks passed

cmgosnell deleted the fix_mystery_fuel_cat_nan branch November 8, 2023 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix validation `test_fbp_ferc1_mismatched_fuels` error #3025

Fix validation `test_fbp_ferc1_mismatched_fuels` error #3025

cmgosnell commented Nov 7, 2023 •

edited

Loading

zaneselvans commented Nov 7, 2023

zaneselvans left a comment

codecov bot commented Nov 8, 2023 •

edited

Loading

cmgosnell commented Nov 8, 2023

Fix validation test_fbp_ferc1_mismatched_fuels error #3025

Fix validation test_fbp_ferc1_mismatched_fuels error #3025

Conversation

cmgosnell commented Nov 7, 2023 • edited Loading

PR Overview

first solution:

current solution:

PR Checklist

zaneselvans commented Nov 7, 2023

zaneselvans left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 8, 2023 • edited Loading

Codecov Report

cmgosnell commented Nov 8, 2023

Fix validation `test_fbp_ferc1_mismatched_fuels` error #3025

Fix validation `test_fbp_ferc1_mismatched_fuels` error #3025

cmgosnell commented Nov 7, 2023 •

edited

Loading

codecov bot commented Nov 8, 2023 •

edited

Loading