Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QA/QC checks for mid-am deprish/ferc/eia data #212

Closed
4 of 6 tasks
cmgosnell opened this issue Apr 25, 2022 · 5 comments
Closed
4 of 6 tasks

QA/QC checks for mid-am deprish/ferc/eia data #212

cmgosnell opened this issue Apr 25, 2022 · 5 comments
Assignees
Labels
deprish_ferc Connecting Depreciation to FERC rmi use on all issues in this repo (for project management tools) tests

Comments

@cmgosnell
Copy link
Member

cmgosnell commented Apr 25, 2022

  • Check data integrity of deprish-ferc-eia data through OUR processing steps (i.e. do any of our data processing steps alter the data - if original plant_balance was $100, the output's plant balance after being connected, aggregated and/or allocated to the generator level should still be $100). Usually errors in processing come from 1) missing or wrong connections (between FERC and EIA or between Deprish and EIA) OR 2) some bug in the code! Finding the plants or utilities will involve running the tests (effectively running the tests in test/integration/rmi_out_test.test_consistency_of_data_stages):
pudl_out = pudl.output.pudltabl.PudlTabl(
    sa.create_engine(pudl.workspace.setup.get_defaults()["pudl_db"]),
    freq='AS',
    fill_fuel_cost=False,
    roll_fuel_cost=True,
    fill_net_gen=True,
)
rmi_out = pudl_rmi.coordinate.Output(
    pudl_out,
)
ppl, deprish, deprish_eia, ferc_eia, ferc_deprish_eia = rmi_out.run_all(clobber_all=False)
compare = pudl_rmi.validate.agg_test_data(
    df1=deprish,
    df2=ferc_deprish_eia,
    data_cols=[
        "plant_balance_w_common",
        "book_reserve_w_common",
        "unaccrued_balance_w_common",
    ],
    by=["report_year", "data_source", "utility_id_pudl", "ferc_acct_name"],
    # if looking at the individual plants
    # by=["report_year", "data_source", "utility_id_pudl", "plant_id_eia"],
)
compare.loc[[2020, 2019, 2018], ["PUC", "FERC"], 185].filter(like='book_re')
  • Compare deprish-ferc-eia data to utility-ferc_acct level reported data to form1
compare_npb = compare_df_vs_net_plant_balance(
    df=ferc_deprish_eia,
    net_plant_balance=net_plant_balance,
    data_cols= [
        "plant_balance_w_common",
        "book_reserve_w_common",
        "unaccrued_balance_w_common",
    ],
)
@cmgosnell cmgosnell added deprish_ferc Connecting Depreciation to FERC rmi use on all issues in this repo (for project management tools) tests labels Apr 25, 2022
@cmgosnell
Copy link
Member Author

cmgosnell commented May 9, 2022

I've learned that the Mid-Am 2020 PUC data is ONLY IOWA.

Because of this... we should probably be using pudl_rmi.formatter_optimus.select_from_deprish_ferc1 with priority_data_source="PUC" and include_non_priority_data_source=True so we can grab the Iowa PUC plants and then the FERC study plants

update: we are going to use FERC 2020 data only!!

@cmgosnell
Copy link
Member Author

cmgosnell commented May 10, 2022

on the data processing integrity between deprish and ferc_deprish_eia

plant balance looks great except for the hydro plants
image.png
(this is bc the deprish study includes Hydraulic Prod Plant that doesn't obviously line up to a plant - the non-plant records are added back into ferc_deprish_eia.)
book_reserve and unaccrued_balance are all go's!
image.png
image.png

@cmgosnell
Copy link
Member Author

cmgosnell commented May 10, 2022

on comparing ferc_deprish_eia to FERC1 utility-ferc_acct plant balance data

  • looking great: Transmission, Distribution, and Nuclear
  • looking weird: Other. My suspicion is that Other includes all of the Renewables. Removing the full npb renewables value from the depreciation data still has Other at 18% over the npb Other value, but that is in the right ballpark!
    image.png

Next steps:

  • go through and ensure the ferc account classification in the depreciation study is correct.

Update:
The hub team separates the reported other category into other_fossil and renewables based on the capex_total from the plant tables. Because of this, I'm going to re-aggregate the hubs' other categories back into one other category so it is directly comparable. With that change integrated... all the plant categories look good except for steam and hydraulic (this is smaller so we can probably ignore).

image.png

Next steps:

  • are there any duplicated or miss-classified steam plants? ... Nope! It looks like there are duplicates

@cmgosnell
Copy link
Member Author

cmgosnell commented May 12, 2022

on comparing deprish-ferc-eia to mid-am proprietary data.

There are eight plants that are showing up with negative book reserves.

They are all wind farms! 🌬️ ...? they nearly all have 0 for net salvage.. which is in the original data but seems like it could be wrong.
image

5/17 update

The wind plants with negative book reserve stem from the ferc account # 344 lines (the generators).

Here is the original records. These are essentially values reported directly to the edcfu table:
image

These are our filled-in values:
image

This is how we are filling in the book reserve:

filled_df["depreciation_annual_epxns"] = (
    filled_df.depreciation_annual_rate
    * filled_df["plant_balance"]
)

filled_df[f"book_reserve"] = (
    filled_df["plant_balance"]
    - (
        filled_df["depreciation_annual_epxns"]
        * filled_df.remaining_life_avg
    )
)

I've confirmed that the agreed upon methodology for filling in these values is behaving as expected.

@cmgosnell
Copy link
Member Author

closing this for now. The few remaining oddities in the BHE data we will work out as needed in new issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprish_ferc Connecting Depreciation to FERC rmi use on all issues in this repo (for project management tools) tests
Projects
None yet
Development

No branches or pull requests

2 participants