Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract AEO Table 54 into a dataframe #3368

Closed
2 tasks done
Tracked by #3464
jdangerx opened this issue Feb 6, 2024 · 0 comments · Fixed by #3538
Closed
2 tasks done
Tracked by #3464

Extract AEO Table 54 into a dataframe #3368

jdangerx opened this issue Feb 6, 2024 · 0 comments · Fixed by #3538
Assignees
Labels
eiaaeo EIA Annual Energy Outlook gridlab Work related to open modeling input data integration funded/coordinated by GridLab new-data Requests for integration of new data.

Comments

@jdangerx
Copy link
Member

jdangerx commented Feb 6, 2024

EIA archives a bulk JSON version of the Annual Energy Outlook (AEO), but does not link to these archives from their bulk data download page. The URLs take the form:

https://www.eia.gov/opendata/bulk/AEOYYY.zip

And they are available from 2014 to 2023. There will be no 2024 AEO, while EIA updates their energy modeling system.

So - each record in this pile is a data series column, basically. There's a "data" field which has indices + corresponding data. Plus some metadata.

{
  "series_id": "AEO.2023.LNG_HP.CAP_NA_ENUS_NA_TOT_NA_NA_GW.A",
  "name": "Renewable Energy : End-Use Generators : Net Summer Capacity : Total, High LNG Price, AEO2023",
  "units": "GW",
  "f": "A",
  "start": "2021",
  "end": "2050",
  "lastHistoricalPeriod": "2021",
  "last_updated": "2023-05-04T13:04:45-04:00",
  "data": [
    [
      "2050",
      230.576385
    ],
    ...
    [
      "2022",
      55.064106
    ]
  ]
}

And there is some taxonomy data as well - "categories" which don't have a 1:1 mapping to tables. There's over 14,000 of these!

{
  "category_id": "5048336",
  "parent_category_id": "5047484",
  "name": "Table 67.  Employment and Shipments by Industry, and Income and Employment by Region",
  "notes": "",
  "childseries": [
    "AEO.2023.LNG_LP.ECI_VOS_IDAL_NA_NA_NA_NA_BLNY09DLR.A",
    "AEO.2023.LNG_LP.ECI_VOS_MANF_NA_NA_NA_NA_BLNY09DLR.A",
    "AEO.2023.LNG_LP.ECI_VOS_NA_NA_NA_NA_NA_BLNY09DLR.A",
    "AEO.2023.LNG_LP.ECI_VOS_NIND_NA_NA_NA_NA_BLNY09DLR.A",
    "AEO.2023.LNG_LP.ECI_VOS_NMFG_NA_NA_NA_NA_BLNY09DLR.A"
  ]
}

If you dig in to the tree of categories, you find:

  • one category per case - "Fast Builds Plus High LNG Price", "High LNG Price", "High Oil Price", etc. There are 20 cases.
  • one category per subject, per case - "Electric Power Sector", "Macroeconomic", etc. 17 subjects * 20 cases = 340 subject/case combos.
  • one category per table per subject/case combo that is relevant to the table; some tables have extra dimensions that shows up in the table name. There are 68 tables. Some examples:
    • Table 21 doesn't have another dimension:
      • 17x "Table 21. Residential Sector Equipment Stock and Efficiency, and Distributed Generation"
    • Table 2 - there are two other dimensions: unit + geography
      • 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), East North Central"
      • 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), East South Central"
      • 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), Middle Atlantic"
      • 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), Mountain"
      • 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), New England"
      • 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), Pacific"
      • 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), South Atlantic"
      • 200x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), United States"
      • 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), West North Central"
      • 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), West South Central"

I think what we do is:

  • create a taxonomy that links the individual category records to:

    • the tables they belong to
    • the case, subject, and other dimensions that should be applied to all of its child series
  • for each category that has child series, turn the child series into a series which is indexed by case/subject/datetime/etc.:

    Something like...

    start_dt end_dt last_updated case subject factoid value unit geography
    2022-01-01T00:00:00 2023-01-01T00:00:00 2024-02-06T13:17:14 High LNG Price Coal Supply and Prices very_long_factoid_name 123.456 GW West South Central

    And add those rows into the table.

  • Write those tables out! Hooray.

First one seems like it is the tricky part. If we get the data structures representing the tables, categories, and additional dimensions right, the rest should follow. I think this would probably take 25h of work or so - a bunch of string mapping, some tree munging. Then turning the child series into data + writing it out should be ~10h. With some contingency for unforeseen weirdness, we could say the whole project takes 40-50h.

If we cut scope to only integrate a few of the tables, that could save some time - 10-20 hours. We will still need to write the code to turn the manual string mappings into a taxonomy, and something that can turn the taxonomy + data into a set of tables. We may be able to skip some of the more complicated additional dimension work, depending on the tables.

Tasks

Preview Give feedback
  1. eia930 eiaaeo gridlab metadata nrelatb
    e-belfer
  2. eiaaeo gridlab new-data zenodo
    e-belfer
@zaneselvans zaneselvans added the new-data Requests for integration of new data. label Feb 6, 2024
@zaneselvans zaneselvans added eiaaeo EIA Annual Energy Outlook gridlab Work related to open modeling input data integration funded/coordinated by GridLab labels Mar 14, 2024
@zaneselvans zaneselvans changed the title Integrating EIA Annual Energy Outlook into raw Integrate raw EIA Annual Energy Outlook data Mar 14, 2024
@zaneselvans zaneselvans moved this from New to Backlog in Catalyst Megaproject Mar 15, 2024
@jdangerx jdangerx changed the title Integrate raw EIA Annual Energy Outlook data Extract AEO Table 54 into a dataframe Mar 26, 2024
@jdangerx jdangerx moved this from Backlog to In progress in Catalyst Megaproject Mar 26, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in Catalyst Megaproject Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eiaaeo EIA Annual Energy Outlook gridlab Work related to open modeling input data integration funded/coordinated by GridLab new-data Requests for integration of new data.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants