Extract AEO Table 54 into a dataframe #3368

jdangerx · 2024-02-06T19:03:21Z

EIA archives a bulk JSON version of the Annual Energy Outlook (AEO), but does not link to these archives from their bulk data download page. The URLs take the form:

https://www.eia.gov/opendata/bulk/AEOYYY.zip

And they are available from 2014 to 2023. There will be no 2024 AEO, while EIA updates their energy modeling system.

So - each record in this pile is a data series column, basically. There's a "data" field which has indices + corresponding data. Plus some metadata.

{
  "series_id": "AEO.2023.LNG_HP.CAP_NA_ENUS_NA_TOT_NA_NA_GW.A",
  "name": "Renewable Energy : End-Use Generators : Net Summer Capacity : Total, High LNG Price, AEO2023",
  "units": "GW",
  "f": "A",
  "start": "2021",
  "end": "2050",
  "lastHistoricalPeriod": "2021",
  "last_updated": "2023-05-04T13:04:45-04:00",
  "data": [
    [
      "2050",
      230.576385
    ],
    ...
    [
      "2022",
      55.064106
    ]
  ]
}

And there is some taxonomy data as well - "categories" which don't have a 1:1 mapping to tables. There's over 14,000 of these!

{
  "category_id": "5048336",
  "parent_category_id": "5047484",
  "name": "Table 67.  Employment and Shipments by Industry, and Income and Employment by Region",
  "notes": "",
  "childseries": [
    "AEO.2023.LNG_LP.ECI_VOS_IDAL_NA_NA_NA_NA_BLNY09DLR.A",
    "AEO.2023.LNG_LP.ECI_VOS_MANF_NA_NA_NA_NA_BLNY09DLR.A",
    "AEO.2023.LNG_LP.ECI_VOS_NA_NA_NA_NA_NA_BLNY09DLR.A",
    "AEO.2023.LNG_LP.ECI_VOS_NIND_NA_NA_NA_NA_BLNY09DLR.A",
    "AEO.2023.LNG_LP.ECI_VOS_NMFG_NA_NA_NA_NA_BLNY09DLR.A"
  ]
}

If you dig in to the tree of categories, you find:

one category per case - "Fast Builds Plus High LNG Price", "High LNG Price", "High Oil Price", etc. There are 20 cases.
one category per subject, per case - "Electric Power Sector", "Macroeconomic", etc. 17 subjects * 20 cases = 340 subject/case combos.
one category per table per subject/case combo that is relevant to the table; some tables have extra dimensions that shows up in the table name. There are 68 tables. Some examples:
- Table 21 doesn't have another dimension:
  - 17x "Table 21. Residential Sector Equipment Stock and Efficiency, and Distributed Generation"
- Table 2 - there are two other dimensions: unit + geography
  - 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), East North Central"
  - 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), East South Central"
  - 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), Middle Atlantic"
  - 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), Mountain"
  - 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), New England"
  - 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), Pacific"
  - 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), South Atlantic"
  - 200x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), United States"
  - 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), West North Central"
  - 170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), West South Central"

I think what we do is:

create a taxonomy that links the individual category records to:
- the tables they belong to
- the case, subject, and other dimensions that should be applied to all of its child series

for each category that has child series, turn the child series into a series which is indexed by case/subject/datetime/etc.:

Something like...

start_dt	end_dt	last_updated	case	subject	factoid	value	unit	geography
2022-01-01T00:00:00	2023-01-01T00:00:00	2024-02-06T13:17:14	High LNG Price	Coal Supply and Prices	very_long_factoid_name	123.456	GW	West South Central

And add those rows into the table.

Write those tables out! Hooray.

First one seems like it is the tricky part. If we get the data structures representing the tables, categories, and additional dimensions right, the rest should follow. I think this would probably take 25h of work or so - a bunch of string mapping, some tree munging. Then turning the child series into data + writing it out should be ~10h. With some contingency for unforeseen weirdness, we could say the whole project takes 40-50h.

If we cut scope to only integrate a few of the tables, that could save some time - 10-20 hours. We will still need to write the code to turn the manual string mappings into a taxonomy, and something that can turn the taxonomy + data into a set of tables. We may be able to skip some of the more complicated additional dimension work, depending on the tables.

Tasks

Give feedback

Define eia_aeo, nrel_atb and eia930 data source metadata #3473

eia930 eiaaeo gridlab metadata nrelatb
Archive the EIA AEO bulk data pudl-archiver#293

eiaaeo gridlab new-data zenodo
Options

The text was updated successfully, but these errors were encountered:

github-project-automation bot added this to Catalyst Megaproject Feb 6, 2024

github-project-automation bot moved this to New in Catalyst Megaproject Feb 6, 2024

zaneselvans added the new-data Requests for integration of new data. label Feb 6, 2024

zaneselvans added eiaaeo EIA Annual Energy Outlook gridlab Work related to open modeling input data integration funded/coordinated by GridLab labels Mar 14, 2024

zaneselvans changed the title ~~Integrating EIA Annual Energy Outlook into raw~~ Integrate raw EIA Annual Energy Outlook data Mar 14, 2024

zaneselvans mentioned this issue Mar 14, 2024

Integrate bulk EIA Annual Energy Outlook (AEO) data #3464

Closed

e-belfer assigned jdangerx Mar 15, 2024

zaneselvans moved this from New to Backlog in Catalyst Megaproject Mar 15, 2024

jdangerx changed the title ~~Integrate raw EIA Annual Energy Outlook data~~ Extract AEO Table 54 into a dataframe Mar 26, 2024

jdangerx moved this from Backlog to In progress in Catalyst Megaproject Mar 26, 2024

jdangerx mentioned this issue Apr 3, 2024

Extract AEO Table 54, with bonus 13/15/20. #3538

Merged

jdangerx closed this as completed in #3538 Apr 16, 2024

github-project-automation bot moved this from In progress to Done in Catalyst Megaproject Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract AEO Table 54 into a dataframe #3368

Extract AEO Table 54 into a dataframe #3368

jdangerx commented Feb 6, 2024 •

edited by e-belfer

Loading

Tasks

Extract AEO Table 54 into a dataframe #3368

Extract AEO Table 54 into a dataframe #3368

Comments

jdangerx commented Feb 6, 2024 • edited by e-belfer Loading

Tasks

jdangerx commented Feb 6, 2024 •

edited by e-belfer

Loading