You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
jdangerx opened this issue
Feb 6, 2024
· 0 comments
· Fixed by #3538
Assignees
Labels
eiaaeoEIA Annual Energy OutlookgridlabWork related to open modeling input data integration funded/coordinated by GridLabnew-dataRequests for integration of new data.
EIA archives a bulk JSON version of the Annual Energy Outlook (AEO), but does not link to these archives from their bulk data download page. The URLs take the form:
https://www.eia.gov/opendata/bulk/AEOYYY.zip
And they are available from 2014 to 2023. There will be no 2024 AEO, while EIA updates their energy modeling system.
So - each record in this pile is a data series column, basically. There's a "data" field which has indices + corresponding data. Plus some metadata.
And there is some taxonomy data as well - "categories" which don't have a 1:1 mapping to tables. There's over 14,000 of these!
{
"category_id": "5048336",
"parent_category_id": "5047484",
"name": "Table 67. Employment and Shipments by Industry, and Income and Employment by Region",
"notes": "",
"childseries": [
"AEO.2023.LNG_LP.ECI_VOS_IDAL_NA_NA_NA_NA_BLNY09DLR.A",
"AEO.2023.LNG_LP.ECI_VOS_MANF_NA_NA_NA_NA_BLNY09DLR.A",
"AEO.2023.LNG_LP.ECI_VOS_NA_NA_NA_NA_NA_BLNY09DLR.A",
"AEO.2023.LNG_LP.ECI_VOS_NIND_NA_NA_NA_NA_BLNY09DLR.A",
"AEO.2023.LNG_LP.ECI_VOS_NMFG_NA_NA_NA_NA_BLNY09DLR.A"
]
}
If you dig in to the tree of categories, you find:
one category per case - "Fast Builds Plus High LNG Price", "High LNG Price", "High Oil Price", etc. There are 20 cases.
one category per subject, per case - "Electric Power Sector", "Macroeconomic", etc. 17 subjects * 20 cases = 340 subject/case combos.
one category per table per subject/case combo that is relevant to the table; some tables have extra dimensions that shows up in the table name. There are 68 tables. Some examples:
Table 21 doesn't have another dimension:
17x "Table 21. Residential Sector Equipment Stock and Efficiency, and Distributed Generation"
Table 2 - there are two other dimensions: unit + geography
170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), East North Central"
170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), East South Central"
170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), Middle Atlantic"
170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), Mountain"
170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), New England"
170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), Pacific"
170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), South Atlantic"
200x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), United States"
170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), West North Central"
170x "Table 2. Energy Consumption by Sector and Source (quadrillion Btu, unless otherwise noted), West South Central"
I think what we do is:
create a taxonomy that links the individual category records to:
the tables they belong to
the case, subject, and other dimensions that should be applied to all of its child series
for each category that has child series, turn the child series into a series which is indexed by case/subject/datetime/etc.:
Something like...
start_dt
end_dt
last_updated
case
subject
factoid
value
unit
geography
2022-01-01T00:00:00
2023-01-01T00:00:00
2024-02-06T13:17:14
High LNG Price
Coal Supply and Prices
very_long_factoid_name
123.456
GW
West South Central
And add those rows into the table.
Write those tables out! Hooray.
First one seems like it is the tricky part. If we get the data structures representing the tables, categories, and additional dimensions right, the rest should follow. I think this would probably take 25h of work or so - a bunch of string mapping, some tree munging. Then turning the child series into data + writing it out should be ~10h. With some contingency for unforeseen weirdness, we could say the whole project takes 40-50h.
If we cut scope to only integrate a few of the tables, that could save some time - 10-20 hours. We will still need to write the code to turn the manual string mappings into a taxonomy, and something that can turn the taxonomy + data into a set of tables. We may be able to skip some of the more complicated additional dimension work, depending on the tables.
The content you are editing has changed. Please copy your edits and refresh the page.
zaneselvans
added
eiaaeo
EIA Annual Energy Outlook
gridlab
Work related to open modeling input data integration funded/coordinated by GridLab
labels
Mar 14, 2024
zaneselvans
changed the title
Integrating EIA Annual Energy Outlook into raw
Integrate raw EIA Annual Energy Outlook data
Mar 14, 2024
eiaaeoEIA Annual Energy OutlookgridlabWork related to open modeling input data integration funded/coordinated by GridLabnew-dataRequests for integration of new data.
EIA archives a bulk JSON version of the Annual Energy Outlook (AEO), but does not link to these archives from their bulk data download page. The URLs take the form:
https://www.eia.gov/opendata/bulk/AEOYYY.zip
And they are available from 2014 to 2023. There will be no 2024 AEO, while EIA updates their energy modeling system.
So - each record in this pile is a data series column, basically. There's a "data" field which has indices + corresponding data. Plus some metadata.
And there is some taxonomy data as well - "categories" which don't have a 1:1 mapping to tables. There's over 14,000 of these!
If you dig in to the tree of categories, you find:
I think what we do is:
create a taxonomy that links the individual category records to:
for each category that has child series, turn the child series into a series which is indexed by case/subject/datetime/etc.:
Something like...
And add those rows into the table.
Write those tables out! Hooray.
First one seems like it is the tricky part. If we get the data structures representing the tables, categories, and additional dimensions right, the rest should follow. I think this would probably take 25h of work or so - a bunch of string mapping, some tree munging. Then turning the child series into data + writing it out should be ~10h. With some contingency for unforeseen weirdness, we could say the whole project takes 40-50h.
If we cut scope to only integrate a few of the tables, that could save some time - 10-20 hours. We will still need to write the code to turn the manual string mappings into a taxonomy, and something that can turn the taxonomy + data into a set of tables. We may be able to skip some of the more complicated additional dimension work, depending on the tables.
Tasks
eia_aeo
,nrel_atb
andeia930
data source metadata #3473The text was updated successfully, but these errors were encountered: