You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add the new Zenodo archive DOI values to pudl/workspace/datastore.py.
Run the datastore script to download the new data: pudl_datastore --dataset eia923. The new raw data will appear in pudl_input/eia923/<ZENODO_DOI>/...
Update the information in pudl/package_data/eia923 if necessary:
file maps (remove _Early_Release suffix!)
column maps (probably the same)
page maps (probably the same)
skip footer (probably the same)
skip rows (the early release data has an extra row that we skip. Now that we have the final release, we don't need to skip that row. Subtract 1 from the skip rows for the year. It should probably look like the rest of the years--if it's 0, leave it as 0.)
Launch dagit and refresh the code location (run in your terminaldagster-webserver -m pudl.etl and then open http://127.0.0.1:3000/locations/pudl.etl/jobs/etl_full in a browser)
Materialize the raw_eia923 asset group. Look out for warnings in the logs about missing or extra columns. If they appear, check and update the package_data accordingly.
Materialize the _core_eia923 asset group. Look out for warnings and fix accordingly.
Materialize the norm_eia and then denorm_eia asset groups. You'll probably see some errors related to encoding. Take a look at which column it's talking about and look at metadata/resources/eia.py to see which encoder in CODE_METADATA to tweak.
Update the validation test test_minmax_rows in test/validate/eia_test.py. Sometimes it helps to just run the test (pytest test/validate/eia_test.py::test_minmax_rows) in the terminal because it will print out how many rows it found vs. how many it expected and you can put the found rows into the code so they become expected rows. Make sure none of the rows have less rows than before. Also make sure none of the row changes are unexpectedly large.
Test table outputs in a notebook to make sure expected dates appear
Run tox and troubleshoot what else might be broken! Might include things like:
aesharpe
changed the title
Integrate EIA 923 2022 final release data
Integrate EIA 923 2022 final release data and most recent 923 monthly data
Nov 2, 2023
Last fix here is related to #2448 - Updated the file map to say Final instead of Early Release and actually extracted this raw table (it had been blocked due to issues with the 2018 archive). See #3100
Annual Updates Docs: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/annual_updates.html
pudl/workspace/datastore.py
.pudl_datastore --dataset eia923
. The new raw data will appear inpudl_input/eia923/<ZENODO_DOI>/...
pudl/package_data/eia923
if necessary:dagster-webserver -m pudl.etl
and then openhttp://127.0.0.1:3000/locations/pudl.etl/jobs/etl_full
in a browser)raw_eia923
asset group. Look out for warnings in the logs about missing or extra columns. If they appear, check and update thepackage_data
accordingly._core_eia923
asset group. Look out for warnings and fix accordingly.norm_eia
and thendenorm_eia
asset groups. You'll probably see some errors related to encoding. Take a look at which column it's talking about and look atmetadata/resources/eia.py
to see which encoder inCODE_METADATA
to tweak.test_minmax_rows
intest/validate/eia_test.py
. Sometimes it helps to just run the test (pytest test/validate/eia_test.py::test_minmax_rows
) in the terminal because it will print out how many rows it found vs. how many it expected and you can put the found rows into the code so they become expected rows. Make sure none of the rows have less rows than before. Also make sure none of the row changes are unexpectedly large.tox
and troubleshoot what else might be broken! Might include things like:The text was updated successfully, but these errors were encountered: