Skip to content

Commit

Permalink
Merge branch 'rename-core-assets' into create-naming-convention-docs
Browse files Browse the repository at this point in the history
  • Loading branch information
bendnorman committed Nov 6, 2023
2 parents 33fab91 + c2af359 commit 50e3eef
Show file tree
Hide file tree
Showing 42 changed files with 686 additions and 423 deletions.
7 changes: 7 additions & 0 deletions .github/workflows/build-deploy-pudl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@ jobs:

# Deploy PUDL image to GCE
- name: Deploy
env:
DAGSTER_PG_PASSWORD: ${{ secrets.DAGSTER_PG_PASSWORD }}
run: |-
gcloud compute instances add-metadata "$GCE_INSTANCE" \
--zone "$GCE_INSTANCE_ZONE" \
Expand All @@ -110,6 +112,11 @@ jobs:
--container-env AWS_ACCESS_KEY_ID=${{ secrets.AWS_ACCESS_KEY_ID }} \
--container-env AWS_SECRET_ACCESS_KEY=${{ secrets.AWS_SECRET_ACCESS_KEY }} \
--container-env AWS_DEFAULT_REGION=${{ secrets.AWS_DEFAULT_REGION }} \
--container-env DAGSTER_PG_USERNAME="postgres" \
--container-env DAGSTER_PG_PASSWORD="$DAGSTER_PG_PASSWORD" \
--container-env DAGSTER_PG_HOST="104.154.182.24" \
--container-env DAGSTER_PG_DB="dagster-storage" \
--container-env PUDL_SETTINGS_YML="/home/catalyst/src/pudl/package_data/settings/etl_full.yml" \
# Start the VM
- name: Start the deploy-pudl-vm
Expand Down
3 changes: 3 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ ENV DAGSTER_HOME=${CONTAINER_PUDL_WORKSPACE}/dagster_home
# Create data input/output directories
RUN mkdir -p ${PUDL_INPUT} ${PUDL_OUTPUT} ${DAGSTER_HOME}

# Copy dagster configuration file
COPY docker/dagster.yaml ${DAGSTER_HOME}/dagster.yaml

# Create a conda environment based on the specification in the repo
COPY test/test-environment.yml test/test-environment.yml
RUN mamba create --copy --prefix ${CONDA_PREFIX} --yes python=${PYTHON_VERSION} && \
Expand Down
12 changes: 12 additions & 0 deletions docker/dagster.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
storage:
postgres:
postgres_db:
username:
env: DAGSTER_PG_USERNAME
password:
env: DAGSTER_PG_PASSWORD
hostname:
env: DAGSTER_PG_HOST
db_name:
env: DAGSTER_PG_DB
port: 5432
1 change: 0 additions & 1 deletion docker/gcp_pudl_etl.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ function run_pudl_etl() {
$PUDL_SETTINGS_YML \
&& pudl_etl \
--loglevel DEBUG \
--max-concurrent 6 \
--gcs-cache-path gs://internal-zenodo-cache.catalyst.coop \
$PUDL_SETTINGS_YML \
&& pytest \
Expand Down
20 changes: 10 additions & 10 deletions docs/data_access.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,42 +88,42 @@ AWS CLI, or programmatically via the S3 API. They can also be downloaded directl
HTTPS using the following links:

* `PUDL SQLite DB <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/pudl.sqlite>`__
* `EPA CEMS Hourly Emissions Parquet (1995-2021) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/core_epacems__hourly_emissions.parquet>`__
* `EPA CEMS Hourly Emissions Parquet (1995-2022) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/hourly_emissions_epacems.parquet>`__
* `Census DP1 SQLite DB (2010) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/censusdp1tract.sqlite>`__

* Raw FERC Form 1:

* `FERC-1 SQLite derived from DBF (1994-2020) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc1.sqlite>`__
* `FERC-1 SQLite derived from XBRL (2021) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc1_xbrl.sqlite>`__
* `FERC-1 SQLite derived from XBRL (2021-2022) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc1_xbrl.sqlite>`__
* `FERC-1 Datapackage (JSON) describing SQLite derived from XBRL <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc1_xbrl_datapackage.json>`__
* `FERC-1 XBRL Taxonomy Metadata as JSON (2021) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc1_xbrl_taxonomy_metadata.json>`__
* `FERC-1 XBRL Taxonomy Metadata as JSON (2021-2022) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc1_xbrl_taxonomy_metadata.json>`__

* Raw FERC Form 2:

* `FERC-2 SQLite derived from DBF (1996-2020) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc2.sqlite>`__
* `FERC-2 SQLite derived from XBRL (2021) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc2_xbrl.sqlite>`__
* `FERC-2 SQLite derived from XBRL (2021-2022) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc2_xbrl.sqlite>`__
* `FERC-2 Datapackage (JSON) describing SQLite derived from XBRL <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc2_xbrl_datapackage.json>`__
* `FERC-2 XBRL Taxonomy Metadata as JSON (2021) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc2_xbrl_taxonomy_metadata.json>`__
* `FERC-2 XBRL Taxonomy Metadata as JSON (2021-2022) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc2_xbrl_taxonomy_metadata.json>`__

* Raw FERC Form 6:

* `FERC-6 SQLite derived from DBF (2000-2020) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc6.sqlite>`__
* `FERC-6 SQLite derived from XBRL (2021) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc6_xbrl.sqlite>`__
* `FERC-6 SQLite derived from XBRL (2021-2022) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc6_xbrl.sqlite>`__
* `FERC-6 Datapackage (JSON) describing SQLite derived from XBRL <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc6_xbrl_datapackage.json>`__
* `FERC-6 XBRL Taxonomy Metadata as JSON (2021) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc6_xbrl_taxonomy_metadata.json>`__
* `FERC-6 XBRL Taxonomy Metadata as JSON (2021-2022) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc6_xbrl_taxonomy_metadata.json>`__

* Raw FERC Form 60:

* `FERC-60 SQLite derived from DBF (2006-2020) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc60.sqlite>`__
* `FERC-60 SQLite derived from XBRL (2021) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc60_xbrl.sqlite>`__
* `FERC-60 SQLite derived from XBRL (2021-2022) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc60_xbrl.sqlite>`__
* `FERC-60 Datapackage (JSON) describing SQLite derived from XBRL <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc60_xbrl_datapackage.json>`__
* `FERC-60 XBRL Taxonomy Metadata as JSON (2021) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc60_xbrl_taxonomy_metadata.json>`__

* Raw FERC Form 714:

* `FERC-714 SQLite derived from XBRL (2021) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc714_xbrl.sqlite>`__
* `FERC-714 SQLite derived from XBRL (2021-2022) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc714_xbrl.sqlite>`__
* `FERC-714 Datapackage (JSON) describing SQLite derived from XBRL <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc714_xbrl_datapackage.json>`__
* `FERC-714 XBRL Taxonomy Metadata as JSON (2021) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc714_xbrl_taxonomy_metadata.json>`__
* `FERC-714 XBRL Taxonomy Metadata as JSON (2021-2022) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/dev/ferc714_xbrl_taxonomy_metadata.json>`__


.. _access-zenodo:
Expand Down
3 changes: 2 additions & 1 deletion docs/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,8 @@ Data Coverage
^^^^^^^^^^^^^

* Updated :doc:`data_sources/eia860` to include early release data from 2022.
* Updated :doc:`data_sources/eia923` to include early release data from 2022.
* Updated :doc:`data_sources/eia923` to include early release data from 2022 and
monthly YTD data as of April 2023.
* Updated :doc:`data_sources/epacems` to switch from the old FTP server to the new
CAMPD API, and to include 2022 data. Due to changes in the ETL, Alaska, Puerto Rico
and Hawaii are now included in CEMS processing. See issue :issue:`1264` & PRs
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
"""add data_maturity to eia923m tables
Revision ID: 1ceb9897fd34
Revises: f11241c9292d
Create Date: 2023-10-26 16:30:33.771381
"""
import sqlalchemy as sa
from alembic import op

# revision identifiers, used by Alembic.
revision = '1ceb9897fd34'
down_revision = 'f11241c9292d'
branch_labels = None
depends_on = None


def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table('boiler_fuel_eia923', schema=None) as batch_op:
batch_op.add_column(sa.Column('data_maturity', sa.Text(), nullable=True, comment='Level of maturity of the data record. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.'))
batch_op.create_foreign_key(batch_op.f('fk_boiler_fuel_eia923_data_maturity_data_maturities'), 'data_maturities', ['data_maturity'], ['code'])

with op.batch_alter_table('denorm_boiler_fuel_eia923', schema=None) as batch_op:
batch_op.add_column(sa.Column('data_maturity', sa.Text(), nullable=True, comment='Level of maturity of the data record. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.'))
batch_op.create_foreign_key(batch_op.f('fk_denorm_boiler_fuel_eia923_data_maturity_data_maturities'), 'data_maturities', ['data_maturity'], ['code'])

with op.batch_alter_table('denorm_boiler_fuel_monthly_eia923', schema=None) as batch_op:
batch_op.add_column(sa.Column('data_maturity', sa.Text(), nullable=True, comment='Level of maturity of the data record. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.'))
batch_op.create_foreign_key(batch_op.f('fk_denorm_boiler_fuel_monthly_eia923_data_maturity_data_maturities'), 'data_maturities', ['data_maturity'], ['code'])

with op.batch_alter_table('denorm_boiler_fuel_yearly_eia923', schema=None) as batch_op:
batch_op.add_column(sa.Column('data_maturity', sa.Text(), nullable=True, comment='Level of maturity of the data record. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.'))
batch_op.create_foreign_key(batch_op.f('fk_denorm_boiler_fuel_yearly_eia923_data_maturity_data_maturities'), 'data_maturities', ['data_maturity'], ['code'])

with op.batch_alter_table('denorm_fuel_receipts_costs_monthly_eia923', schema=None) as batch_op:
batch_op.add_column(sa.Column('data_maturity', sa.Text(), nullable=True, comment='Level of maturity of the data record. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.'))
batch_op.create_foreign_key(batch_op.f('fk_denorm_fuel_receipts_costs_monthly_eia923_data_maturity_data_maturities'), 'data_maturities', ['data_maturity'], ['code'])

with op.batch_alter_table('denorm_fuel_receipts_costs_yearly_eia923', schema=None) as batch_op:
batch_op.add_column(sa.Column('data_maturity', sa.Text(), nullable=True, comment='Level of maturity of the data record. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.'))
batch_op.create_foreign_key(batch_op.f('fk_denorm_fuel_receipts_costs_yearly_eia923_data_maturity_data_maturities'), 'data_maturities', ['data_maturity'], ['code'])

with op.batch_alter_table('denorm_generation_fuel_combined_monthly_eia923', schema=None) as batch_op:
batch_op.add_column(sa.Column('data_maturity', sa.Text(), nullable=True, comment='Level of maturity of the data record. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.'))
batch_op.create_foreign_key(batch_op.f('fk_denorm_generation_fuel_combined_monthly_eia923_data_maturity_data_maturities'), 'data_maturities', ['data_maturity'], ['code'])

with op.batch_alter_table('denorm_generation_fuel_combined_yearly_eia923', schema=None) as batch_op:
batch_op.add_column(sa.Column('data_maturity', sa.Text(), nullable=True, comment='Level of maturity of the data record. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.'))
batch_op.create_foreign_key(batch_op.f('fk_denorm_generation_fuel_combined_yearly_eia923_data_maturity_data_maturities'), 'data_maturities', ['data_maturity'], ['code'])

with op.batch_alter_table('denorm_generation_monthly_eia923', schema=None) as batch_op:
batch_op.add_column(sa.Column('data_maturity', sa.Text(), nullable=True, comment='Level of maturity of the data record. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.'))
batch_op.create_foreign_key(batch_op.f('fk_denorm_generation_monthly_eia923_data_maturity_data_maturities'), 'data_maturities', ['data_maturity'], ['code'])

with op.batch_alter_table('denorm_generation_yearly_eia923', schema=None) as batch_op:
batch_op.add_column(sa.Column('data_maturity', sa.Text(), nullable=True, comment='Level of maturity of the data record. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.'))
batch_op.create_foreign_key(batch_op.f('fk_denorm_generation_yearly_eia923_data_maturity_data_maturities'), 'data_maturities', ['data_maturity'], ['code'])

with op.batch_alter_table('denorm_plants_utilities_eia', schema=None) as batch_op:
batch_op.add_column(sa.Column('data_maturity', sa.Text(), nullable=True, comment='Level of maturity of the data record. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.'))
batch_op.create_foreign_key(batch_op.f('fk_denorm_plants_utilities_eia_data_maturity_data_maturities'), 'data_maturities', ['data_maturity'], ['code'])

# ### end Alembic commands ###


def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table('denorm_plants_utilities_eia', schema=None) as batch_op:
batch_op.drop_constraint(batch_op.f('fk_denorm_plants_utilities_eia_data_maturity_data_maturities'), type_='foreignkey')
batch_op.drop_column('data_maturity')

with op.batch_alter_table('denorm_generation_yearly_eia923', schema=None) as batch_op:
batch_op.drop_constraint(batch_op.f('fk_denorm_generation_yearly_eia923_data_maturity_data_maturities'), type_='foreignkey')
batch_op.drop_column('data_maturity')

with op.batch_alter_table('denorm_generation_monthly_eia923', schema=None) as batch_op:
batch_op.drop_constraint(batch_op.f('fk_denorm_generation_monthly_eia923_data_maturity_data_maturities'), type_='foreignkey')
batch_op.drop_column('data_maturity')

with op.batch_alter_table('denorm_generation_fuel_combined_yearly_eia923', schema=None) as batch_op:
batch_op.drop_constraint(batch_op.f('fk_denorm_generation_fuel_combined_yearly_eia923_data_maturity_data_maturities'), type_='foreignkey')
batch_op.drop_column('data_maturity')

with op.batch_alter_table('denorm_generation_fuel_combined_monthly_eia923', schema=None) as batch_op:
batch_op.drop_constraint(batch_op.f('fk_denorm_generation_fuel_combined_monthly_eia923_data_maturity_data_maturities'), type_='foreignkey')
batch_op.drop_column('data_maturity')

with op.batch_alter_table('denorm_fuel_receipts_costs_yearly_eia923', schema=None) as batch_op:
batch_op.drop_constraint(batch_op.f('fk_denorm_fuel_receipts_costs_yearly_eia923_data_maturity_data_maturities'), type_='foreignkey')
batch_op.drop_column('data_maturity')

with op.batch_alter_table('denorm_fuel_receipts_costs_monthly_eia923', schema=None) as batch_op:
batch_op.drop_constraint(batch_op.f('fk_denorm_fuel_receipts_costs_monthly_eia923_data_maturity_data_maturities'), type_='foreignkey')
batch_op.drop_column('data_maturity')

with op.batch_alter_table('denorm_boiler_fuel_yearly_eia923', schema=None) as batch_op:
batch_op.drop_constraint(batch_op.f('fk_denorm_boiler_fuel_yearly_eia923_data_maturity_data_maturities'), type_='foreignkey')
batch_op.drop_column('data_maturity')

with op.batch_alter_table('denorm_boiler_fuel_monthly_eia923', schema=None) as batch_op:
batch_op.drop_constraint(batch_op.f('fk_denorm_boiler_fuel_monthly_eia923_data_maturity_data_maturities'), type_='foreignkey')
batch_op.drop_column('data_maturity')

with op.batch_alter_table('denorm_boiler_fuel_eia923', schema=None) as batch_op:
batch_op.drop_constraint(batch_op.f('fk_denorm_boiler_fuel_eia923_data_maturity_data_maturities'), type_='foreignkey')
batch_op.drop_column('data_maturity')

with op.batch_alter_table('boiler_fuel_eia923', schema=None) as batch_op:
batch_op.drop_constraint(batch_op.f('fk_boiler_fuel_eia923_data_maturity_data_maturities'), type_='foreignkey')
batch_op.drop_column('data_maturity')

# ### end Alembic commands ###
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
"""demand_hourly_pa_ferc714.report_date can't be null
Revision ID: 3313ca078f4e
Revises: 1ceb9897fd34
Create Date: 2023-11-02 15:48:50.477585
"""
import sqlalchemy as sa
from alembic import op

# revision identifiers, used by Alembic.
revision = '3313ca078f4e'
down_revision = '1ceb9897fd34'
branch_labels = None
depends_on = None


def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table('demand_hourly_pa_ferc714', schema=None) as batch_op:
batch_op.alter_column('report_date',
existing_type=sa.DATE(),
nullable=False)

# ### end Alembic commands ###


def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table('demand_hourly_pa_ferc714', schema=None) as batch_op:
batch_op.alter_column('report_date',
existing_type=sa.DATE(),
nullable=True)

# ### end Alembic commands ###
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ dependencies = [
"coloredlogs>=14.0,<15.1", # Dagster requires 14.0
"dagster-webserver>=1.4,<1.6",
"dagster>=1.4,<1.6",
"dagster-postgres>=0.21.5,<0.21.6",
"dask>=2022.5,<2023.10.2",
"datapackage>=1.11,<1.16", # Transition datastore to use frictionless.
"email-validator>=1.0.3", # pydantic[email]
Expand Down
3 changes: 2 additions & 1 deletion src/pudl/analysis/allocate_gen_fuel.py
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,8 @@ def allocate_gen_fuel_by_generator_energy_source(
# Add any startup energy source codes to the list of energy source codes
gens_at_freq = adjust_msw_energy_source_codes(gens_at_freq, gf, bf)
gens_at_freq = add_missing_energy_source_codes_to_gens(gens_at_freq, gf, bf)
# do the association!
# do the association! --> this step is where a small no. of plants are dropped for
# an unknown reason. Investigate in issue #2978.
gen_assoc = associate_generator_tables(
gens=gens_at_freq, gf=gf, gen=gen, bf=bf, bga=bga
)
Expand Down
4 changes: 4 additions & 0 deletions src/pudl/extract/eia923.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,10 @@ def process_raw(self, df, page, **partition):
if col in df.columns:
df = remove_leading_zeros_from_numeric_strings(df=df, col_name=col)
df = self.add_data_maturity(df, page, **partition)
# Fill in blank reporting_frequency_code for monthly data
df.loc[
df["data_maturity"] == "incremental_ytd", "reporting_frequency_code"
] = "M"
# the 2021 early release data had some ding dang "."'s and nulls in the year column
if "report_year" in df.columns:
mask = (df.report_year == ".") | df.report_year.isnull()
Expand Down
11 changes: 10 additions & 1 deletion src/pudl/extract/excel.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

import dbfread
import pandas as pd
import regex as re
from dagster import (
AssetsDefinition,
DynamicOut,
Expand Down Expand Up @@ -200,10 +201,18 @@ def add_data_maturity(self, df: pd.DataFrame, page, **partition) -> pd.DataFrame
``self.cols_added``.
"""
maturity = "final"
if "early_release" in self.excel_filename(page, **partition).lower():
file_name = self.excel_filename(page, **partition)
if "early_release" in file_name.lower():
maturity = "provisional"
elif self._dataset_name == "eia860m":
maturity = "monthly_update"
elif "EIA923_Schedules_2_3_4_5_M_" in file_name:
release_month = re.search(
r"EIA923_Schedules_2_3_4_5_M_(\d{2})",
file_name,
).group(1)
if release_month != "12":
maturity = "incremental_ytd"
df = df.assign(data_maturity=maturity)
self.cols_added.append("data_maturity")
return df
Expand Down
Loading

0 comments on commit 50e3eef

Please sign in to comment.