Clean up some nightly build infrastructure cruft (#3962)

* Bump docker container to mamba v2.0.3 * Warn against using nightly build outputs directly for Zenodo. * Remove obsolete refs to GCE_INSTANCE in nightly build workflow. * Update RTD backend link to new interface. * Update nightly build docs to reflect use of Google Batch. * Bump a couple of non-python dependency versions. * Simplify nightly/stable/workflow_dispatch logic in nightly build script. * Make VCE RARE row count asset check non-blocking for fast ETL testing * Add test distribution of parquet and other outputs. * Use BUILD_ID as test distribution path to ensure uniqueness. * Discontinue parquet distribution. Remove test distribution files. * Remove AWS CLI commands and use gcloud storage instead. * Add AWS credentials from envvars * Create ~/.aws directory before attempting to write credentials * Remove dangling && from now separate commands in build script. * Remove AWS S3 access test. * Don't && the removal of existing paths, in case it isn't there. * Fix source path for AWS S3 distribution. * Remove all testing shortcuts and revert to FULL ETL. * Remove unnecessary copy_to_dist_path function * Use more specific verstion tag matching pattern. * Use more specific version tag matching pattern. * Remove unnecessary conditional in stable deployment * Use more generous timeouts/retries in Zenodo data release script * Relock dependencies. * Switch to new Slack GitHub Action syntax. * Switch to using postgres 17 and fast ETL to run a quick test deployment. * Use postgres 16 since 17 isn't yet available in our Docker image sources. * Update comment about postgres version. * Use Ubuntu 24.04 micromamba image. * Go back to doing full ETL after Postgres 16 test. * Re-lock dependencies * Remove jq, use envvar for PG_VERSION, test fast ETL. * Add a little workflow to test pattern matching. * Fix typo in regex-test workflow. * Use a more restrictive tag matching pattern. * Use a more specific tag pattern to trigger data releases. * Revert to a simple version tag pattern v20* * revert to running full ETL. * Relock dependencies
catalyst-cooperative · Nov 20, 2024 · f3cdf14 · f3cdf14
1 parent aaaabfc
commit f3cdf14
Show file tree

Hide file tree

Showing 11 changed files with 419 additions and 546 deletions.
diff --git a/.github/ISSUE_TEMPLATE/versioned_release.md b/.github/ISSUE_TEMPLATE/versioned_release.md
@@ -24,7 +24,7 @@ assignees: ""
 - [ ] Verify [`catalystcoop.pudl` PyPI (software) release](https://pypi.org/project/catalystcoop.pudl/)
 - [ ] Verify that [PUDL repo archive on Zenodo](https://zenodo.org/doi/10.5281/zenodo.3404014) has been updated w/ new version
 - [ ] Wait 6-10 hours for a successful build to complete
-- [ ] Activate new version on the [RTD admin panel](https://readthedocs.org/projects/catalystcoop-pudl/versions/) and verify that it builds successfully.
+- [ ] Activate new version on the [RTD admin panel](https://app.readthedocs.org/projects/catalystcoop-pudl/) and verify that it builds successfully.
 - [ ] Verify that `stable` and the version tag point at same git ref
 - [ ] Verify that [`stable` docs on RTD](https://catalystcoop-pudl.readthedocs.io/en/stable/) have been updated
 - [ ] Verify `gs://pudl.catalyst.coop/vYYYY.M.x` has the new expected data.

diff --git a/.github/workflows/build-deploy-pudl.yml b/.github/workflows/build-deploy-pudl.yml
@@ -1,3 +1,4 @@
+---
 name: build-deploy-pudl
 on:
   workflow_dispatch:
@@ -11,8 +12,6 @@ on:
 
 env:
   GCP_BILLING_PROJECT: ${{ secrets.GCP_BILLING_PROJECT }}
-  GCE_INSTANCE: pudl-deployment-tag # This is changed to pudl-deployment-dev if running on a schedule
-  GCE_INSTANCE_ZONE: ${{ secrets.GCE_INSTANCE_ZONE }}
   GCS_OUTPUT_BUCKET: gs://builds.catalyst.coop
   BATCH_JOB_JSON: batch_job.json
 
@@ -24,12 +23,6 @@ jobs:
       contents: write
       id-token: write
     steps:
-      - name: Use pudl-deployment-dev vm if running on a schedule
-        if: ${{ (github.event_name == 'schedule') }}
-        run: |
-          echo "This action was triggered by a schedule."
-          echo "GCE_INSTANCE=pudl-deployment-dev" >> $GITHUB_ENV
-
       - name: Checkout Repository
         uses: actions/checkout@v4
         with:
@@ -56,7 +49,6 @@ jobs:
       - name: Show freshly set envvars
         if: ${{ env.SKIP_BUILD != 'true' }}
         run: |
-          echo "GCE_INSTANCE: $GCE_INSTANCE"
           echo "NIGHTLY_TAG: $NIGHTLY_TAG"
           echo "BUILD_ID: $BUILD_ID"
           echo "BATCH_JOB_ID: $BATCH_JOB_ID"
@@ -140,8 +132,6 @@ jobs:
             --container-env BUILD_ID=${{ env.BUILD_ID }} \
             --container-env BUILD_REF=${{ github.ref_name }} \
             --container-env FLY_ACCESS_TOKEN=${{ secrets.FLY_ACCESS_TOKEN }} \
-            --container-env GCE_INSTANCE=${{ env.GCE_INSTANCE }} \
-            --container-env GCE_INSTANCE_ZONE=${{ env.GCE_INSTANCE_ZONE }} \
             --container-env GCP_BILLING_PROJECT=${{ secrets.GCP_BILLING_PROJECT }} \
             --container-env GITHUB_ACTION_TRIGGER=${{ github.event_name }} \
             --container-env NIGHTLY_TAG=${{ env.NIGHTLY_TAG }} \
@@ -160,13 +150,13 @@ jobs:
         if: ${{ env.SKIP_BUILD != 'true' }}
         run: gcloud batch jobs submit run-etl-${{ env.BATCH_JOB_ID }} --config ${{ env.BATCH_JOB_JSON }} --location us-west1
 
-      - name: Post to a pudl-deployments channel
+      - name: Post to pudl-deployments channel
         if: always()
         id: slack
         uses: slackapi/slack-github-action@v2
         with:
-          channel-id: "C03FHB9N0PQ"
-          slack-message: "`${{ env.BUILD_ID }}` build-deploy-pudl status: ${{ (env.SKIP_BUILD == 'true') && 'skipped' || job.status }}\n${{ env.GCS_OUTPUT_BUCKET }}/${{ env.BUILD_ID }}"
-        env:
-          channel-id: "C03FHB9N0PQ"
-          SLACK_BOT_TOKEN: ${{ secrets.PUDL_DEPLOY_SLACK_TOKEN }}
+          method: chat.postMessage
+          token: ${{ secrets.PUDL_DEPLOY_SLACK_TOKEN }}
+          payload: |
+            text: "`${{ env.BUILD_ID }}` build-deploy-pudl status: ${{ (env.SKIP_BUILD == 'true') && 'skipped' || job.status }}\n${{ env.GCS_OUTPUT_BUCKET }}/${{ env.BUILD_ID }}"
+            channel: "C03FHB9N0PQ"
diff --git a/devtools/zenodo/zenodo_data_release.py b/devtools/zenodo/zenodo_data_release.py
@@ -87,24 +87,26 @@ def __init__(self, env: str):
 
         logger.info(f"Using Zenodo token: {token[:4]}...{token[-4:]}")
 
-    def retry_request(self, *, method, url, max_tries=5, timeout=5, **kwargs):
+    def retry_request(self, *, method, url, max_tries=6, timeout=2, **kwargs):
         """Wrap requests.request in retry logic.
 
         Passes method, url, and **kwargs to requests.request.
         """
-        base_timeout = 2
         for try_num in range(1, max_tries):
             try:
                 return requests.request(
-                    method=method, url=url, timeout=timeout, **kwargs
+                    method=method, url=url, timeout=timeout**try_num, **kwargs
                 )
             except requests.RequestException as e:
-                timeout = base_timeout**try_num
-                logger.warning(f"Attempt #{try_num} Got {e}, retrying in {timeout} s")
-                time.sleep(timeout)
+                logger.warning(
+                    f"Attempt #{try_num} Got {e}, retrying in {timeout**try_num} s"
+                )
+                time.sleep(timeout**try_num)
 
         # don't catch errors on the last try.
-        return requests.request(method=method, url=url, timeout=timeout, **kwargs)
+        return requests.request(
+            method=method, url=url, timeout=timeout**max_tries, **kwargs
+        )
 
     def get_deposition(self, deposition_id: int) -> _LegacyDeposition:
         """LEGACY API: Get JSON describing a deposition.
@@ -115,7 +117,6 @@ def get_deposition(self, deposition_id: int) -> _LegacyDeposition:
             method="GET",
             url=f"{self.base_url}/deposit/depositions/{deposition_id}",
             headers=self.auth_headers,
-            timeout=5,
         )
         logger.debug(
             f"License from JSON for {deposition_id} is "
@@ -132,7 +133,6 @@ def get_record(self, record_id: int) -> _NewRecord:
             method="GET",
             url=f"{self.base_url}/records/{record_id}",
             headers=self.auth_headers,
-            timeout=5,
         )
         return _NewRecord(**response.json())
 
@@ -146,7 +146,6 @@ def new_record_version(self, record_id: int) -> _NewRecord:
             method="POST",
             url=f"{self.base_url}/records/{record_id}/versions",
             headers=self.auth_headers,
-            timeout=5,
         )
         return _NewRecord(**response.json())
 
@@ -162,7 +161,7 @@ def update_deposition_metadata(
         data = {"metadata": metadata.model_dump()}
         logger.debug(f"Setting metadata for {deposition_id} to {data}")
         response = self.retry_request(
-            method="PUT", url=url, json=data, headers=self.auth_headers, timeout=5
+            method="PUT", url=url, json=data, headers=self.auth_headers
         )
         return _LegacyDeposition(**response.json())
 
@@ -175,7 +174,6 @@ def delete_deposition_file(self, deposition_id: int, file_id) -> requests.Respon
             method="DELETE",
             url=f"{self.base_url}/deposit/depositions/{deposition_id}/files/{file_id}",
             headers=self.auth_headers,
-            timeout=5,
         )
 
     def create_bucket_file(
@@ -196,7 +194,6 @@ def create_bucket_file(
             url=url,
             headers=self.auth_headers,
             data=file_content,
-            timeout=5,
         )
         return response
 
@@ -206,7 +203,6 @@ def publish_deposition(self, deposition_id: int) -> _LegacyDeposition:
             method="POST",
             url=f"{self.base_url}/deposit/depositions/{deposition_id}/actions/publish",
             headers=self.auth_headers,
-            timeout=5,
         )
         return _LegacyDeposition(**response.json())
 
@@ -375,7 +371,10 @@ def get_html_url(self):
     required=True,
     help="Path to a directory whose contents will be uploaded to Zenodo. "
     "Subdirectories are ignored. Can get files from GCS as well - just prefix "
-    "with gs://.",
+    "with gs://. NOTE: nightly build outputs are NOT suitable for creating a Zenodo "
+    "data release, as they include hundreds of individual Parquet files, which we "
+    "archive on Zenodo as a single zipfile. Check what files should actually be "
+    "distributed. E.g. it may be *.log *.zip *.json ",
 )
 @click.option(
     "--publish/--no-publish",

diff --git a/docker/Dockerfile b/docker/Dockerfile
@@ -1,4 +1,4 @@
-FROM mambaorg/micromamba:2.0.2
+FROM mambaorg/micromamba:2.0.3-ubuntu24.04
 
 ENV CONTAINER_HOME=/home/$MAMBA_USER
 ENV PGDATA=${CONTAINER_HOME}/pgdata
@@ -8,10 +8,9 @@ USER root
 SHELL [ "/bin/bash", "-exo", "pipefail", "-c" ]
 
 # Install some linux packages
-# awscli requires unzip, less, groff and mandoc
 # hadolint ignore=DL3008
 RUN apt-get update && \
-    apt-get install --no-install-recommends -y git jq unzip less groff mandoc postgresql && \
+    apt-get install --no-install-recommends -y git postgresql && \
     apt-get clean && \
     rm -rf /var/lib/apt/lists/*
 
@@ -23,10 +22,13 @@ RUN printf '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg
 # hadolint ignore=DL3059
 RUN usermod -aG postgres "$MAMBA_USER"
 
+# We use an enviroment variable to set the Postgres version because it is also used in
+# the nightly build script and this makes it easier to ensure they are all the same.
+# Remember to bump the Postgres version. Postgres 17 was released in September, 2024.
+ENV PG_VERSION=16
 # Create new cluster for Dagster usage that's owned by $MAMBA_USER.
-# When the PG major version changes we'll have to update this from 15 to 16
 # hadolint ignore=DL3059
-RUN pg_createcluster 15 dagster -u "$MAMBA_USER" -- -A trust
+RUN pg_createcluster ${PG_VERSION} dagster -u "$MAMBA_USER" -- -A trust
 
 # Switch back to being non-root user and get into the home directory
 USER $MAMBA_USER
@@ -62,13 +64,6 @@ COPY --chown=${MAMBA_USER}:${MAMBA_USER} . ${PUDL_REPO}
 ENV LD_LIBRARY_PATH=${CONDA_PREFIX}/lib
 RUN ${CONDA_RUN} pip install --no-cache-dir --no-deps --editable ${PUDL_REPO}
 
-# Install awscli2
-# Change back to root because the install script needs access to /usr/local/aws-cli
-# curl commands run within conda environment because curl is installed by conda.
-USER root
-RUN ${CONDA_RUN} bash -c 'curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && unzip awscliv2.zip && ./aws/install'
-USER $MAMBA_USER
-
 # Install flyctl
 # hadolint ignore=DL3059
 RUN ${CONDA_RUN} bash -c 'curl -L https://fly.io/install.sh | sh'