This repository has been archived by the owner on Aug 4, 2023. It is now read-only.
Tags: WordPress/openverse-catalog
Tags
RDS Snapshot rotation DAG (#904) * Add first pass a db snapshot rotation DAG * Add unit tests * Fix DAG documentation * Add db snapshots DAG to parsing test * Add missing attributes to DAG * Fix DAG_ID Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu> * Fix template variable * Remove redundant parameter * Update openverse_catalog/dags/maintenance/rotate_db_snapshots.py Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu> * Use Airflow template strings to get variables Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu> * Fix dag name * Sort describe snapshots return value (just to make sure) Also fixes the usage of `describe_db_snapshots` to retrieve the actual list of snapshots on the pagination object. * Lint generated DAG file Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu>
Break load_from_s3 into separate tasks to fix duplicate reporting (#914) * Separate load_from_s3 from upsert_data step * Fix load_local_data_to_intermediate_table * Remove clean step from load_local_data_to_intermediate_table As far as I can tell this method is tested but never used anywhere. This should mirror `load_s3_data_to_intermediate_table` and only handle the loading steps, not the cleaning steps. If this method *is* used elsewhere, it will need to be updated to call the cleaning steps separately. * Fix tests * Also separate out the clean data step * Clarify 'load_timeout' to 'upsert_timeout' * Extend Smithsonian upsert timeout * Add types to the clean_intermediate_table_data function * Add NMNHANTHRO Smithsonian subprovider
Break load_from_s3 into separate tasks to fix duplicate reporting (#914) * Separate load_from_s3 from upsert_data step * Fix load_local_data_to_intermediate_table * Remove clean step from load_local_data_to_intermediate_table As far as I can tell this method is tested but never used anywhere. This should mirror `load_s3_data_to_intermediate_table` and only handle the loading steps, not the cleaning steps. If this method *is* used elsewhere, it will need to be updated to call the cleaning steps separately. * Fix tests * Also separate out the clean data step * Clarify 'load_timeout' to 'upsert_timeout' * Extend Smithsonian upsert timeout * Add types to the clean_intermediate_table_data function * Add NMNHANTHRO Smithsonian subprovider
Bump apache-airflow[amazon,http,postgres] from 2.4.1 to 2.4.2 (#842) * Bump apache-airflow[amazon,http,postgres] from 2.4.1 to 2.4.2 Bumps [apache-airflow[amazon,http,postgres]](https://github.com/apache/airflow) from 2.4.1 to 2.4.2. - [Release notes](https://github.com/apache/airflow/releases) - [Changelog](https://github.com/apache/airflow/blob/main/RELEASE_NOTES.rst) - [Commits](apache/airflow@2.4.1...2.4.2) --- updated-dependencies: - dependency-name: apache-airflow[amazon,http,postgres] dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * Add expose_stacktrace option to dockerfile Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu> Co-authored-by: Olga Bulat <obulat@gmail.com>
Refactor Freesound to use ProviderDataIngester (#746) * Initial refactor steps * Move functions * Refactor other functions * Update tests * Add class to workflow list * Remove TODO note * Update DAGs.md * Fixes identified in PR review Co-authored-by: Staci Cooper <63313398+stacimc@users.noreply.github.com> * Simplify some logic * Remove per-license selection logic * Simplify URL acquisition & tests Co-authored-by: Staci Cooper <63313398+stacimc@users.noreply.github.com>
PreviousNext