Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* cleaning and temp table in pg * sketch of full dag NOT TESTED * inaturalist dag without tests or reporting (yet) * complete dag, 25 mill recs in 5.5 hours local test * Add passwords for s3 testing with new docker * make temp loading table UNLOGGED to load it faster * inat with translation 75 million recs in 8 hrs * using OUTPUT_DIR for API files * clarify delayed requester vs requester * DRYer approach to tags TO DO * comments on taxa transformation * scientific names not ids for manual translation * TO DO comment clean-up * fix name insert syntax * Merge 'main' into feature/inaturalist-performance * add clarity on batch limit override * missing piece of merge from main * limit to 20 tags per photo * add option to use alternate dag creation for sql * adjust tests see issue #898 * slightly faster way to pull medium test sample * Note another data source for vernacular names * remove unnecessary test code * clean and upsert one batch at a time * log parsing resource doc * use common.constants.IMAGE instead of MEDIA_TYPE * add explanation of ancestry joins and taxa tags * use existing clean_intermediate_table_data * remove unnecessary env vars from load_to_s3 * declarative doc string for file update check * update iNaturalist description * remove message to Staci :) * use dynamically generated load subtasks * clarify taxa comments and include languages * consolidate consolidation code * add testing for consolidated metrics * separate ti_mock instances per test * test get batches * shorter titles to save space * add better testing instructions * dag parameter to manage post-ingestion deletions * Add kwargs to get_response_json call * get_media_type can be static method Co-authored-by: Krystle Salazar <[email protected]> * link to original inaturalist photo, rather than medium Co-authored-by: Krystle Salazar <[email protected]> * prefer creator name over login * remove unused constants * add to do for extension cleanup Co-authored-by: Madison Swain-Bowden <[email protected]> Co-authored-by: Krystle Salazar <[email protected]>
- Loading branch information