-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genie DAG import scripts #1201
base: master
Are you sure you want to change the base?
Genie DAG import scripts #1201
Conversation
50ceb72
to
c2f5996
Compare
c2f5996
to
fbf5f4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review covers first file only, but most comments apply to all scripts
additional scripts will be reviewed in a second review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additional issues found in remaining files (part 2 of review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good for first release --
couple comments regarding logging messages, mostly just clarifying Source/Dest db.
For future work/improvements, need to make sure server is inline with what the scripts expect (esp. in this SSHOperator mode of running)
Also suggest (maybe this is already in the DAG repo) -- adding a README.md for the DAG workflow and steps. What is "wrapper" logic versus tooling logic and expected frequency of updates/tweaks.
thanks @averyniceday, responding below:
Can you elaborate on what you mean by this?
I'd prefer not to document the DAG steps because Airflow already generates a DAG diagram which will automatically reflect any changes to the tasks/task order. The tasks are also documented in the DAG file and in the scripts themselves. What do you mean by tooling logic? |
This PR adds scripts for the Genie import DAG:
dags/import_genie_dag.py
: DAG which handles the genie importgenie-airflow-clone-db.sh
: Drop tables in non-prod SQL DB, clone prod DBgenie-airflow-setup.sh
: DB check, repo fetch, CDD/Oncotree cache refreshgenie-airflow-import-sql.sh
: Import cancer types into SQL, import from genie-portal column into SQL DBgenie-airflow-import-clickhouse.sh
: Import into Clickhouse DBupdates to
refresh-cdd-oncotree-cache.sh
to use correct URLs for CDD and OncoTreeDISCUSS: Should these scripts be moved to the same repository as the genie import DAG? Should the genie import DAG be moved to this repo? - Discussed with Avery 1/29: will store import_genie_dag in cmo-pipelines