-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add composite_trip
Migration Scripts
#1009
Add composite_trip
Migration Scripts
#1009
Conversation
`run_on_all_deployments` can be used byother migration scripts, causing them to be run on all production deployments It does this by scanning the `nrel-openpath-deploy-configs` repo for a list of deployments and modifying the DB_HOST accordingly in between each time the function is called
""" Trim unnecessary fields from composite trips in the analysis_timeseries_db. The shape of the remaining fields is unchanged. """
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this was ready yesterday afternoon, and it is pressing, I am just going to make the changes now so we can run the script overnight
bin/historical/migrations/_common.py
Outdated
|
||
import emission.core.get_database as edb | ||
|
||
DB_HOST_TEMPLATE = "mongodb://localhost:27017/openpath_prod_REPLACEME" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DB_HOST_TEMPLATE = "mongodb://localhost:27017/openpath_prod_REPLACEME" | |
DB_HOST_TEMPLATE = os.environ["DB_HOST_TEMPLATE"] |
The template for DB_HOST on DocumentDB is very different. It has a different host (not localhost
), and includes user+password
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 947492d
bin/historical/migrations/_common.py
Outdated
'rm -rf nrel-openpath-deploy-configs && ' + | ||
'git clone --no-checkout https://github.com/e-mission/nrel-openpath-deploy-configs.git && ' + | ||
'cd nrel-openpath-deploy-configs && ' + | ||
'git ls-tree -r --name-only HEAD | grep configs/', | ||
shell=True, | ||
capture_output=True, | ||
text=True) | ||
filenames = proc.stdout.replace("configs/", "").split("\n") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this works, and I don't want the perfect to be the enemy of the good, but this seems overly convoluted. Why isn't this just
'rm -rf nrel-openpath-deploy-configs && ' + | |
'git clone --no-checkout https://github.com/e-mission/nrel-openpath-deploy-configs.git && ' + | |
'cd nrel-openpath-deploy-configs && ' + | |
'git ls-tree -r --name-only HEAD | grep configs/', | |
shell=True, | |
capture_output=True, | |
text=True) | |
filenames = proc.stdout.replace("configs/", "").split("\n") | |
'rm -rf nrel-openpath-deploy-configs && ' + | |
'git clone https://github.com/e-mission/nrel-openpath-deploy-configs.git)' | |
filenames = os.listdir('nrel-openpath-deploy-configs') |
It seems much cleaner than capturing stdout and mucking around with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'rm -rf nrel-openpath-deploy-configs && ' + | |
'git clone --no-checkout https://github.com/e-mission/nrel-openpath-deploy-configs.git && ' + | |
'cd nrel-openpath-deploy-configs && ' + | |
'git ls-tree -r --name-only HEAD | grep configs/', | |
shell=True, | |
capture_output=True, | |
text=True) | |
filenames = proc.stdout.replace("configs/", "").split("\n") | |
with tempfile.TemporaryDirectory() as tmpdirname: | |
print('created temporary directory', tmpdirname) | |
proc = subprocess.run('git clone --no-checkout https://github.com/e-mission/nrel-openpath-deploy-configs.git ' + tmpdirname) | |
filenames = os.listdir(tmpdirnme+"/configs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this to avoid downloading the actual files since all we need are the filenames, which are viewable even when --no-checkout
is used
May have been overengineered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it may be something like that (no-checkout) but yes, I thought it was overly complex, and having the full configs might be helpful for some functions (e.g. maybe the function will be sensitive to the variation in the configs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also fixed in 947492d
This commit builds on 1c16800 and: - simplifies the retrieval (via git) to dynamically identify the full list of deployments. Notably it no longer relies on munging command output - supports providing a subset of the deployments via an environment variable It also adds a new function that is lightweight and just prints out the configured URL as a simple template for testing and adapting Testing done: - ran the template function above ``` ./e-mission-py.bash bin/historical/migrations/all_deployments_template.py Connecting to database URLmongodb://localhost:27017/openpath_prod_nc_transit_equity_study Running print_connect_url for ebikegj on DB ebikegj Config file not found, returning a copy of the environment variables instead... Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_ebikegj', 'DB_RESULT_LIMIT': None} Connecting to database URL mongodb://localhost:27017/openpath_prod_ebikegj Connecting to database URLmongodb://localhost:27017/openpath_prod_ebikegj Running print_connect_url for cortezebikes on DB cortezebikes Config file not found, returning a copy of the environment variables instead... Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_cortezebikes', 'DB_RESULT_LIMIT': None} Connecting to database URL mongodb://localhost:27017/openpath_prod_cortezebikes Connecting to database URLmongodb://localhost:27017/openpath_prod_cortezebikes Running print_connect_url for dcebike on DB dcebike Config file not found, returning a copy of the environment variables instead... Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_dcebike', 'DB_RESULT_LIMIT': None} Connecting to database URL mongodb://localhost:27017/openpath_prod_dcebike Connecting to database URLmongodb://localhost:27017/openpath_prod_dcebike Running print_connect_url for washingtoncommons on DB washingtoncommons Config file not found, returning a copy of the environment variables instead... Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_washingtoncommons', 'DB_RESULT_LIMIT': None} ``` - ran the template function above with an overridden `PROD_LIST`. Note that the first two entries are from final values set to the DB_HOST variable before the program terminated previously. We may want to unset the variable before the program ends ``` $ export PROD_LIST='cortezebikes,fortmorgan' $ ./e-mission-py.bash bin/historical/migrations/all_deployments_template.py Config file not found, returning a copy of the environment variables instead... Retrieved config: {'DB_HOST': 'mongodb://localhost/openpath_prod_nrel_commute', 'DB_RESULT_LIMIT': None} Connecting to database URL mongodb://localhost/openpath_prod_nrel_commute PROD_LIST: ['cortezebikes', 'fortmorgan'] Running print_connect_url for cortezebikes on DB cortezebikes Config file not found, returning a copy of the environment variables instead... Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_cortezebikes', 'DB_RESULT_LIMIT': None} Connecting to database URL mongodb://localhost:27017/openpath_prod_cortezebikes Connecting to database URLmongodb://localhost:27017/openpath_prod_cortezebikes Running print_connect_url for fortmorgan on DB fortmorgan Config file not found, returning a copy of the environment variables instead... Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_fortmorgan', 'DB_RESULT_LIMIT': None} Connecting to database URL mongodb://localhost:27017/openpath_prod_fortmorgan Connecting to database URLmongodb://localhost:27017/openpath_prod_fortmorgan ```
I stayed up until 2am-ish last night so I will merge and run this tomorrow, when I will be more careful. |
add _common.py with run_on_all_deployments for migration scripts
run_on_all_deployments
can be used byother migration scripts, causing them to be run on all production deploymentsIt does this by scanning the
nrel-openpath-deploy-configs
repo for a list of deployments and modifying the DB_HOST accordingly in between each time the function is calledadd trim_fluff_from_composite_trips.py
"""
Trim unnecessary fields from composite trips in the analysis_timeseries_db.
The shape of the remaining fields is unchanged.
"""