-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update documentation around migrating data #6214
Conversation
Most importantly, make it clear that CommCare needs to be stopped prior to wiping/rebuilding data, and that the correct revision of CommCare needs to be used when rebuilding/loading in data.
|
||
.. code-block:: | ||
|
||
$ cchq <env_name> service commcare stop | ||
$ cchq <env_name> downtime start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(What's the difference between commcare stop
and downtime start
? When would one use one versus the other?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. This may not actually be necessary to change. Both commands operate on COMMCARE_INVENTORY_GROUPS, and stop those services using supervisorctl. The downtime
command also involves creating a record of downtime in datadog which is obviously unnecessary in this case, so I can switch it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the downtime
command does give the user the option to wait for all processes to be killed whereas service stop might take a bit of time, but by the time they run through the steps to wipe blobdb and es, Celery should be complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah on second thought, I like downtime for that reason in particular, and updated the instructions to select the kill option when prompted in b7b46e6
@@ -147,6 +147,8 @@ before importing data from the old environment. | |||
4. Import the data to the new environment | |||
----------------------------------------- | |||
|
|||
* Ensure you are running the following steps from a release created using the CommCare version that you should | |||
have been provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe tell the reader that Celery needs to be stopped before running load_domain_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally up for continuing to make it clear that celery needs to be stopped, but it's tricky since it isn't as simple as stopping celery prior to running load_domain_data
. If celery was running at any point after resetting postgres, it is likely the case that the db has been "corrupted" and needs to be reset again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is tricky. So a deploy will restart Celery, but we need to run a deploy after the environment is reset. Hmmm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I think the framing should be we need to run a deploy before resetting the environment (I think I made that change here too), since the "current" release also impacts what state the database in migrated to when rebuilding the environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a critical point, so may be good to rephrase to share why
Ensure you are running the following steps from the same version of CommCareHQ as used to create the data dump being used to import data into your environment. Request for the CommCareHQ version/commit hash, if not shared.
Additionally, i think we need to do this during the initial setup. CommCareHQ is already deployed by this point and by default we deploy the latest code. So, at this point we are actually asking them to revert to an older version which isn't feasible due to migrations. Should we need a note about this when the setup is happening.
@@ -147,6 +147,8 @@ before importing data from the old environment. | |||
4. Import the data to the new environment | |||
----------------------------------------- | |||
|
|||
* Ensure you are running the following steps from a release created using the CommCare version that you should | |||
have been provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a critical point, so may be good to rephrase to share why
Ensure you are running the following steps from the same version of CommCareHQ as used to create the data dump being used to import data into your environment. Request for the CommCareHQ version/commit hash, if not shared.
Additionally, i think we need to do this during the initial setup. CommCareHQ is already deployed by this point and by default we deploy the latest code. So, at this point we are actually asking them to revert to an older version which isn't feasible due to migrations. Should we need a note about this when the setup is happening.
|
||
#. Stop CommCare | ||
#. Stop CommCare services to prevent background processes from writing to databases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
#. Add "wipe_environment_enabled: True" to `public.yml` file. | ||
.. note:: | ||
This is especially important if you are performing a migration of your data to a new instance. You should have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this only needed if performing a migration.
Otherwise there should not be a need to deploy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is true. I'll make that distinction 👍🏻
|
||
#. Reset PostgreSQL and PgBouncer | ||
#. Wipe BlobDB, Elasticsearch, and Couch using management commands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to check no services are running before doing this. This could happen if
- someone jumps straight to this step in the documentation without reading the block above
- the services are restarted by monit or someone/something else unknowingly.
|
||
.. code-block:: | ||
|
||
$ kafka-topics.sh --bootstrap-server localhost:9092 --list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My interpretation was that we never needed this but it was just a nice additional check. However given that we don't do any similar checks for the other steps that are run, I see no reason why we should treat kafka any differently and had no issues when running wipe_kafka
myself. So I removed it because I prioritized simplifying these steps.
@@ -56,41 +72,37 @@ in the sequence given below, so you shouldn't proceed to next steps until the pr | |||
|
|||
$ cchq <env_name> ap wipe_kafka.yml | |||
|
|||
#. Remove the "wipe_environment_enabled: True" line in your `public.yml` file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a new option that has been added? "wipe_environment_enabled"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it is used in ansible playbooks that wipe data as an extra safety feature to prevent people from accidentally running those playbooks. I just moved it to "wrap" all of the steps in the wiping data section even though it doesn't apply to django management commands because I thought that made more logical sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option was added when we added the Ansible tasks for nuking environments' data, to ensure that the people with their fingers on the button know what they're doing.
Edit: I should have refreshed my browser before responding. I see gherceg replied hours ago. ... I'll leave this here though, cos the meme is funny. 😏
thanks for these updates @gherceg , much needed improvement to our documentation |
|
||
|
||
3. Prepare the new environment to be populated | ||
---------------------------------------------- | ||
|
||
* Ensure you are running the following steps from a release created using the CommCare version/commit hash that you | ||
should have been provided in Step 1. This ensures the database will be migrated to the same state it was in when | ||
the data was dumped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding this @gherceg
There is one challenge though. The documented for deploy does not say how it is to be done. So, we would need to update there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Manish I'll make a new PR as there are some other changes I'd like to make too. Feel free to add any additional comments you may still have and I'll address them there! I admittedly rushed this out just to get something closer to up to date instructions out there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything else looks good!
After going through this process a number of times, this is where I've landed. I've tried to simplify the steps (take out what seemed unnecessary), and added/updated anything that caused issues but was needed.
Environments Affected
None