Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation around migrating data #6214

Merged
merged 3 commits into from
Jan 30, 2024

Conversation

gherceg
Copy link
Contributor

@gherceg gherceg commented Jan 29, 2024

After going through this process a number of times, this is where I've landed. I've tried to simplify the steps (take out what seemed unnecessary), and added/updated anything that caused issues but was needed.

Environments Affected

None

Most importantly, make it clear that CommCare needs to be stopped
prior to wiping/rebuilding data, and that the correct revision of
CommCare needs to be used when rebuilding/loading in data.

.. code-block::

$ cchq <env_name> service commcare stop
$ cchq <env_name> downtime start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(What's the difference between commcare stop and downtime start? When would one use one versus the other?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This may not actually be necessary to change. Both commands operate on COMMCARE_INVENTORY_GROUPS, and stop those services using supervisorctl. The downtime command also involves creating a record of downtime in datadog which is obviously unnecessary in this case, so I can switch it back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the downtime command does give the user the option to wait for all processes to be killed whereas service stop might take a bit of time, but by the time they run through the steps to wipe blobdb and es, Celery should be complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah on second thought, I like downtime for that reason in particular, and updated the instructions to select the kill option when prompted in b7b46e6

@@ -147,6 +147,8 @@ before importing data from the old environment.
4. Import the data to the new environment
-----------------------------------------

* Ensure you are running the following steps from a release created using the CommCare version that you should
have been provided.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe tell the reader that Celery needs to be stopped before running load_domain_data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally up for continuing to make it clear that celery needs to be stopped, but it's tricky since it isn't as simple as stopping celery prior to running load_domain_data. If celery was running at any point after resetting postgres, it is likely the case that the db has been "corrupted" and needs to be reset again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is tricky. So a deploy will restart Celery, but we need to run a deploy after the environment is reset. Hmmm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I think the framing should be we need to run a deploy before resetting the environment (I think I made that change here too), since the "current" release also impacts what state the database in migrated to when rebuilding the environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a critical point, so may be good to rephrase to share why

Ensure you are running the following steps from the same version of CommCareHQ as used to create the data dump being used to import data into your environment. Request for the CommCareHQ version/commit hash, if not shared.

Additionally, i think we need to do this during the initial setup. CommCareHQ is already deployed by this point and by default we deploy the latest code. So, at this point we are actually asking them to revert to an older version which isn't feasible due to migrations. Should we need a note about this when the setup is happening.

@@ -147,6 +147,8 @@ before importing data from the old environment.
4. Import the data to the new environment
-----------------------------------------

* Ensure you are running the following steps from a release created using the CommCare version that you should
have been provided.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a critical point, so may be good to rephrase to share why

Ensure you are running the following steps from the same version of CommCareHQ as used to create the data dump being used to import data into your environment. Request for the CommCareHQ version/commit hash, if not shared.

Additionally, i think we need to do this during the initial setup. CommCareHQ is already deployed by this point and by default we deploy the latest code. So, at this point we are actually asking them to revert to an older version which isn't feasible due to migrations. Should we need a note about this when the setup is happening.


#. Stop CommCare
#. Stop CommCare services to prevent background processes from writing to databases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


#. Add "wipe_environment_enabled: True" to `public.yml` file.
.. note::
This is especially important if you are performing a migration of your data to a new instance. You should have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this only needed if performing a migration.
Otherwise there should not be a need to deploy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is true. I'll make that distinction 👍🏻


#. Reset PostgreSQL and PgBouncer
#. Wipe BlobDB, Elasticsearch, and Couch using management commands.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to check no services are running before doing this. This could happen if

  1. someone jumps straight to this step in the documentation without reading the block above
  2. the services are restarted by monit or someone/something else unknowingly.


.. code-block::

$ kafka-topics.sh --bootstrap-server localhost:9092 --list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation was that we never needed this but it was just a nice additional check. However given that we don't do any similar checks for the other steps that are run, I see no reason why we should treat kafka any differently and had no issues when running wipe_kafka myself. So I removed it because I prioritized simplifying these steps.

@@ -56,41 +72,37 @@ in the sequence given below, so you shouldn't proceed to next steps until the pr

$ cchq <env_name> ap wipe_kafka.yml

#. Remove the "wipe_environment_enabled: True" line in your `public.yml` file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a new option that has been added? "wipe_environment_enabled"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it is used in ansible playbooks that wipe data as an extra safety feature to prevent people from accidentally running those playbooks. I just moved it to "wrap" all of the steps in the wiping data section even though it doesn't apply to django management commands because I thought that made more logical sense.

Copy link
Contributor

@kaapstorm kaapstorm Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option was added when we added the Ansible tasks for nuking environments' data, to ensure that the people with their fingers on the button know what they're doing.

cat-north-korea

Edit: I should have refreshed my browser before responding. I see gherceg replied hours ago. ... I'll leave this here though, cos the meme is funny. 😏

@mkangia
Copy link
Contributor

mkangia commented Jan 30, 2024

thanks for these updates @gherceg , much needed improvement to our documentation

@gherceg gherceg merged commit f7b9777 into master Jan 30, 2024
2 checks passed
@gherceg gherceg deleted the gh/data-dump/update-wipe-data-docs branch January 30, 2024 19:06


3. Prepare the new environment to be populated
----------------------------------------------

* Ensure you are running the following steps from a release created using the CommCare version/commit hash that you
should have been provided in Step 1. This ensures the database will be migrated to the same state it was in when
the data was dumped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding this @gherceg

There is one challenge though. The documented for deploy does not say how it is to be done. So, we would need to update there as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Manish I'll make a new PR as there are some other changes I'd like to make too. Feel free to add any additional comments you may still have and I'll address them there! I admittedly rushed this out just to get something closer to up to date instructions out there.

Copy link
Contributor

@minhaminha minhaminha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything else looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants