Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation around migrating data #6214

Merged
merged 3 commits into from
Jan 30, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/source/installation/migration/1-migrating-project.rst
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,8 @@ before importing data from the old environment.
4. Import the data to the new environment
-----------------------------------------

* Ensure you are running the following steps from a release created using the CommCare version that you should
have been provided.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe tell the reader that Celery needs to be stopped before running load_domain_data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally up for continuing to make it clear that celery needs to be stopped, but it's tricky since it isn't as simple as stopping celery prior to running load_domain_data. If celery was running at any point after resetting postgres, it is likely the case that the db has been "corrupted" and needs to be reset again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is tricky. So a deploy will restart Celery, but we need to run a deploy after the environment is reset. Hmmm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I think the framing should be we need to run a deploy before resetting the environment (I think I made that change here too), since the "current" release also impacts what state the database in migrated to when rebuilding the environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a critical point, so may be good to rephrase to share why

Ensure you are running the following steps from the same version of CommCareHQ as used to create the data dump being used to import data into your environment. Request for the CommCareHQ version/commit hash, if not shared.

Additionally, i think we need to do this during the initial setup. CommCareHQ is already deployed by this point and by default we deploy the latest code. So, at this point we are actually asking them to revert to an older version which isn't feasible due to migrations. Should we need a note about this when the setup is happening.


* Import the dump files (each blob file will need to be imported individually)

Expand All @@ -156,7 +158,7 @@ before importing data from the old environment.
* Rebuild elasticsearch indices

* Rebuild the indices with the new data
``./manage.py ptop_preindex``
``./manage.py ptop_preindex --reset``

* Print the database numbers and compare them to the values obtained previously

Expand Down
104 changes: 65 additions & 39 deletions docs/source/reference/howto/wipe_persistent_data.rst
Original file line number Diff line number Diff line change
@@ -1,47 +1,63 @@
How To Rebuild a CommCare HQ environment
========================================

This step deletes all of the CommCare data from your environment and resets to as if it's a new environment.
In practice, you will likely need this only to delete test environments and not production data. Please understand fully
before you proceed to perform this as it will permenantly delete all of your data.
These steps delete *all* CommCare data in your environment.

In practice, you will likely *only* need this to delete test environments. We strongly discourage using any of
these of steps on production data. Please fully understand this before proceeding as this will permenantly
delete all of your data.

How To Wipe Persistent Data
---------------------------
Prior to Wiping Data
--------------------

#. Ensure CommCare services are in a healthy state. If you observe any issues, see the Troubleshooting section below.

.. code-block::

This step deletes all of the persistent data in BlobDB, Postgres, Couch and Elasticsearch. Note that this works only
in the sequence given below, so you shouldn't proceed to next steps until the prior steps are successful.
$ cchq <env_name> django-manage check_services


#. Wipe BlobDB, ES, Couch using management commands.
#. Deploy CommCare from a specific revision

.. code-block::

$ cchq <env_name> django-manage wipe_blobdb --commit
$ cchq <env_name> django-manage wipe_es --commit
$ cchq <env_name> django-manage delete_couch_dbs --commit
$ cchq <env_name> deploy commcare --commcare-rev=<commit-hash>

#. Add "wipe_environment_enabled: True" to `public.yml` file.
.. note::
This is especially important if you are performing a migration of your data to a new instance. You should have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this only needed if performing a migration.
Otherwise there should not be a need to deploy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is true. I'll make that distinction 👍🏻

been given a commit hash that matches the revision of CommCare used to generate your exported data, and it is
critical that this same CommCare revision is used to rebuild the new environment, and load data in.

#. Stop CommCare
#. Stop CommCare services to prevent background processes from writing to databases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


.. code-block::

$ cchq <env_name> service commcare stop
$ cchq <env_name> downtime start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(What's the difference between commcare stop and downtime start? When would one use one versus the other?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This may not actually be necessary to change. Both commands operate on COMMCARE_INVENTORY_GROUPS, and stop those services using supervisorctl. The downtime command also involves creating a record of downtime in datadog which is obviously unnecessary in this case, so I can switch it back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the downtime command does give the user the option to wait for all processes to be killed whereas service stop might take a bit of time, but by the time they run through the steps to wipe blobdb and es, Celery should be complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah on second thought, I like downtime for that reason in particular, and updated the instructions to select the kill option when prompted in b7b46e6

# Choose option to kill any running processes when prompted

How To Wipe Persistent Data
---------------------------

These steps are intended to be run in the sequence given below, so you shouldn't proceed to next step until
the prior step is completed.


#. Add "wipe_environment_enabled: True" to `public.yml` file.

#. Reset PostgreSQL and PgBouncer
#. Wipe BlobDB, Elasticsearch, and Couch using management commands.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to check no services are running before doing this. This could happen if

  1. someone jumps straight to this step in the documentation without reading the block above
  2. the services are restarted by monit or someone/something else unknowingly.


.. code-block::

$ cchq <env_name> ap deploy_postgres.yml
$ cchq <env_name> django-manage wipe_blobdb --commit
$ cchq <env_name> django-manage wipe_es --commit
$ cchq <env_name> django-manage delete_couch_dbs --commit

#. Wipe PostgreSQL data

Check status. Once status is "OK", wipe PostgreSQL data
#. Wipe PostgreSQL data (restart first to kill any existing connections)

.. code-block::

$ cchq <env_name> service postgresql status
$ cchq <env_name> service postgresql restart
$ cchq <env_name> ap wipe_postgres.yml

#. Clear the Redis cache data
Expand All @@ -56,41 +72,37 @@ in the sequence given below, so you shouldn't proceed to next steps until the pr

$ cchq <env_name> ap wipe_kafka.yml

#. Remove the "wipe_environment_enabled: True" line in your `public.yml` file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a new option that has been added? "wipe_environment_enabled"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it is used in ansible playbooks that wipe data as an extra safety feature to prevent people from accidentally running those playbooks. I just moved it to "wrap" all of the steps in the wiping data section even though it doesn't apply to django management commands because I thought that made more logical sense.

Copy link
Contributor

@kaapstorm kaapstorm Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option was added when we added the Ansible tasks for nuking environments' data, to ensure that the people with their fingers on the button know what they're doing.

cat-north-korea

Edit: I should have refreshed my browser before responding. I see gherceg replied hours ago. ... I'll leave this here though, cos the meme is funny. 😏


You can check they have been removed by confirming that the following shows
no output:

**Note**\ : Use below command when the ``kafka version is < 3.x``. The ``--zookeeper`` argument is removed from 3.x.

.. code-block::

$ kafka-topics.sh --zookeeper localhost:2181 --list

**Note**\ : Use below command when the ``kafka version is >= 3.x``.

.. code-block::

$ kafka-topics.sh --bootstrap-server localhost:9092 --list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation was that we never needed this but it was just a nice additional check. However given that we don't do any similar checks for the other steps that are run, I see no reason why we should treat kafka any differently and had no issues when running wipe_kafka myself. So I removed it because I prioritized simplifying these steps.


Rebuilding environment
----------------------


#. Remove the "wipe_environment_enabled: True" line in your `public.yml` file.

#. Run Ansible playbook to recreate databases.
#. Recreate all databases

.. code-block::

$ cchq <env_name> ap deploy_db.yml --skip-check

Run initial migration
#. Run migrations for fresh install

.. code-block::

$ cchq <env_name> ap migrate_on_fresh_install.yml -e CCHQ_IS_FRESH_INSTALL=1

#. Run a code deploy to create Kafka topics and Elasticsearch indices.
#. Create kafka topics

.. code-block::

$ cchq <env_name> django-manage create_kafka_topics

.. note::

If you are migrating a project to a new environment, you can return to the steps outlined in
`Import the data to the new environment <installation/migration/1-migrating-project.html#import-the-data-to-the-new-environment>`_.
Otherwise, you can continue with the following steps.

#. Run a code deploy to start CommCare back up.

.. code-block::

Expand All @@ -104,3 +116,17 @@ Rebuilding environment
.. code-block::

$ cchq <env_name> django-manage make_superuser [email protected]

Troubleshooting
---------------

Issues with check_services
~~~~~~~~~~~~~~~~~~~~~~~~~~

* Kafka: No Brokers Available - Try resetting Zookeeper by performing the following steps:

.. code-block::

$ cchq monolith service kafka stop
$ rm -rf /var/lib/zookeeper/*
$ cchq monolith service kafka restart
Loading