Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation around migrating data #6214

Merged
merged 3 commits into from
Jan 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 24 additions & 13 deletions docs/source/installation/migration/1-migrating-project.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,10 +102,9 @@ Only after all the devices are updated to use a new/mobile URL, you can proceed
2. Pull the domain data from the old environment
------------------------------------------------

The migration will require you to block data access to prevent loss of data
created during the migration. In spite of this, a practice run may also be done
without this data freeze, though the data will need to be cleared before the
real run.
The migration will require you to block data access to prevent loss of data created during the migration. If you
would like to do a practice run, you will still need to block data access to ensure the exported data is in a
clean state, and the data will need to be cleared before the real run.

During the downtime, mobile users will still be able to collect data, but they
will be unable to submit forms or sync with the server.
Expand All @@ -125,28 +124,40 @@ will be unable to submit forms or sync with the server.
* ``./manage.py dump_domain_data <domain_name>``
* ``./manage.py run_blob_export --all <domain_name>``

* Transfer these two files to the new environment.
.. note::
It is important to have the commit hash that ``dump_domain_data`` and ``run_blob_export`` were run from. If
Dimagi does not provide you with this commit hash, please followup to ensure you are able to reference this
hash in future steps.

* Transfer these files to the new environment.

.. note::
If you are not able to use your own domain for a test run and would like dump data for a test domain for practising or testing, please contact us via https://forum.dimagi.com/c/developers/ with subject "Request for test domain dump data for migration testing" and mention this page. A dimagi developer will provide you above data for any test/QA domains (casesearch, ccqa, dataregistry, qateam, ben-test, qa-erm-v1-downstream1) from https://staging.commcarehq.org.
If you are not able to use your own domain for a test run and would like dump data for a test domain for
practising or testing, please contact us via https://forum.dimagi.com/c/developers/ with subject
"Request for test domain dump data for migration testing" and mention this page. A dimagi developer will
provide you above data for any test/QA domains (casesearch, ccqa, dataregistry, qateam, ben-test, qa-erm-v1-downstream1)
from https://staging.commcarehq.org.


3. Prepare the new environment to be populated
----------------------------------------------

* Ensure you are running the following steps from a release created using the CommCare version/commit hash that you
should have been provided in Step 1. This ensures the database will be migrated to the same state it was in when
the data was dumped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding this @gherceg

There is one challenge though. The documented for deploy does not say how it is to be done. So, we would need to update there as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Manish I'll make a new PR as there are some other changes I'd like to make too. Feel free to add any additional comments you may still have and I'll address them there! I admittedly rushed this out just to get something closer to up to date instructions out there.

* Setup a new environment by following :ref:`deploy-commcarehq`
* Do a commcare-hq deploy using :ref:`operations/2-deploys:Deploying CommCare HQ code changes`
* Follow steps in
:ref:`reference/howto/wipe_persistent_data:How To Rebuild a CommCare HQ environment`
to ensure your environment is in a clean state before attempting to import data.
* Proceed to step 4.

If you have performed any tests on your new environment that has created test data, to delete
the data you can rebuild your environment using
:ref:`reference/howto/wipe_persistent_data:How To Rebuild a CommCare HQ environment`
before importing data from the old environment.


4. Import the data to the new environment
-----------------------------------------

* Ensure you are running the following steps from a release created using the CommCare version/commit hash that you
should have been provided in Step 1. This ensures the database will be migrated to the same state it was in when
the data was dumped.

* Import the dump files (each blob file will need to be imported individually)

Expand All @@ -156,7 +167,7 @@ before importing data from the old environment.
* Rebuild elasticsearch indices

* Rebuild the indices with the new data
``./manage.py ptop_preindex``
``./manage.py ptop_preindex --reset``

* Print the database numbers and compare them to the values obtained previously

Expand Down
109 changes: 70 additions & 39 deletions docs/source/reference/howto/wipe_persistent_data.rst
Original file line number Diff line number Diff line change
@@ -1,47 +1,68 @@
How To Rebuild a CommCare HQ environment
========================================

This step deletes all of the CommCare data from your environment and resets to as if it's a new environment.
In practice, you will likely need this only to delete test environments and not production data. Please understand fully
before you proceed to perform this as it will permenantly delete all of your data.
These steps delete *all* CommCare data in your environment.

In practice, you will likely *only* need this to delete test environments. We strongly discourage using any of
these of steps on production data. Please fully understand this before proceeding as this will permenantly
delete all of your data.

How To Wipe Persistent Data
---------------------------
Prior to Wiping Data
--------------------

#. Ensure CommCare services are in a healthy state. If you observe any issues, see the Troubleshooting section below.

This step deletes all of the persistent data in BlobDB, Postgres, Couch and Elasticsearch. Note that this works only
in the sequence given below, so you shouldn't proceed to next steps until the prior steps are successful.
.. code-block::

$ cchq <env_name> django-manage check_services


#. Wipe BlobDB, ES, Couch using management commands.
#. If planning to migrate data, deploy CommCare from a specific revision

.. code-block::

$ cchq <env_name> django-manage wipe_blobdb --commit
$ cchq <env_name> django-manage wipe_es --commit
$ cchq <env_name> django-manage delete_couch_dbs --commit
$ cchq <env_name> deploy commcare --commcare-rev=<commit-hash>

#. Add "wipe_environment_enabled: True" to `public.yml` file.
.. note::
You should have been given a commit hash that matches the revision of CommCare used to generate your
exported data, and it is critical that this same CommCare revision is used to rebuild the new environment,
and load data in. Please request a commit hash if you were not provided one.

#. Stop CommCare
#. Stop CommCare services to prevent background processes from writing to databases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


.. code-block::

$ cchq <env_name> service commcare stop
$ cchq <env_name> downtime start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(What's the difference between commcare stop and downtime start? When would one use one versus the other?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This may not actually be necessary to change. Both commands operate on COMMCARE_INVENTORY_GROUPS, and stop those services using supervisorctl. The downtime command also involves creating a record of downtime in datadog which is obviously unnecessary in this case, so I can switch it back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the downtime command does give the user the option to wait for all processes to be killed whereas service stop might take a bit of time, but by the time they run through the steps to wipe blobdb and es, Celery should be complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah on second thought, I like downtime for that reason in particular, and updated the instructions to select the kill option when prompted in b7b46e6

# Choose option to kill any running processes when prompted

#. Reset PostgreSQL and PgBouncer
How To Wipe Persistent Data
---------------------------

These steps are intended to be run in the sequence given below, so you shouldn't proceed to next step until
the prior step is completed.

#. Ensure CommCare services are stopped to prevent background processes from writing to databases.

.. code-block::

$ cchq <env_name> service commcare status

#. Add "wipe_environment_enabled: True" to `public.yml` file.

#. Wipe BlobDB, Elasticsearch, and Couch using management commands.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to check no services are running before doing this. This could happen if

  1. someone jumps straight to this step in the documentation without reading the block above
  2. the services are restarted by monit or someone/something else unknowingly.


.. code-block::

$ cchq <env_name> ap deploy_postgres.yml
$ cchq <env_name> django-manage wipe_blobdb --commit
$ cchq <env_name> django-manage wipe_es --commit
$ cchq <env_name> django-manage delete_couch_dbs --commit

#. Wipe PostgreSQL data

Check status. Once status is "OK", wipe PostgreSQL data
#. Wipe PostgreSQL data (restart first to kill any existing connections)

.. code-block::

$ cchq <env_name> service postgresql status
$ cchq <env_name> service postgresql restart
$ cchq <env_name> ap wipe_postgres.yml

#. Clear the Redis cache data
Expand All @@ -56,41 +77,37 @@ in the sequence given below, so you shouldn't proceed to next steps until the pr

$ cchq <env_name> ap wipe_kafka.yml

#. Remove the "wipe_environment_enabled: True" line in your `public.yml` file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a new option that has been added? "wipe_environment_enabled"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it is used in ansible playbooks that wipe data as an extra safety feature to prevent people from accidentally running those playbooks. I just moved it to "wrap" all of the steps in the wiping data section even though it doesn't apply to django management commands because I thought that made more logical sense.

Copy link
Contributor

@kaapstorm kaapstorm Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option was added when we added the Ansible tasks for nuking environments' data, to ensure that the people with their fingers on the button know what they're doing.

cat-north-korea

Edit: I should have refreshed my browser before responding. I see gherceg replied hours ago. ... I'll leave this here though, cos the meme is funny. 😏


You can check they have been removed by confirming that the following shows
no output:

**Note**\ : Use below command when the ``kafka version is < 3.x``. The ``--zookeeper`` argument is removed from 3.x.

.. code-block::

$ kafka-topics.sh --zookeeper localhost:2181 --list

**Note**\ : Use below command when the ``kafka version is >= 3.x``.

.. code-block::

$ kafka-topics.sh --bootstrap-server localhost:9092 --list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation was that we never needed this but it was just a nice additional check. However given that we don't do any similar checks for the other steps that are run, I see no reason why we should treat kafka any differently and had no issues when running wipe_kafka myself. So I removed it because I prioritized simplifying these steps.


Rebuilding environment
----------------------


#. Remove the "wipe_environment_enabled: True" line in your `public.yml` file.

#. Run Ansible playbook to recreate databases.
#. Recreate all databases

.. code-block::

$ cchq <env_name> ap deploy_db.yml --skip-check

Run initial migration
#. Run migrations for fresh install

.. code-block::

$ cchq <env_name> ap migrate_on_fresh_install.yml -e CCHQ_IS_FRESH_INSTALL=1

#. Run a code deploy to create Kafka topics and Elasticsearch indices.
#. Create kafka topics

.. code-block::

$ cchq <env_name> django-manage create_kafka_topics

.. note::

If you are migrating a project to a new environment, you can return to the steps outlined in
`Import the data to the new environment <installation/migration/1-migrating-project.html#import-the-data-to-the-new-environment>`_.
Otherwise, you can continue with the following steps.

#. Run a code deploy to start CommCare back up.

.. code-block::

Expand All @@ -104,3 +121,17 @@ Rebuilding environment
.. code-block::

$ cchq <env_name> django-manage make_superuser [email protected]

Troubleshooting
---------------

Issues with check_services
~~~~~~~~~~~~~~~~~~~~~~~~~~

* Kafka: No Brokers Available - Try resetting Zookeeper by performing the following steps:

.. code-block::

$ cchq monolith service kafka stop
$ rm -rf /var/lib/zookeeper/*
$ cchq monolith service kafka restart
Loading