Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocp13 replacement #518

Merged
merged 25 commits into from
Oct 29, 2024
Merged

ocp13 replacement #518

merged 25 commits into from
Oct 29, 2024

Conversation

RobHooper
Copy link
Contributor

Resolves #494 and #504.

@RobHooper RobHooper self-assigned this Aug 21, 2024
@jpmckinney
Copy link
Member

jpmckinney commented Sep 27, 2024

I have noted that scrapyd needs a project configuring, on ocp13 this is configured under the datlab user (/home/datlab/.config/scrapy.cfg).

@RobHooper Is it that you observed that e.g. collect.data.open-contracting.org lists no projects? That's because we need to do something like scrapyd-deploy registry from our local computers. Not sure why Datlab has that config.

  • We can document this on a new page for the registry.

@jpmckinney
Copy link
Member

I will enable mod_md again when DNS is live (sending too many failed requests, as we are at the moment, causes Lets Encrypt to block us).

Yeah, I have a note about that in the docs https://ocdsdeploy.readthedocs.io/en/latest/develop/update/apache.html

Let’s Encrypt will reach a Failed Validation limit if DNS is not propagated.

I've now also added:

In the meantime, you can use Let’s Encrypt’s staging environment.

@jpmckinney
Copy link
Member

jpmckinney commented Sep 27, 2024


James when you have a moment, please can you confirm what data needs migrating to the new server from /data/.

332M	/data/storage/spoonbill

Yes, we also need to move /data/storage/spoonbill (looks like you already have).

It contains the Django media files for that project. It has a var subdirectory which is not referenced by anything and that has no changes since 2021, so I deleted it on both the old and new server (I assume it was from an old version).


Alongside this work, we can remove state ntp disabling the NTP service. This was only required for older systems (Ubuntu 20.04).
https://github.com/open-contracting/deploy/blob/main/salt/core/systemd/ntp.sls#L31-L34

  • @RobHooper Did you want to do this, or you decided not to?

For the data support server, we have:

Adjust reserved disk space to 1% for large disks:

tune2fs -m 1 /dev/md2

  • Do we want to do this for the registry server?

@jpmckinney
Copy link
Member

jpmckinney commented Sep 27, 2024

In terms of migration process, is this the plan?

  • Do we adjust the TTLs to speed up the DNS switchover?
  • ocp13: Remove/comment out DATA_REGISTRY_CBOM so that new jobs aren't started
  • ocp13: Delete /etc/cron.d/postgres_backups
  • ocp27: We should re-copy any new files (data/storage/exporter) and databases from ocp13
  • ocp27: Can we copy the mod_md files for data.open-contracting.org from ocp13? If so, we want to also restore the Apache conf files and reload Apache.
  • GoDaddy: Change the DNS over for data.open-contracting.org
  • Run scrapyd-deploy once DNS has propagated
  • Deploy ocp27 with Salt, to install the deployer's crontab and the postgres_backups
  • When satisfied, decommission ocp13

Copy link
Member

@jpmckinney jpmckinney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just one comment.

salt/aws/init.sls Show resolved Hide resolved
@jpmckinney
Copy link
Member

I added a documentation page that covers most of what I wrote above (except where I still have a question). Please add the steps to restore the data_registry and spoonbill databases from backups.

We should add a subheading on this other page, and explain in general how to restore from individual database backups versus the existing docs about pgBackrest backups: https://ocdsdeploy.readthedocs.io/en/latest/maintain/databases.html#restore-from-backup We can then just link to that from the registry page.

@RobHooper
Copy link
Contributor Author

In the meantime, you can use Let’s Encrypt’s staging environment.

The Lets Encrypt staging environment won't generate SSL certificates without live DNS, so this doesn't help in this situation.

@RobHooper Did you want to [remove the disable NTP state], or you decided not to?

This was already done in a previous PR :)

Do we want [modify reserved disk bytes] for the registry server?

Yes, I have ran tune2fs -m 1 /dev/md2 now.
I think we should run this by default on any disk larger than ~50GBs.
Where would be best to document this?

@jpmckinney
Copy link
Member

Where would be best to document this?

Probably at an appropriate step in create_server.rst, and maybe as a note at https://ocdsdeploy.readthedocs.io/en/latest/maintain/hosting.html#rescale-a-server

@jpmckinney
Copy link
Member

The Lets Encrypt staging environment won't generate SSL certificates without live DNS, so this doesn't help in this situation.

It helps to not get temporarily blocked by Let's Encrypt :)

@RobHooper
Copy link
Contributor Author

RobHooper commented Oct 24, 2024

Detailed migration plan as follows:

Before go-live:

  1. Inform parties (Dogsbody, OCP)

  2. Reduce TTLs for:

  • collect.data.open-contracting.org
  • data.open-contracting.org
  • flatten.open-contracting.org
  • rabbitmq.data.open-contracting.org

On the day:

  1. Merge GitHub Pull Request

  2. Disable crons on ocp13:

  • PELICAN_BACKEND_UPDATE_EXCHANGE_RATES
  • DATA_REGISTRY_CBOM
  • /etc/cron.d/postgres_backups
  1. Disable sites on ocp13 (Ensure no new data is created and lost).

docker compose down everything except the data-registry web and static containers

  1. Copy data from ocp13 and import on the new server

    4a. Rsync files

    rsync -avze "ssh -p 2223" [email protected]:/data/storage/spoonbill/ /data/storage/spoonbill/
    rsync -avze "ssh -p 2223" [email protected]:/data/storage/exporter_dumps/ /data/storage/exporter/
    rsync -avze "ssh -p 2223" [email protected]:/home/collect/scrapyd/logs /home/collect/scrapyd/logs

    4b. Migrate databases

    # Export databases on ocp13
    su - postgres -c "/usr/bin/pg_dump -Ft -f '~/spoonbill_web.tar' 'spoonbill_web'"
    su - postgres -c "/usr/bin/pg_dump -Ft -f '~/data_registry.tar' 'data_registry'"
    
    # Copy both files over
    
    # Import databases on ocp27
    sudo -u postgres pg_restore -cC --if-exists -v -U postgres -d spoonbill_web spoonbill_web.tar
    sudo -u postgres pg_restore -cC --if-exists -v -U postgres -d data_registry data_registry.tar

    4c. Migrate exchange rates table
    https://ocdsdeploy.readthedocs.io/en/latest/deploy/servers/data-support.html#pelican-backend

  2. Run docker app migrations
    https://ocdsdeploy.readthedocs.io/en/latest/deploy/servers/data-support.html#docker-apps

  3. Test sites are working - James and Bob

  4. Update DNS to point to the ocp27 CNAME.

  5. Run scrapyd-deploy once DNS has propagated - With James

  6. Deploy ocp27 with Salt, to install the deployer's crontab, postgres_backups and enable Apache mod_md.

Post-Migration

  1. Restore TTLs to 1 hour
  2. Deploy Docs with salt updating Tinyproxy for new IP registry address
  3. Deploy Prometheus with Salt
  4. Block all access to ocp13, and then when we are happy decommission. (Usually a couple of weeks after the migration).

Changes in this PR to docker and rsyslog effect all docker servers so carefully deploy and restart the Docker daemon.

@RobHooper
Copy link
Contributor Author

ocp27: Can we copy the mod_md files for data.open-contracting.org from ocp13? If so, we want to also restore the Apache conf files and reload Apache.

mod_md on the new server should instantly provision new SSL certificates as soon as DNS is live (and it is re-enabled).
I will only copy data from mod_md on ocp13 if there is a problem getting a new certificate.

@jpmckinney
Copy link
Member

jpmckinney commented Oct 24, 2024

  • Step 0: I need to message Dogsbody to say we're decommissioning ocp13, so there will be alerts about it, data.open-contracting.org, etc.

Reduce DNS TTLs for:

  • Aren't those for ocp23? We're replacing ocp13. (data.open-contracting.org, etc.)

Disable sites on ocp13

  • I think we can just (1) docker compose down everything except the data-registry web and static containers and (2) email [email protected] that work is starting (so that Yohanna doesn't edit any publications).

Test sites are working - James will you be online for this?

  • At what time are you expecting?

Run scrapyd-deploy once DNS has propagated - With James?

  • Yes, I can do this one. As I understand, DATA_REGISTRY_CBOM cron will be disabled until the next step.

Copy scrapyd logs

  • Let's do this before step 9. Once cron starts I'm not sure how rsync'ing the directory works.

@jpmckinney
Copy link
Member

Some other tasks I did:

@RobHooper RobHooper merged commit 0f7116d into main Oct 29, 2024
9 checks passed
@RobHooper RobHooper deleted the ocp27-onboarding branch October 29, 2024 14:04
@jpmckinney
Copy link
Member

Changes in this PR to docker and rsyslog effect all docker servers so carefully deploy and restart the Docker daemon

Is there anything extra to do when deploying other servers, or should Docker (and containers) restart normally?

@RobHooper
Copy link
Contributor Author

Changes in this PR to docker and rsyslog effect all docker servers so carefully deploy and restart the Docker daemon

Is there anything extra to do when deploying other servers, or should Docker (and containers) restart normally?

Docker should restart as part of the salt deployment causing a small outage.
I will be re-deploying on servers effected tomorrow morning while traffic is quiet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade ocp13 (registry) to Ubuntu 24.04 LTS
2 participants