Skip to content

Maintenance Schedule Guide

Samuel Arogbonlo edited this page Aug 24, 2023 · 3 revisions

Forest IAC Services Bi-weekly Maintenance Guide

Purpose of the Guide

This guide serves as a detailed roadmap for the bi-weekly maintenance of forest-mainnet, forest-calibnet, lotus-mainnet, Daily-Snapshot, and Sync-check. By providing comprehensive steps, our objective is to uphold the reliability, security, and optimal performance of these critical services.

Rationale for Bi-weekly Maintenance

A bi-weekly maintenance routine offers a balanced approach between ensuring our services stay current and minimizing potential disruptions. By doing so, we can swiftly identify and address issues, reducing extended downtime or unexpected complications.

Maintenance Steps

Basic Pre-checks:

  • Verify the status of all nodes to confirm they are operational. You can conveniently assess this by reviewing the New Relic Dashboard for all active nodes. Additionally, log into Digital Ocean for further confirmation. Document any inconsistencies or anomalies you observe or open an issue here

  • Check The snapshot buckets for Mainnet and Calibnet. Confirm that snapshots update as expected and adhere to their Hourly schedules.

Filecoin Node Health and Synchronization:

For insight to how the filecoin nodes are running the follow following steps below.

New Relic Dashboards

  • Dashboards provide a summary, but the devil is in the details. What do key metrics like Epoch Count, Healthy Peers, and Process Wall Time tell us about Forest-calibnet and Forest-calibnet?

  • Log Delve: Logs narrate the node's story. Scour the logs for forest-mainnet, forest-calibnet and lotus-mainnet. Look beyond errors: Unusual patterns, even if not errors, could indicate looming issues.

Lotus-mainnet Deep Dive

For lotus-mainnet node you might need to login into the node to gain more insight and Confirm synchronization status with the wider network.

  • Begin by logging into the node.
  • Once inside, execute the following command to enter the running container:
docker exec -it lotus-mainnet bash
  • To verify the synchronization status, use:
lotus sync status
  • Obtain a general node overview:
lotus info

Service Health Checks

Daily-Snapshot:

  • Verify that the snapshot generation process is free of errors. you can check the forest-notificationa channel about the status recent uploads and logs if any errors
  • Inspect the latest snapshots for data consistency and completeness.
  • Ensure there's sufficient storage space for upcoming snapshots.

Sync-check:

  • Execute the synchronization check script/tool.
  • Confirm synchronization status as notificated in the forest-notification channel and document any discrepancies.

Security Reviews

  • Review the service logs on new-relic or any indications of unusual behavior or potential unauthorized accesses.
  • Ensure that the firewall configurations on Digital Ocean are strictly set up to permit only essential traffic.

Post-maintenance Review:

  • Review health metrics and performance indicators across all nodes and services.
  • Ensure services are operational, accessible, and delivering expected performance levels.
  • Thoroughly document any discrepancies found during maintenance and flag necessary issues on the GitHub repository.

Feedback and Improvements:

  • Reflect upon the maintenance process. Identify any bottlenecks or challenges encountered.
  • Explore avenues to refine, automate, or enhance the maintenance procedure.
  • Periodically review and revise this guide to incorporate changes, augmentations, or new methodologies.

By regularly adhering to this guide, we cement our commitment to maintaining a stable, secure, and high-performance environment for all our services.