Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database versioning and migration #3255

Closed
wants to merge 34 commits into from
Closed
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
654a5f1
add basic structure
sudo-shashank Jul 21, 2023
19abb27
complete pre migration check
sudo-shashank Jul 21, 2023
92d7391
use migrate db in daemon
sudo-shashank Jul 21, 2023
6ffb5e7
cleanup
sudo-shashank Jul 21, 2023
83a5b0e
Add comments
sudo-shashank Jul 21, 2023
9d75b30
correction
sudo-shashank Jul 21, 2023
aa7e4de
fmt
sudo-shashank Jul 21, 2023
d087781
Add dev db support
sudo-shashank Jul 25, 2023
dd1888a
Added migration check script
sudo-shashank Jul 25, 2023
85ba181
Added migration CI
sudo-shashank Jul 25, 2023
9b5cc33
fix migration check and cleanup
sudo-shashank Jul 25, 2023
cefd847
fmt
sudo-shashank Jul 26, 2023
1cbedd2
Merge branch 'main' into shashank/db-migration
sudo-shashank Jul 26, 2023
91d2b62
fix shellcheck
sudo-shashank Jul 26, 2023
6ea25cf
lint fix
sudo-shashank Jul 26, 2023
9e1fd70
fmt
sudo-shashank Jul 26, 2023
3948dbc
fmt
sudo-shashank Jul 26, 2023
19592d0
use released forest binary
sudo-shashank Jul 26, 2023
707e512
change migration CI test
sudo-shashank Jul 26, 2023
256cde7
fix CI
sudo-shashank Jul 26, 2023
38351aa
fix CI test
sudo-shashank Jul 26, 2023
57e0f15
fix CI
sudo-shashank Jul 26, 2023
5f6f7a8
cleanup
sudo-shashank Jul 26, 2023
92e3eec
fmt
sudo-shashank Jul 26, 2023
151458a
remove unwrap
sudo-shashank Jul 26, 2023
2bd40f8
update changelog
sudo-shashank Jul 26, 2023
77d453a
fmt
sudo-shashank Jul 26, 2023
5282c56
fix comments
sudo-shashank Jul 26, 2023
94dbd37
Merge branch 'main' into shashank/db-migration
sudo-shashank Jul 27, 2023
cbfc575
Fix
sudo-shashank Aug 3, 2023
3056580
fix dev db
sudo-shashank Aug 3, 2023
583cb8d
Merge branch 'main' into shashank/db-migration
sudo-shashank Aug 8, 2023
18a6140
fix chainstore init
sudo-shashank Aug 8, 2023
a7d126a
update version in CI
sudo-shashank Aug 8, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/workflows/migration-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Migration Check

# Cancel workflow if there is a new change to the branch.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

on:
workflow_dispatch:
pull_request:
branches:
- main
push:
branches:
- main

jobs:
migration-check:
name: Forest database migration checks
runs-on: ubuntu-latest
steps:
- name: Checkout sources
uses: actions/checkout@v3
- name: Setup sccache
uses: mozilla-actions/[email protected]
timeout-minutes: ${{ fromJSON(env.CACHE_TIMEOUT_MINUTES) }}
continue-on-error: true
- name: migration check
run: ./scripts/migration_check.sh
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@

### Breaking

- [#3203](https://github.com/ChainSafe/forest/issues/3203): Implemented database
versioning and migration
- [#3189](https://github.com/ChainSafe/forest/issues/3189): Changed the database
organisation to use multiple columns. The database will need to be recreated.
- [#3220](https://github.com/ChainSafe/forest/pull/3220): Removed the
Expand Down
10 changes: 10 additions & 0 deletions documentation/src/developer_documentation/db_migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Steps to add support for a new database version:

- Add a new enum variant for new database version in `DBVersion`.
- Update `get_db_version` to include newly added enum variant.
- Add version transition for each DBVersion in `migrate_db` method.
- Add steps required for new migration in `migrate` method. In each migration
step, you can either do in place migration or use temp_db/ to migrate data
from existing db but finally it must atomically rename temp_db/ back to
existing db name.
- Update `LATEST_DB_VERSION` to latest database version.
69 changes: 69 additions & 0 deletions scripts/migration_check.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/usr/bin/env bash
set -euo pipefail

# Function to sync using a specific tag
function sync_with_tag() {
local tag=$1
echo "Syncing using tag ($tag)..."

# Create the download URL for the forest release
URL="https://github.com/ChainSafe/forest/releases/download/${tag}/forest-${tag}-linux-amd64.zip"
# Download the release using curl
curl -LJO "${URL}"

# Unzip the downloaded file
unzip "forest-${tag}-linux-amd64.zip"
cd "forest-${tag}"

# Run forest daemon
./forest --chain calibnet --encrypt-keystore false --auto-download-snapshot --detach

# Check if the sync succeeded for the tag
if ./forest-cli --chain calibnet sync wait; then
echo "Sync successful for tag: $tag"
pkill -9 forest
# clean up
cd ..
rm "forest-${tag}-linux-amd64.zip" "forest-${tag}" -rf
sleep 5s
else
echo "Sync failed for tag: $tag"
exit 1
fi
}

# DB Migration are supported v0.11.1 onwards
START_TAG="v0.11.1"

# Fetch the latest tags from the remote repository
git fetch --tags

# Get a list of all tags sorted chronologically
tags=$(git tag --sort=creatordate)
# Get latest tag
LATEST_TAG=""

# Database migration are not supported for forest version below `v0.11.1`
is_tag_valid=false

echo "Testing db migrations from v0.11.1 to latest, one by one"
# Loop through each tag and sync with corresponding version
for tag in $tags; do
# Check if the current tag matches the start tag
if [ "$tag" = "$START_TAG" ]; then
is_tag_valid=true
fi
if $is_tag_valid; then
# Run sync check with the current tag
sync_with_tag "$tag"
fi
LATEST_TAG="$tag"
done

echo "Testing db migration from v0.11.1 to latest, at once"
# Sync calibnet with Forest `V0.11.1`
sync_with_tag "$START_TAG"
# Sync calibnet with latest version of Forest
sync_with_tag "$LATEST_TAG"

echo "Migration check completed successfully."
9 changes: 8 additions & 1 deletion src/cli_shared/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,14 @@ pub use tikv_jemallocator;

/// Gets chain data directory
pub fn chain_path(config: &crate::cli_shared::cli::Config) -> PathBuf {
PathBuf::from(&config.client.data_dir).join(config.chain.network.to_string())
let chain_path = PathBuf::from(&config.client.data_dir).join(config.chain.network.to_string());
// Use the dev database if it exists, else use versioned database
let dev_path = chain_path.join("dev");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is the database ever put in the dev folder?

if dev_path.exists() {
dev_path
} else {
chain_path.join(env!("CARGO_PKG_VERSION"))
}
}

pub mod snapshot;
5 changes: 5 additions & 0 deletions src/daemon/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ use crate::cli_shared::{
};
use crate::db::{
db_engine::{db_root, open_proxy_db},
migration::{check_if_another_db_exist, migrate_db, LATEST_DB_VERSION},
rolling::DbGarbageCollector,
};
use crate::genesis::{get_network_name_from_genesis, import_chain, read_genesis_header};
Expand Down Expand Up @@ -150,6 +151,10 @@ pub(super) async fn start(

let keystore = Arc::new(RwLock::new(keystore));

if let Some(db_path) = check_if_another_db_exist(&config) {
migrate_db(&config, db_path, LATEST_DB_VERSION).await?;
}

let chain_data_path = chain_path(&config);
let db = Arc::new(open_proxy_db(
db_root(&chain_data_path),
Expand Down
129 changes: 129 additions & 0 deletions src/db/migration/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
// Copyright 2019-2023 ChainSafe Systems
// SPDX-License-Identifier: Apache-2.0, MIT

use super::db_engine::{db_root, open_proxy_db};
use crate::chain::ChainStore;
use crate::cli_shared::{chain_path, cli::Config};
use crate::fil_cns::composition as cns;
use crate::genesis::read_genesis_header;
use crate::state_manager::StateManager;
use crate::utils::proofs_api::paramfetch::{
ensure_params_downloaded, set_proofs_parameter_cache_dir_env,
};
use std::fs;
use std::path::{Path, PathBuf};
use std::sync::Arc;
use tracing::info;

pub const LATEST_DB_VERSION: DBVersion = DBVersion::V11;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why manually define this? It could be inferred from the version number of Forest.


/// Database version for each forest version which supports db migration
#[derive(Debug, Eq, PartialEq)]
pub enum DBVersion {
V0, // Default DBVersion for any unknown db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the version is unknown, let's call it Unknown rather than V0.

V11,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this version 11? Forest is at version 0.11.1. How about you just use https://docs.rs/semver/latest/semver/struct.Version.html to exactly capture the Forest version?

}

/// Check to verify database migrations
async fn migration_check(config: &Config, existing_chain_data_root: &Path) -> anyhow::Result<()> {
info!(
"Running database migration checks for: {}",
existing_chain_data_root.display()
);
// Set proof param dir env path, required for running validations
if cns::FETCH_PARAMS {
set_proofs_parameter_cache_dir_env(&config.client.data_dir);
}
ensure_params_downloaded().await?;

// Open db
let db = Arc::new(open_proxy_db(
db_root(existing_chain_data_root),
config.db_config().clone(),
)?);
let genesis = read_genesis_header(None, config.chain.genesis_bytes(), &db).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want ts.genesis().

let chain_store = Arc::new(ChainStore::new(
db,
Arc::clone(&config.chain),
genesis,
existing_chain_data_root,
)?);
let state_manager = Arc::new(StateManager::new(chain_store, Arc::clone(&config.chain))?);

let ts = state_manager.chain_store().heaviest_tipset();
let height = ts.epoch();
assert!(height.is_positive());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the purpose of this assertion. More documentation is needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not required, removed

// re-compute 100 tipsets only
state_manager.validate_range((height - 100)..=height)?;
Ok(())
}

/// Migrate database to latest version
pub async fn migrate_db(
config: &Config,
db_path: PathBuf,
target_version: DBVersion,
) -> anyhow::Result<()> {
info!("Running database migrations...");
// Get DBVersion from existing db path
let mut current_version = get_db_version(&db_path);
// Run pre-migration checks, which includes:
// - re-compute 100 tipsets
migration_check(config, &db_path).await?;

// Iterate over all DBVersion's until database is migrated to lastest version
while current_version != target_version {
let next_version = match current_version {
DBVersion::V0 => DBVersion::V11,
_ => break,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need help understanding the details of this code. The list of Forest versions is static, and we shouldn't have to specify the next_version manually. For example, the last three Forest versions are: ["0.10.0", "0.11.0", "0.11.1"]. Keeping the versions in a vector tells you about the known versions and their ordering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am maintaining the list of versions now

// Execute the migration steps for itermediate version
migrate(&db_path, &next_version)?;
current_version = next_version;
}
// Run post-migration checks, which includes:
// - re-compute 100 tipsets
migration_check(config, &db_path).await?;

// Rename db to latest versioned db
fs::rename(db_path.as_path(), chain_path(config))?;

info!("Database Successfully Migrated to {:?}", target_version);
Ok(())
}

// TODO: Add Steps required for new migration
/// Migrate to an intermediate db version
fn migrate(_existing_db_path: &Path, next_version: &DBVersion) -> anyhow::Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you want the migrations to happen inside the database folder. That's not safe. If something goes wrong, we want to revert back to the last known good state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently using temp_dir for intermediate db migrations, if something goes wrong we can revert back to last good state easily

match next_version {
DBVersion::V11 => Ok(()),
_ => Ok(()),
}
}

/// Checks if another database already exist
pub fn check_if_another_db_exist(config: &Config) -> Option<PathBuf> {
let dir = PathBuf::from(&config.client.data_dir).join(config.chain.network.to_string());
let paths = fs::read_dir(&dir).ok()?;
for dir in paths.flatten() {
let path = dir.path();
if path.is_dir() && path != chain_path(config) {
return Some(path);
}
}
None
}

/// Returns `DBVersion` for the given database path.
fn get_db_version(db_path: &Path) -> DBVersion {
match db_path
.parent()
.and_then(|parent_path| parent_path.file_name())
{
Some(dir_name) => match dir_name.to_str() {
Some(name) if name.starts_with("0.11") => DBVersion::V11,
_ => DBVersion::V0, // Defaults to V0
},
None => DBVersion::V0, // Defaults to V0
}
}
4 changes: 2 additions & 2 deletions src/db/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ pub mod parity_db;
pub mod parity_db_config;
pub use memory::MemoryDB;

pub mod migration;
pub mod rolling;

/// Interface used to store and retrieve settings from the database.
Expand Down Expand Up @@ -64,9 +65,8 @@ impl<DB: DBStatistics> DBStatistics for std::sync::Arc<DB> {
}

pub mod db_engine {
use std::path::{Path, PathBuf};

use crate::db::rolling::*;
use std::path::{Path, PathBuf};

pub type Db = crate::db::parity_db::ParityDb;
pub type DbConfig = crate::db::parity_db_config::ParityDbConfig;
Expand Down
Loading