Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - sql: add copy-based migrations and VACUUM INTO #6085

Closed
wants to merge 79 commits into from

Conversation

ivan4th
Copy link
Contributor

@ivan4th ivan4th commented Jun 27, 2024


Motivation

Currently, when vacuuming is required after database migration (according to the configured db-vacuum-state value), simple SQL VACUUM command is used. According to the description in the SQLite docs, VACUUM requires as much as twice the size of the original database file in free disk space, as first it makes a copy of the database in the temporary directory, and then it copies the vacuumed database back to the original database utilizing WAL file which grows to the size of the database itself before changes are committed.
According to the remark in the SQLite source

** Only 1x temporary space and only 1x writes would be required if
** the copy of step (3) were replaced by deleting the original database
** and renaming the transient database as the original.  But that will
** not work if other processes are attached to the original database.
** And a power loss in between deleting the original and renaming the
** transient would cause the database file to appear to be deleted
** following reboot.

In case of go-spacemesh, we don't need concurrent access to the database from multiple processes, so we can optimize the vacuuming step to use only 1x space of the original database for vacuuming, given that VACUUM INTO makes a vacuumed copy of the database with no space requirements besides the size of the copy. Moreover, we can use the temporary copy of the database to run migrations faster as we can use PRAGMA journal_mode=OFF and PRAGMA synchronous=OFF for the temporary database safely, just dropping it if something goes wrong during migration.

On top of requiring 2x space, normal VACUUM (without INTO) has another problem: very slow last step in which the original database is replaced. Let's compare VACUUM with VACUUM INTO on a Mac M3 Max laptop:

$ ls -lh /tmp/state.sql*
total 164511960
-rw-r--r--  1 ivan4th  staff    78G Jun 28 00:06 state.sql
-rw-r--r--  1 ivan4th  staff    32K Jun 28 01:32 state.sql-shm
-rw-r--r--  1 ivan4th  staff     0B Jun 28 00:06 state.sql-wal

$ time sqlite3 ~/rmme/sm-data/state.sql vacuum
real    67m48.211s
user    57m13.518s
sys     4m41.778s

$ time sqlite3 ~/rmme/sm-data/state.sql "vacuum into '/tmp/state.sql'"
real    2m27.813s
user    0m43.039s
sys     0m56.265s

As it can be seen, VACUUM INTO is about 27 times faster on this particular machine.

Another problem described in #6069 is that on Linux systems, $TMP directory is utilized during normal VACUUM, and in case if it's a RAM disk the vacuuming process may run out of space.

Description

The process below applies to both state and local database.

When no vacuum is required, the migration process is unchanged, with migrations done in-place and each one wrapped in a transaction.

When vacuuming is needed according to db-vacuum-state, the following steps are performed. Marker files which are DBNAME_done are used to indicate that the temporary database is complete and synced and can be used to replace the source database. The temporary database is named DBNAME_migrate.

  1. The database is opened, so that it is locked and any concurrent go-spacemesh processes will fail to open it at the same time.
  2. A temporary copy of the database is created in the go-spacemesh data directory by means of VACUUM INTO. As it can be seen from above "benchmark", VACUUM INTO is relatively fast; also, this way we may hold the lock on the database by simply having it open during the copying. In addition to that, if the source database became bloated somehow due to unused (free) pages, the copy will require less disk space.
  3. The migrations are run on the temporary copy with PRAGMA journal_mode=OFF and PRAGMA synchronous=OFF.
  4. The temporary database is synced to disk after setting PRAGMA journal_mode=WAL and PRAGMA synchronous=NORMAL.
  5. The marker file is created and synced to disk.
  6. The temporary database is copied over the source database by means of VACUUM INTO.
  7. The marker file and the temporary database are deleted.

Additionally, before opening a database no matter what its schema version is, the following steps are performed:

  1. If the temporary database exists along with the "..._done" marker file, this means that a previous migration was interrupted when replacing the source database, or immediately preceding that; in this case, the temporary database is opened, copied over the source database by means of VACUUM INTO, after which the marker file and the temporary database are deleted.
  2. If the temporary database exists but there's no marker file, this means that previous migration has failed and in this case the temporary database is deleted.

In addition to the above, this PR adds logging of parts of migration SQL statements (first lines) when debug logging is set for stateDb in the logging section of the config. This makes it easier to see what parts of a migration are taking long time to execute.

Closes #6069

Test Plan

Tested on a mainnet node.

ivan4th added 30 commits June 1, 2024 05:38
Use consistent package/folder structure for local.sql and state.sql
databases.

When a new database is created, use schema script instead of running
all the migrations. Also, check that there's no schema drift by
diffing database schema against the schema script after running
migrations.
Also, add a migration that removes stray tables from quicksynced DBs
sql/database.go Outdated Show resolved Hide resolved
@ivan4th
Copy link
Contributor Author

ivan4th commented Aug 22, 2024

bors merge

@spacemesh-bors
Copy link

👎 Rejected by code reviews

@fasmat
Copy link
Member

fasmat commented Aug 22, 2024

bors merge

@spacemesh-bors
Copy link

👎 Rejected by code reviews

@fasmat
Copy link
Member

fasmat commented Aug 22, 2024

I think @acud needs to give approval for bors accepting the merge

Copy link
Contributor

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor nitpicks as usual :)

sql/database_test.go Outdated Show resolved Hide resolved
sql/schema.go Outdated
}

if _, err := db.Exec("PRAGMA synchronous=FULL", nil, nil); err != nil {
return fmt.Errorf("PRAGMA journal_mode=OFF: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error annotation here is wrong, maybe we can just instrument like pragma full: %w? otherwise we're using errors to be log messages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

sql/schema.go Outdated

// This should trigger file sync
if err := s.setVersion(db, v); err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instrumenting the error would be nice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

sql/database.go Outdated

if config.exclusive {
if err := db.startExclusive(); err != nil {
return nil, fmt.Errorf("error switching to exclusive mode: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the error could be reduced, like start exclusive: %w

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

sql/database.go Outdated
return nil, fmt.Errorf("PRAGMA journal_mode=OFF: %w", err)
}
if _, err := db.Exec("PRAGMA synchronous=OFF", nil, nil); err != nil {
return nil, fmt.Errorf("PRAGMA journal_mode=OFF: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

sql/database.go Outdated
// EXCLUSIVE mode, a shared lock is obtained and held. The first time the
// database is written, an exclusive lock is obtained and held.
if _, err := exec(conn, "PRAGMA locking_mode=EXCLUSIVE", nil, nil); err != nil {
db.pool.Put(conn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are copied around and can just go into a defer statement after L636

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

sql/database.go Outdated
func (db *sqliteDatabase) copyMigrateDB(config *conf) (finalDB *sqliteDatabase, err error) {
dbPath, migratedPath, err := dbMigrationPaths(config.uri)
if err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have error instrumentation here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -415,6 +720,13 @@ func (db *sqliteDatabase) WithTxImmediate(
// Note that Exec will block until database is closed or statement has finished.
// If application needs to control statement execution lifetime use one of the transaction.
func (db *sqliteDatabase) Exec(query string, encoder Encoder, decoder Decoder) (int, error) {
if err := db.runInterceptors(query); err != nil {
return 0, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have error instrumentation here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

sql/database.go Outdated
func handleIncompleteCopyMigration(config *conf) error {
dbPath, migratedPath, err := dbMigrationPaths(config.uri)
if err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have error instrumentation here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@ivan4th
Copy link
Contributor Author

ivan4th commented Aug 22, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Aug 22, 2024
-------
## Motivation

Currently, when vacuuming is required after database migration (according to the configured `db-vacuum-state` value), simple SQL `VACUUM` command is used. According to the [description](https://www.sqlite.org/lang_vacuum.html#how_vacuum_works) in the SQLite docs, `VACUUM` requires as much as twice the size of the original database file in free disk space, as first it makes a copy of the database in the temporary directory, and then it copies the vacuumed database back to the original database utilizing WAL file which grows to the size of the database itself before changes are committed.
According to the [remark in the SQLite source](https://github.com/sqlite/sqlite/blob/105c20648e1b05839fd0638686b95f2e3998abcb/src/vacuum.c#L97-L103)

```
** Only 1x temporary space and only 1x writes would be required if
** the copy of step (3) were replaced by deleting the original database
** and renaming the transient database as the original.  But that will
** not work if other processes are attached to the original database.
** And a power loss in between deleting the original and renaming the
** transient would cause the database file to appear to be deleted
** following reboot.
```

In case of go-spacemesh, we don't need concurrent access to the database from multiple processes, so we can optimize the vacuuming step to use only 1x space of the original database for vacuuming, given that `VACUUM INTO` makes a vacuumed copy of the database with no space requirements besides the size of the copy. Moreover, we can use the temporary copy of the database to run migrations faster as we can use `PRAGMA journal_mode=OFF` and `PRAGMA synchronous=OFF` for the temporary database safely, just dropping it if something goes wrong during migration.

On top of requiring 2x space, normal `VACUUM` (without `INTO`) has another problem: very slow last step in which the original database is replaced. Let's compare `VACUUM` with `VACUUM INTO` on a Mac M3 Max laptop:

```console
$ ls -lh /tmp/state.sql*
total 164511960
-rw-r--r--  1 ivan4th  staff    78G Jun 28 00:06 state.sql
-rw-r--r--  1 ivan4th  staff    32K Jun 28 01:32 state.sql-shm
-rw-r--r--  1 ivan4th  staff     0B Jun 28 00:06 state.sql-wal

$ time sqlite3 ~/rmme/sm-data/state.sql vacuum
real    67m48.211s
user    57m13.518s
sys     4m41.778s

$ time sqlite3 ~/rmme/sm-data/state.sql "vacuum into '/tmp/state.sql'"
real    2m27.813s
user    0m43.039s
sys     0m56.265s
```

As it can be seen, `VACUUM INTO` is about 27 times faster on this particular machine.

Another problem described in #6069 is that on Linux systems, `$TMP` directory is utilized during normal `VACUUM`, and in case if it's a RAM disk the vacuuming process may run out of space.
@spacemesh-bors
Copy link

Build failed:

  • systest-status

@ivan4th
Copy link
Contributor Author

ivan4th commented Aug 22, 2024

Systests timed out, trying again

@ivan4th
Copy link
Contributor Author

ivan4th commented Aug 22, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Aug 22, 2024
@ivan4th
Copy link
Contributor Author

ivan4th commented Aug 22, 2024

bors cancel

@spacemesh-bors
Copy link

try

Build failed:

@ivan4th
Copy link
Contributor Author

ivan4th commented Aug 22, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Aug 22, 2024
-------
## Motivation

Currently, when vacuuming is required after database migration (according to the configured `db-vacuum-state` value), simple SQL `VACUUM` command is used. According to the [description](https://www.sqlite.org/lang_vacuum.html#how_vacuum_works) in the SQLite docs, `VACUUM` requires as much as twice the size of the original database file in free disk space, as first it makes a copy of the database in the temporary directory, and then it copies the vacuumed database back to the original database utilizing WAL file which grows to the size of the database itself before changes are committed.
According to the [remark in the SQLite source](https://github.com/sqlite/sqlite/blob/105c20648e1b05839fd0638686b95f2e3998abcb/src/vacuum.c#L97-L103)

```
** Only 1x temporary space and only 1x writes would be required if
** the copy of step (3) were replaced by deleting the original database
** and renaming the transient database as the original.  But that will
** not work if other processes are attached to the original database.
** And a power loss in between deleting the original and renaming the
** transient would cause the database file to appear to be deleted
** following reboot.
```

In case of go-spacemesh, we don't need concurrent access to the database from multiple processes, so we can optimize the vacuuming step to use only 1x space of the original database for vacuuming, given that `VACUUM INTO` makes a vacuumed copy of the database with no space requirements besides the size of the copy. Moreover, we can use the temporary copy of the database to run migrations faster as we can use `PRAGMA journal_mode=OFF` and `PRAGMA synchronous=OFF` for the temporary database safely, just dropping it if something goes wrong during migration.

On top of requiring 2x space, normal `VACUUM` (without `INTO`) has another problem: very slow last step in which the original database is replaced. Let's compare `VACUUM` with `VACUUM INTO` on a Mac M3 Max laptop:

```console
$ ls -lh /tmp/state.sql*
total 164511960
-rw-r--r--  1 ivan4th  staff    78G Jun 28 00:06 state.sql
-rw-r--r--  1 ivan4th  staff    32K Jun 28 01:32 state.sql-shm
-rw-r--r--  1 ivan4th  staff     0B Jun 28 00:06 state.sql-wal

$ time sqlite3 ~/rmme/sm-data/state.sql vacuum
real    67m48.211s
user    57m13.518s
sys     4m41.778s

$ time sqlite3 ~/rmme/sm-data/state.sql "vacuum into '/tmp/state.sql'"
real    2m27.813s
user    0m43.039s
sys     0m56.265s
```

As it can be seen, `VACUUM INTO` is about 27 times faster on this particular machine.

Another problem described in #6069 is that on Linux systems, `$TMP` directory is utilized during normal `VACUUM`, and in case if it's a RAM disk the vacuuming process may run out of space.
@spacemesh-bors
Copy link

Pull request successfully merged into develop.

Build succeeded:

@spacemesh-bors spacemesh-bors bot changed the title sql: add copy-based migrations and VACUUM INTO [Merged by Bors] - sql: add copy-based migrations and VACUUM INTO Aug 22, 2024
@spacemesh-bors spacemesh-bors bot closed this Aug 22, 2024
@spacemesh-bors spacemesh-bors bot deleted the feature/vacuum-into branch August 22, 2024 18:12
spacemesh-bors bot pushed a commit that referenced this pull request Aug 22, 2024
## Motivation

We need to vacuum the DB after migrations, and with #6085 merged, we need to use VACUUM INTO based migration for recent DB schema updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make VACUUM to not use $TMP
3 participants