You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During our investigation of why the size of our database was endlessly growing, even when no data was being written to cete, we figured out that there is an important design flaw in how BadgerDB and Raft interact.
The flaw is explained as follows:
The cete server is started
Data is sent to the server
Raft generates new snapshots at regular intervals
Badger writes new vlog files with the logs related to the incoming data
The server is shutdown
Here starts the issue:
The server is restarted
Raft restores the latest snapshot with all key-value pairs snapshotted up to this point
All pairs are replayed through a call to Set() which stores the data in Badger
Badger writes again all pairs coming from the snapshot, generating new logs which will be stored in the vlog files
The server is shutdown - go to 6 and repeat this over
TL;DR: every time the server is restarted all kv pairs are replayed in badger, causing a massive increase in the size of the database and eventually leading to a disk full.
Please note that while KV pairs are being replayed, the garbage collector is not useful. This also causes a massive consumption of resources (CPU, RAM, I/O) at startup time. The situation is even worse when a Kubernetes environment is taken into account where probes could kill the process if it takes to long to start - causing an exponential growth of the issue.
The three options that I could think of to solve this issue are the following:
Snapshots restore is disabled on start via config.NoSnapshotRestoreOnStart = true, but can be executed manually in order to recover from disasters (which is what we use since we are running on a single node)
Badger is cleaned completely at startup via db.DropAll() and the snapshot is used to re-populate the database (RAM, CPU, I/O intensive)
Snapshots use an index and only the records that have an index greater of what is available in badger is replayed (aka incremental snapshots)
The text was updated successfully, but these errors were encountered:
During our investigation of why the size of our database was endlessly growing, even when no data was being written to cete, we figured out that there is an important design flaw in how BadgerDB and Raft interact.
The flaw is explained as follows:
Here starts the issue:
Set()
which stores the data in BadgerTL;DR: every time the server is restarted all kv pairs are replayed in badger, causing a massive increase in the size of the database and eventually leading to a disk full.
Please note that while KV pairs are being replayed, the garbage collector is not useful. This also causes a massive consumption of resources (CPU, RAM, I/O) at startup time. The situation is even worse when a Kubernetes environment is taken into account where probes could kill the process if it takes to long to start - causing an exponential growth of the issue.
The three options that I could think of to solve this issue are the following:
config.NoSnapshotRestoreOnStart = true
, but can be executed manually in order to recover from disasters (which is what we use since we are running on a single node)db.DropAll()
and the snapshot is used to re-populate the database (RAM, CPU, I/O intensive)The text was updated successfully, but these errors were encountered: