Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resume investigation i using foundationdb as persistence service #2172

Open
masih opened this issue Jul 27, 2023 · 0 comments
Open

Resume investigation i using foundationdb as persistence service #2172

masih opened this issue Jul 27, 2023 · 0 comments

Comments

@masih
Copy link
Member

masih commented Jul 27, 2023

We recently experienced a suspected memory leak issue with our FoundationDB deployment in both development and production environments, running version 7.1.33. This issue manifested as all storage servers consistently consuming 100% of the memory after approximately two weeks of operation, leading to their frequent shutdown by the scheduler. This deployment utilized the RocksDB storage engine, which has been known to present memory leak problems in previous versions of FoundationDB. However, it remains uncertain if these issues persist in the version we deployed.

To address the problem, we rejuvenated the production cluster and upgraded to the latest pre-release version of FoundationDB available at the time, version 7.3.7. This process involved migrating the data to new storage servers, and we also planned to explore other, newer storage engines like Redwood.

Two weeks post-upgrade, we observed reduced memory usage on some but not all storage servers, and the data migration was still incomplete. These findings suggest that data migration can be a lengthy process in FoundationDB. Since data migration consumes read bandwidth, we're currently unsure how this would affect read performance in a live production setup, particularly when FoundationDB is in the read traffic path.

Due to the time-consuming nature of further investigation, we decided to temporarily shut down the deployments in both development and production environments. Despite these issues, FoundationDB demonstrated significant potential. With a replication factor of two, we achieved a multihash ingest rate as high as 250K per second, which is the highest we have recorded, compared to non-replicated current Pebble backends.

Once time permits, we intend to revisit FoundationDB for further testing and potential use.

For instructions on restarting the instances in the Kubernetes (K8S) setup, please refer to the shutdown PR:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant