You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our current file-based storage backend is starting to reach its scalability limits. Icepeak will now occasionally serve 503s (Service Unavailable- "Dropped Icepeak status update, the queue was full") when there are too many incoming requests at the same time during load spikes.
Icepeak is working as expected here: whenever its internal queue is full it should serve a 503. However, the goal of a new storage backend would be to increase the throughput that we can achieve so that we can handle a higher request load before having to serve 503s.
There are a few requirements for a new storage backend:
must be able to store JSON objects
must allow us to subscribe and update arbitrary paths in the JSON object
should give us higher write throughput
should not make read throughput worse
should allow us to store larger datasets more efficiently
Additionally, it would be nice if we can keep the current "zero configuration" approach, where you don't have to set up any external services first, and don't have to configure any schemas first, but can simply start using Icepeak right away.
Storage options
There are various existing storage backends that we could use here. First, there are embedded Key/Value store like LevelDB or RocksDB. Second, there are embedded databases like SQLite or DuckDB. Third, it would also be possible to extend our current file-based system by splitting it up into multiple files (e.g. sharding on the top-level keys).
Note: I won't consider non-embeddable systems like Redis and Postgres here. The extra complexity of having to set them up and configure them runs counter to Icepeak's current simplicity and zero-config nature. In the unlikely case that we ever need to scale Icepeak further than what we can do with an embedded datastore, we can reconsider.
Each of these options have various pros and cons. I will discuss two of them below.
Option 1: RocksDB
RocksDB is a storage engine with key/value interface, where keys and values are arbitrary byte streams. It is a C++ library. It was developed at Facebook based on LevelDB and provides backwards-compatible support for LevelDB APIs.
Pros:
Simple K/V store where keys and values are arbitrary byte streams
Embeddable: we can easily ship it together with icepeak and it won't need any configuration
Supports Prefix seek which would allow us to optimize a JSON path get
Cons:
Encoding a JSON value into a K/V structure requires more implementation work and it's not obvious what the best way to do it is
There is a trade-off between storing a larger JSON object as a single value vs. breaking it up into individual k/v pairs for each leaf value. The former comes with faster reads when you want to read a whole value (at the cost of having to always write the whole value, when you change any part of it), while the latter comes with fine-grained reads and writes, at the cost of expensive reads and writes when you want to read/write a larger JSON object (since each leaf value has to read or written individually).
One nice thing of using a K/V store would be that we could abstract it with a simple type class for get/put/delete which could then have several instances for e.g. LevelDB and RocksDB. This would allow us to easily benchmark the different implementations.
Support for [JSON functions and operators](JSON Functions And Operators )
Cons:
Using indexes requires some knowledge of the schema. However, since Icepeak supports free-form JSON objects we likely won't be able to add any indexes automatically. This would still be a manual task/optimization that a user could take advantage of themselves.
We would need to generate a SQL schema that is flexible enough to store any JSON object. We could not take advantage of data-specific domain knowledge (as long as we want to keep the zero-config requirement).
Conclusion
In my view an embedded data store like RocksDB or SQLite makes the most sense for the second iteration of Icepeak's storage backend. My preference would probably be for SQLite since it gives us all of the power of SQL, both for modeling the data, and for querying it.
The text was updated successfully, but these errors were encountered:
Intro
Our current file-based storage backend is starting to reach its scalability limits. Icepeak will now occasionally serve 503s (Service Unavailable- "Dropped Icepeak status update, the queue was full") when there are too many incoming requests at the same time during load spikes.
Icepeak is working as expected here: whenever its internal queue is full it should serve a 503. However, the goal of a new storage backend would be to increase the throughput that we can achieve so that we can handle a higher request load before having to serve 503s.
There are a few requirements for a new storage backend:
Additionally, it would be nice if we can keep the current "zero configuration" approach, where you don't have to set up any external services first, and don't have to configure any schemas first, but can simply start using Icepeak right away.
Storage options
There are various existing storage backends that we could use here. First, there are embedded Key/Value store like LevelDB or RocksDB. Second, there are embedded databases like SQLite or DuckDB. Third, it would also be possible to extend our current file-based system by splitting it up into multiple files (e.g. sharding on the top-level keys).
Note: I won't consider non-embeddable systems like Redis and Postgres here. The extra complexity of having to set them up and configure them runs counter to Icepeak's current simplicity and zero-config nature. In the unlikely case that we ever need to scale Icepeak further than what we can do with an embedded datastore, we can reconsider.
Each of these options have various pros and cons. I will discuss two of them below.
Option 1: RocksDB
Pros:
Cons:
One nice thing of using a K/V store would be that we could abstract it with a simple type class for get/put/delete which could then have several instances for e.g. LevelDB and RocksDB. This would allow us to easily benchmark the different implementations.
Option 2: SQLite
Pros:
Cons:
Conclusion
In my view an embedded data store like RocksDB or SQLite makes the most sense for the second iteration of Icepeak's storage backend. My preference would probably be for SQLite since it gives us all of the power of SQL, both for modeling the data, and for querying it.
The text was updated successfully, but these errors were encountered: