-
-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Etcd as a data store backend #742
base: main
Are you sure you want to change the base?
Conversation
@@ -71,6 +71,8 @@ impl DistributedBlob { | |||
Store::MySQL(store) => store.get_blob(key, read_range).await, | |||
#[cfg(feature = "rocks")] | |||
Store::RocksDb(store) => store.get_blob(key, read_range).await, | |||
#[cfg(feature = "etcd")] | |||
Store::Etcd(_) => unimplemented!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can I declare that my backend will not be able to handle blobs and avoid such lines ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdecimus could you help me please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should implement etcd
as a lookup store, not a data store. Check the Redis implementation for guidelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I want to store data but not blobs:
- Data store
- Full-text store
- Lookup store
https://stalw.art/docs/get-started
Should I still implement blobs even if this is not recommended with etcd ?
I really want that this implementation checks all boxes on the https://stalw.art/docs/get-started page but not blobs. As I want Garage/S3 to store them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I want to store data but not blobs:
That is not possible with the current design. A data store has to offer also blob and lookup functionalities in order to be functional.
I don't have experience with etcd
but according to their website it was designed as a store for settings and metadata. For this reason it should be added as a lookup store rather than a data store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it store large data?
Not if you use an external blob store. If you are sure that etcd can store large amounts of data go ahead, but it has to implement the blob store methods even if they are not used.
Also, if this featured is merged it won't be distributed by default in order to keep the binary size small since etcd is not a popular choice for storing indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did more research about limits
https://etcd.io/docs/v3.5/dev-guide/limit/
Request size limit
etcd is designed to handle small key value pairs typical for metadata. Larger requests will work, but may increase the latency of other requests. By default, the maximum size of any request is 1.5 MiB. This limit is configurable through --max-request-bytes flag for etcd server.
Storage size limit
The default storage size limit is 2 GiB, configurable with --quota-backend-bytes flag. 8 GiB is a suggested maximum size for normal environments and etcd warns at startup if the configured value exceeds it.
I guess this should be more than enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my use case is finding the right softwares that scale across multiple data centers. Multi master multi read. The worst setup you could imagine.
Garage works perfectly out of the box for S3, great. The have a KV api but not what we need for this project.
PostgresSQL is terrible to configure for multiple datacenter. A nightmare. FoundationDB should have worked but did not and was nonsense to setup across multiple data centers.
Have you considered SeaweedFS (à la Haystack) & Cockroach (à la Spanner)? The former, raw, is a slight variant of a vanilla S3 I imagine, and the latter is Postgres-y clustering as you'd like. Combined I think they'd be an ideal fit for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the two references, I did not know about SeaweedFS. I did check it out.
I would say that SeaweedFS, Cockroach seem to do quite a lot and talk about pricing to enable features.
For S3 I use Garage, pure Rust no complex features. Works out of the box on multiple datacenters.
See: https://garagehq.deuxfleurs.fr/
Etcd was quite easy to setup, I guess for now this is only a matter of implementation: https://blog.williamdes.eu/Infrastructure/tutorials/install-a-distributed-etcd-cluster/
Over all, I tend to avoid and ban tools that have many more features that I do not use.
For example PostgresSQL. It has too much features, using it to do the equivalent of a key value system is a waste (IMO).
That's why tools like etcd or Garage that do only one task are so good.
Do you know some simple and good softwares that work great cross datacenter that could eventually be an alternative to some software I use ?
For now only the LDAP server is not distributed and I manually copy paste changes. If you know a good and easy distributed LDAP that could be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably getting way off topic wrt the pullreq, but --
Seaweed is Apache 2.0, and I am using https://github.com/oxidecomputer/cockroach so it'll also be Apache 2.0 on 2025-04-01, BSL for now which is acceptable in this capacity I think. The former does not need any additional features to fit in the role as an incredibly fast and massively scalable blob storage, and the latter also is ootb pretty feature complete for me. I'll be trying to implement this stuff anyway, so I too will be digging into the internals, and run atop illumos so will have zfs to lean on for snapshotting etc. (and probably need to adjust code, already had to hack it up to get it built.)
I've not yet looked at the code for authentication but I'm going to try and centralise as much as possible and just shove this also in a cockroach table, don't see any downside as of yet.
If you're keen on self-hosting with minimal effort, in terms of tooling that I think helps, I can recommend tailscale (+ headscale, if you want to avoid "tailscale the product"), which'll massively simplify any network topology and cross-site security -- hell, you might not even care about setting up TLS for your internal services since it's already end-to-end authenticated encryption on the wire. Then you just set up communication on the wireguard interfaces and whether a node is another-contininent-remote, or whether it's rack-local, you can just treat everything as though it were a homelab.
Good luck.
Thank you Alvin for the TiKV implementation Co-Authored-By: Alvin Peters <[email protected]>
Enjoy this new tutorial: https://blog.williamdes.eu/Infrastructure/tutorials/install-a-distributed-etcd-cluster/ |
@mdecimus does .write( the batch in the store mean that all operations MUST be in a transaction, or can I commit each operation one by one ? |
Ref: #634
See:
etcd-server
andetcd-client
on Debian/UbuntuGoals: provide a backend to store everything I need to store
Blob store(etcd is not made for large data)https://stalw.art/docs/get-started