Add Etcd as a data store backend #742

williamdes · 2024-09-08T23:38:08Z

Ref: #634

See:

https://github.com/etcdv3/etcd-client/blob/master/examples/kv.rs
https://github.com/xline-kv/Xline (etcd compatible)
https://etcd.io/
etcd-server and etcd-client on Debian/Ubuntu

Goals: provide a backend to store everything I need to store

https://stalw.art/docs/get-started

CLAassistant · 2024-09-08T23:38:14Z

All committers have signed the CLA.

williamdes · 2024-09-10T21:22:40Z

@mdecimus can you backport be18ddf into main

STORE=etcd cargo test store_tests --no-default-features --features=etcd -- --nocapture
# or
STORE=sqlite cargo test store_tests --no-default-features --features=sqlite -- --nocapture

Before this patch tests can not disable everything but one store type

williamdes · 2024-09-10T21:32:38Z

crates/store/src/backend/composite/distributed_blob.rs

@@ -71,6 +71,8 @@ impl DistributedBlob {
                    Store::MySQL(store) => store.get_blob(key, read_range).await,
                    #[cfg(feature = "rocks")]
                    Store::RocksDb(store) => store.get_blob(key, read_range).await,
+                    #[cfg(feature = "etcd")]
+                    Store::Etcd(_) => unimplemented!(),


how can I declare that my backend will not be able to handle blobs and avoid such lines ?

@mdecimus could you help me please?

You should implement etcd as a lookup store, not a data store. Check the Redis implementation for guidelines.

Well, I want to store data but not blobs:

Data store

Full-text store

Lookup store

https://stalw.art/docs/get-started

Should I still implement blobs even if this is not recommended with etcd ?
I really want that this implementation checks all boxes on the https://stalw.art/docs/get-started page but not blobs. As I want Garage/S3 to store them.

Well, I want to store data but not blobs:

That is not possible with the current design. A data store has to offer also blob and lookup functionalities in order to be functional.

I don't have experience with etcd but according to their website it was designed as a store for settings and metadata. For this reason it should be added as a lookup store rather than a data store.

Does it store large data?

Not if you use an external blob store. If you are sure that etcd can store large amounts of data go ahead, but it has to implement the blob store methods even if they are not used.

Also, if this featured is merged it won't be distributed by default in order to keep the binary size small since etcd is not a popular choice for storing indexes.

I did more research about limits
https://etcd.io/docs/v3.5/dev-guide/limit/

Request size limit
etcd is designed to handle small key value pairs typical for metadata. Larger requests will work, but may increase the latency of other requests. By default, the maximum size of any request is 1.5 MiB. This limit is configurable through --max-request-bytes flag for etcd server.

Storage size limit
The default storage size limit is 2 GiB, configurable with --quota-backend-bytes flag. 8 GiB is a suggested maximum size for normal environments and etcd warns at startup if the configured value exceeds it.

I guess this should be more than enough

my use case is finding the right softwares that scale across multiple data centers. Multi master multi read. The worst setup you could imagine.
Garage works perfectly out of the box for S3, great. The have a KV api but not what we need for this project.
PostgresSQL is terrible to configure for multiple datacenter. A nightmare. FoundationDB should have worked but did not and was nonsense to setup across multiple data centers.

Have you considered SeaweedFS (à la Haystack) & Cockroach (à la Spanner)? The former, raw, is a slight variant of a vanilla S3 I imagine, and the latter is Postgres-y clustering as you'd like. Combined I think they'd be an ideal fit for this.

Thank you for the two references, I did not know about SeaweedFS. I did check it out.

I would say that SeaweedFS, Cockroach seem to do quite a lot and talk about pricing to enable features.

For S3 I use Garage, pure Rust no complex features. Works out of the box on multiple datacenters.
See: https://garagehq.deuxfleurs.fr/

Etcd was quite easy to setup, I guess for now this is only a matter of implementation: https://blog.williamdes.eu/Infrastructure/tutorials/install-a-distributed-etcd-cluster/

Over all, I tend to avoid and ban tools that have many more features that I do not use.
For example PostgresSQL. It has too much features, using it to do the equivalent of a key value system is a waste (IMO).

That's why tools like etcd or Garage that do only one task are so good.

Do you know some simple and good softwares that work great cross datacenter that could eventually be an alternative to some software I use ?

For now only the LDAP server is not distributed and I manually copy paste changes. If you know a good and easy distributed LDAP that could be great.

Probably getting way off topic wrt the pullreq, but --

Seaweed is Apache 2.0, and I am using https://github.com/oxidecomputer/cockroach so it'll also be Apache 2.0 on 2025-04-01, BSL for now which is acceptable in this capacity I think. The former does not need any additional features to fit in the role as an incredibly fast and massively scalable blob storage, and the latter also is ootb pretty feature complete for me. I'll be trying to implement this stuff anyway, so I too will be digging into the internals, and run atop illumos so will have zfs to lean on for snapshotting etc. (and probably need to adjust code, already had to hack it up to get it built.)

I've not yet looked at the code for authentication but I'm going to try and centralise as much as possible and just shove this also in a cockroach table, don't see any downside as of yet.

If you're keen on self-hosting with minimal effort, in terms of tooling that I think helps, I can recommend tailscale (+ headscale, if you want to avoid "tailscale the product"), which'll massively simplify any network topology and cross-site security -- hell, you might not even care about setting up TLS for your internal services since it's already end-to-end authenticated encryption on the wire. Then you just set up communication on the wireguard interfaces and whether a node is another-contininent-remote, or whether it's rack-local, you can just treat everything as though it were a homelab.

Good luck.

Thank you Alvin for the TiKV implementation Co-Authored-By: Alvin Peters <[email protected]>

williamdes · 2024-10-20T23:20:25Z

Enjoy this new tutorial: https://blog.williamdes.eu/Infrastructure/tutorials/install-a-distributed-etcd-cluster/
And let me know if anything is wrong

williamdes · 2024-10-27T19:35:05Z

@mdecimus does .write( the batch in the store mean that all operations MUST be in a transaction, or can I commit each operation one by one ?
The issue is that the entire batch can be larger than the max request size. And transactions are limited to one request.

williamdes mentioned this pull request Sep 8, 2024

[enhancement]: Support tikv as a redis replacement #634

Open

1 task

williamdes commented Sep 10, 2024

View reviewed changes

williamdes and others added 9 commits October 20, 2024 13:45

feat(etcd): Setup the scaffolding

41cf797

Thank you Alvin for the TiKV implementation Co-Authored-By: Alvin Peters <[email protected]>

fix: allow features to be disabled

f5f719a

feat(etcd): implement get_counter and get_value

6663d93

feat(etcd): Allow to disable the feature

532676f

feat(etcd): More scaffolding

06be26f

feat(etcd): Use subspacing

9e39aa8

feat(etcd): implement purge_store and delete_range

e2cc418

feat(etcd): implement iterate

1403790

chore(etcd): update the lock file

27a6b64

williamdes added 2 commits October 27, 2024 19:07

feat(etcd): implement write

9117070

feat(etcd): implement blobs

52158c5

williamdes force-pushed the etcd branch from b324a05 to 52158c5 Compare October 27, 2024 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Etcd as a data store backend #742

Add Etcd as a data store backend #742

williamdes commented Sep 8, 2024

CLAassistant commented Sep 8, 2024 •

edited

Loading

williamdes commented Sep 10, 2024

williamdes Sep 10, 2024

williamdes Sep 13, 2024

mdecimus Sep 13, 2024

williamdes Sep 13, 2024

mdecimus Sep 22, 2024

mdecimus Sep 24, 2024

williamdes Oct 13, 2024

dspearson Oct 23, 2024

williamdes Oct 23, 2024

dspearson Oct 24, 2024

williamdes commented Oct 20, 2024

williamdes commented Oct 27, 2024

Add Etcd as a data store backend #742

Are you sure you want to change the base?

Add Etcd as a data store backend #742

Conversation

williamdes commented Sep 8, 2024

CLAassistant commented Sep 8, 2024 • edited Loading

williamdes commented Sep 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williamdes commented Oct 20, 2024

williamdes commented Oct 27, 2024

CLAassistant commented Sep 8, 2024 •

edited

Loading