-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should functions be synchronized among cluster nodes? #58
Comments
I don't think slots should be associated with functions. functions should be treated as a resource at node level. On each node addition, all functions existing in the cluster/standalone primary should be applied. Overall, I think we should treat functions, ACLs and config(s) as resource(s) which should be applied to all the nodes and build it in the cluster revamp which we have been talking for a while. |
IIRC it was decided that functions are similar to configuration and that it's the admin's responsibility to define functions on all nodes. I don't know if it's a good decision. The official admin tool is valkey-cli, so when managing a cluster (creating a cluster, adding nodes, etc.) perhaps valkey-cli should try to synchronous functions across nodes? That would at least match the current design decisions. It's the same problem with ACL and other config in a cluster: They are expected to be the same throughout the cluster but the cluster doesn't guarantee this. I'm not sure I believe in a complete rewrite of the cluster bus. Usually such projects are never finished. Can we try gossipping this information instead? |
Zhao can keep me honest, but this is the only time there was a real 3-2 decision amongst the old maintainers. (It was a more broad discussion on functions, but basically they said this is what Redis wanted). I generally want to implement a new system that is distributed configuration, that supports functions, ACLS, configs and allows modules to also store distributed configuration. This can be built ontop of the existing clusterbus extensions, but will be in its best form when built with the cluster v2. (I see Viktor is dubious, but Ping and I are highly motivated to make this happen now) |
I'm motivated, but I also realize we have a lot of things going on, and I want us to make releases more often. Can we make it a focus area for 9.0? (That means start prototyping now.) |
My vision, is that we finish the needed refactoring work to make cluster a module for valkey 8. Then it becomes easy to iterate and prototype in a separate repo the new clustering work. Then someone can "opt-in" to the beta of it while Valkey 8 is out and about, and we can finalize it for 9. |
I think we shall start using some of the reserved bits in the clusterMsg to indicate support for new cluster bus messages. From there, we can add new message types and turn the cluster bus more into a proper raft cluster, step by step. Easier to test, easier to get things right, if we do things in smaller steps. |
Timeline aside, this is actually a problem that can ONLY be properly solved by cluster V2 because it gives you a strongly consistent (meta)data service that manages cluster-wide metadata like cluster topology, node configs, ACLs, and Functions, etc Cluster V1 being eventually consistent (which is not 100% true by the way) and the coupling of data serving logic and topology management logic on the same node are the source of these oddities, IMO. I also think implementing a true raft cluster inside of the Valkey server is no less work than cluster V2 itself. The coupling shouldn't be there in the first place. I am in favor of moving the entire cluster out of the engine and into its own module and this needs to be the precursor to the real cluster V2 work. That said, I don't have a clear idea on the complexity involved so I don't know if it can happen in Valkey 8, yet. Also agreed that building cluster V2 into the engine is a non-starter. |
It's very good that we're having this discussion. We need to have all the information and aspects on the table to be able to make informed decisions.
Beautiful, but I don't think we shall ignore time. Just adding module APIs normally involves a lot of work just in terms of API design, testing and documentation. This makes adding APIs much more work than adding internal functions. Which module API functions do we need to add and what will these APIs look like? Will the cluster module keep all the cluster state in its own memory, intercept commands and sometimes return redirects for them?
So we need to have module APIs for all of those, including some event mechanism so if a user or any other module changes any of these, so the cluster module can get these changes. (Not a very small and easy job.)
Who agreed on this? If we're using a raft library, we avoid that coupling. A node can still be a cluster node without serving any data. Agreed that moving things to a module is a guarantee when it comes to separation of concerns, but it is also much more work. |
Can anyone suggest a use case where we would actually prefer to have different implementations of the same function on different nodes? If not, then we have the answer to the question raised by this issue. |
@daniel-house You're right, it's better that they are synchronized. The open question is how. Can you suggest how to implement it within the current cluster solution that makes sure they are synchronized on all nodes? |
OK, we agree that they should be synchronized. I can't say I understand the current cluster solution, but I can see that we are having problems synchronizing hostnames (#304), so I have serious doubts that we can synchronize functions at this time. It seems to me that a good solution would also solve the hostnames issue and the problems raised in #114 . If anyone can point me to useful reading about the current cluster solution, please do. I would appreciate any pointers to more information about cluster V2. I get the general impression that it will use Raft leader election, and would be perfect for this, hostnames, and the ultimate goals of #114 . If so, I would suggest punting all of these issues until after cluster V2 is implemented and assigning a very high priority to cluster V2. |
|
If we want to do something like the Redis cluster v2, then I believe we can have cluster V2 in a few years, say Valkey 10 or 11, maybe not even that. Rewriting something from scratch that can't be used until everything is in place is very likely to never be launched IMO. That's why I think Redis will fail. I hope we will choose a different, incremental approach. It's not that easy but I believe it's necessary. (We haven't really discussed it enough in the Valkey core team.) |
Thanks. I've looked at those in the past, and again today. They are indeed useful. (1) Are the docs up to date? (2) The docs seem too superficial for this purpose (proposing an extension to synchronize functions). Searching those docs for The information about a gossip protocol is helpful, but which one?
Ha, ha. Very funny. And, sadly, all I really expected to find. According to the cluster-spec:
Why didn't we add the hostname here instead of ping/pong? I'll read some more from those docs and the code. |
This link is something I really needed. Thankyou. So, you think cluster V2 will never happen. Is that the consensus? I will read it in detail. Even if it is far in the future it will help me understand today's pain points. Are there any places I should subscribe to so that I can follow any discussions of cluster V2? How about this incremental approach? Link in a high quality implementation of the Raft protocol. Define a state-machine to manage with the Raft protocol. Only put one thing in it, a list of node-IDs with hostnames. After that works, add function definitions to the state-machine. We should then be able to incrementally extend the state-machine to cover everything needed for #114 Alternatively, use CRDT registers (simple strings) to follow the same incremental path. |
That's not what I said. (I said that if we want to do it in a certain non-incremental way, then it will be difficult.)
No. |
I don't think we should reference back to the old cluster V2 design here in the Valkey forum. That work was started by Redis. We should take a fresh look at the problems again and start from scratch. The only connection is just the name "cluster V2". Please disregard the "Redis Cluster V2" design completely. |
@PingXie Works for me. Is there a list of problems/requirements? |
Few of the problems/requirements:
|
Thanks @hpatro. Here are the problems that I think we need to solve in the current cluster:
cluster topology is concerned with which nodes own which slots and primaryship. The current cluster implementation is not even eventually consistent by design because there are places where node epochs are bumped without consensus (trade-offs). This leads to increased complexity on the client side.
This particular issue provides the exact context on this pain point
Today, both the cluster bus and the client workload run on the same main thread. So a demanding client workload has the potential to starve the cluster bus and leads to unnecessary failover.
The V1 cluster is a mesh so the cluster gossip traffic is proportional to N^2, where N is the (data) nodes in the cluster. The practical limit of a V1 cluster is ~500 nodes. |
I don't see how you would perform a rolling upgrade to a cluster that meets these requirements. Is that acceptable? |
We haven't finalized the v1 to v2 |
We haven't finalized almost anything about v2. :) I think we need many different threads to discuss all these aspects. Shall we use a wiki for that? Or just many issues and sub-issues to discuss? I think we should really explore the possibilities for a running cluster to auto-upgrade to a new consensus mode. For example, the node with the highest epoch can decide to switch to a raft algorithm, when all known nodes are known to support it, and start sending out new messages on a new format. When such message reach the other nodes, they switch too. We can use the existing TCP connections. (We could use UDP but it's harder to get DTLS working (OpenSSL doesn't even have DTLS 1.3 yet) than to just use TCP and TLS. But that's a different discussion.) This sketch ↑ is just to reason that it's theoretically possible. Then we can weigh how important it is, compared to safety, simplicity, separation and other aspects. |
True but my statement is correct too :-)
Yeah, I will create a dedicated issue for cluster v2. Let's start from the pain points in today's cluster design/implementation. It is important to have a consensus on the problem definition first. |
This issue was first raised in Redis community redis/redis#11780, it is a major miss from Redis 7.0, but haven't got a solution. I believe we still need to discuss this issue in this new project, perhaps as a part of new cluster architecture.
The problem:
Let's say:
key1
, is migrated from node A to node B, operated by client1myfunc
is already loaded, but in node B it is notNow client2, that used to handle
key1
in node A, sendsFCALL myfunc 1 key1
to node A, which will be redirected to node B, will then get an unexpected error.Or if node B also has a function
myfunc
but it does different things, client2 may trigger unexpected results.Some thoughts:
1.Should functions be migrated together with slots?
Maybe not. This may overwrite the libraries already loaded in the nodes importing slots. But we may set a version to each library so that we can always choose to keep the latest one.
2.Can functions be broadcasted to all nodes?
May cause confict between nodes. And when adding new nodes we need to have a full synchronization.
The text was updated successfully, but these errors were encountered: