-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEW] Deterministic CLUSTER SHARDS Command #114
Comments
I'm generally against a lot of complexity like what we would need for filtering. So I would prefer we just add
I agree with this. |
I like the word Clearly you feel that the fields are not presented in a way that is easy for the client to filter. Perhaps |
Not sure if the format is the concern. Last time I discussed with Bar ( who works at AWS) was that it was simpler and faster to do it without parsing. Maybe there is more to it though. |
The way I see it, the problem with "deterministic" on the other hand is a bit subjective and leaves room for different interpretation, IMO. |
Sorting the nodes can't be a breaking change (like @hpatro did for CLUSTER SLOTS in #265). Regarding filtering arguments, I'm not so sure... For hashing the output without parsing, I think that's one of the reasons CLUSTER NODES returns a string rather than RESP-structured data. For structured RESP reply like this, a client needs to parse the RESP reply to know where the end of the reply is. You'd need a very special parser to find the length of the response without parsing it. So I think it'd fine that clients can get the reply, then remove some fields (which we can document), then compute a hash of the rest of it. |
I think the proposal is to have the server remove volatile fields, as opposed to clients doing the filtering. |
@PingXie I know that's the suggestion, but adding filter arguments to the command is a larger change than to document what clients need to do if they want to compute this kind of hash. |
I'd think a binary filtering would be acceptable, like "topology or all". There is a straight story here IMO. Fine-grained filtering, agreed. |
The problem seems to be
The rest of the initial post appears to be offering ways to make it easier for a client to get an accurate picture of the entire cluster's state. I think this problem may be theoretically unsolvable. If so, let us seek a sufficiently useful approximate picture. Suppose we had some report from each node that was deterministically comparable, as requested by Bar. (Note: there is nothing subjective about the word deterministic. As Bar wrote: nodes with identical views return identical outputs. However, the choice of deterministic output is subjective as it depends on how we will use it.) Would it be sufficient to find a quorum that is reporting exactly the same state? How big would the quorum have to be? Can we determine if the cluster's topology is in flux due to adding or removing nodes, slot-migration or fail-over, so that we might need to try again later? What is the ultimate use case? Why do we need an accurate picture of the entire cluster's state? Are we presenting a dashboard to the admins? Are we trying to update the client's slot-map for following moved keys? Are we building a replacement for Sentinel? Do we want to give advice for re-sharding? All of the above? Would the report be more useful for more use cases if the server did some filtering? Would it be more useful if it were trivial to parse using easily available libraries (e.g., JSON)? Should it be one or more new CLUSTER subcommands such as STATE, TOPOLOGY, or HEALTH? |
There was some discussion of cluster V2 in #58 . Could someone link information about cluster V2 into this issue? I don't know anything about it except what I gathered from issue 58, but it seems to me that cluster V2 might provide a better solution for the use case that motivates this discussion of CLUSTER SHARDS. |
cluster v1 is eventual consistent (if ever). whenever there is a topology change, such as adding replicas, scaling out/re-sharding, or performing rolling upgrades, there needs to be a way for the client to detect that the cluster topology quiesces before the client can safely update its view of the slot->node mapping. The signal that @barshaul is looking for here is that "majority of nodes agree on the cluster topology", as reported by
Request routing is performed by the client based on the key hash. The client computes the crc16 of the key in question to figure out the (logical) slot it belongs to. However, in order to figure out which node hosts the slot, the client relies on this slot->node mapping, which is obtained by one of the cluster topology query commands such as
Not the primary use case
It is a type of cluster topology change.
Orthogonal to sentinel. This is all about clusters.
Server filter vs client filtering boils down to implement-it-once (on the server side) vs implement-it-many-times (for every client interested)
That would be a breaking change. |
I would never suggest replacing what we have today with JSON because that would indeed be a breaking change. Would allowing the client to request JSON instead of the current format be better than having the client specify a filter? Please help me continue to pin down the requirements. At this point they appear to be to provide a way for the client to
In all of these cases I think we can do much better by providing a new CLUSTER subcommand. For example, suppose the goal is (2) determine that we have a quorum (the percentage might be a variable). We could make it much easier to wait for a quorum by associating a unique, non-decreasing ID with each new topology. Then the client only needs to poll until a quorum reports the same maximal ID. Note: I have an intense dislike for polling, and would look for a better solution. I can think of lots of possible solutions, but I'd rather wait until we have a solid agreement on the goal. |
JSON or not is not at the core of this problem. The output of the existing
Can you elaborate how this ID would be maintained? There is the epoch that has this monotonically increasing property but it is a per shard concept and it is not reported by
Yes and I would say no.2 and 3 are my understanding of the goal. that said, these are unfortunately band-aid requirements/solutions for cluster v1 IMO. I say we really need the strong consistency for the topology (#58 (comment)) |
I don't see a clear winner between the two, either. Now that we reverted the decision on the deprecation of We might need to eventually implement this "topology query" support in both commands, but for now, I would go with whichever one that is easier for clients to adopt. |
We should evaluate the use cases for each of these commands. I think we should continue to prefer CLUSTER SLOTS for clients, optionally with the new dense version for clients that support it. We already added caching for the CLUSTER SLOTS responses. AFAICT, no clients have started using CLUSTER SHARDS yet anyway. We can think of CLUSTER SHARDS as an admin command, as a structured variant of CLUSTER NODES. The #411 PR adds some complexity and has a negative performance impact. Maybe it doesn't need to be deterministic, since the caller can sort and filter the result, so I'm not sure if #411 is necessary. I don't think we should add caching of the CLUSTER SHARDS reply either. |
The problem/use-case that the feature addresses
Currently, some clients rely on a single-node perspective to obtain cluster topology, which may not accurately reflect the entire cluster's state. Ideally, clients should query multiple nodes and compare their topology views to determine the most consistent representation. To facilitate efficient comparison on the client side, if Valkey can offer a deterministic view where nodes with identical views return identical outputs, clients could easily hash and compare these outputs. ATM, certain variables in the CLUSTER SHARDS command, such as "replication-offset," may differ between nodes even when they share the same view, so the client must parse the view and filter out each view by itself. Furthermore, in the current output of CLUSTER SHARDS (or CLUSTER SLOTS), nodes with identical views may present the replicas of specific slots in differing orders. This necessitates clients to parse and sort the outputs for accurate view comparison.
Description of the feature
In order to achieve deterministic output, 2 changes are suggested:
Adding a filter option to the CLUSTER SHARDS command. This enhancement would enable clients to selectively filter out irrelevant fields for their specific use cases, such as non-deterministic fields, nodes in 'fail' state that irrelevant for the client functionality, or other fields that the client doesn't need. It would enable hash-based comparison of views without the need for parsing the output, while also potentially reducing the size of each node's output.
Sorting the replica order in the output.
The text was updated successfully, but these errors were encountered: