-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import-mode: Avoid expiration and eviction during data syncing #1185
Conversation
52be593
to
b23b09b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #1185 +/- ##
============================================
- Coverage 70.65% 70.57% -0.09%
============================================
Files 114 115 +1
Lines 61799 63171 +1372
============================================
+ Hits 43664 44582 +918
- Misses 18135 18589 +454
|
good work! @valkey-io/core-team please take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. I think it's a little bit strange, but with some better documentation it's not too strange.
We need to have good support for migration tools, especially for users who want to migrate from proprietary software to open source software. 😄
Redis already have a solution.
I remember this discussion from the Redis times. Do you know what Redis solution is (the same REPLCONF PSEUDO-MASTER
?) or is it secret?
Any better idea?
Have you considered the idea to let RediShake act as a primary and let the target database replicate from RediShake? It can act as a replication proxy?
+-----------+ PSYNC +-----------+ PSYNC +--------+
| Source DB |<---------| RediShake |<---------| Valkey |
+-----------+ +-----------+ +--------+
interesting, and we have also considered such a method, but there are several issues. First, as a cloud provider, we do not allow an instance to become a replica for external instances. This is a very risky operation, especially since the primary connection uses a super user, which has excessive permissions. Additionally, the replica needs to establish an outbound connection, this is also not allowed. These restrictions, I believe, are not unique to cloud providers, many users' security control policies also prohibit such actions. Another point is that, in a cluster mode, the source and target instances for migration typically have different slot distributions, and redisShake can help with correctly routing the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, I think we should only allow write commands for pseudo-master
client when server is in pseudo-replica
mode.
It's secret. I used to push this PR to Redis Community(#13077) and they say they will PR their implementation. But no news after that.
What @soloestoy said is a major limitation for us. BTW, sometimes the destination already has some data, |
i did not read it carefully, internally we simply pause the expiration on both side i guess. |
Interesting. I think this PR can solve that problem if the server has no other read. If the server has other read, one possible solution is to disable all expiration in |
@lyq2333 Can you commit the changes to When you run make locally and you have python3 installed, make updates commands.def if there are any new or changed commands. |
Signed-off-by: lvyanqi.lyq <[email protected]>
…mand Signed-off-by: lvyanqi.lyq <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
0655bbc
to
7883e9f
Compare
@zuiderkwast Thanks. Sorry forgot it before. I also forgot to sign off and have to force push. No code is modified. |
Weekly core meeting. No specific comments other than we should review this offline and make progress. Directionally seems like a good idea. |
The scenario we encountered is that some users want to migrate data from Server A to Server B. Both A and B work as primary and B may have some data before the migration. We find expiration and eviction may lead to data inconsistency. So we came up with a simple method, but it still has a few places to discuss. We plan to introduce a new config(import-mode in this PR) to mark that this server is importing data. As for expiration, in import mode, active expiration in cron is prohibited and passive expiration in commands is limited depend on the client state. Clients marked as import source work like
As for eviction, should server disable eviction automatically to ensure data consistency in import mode?Or let users choose the maxmeory-policy, I guess no one will choose an option other than @valkey-io/core-team WDYT? Any suggestions would be greatly appreciated. |
The core issue is that for the target node of data migration that is active, we need to provide normal read and write services. However, we don't want to affect the data being migrated and can't identify which data is being migrated. Method 3 seems to be a relatively suitable option, but of course, it requires users to ensure they do not perform write operations on the data being migrated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
It would be nice to have a documentation page for data migration. We have https://valkey.io/topics/migration/ but I think we should explain how to move data using this feature as well.
Question: What happens if a cluster node is a primary in import mode and there is a failover so another node becomes the primary? The node becomes a replica. Do we automatically exit import-mode?
Co-authored-by: Madelyn Olson <[email protected]> Signed-off-by: Yanqi Lv <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
e007d8d
to
be13edc
Compare
@zuiderkwast Thanks. Nice question. You remind me. How about using |
Force push because I forgot to sign off again. Just update the commands.def. |
Signed-off-by: lvyanqi.lyq <[email protected]>
DONE. |
Signed-off-by: lvyanqi.lyq <[email protected]>
Signed-off-by: lvyanqi.lyq <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still did not take a very close look. Overall it seems aligned with the discussions.
I would like to ask: what is the migration strategy regarding ACLs? I mean replica is basically permitted to all actions, but in case of a migration the Source would have to adjust it's user permissions?
Yes, I suggest creating a new user for data migration. |
Signed-off-by: lvyanqi.lyq <[email protected]>
So just to circle back and clarify this: the client which migrates the data will NOT automatically granted with superuser permissions but wil have to count on it's authenticated use to support all needed permissions? |
That's right. It would be good to mention this in the docs. It wouldn't be very secure if a client could get superuser permissions just by sending the command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Looking forward to a doc PR for
In the migration guide, I hope we can have some guide about how to migrate data with import-mode, for example using redishake. |
@suxb201 hi, I think this pr is helpful for RedisShake. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my 2 cents: easy to introduce bug in the future where a dev can forget to check the import mode flag. It would be nice to introduce a macro for serverInImportMode
/ serverInDataServingMode
and use it everywhere instead of checking if it's primary and in/not-in import mode independently.
if (!strcasecmp(c->argv[2]->ptr, "on")) { | ||
c->flag.import_source = 1; | ||
addReply(c, shared.ok); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to client import_source
flag which were turned on explicitly when server import_mode
is turned off?
@@ -832,6 +832,80 @@ start_server {tags {"expire"}} { | |||
close_replication_stream $repl | |||
assert_equal [r debug set-active-expire 1] {OK} | |||
} {} {needs:debug} | |||
|
|||
test {Import mode should forbid active expiration} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worthwhile to run these tests in cluster-enabled mode as well.
* Notice: other clients, apart from the import source, should not access | ||
* the data imported by import source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be enforced by server instead of providing a warning in the code comment/config file? The client's not marked with import_source should receive LOADING
error if I'm understanding the use case correctly.
@@ -1651,7 +1651,7 @@ void beforeSleep(struct aeEventLoop *eventLoop) { | |||
|
|||
/* Run a fast expire cycle (the called function will return | |||
* ASAP if a fast cycle is not needed). */ | |||
if (server.active_expire_enabled && iAmPrimary()) activeExpireCycle(ACTIVE_EXPIRE_CYCLE_FAST); | |||
if (server.active_expire_enabled && !server.import_mode && iAmPrimary()) activeExpireCycle(ACTIVE_EXPIRE_CYCLE_FAST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we could have had similar conditional statement above at line 1058
!server.import_mode && iAmPrimary()
After valkey-io#1185, a client in import-source state can visit expired key both in read commands and write commands, this commit handle keyIsExpired function to handle import-source state as well, so KEYS can visit the expired key. This is not particularly important, but it ensures the definition, also doing some cleanup around the test, verified that the client can indeed visit the expired key. Signed-off-by: Binbin <[email protected]>
New config:
import-mode (yes|no)
New command:
CLIENT IMPORT-SOURCE (ON|OFF)
The config, when set to
yes
, disables eviction and deletion of expired keys, except for commands coming from a client which has marked itself as an import-source, the data source when importing data from another node, using the CLIENT IMPORT-SOURCE command.When we sync data from the source Valkey to the destination Valkey using some sync tools like redis-shake, the destination Valkey can perform expiration and eviction, which may cause data corruption. This problem has been discussed in redis/redis#9760 (reply in thread) and Redis already have a solution. But in Valkey we haven't fixed it by now.
E.g. we call
set key 1 ex 1
on the source server and transfer this command to the destination server. Then we callincr key
on the source server before the key expired, we will have a key on the source server with a value of 2. But when the command arrived at the destination server, the key may be expired and has deleted. So we will have a key on the destination server with a value of 1, which is inconsistent with the source server.In standalone mode, we can use writable replica to simplify the sync process. However, in cluster mode, we still need a sync tool to help us transfer the source data to the destination. The sync tool usually work as a normal client and the destination works as a primary which keep expiration and eviction.
In this PR, we add a new mode named 'import-mode'. In this mode, server stop expiration and eviction just like a replica. Notice that this mode exists only in sync state to avoid data inconsistency caused by expiration and eviction. Import mode only takes effect on the primary. Sync tools can mark their clients as an import source by
CLIENT IMPORT-SOURCE
, which work like a client from primary and can visit expired keys inlookupkey
.Notice: during the migration, other clients, apart from the import source, should not access the data imported by import source.