Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wrong dbnum showed in cli after client reconnected #1382

Closed

Conversation

vitahlin
Copy link
Contributor

@vitahlin vitahlin commented Dec 3, 2024

When the server restarts while the CLI is connecting, the reconnection does not automatically select the previous db.
This may lead users to believe they are still in the previous db, in fact, they are in db0.

This PR will automatically reset the current dbnum and cliSelect() again when reconnecting.

Steps for reproducing bug

  1. start server and client:
./src/valkey-server
./src/valkey-cli
  1. set value in db0 and db5:
127.0.0.1:6379> set a 0
OK
127.0.0.1:6379> get a
"0"
127.0.0.1:6379> select 5
OK
127.0.0.1:6379[5]> set a 5
OK
127.0.0.1:6379[5]> get a
"5"
127.0.0.1:6379[5]> 
  1. Ctrl+C to stop server:
127.0.0.1:6379[5]> get a
Error: Connection reset by peer
not connected> 
  1. restart server then in valkey-cli:
not connected> get a
"0"
127.0.0.1:6379[5]> 

The valkey-cli run command at db0 but showed db5.

Solution

valkey-cli will re-select db5 after reconnected:

127.0.0.1:6379> get a
"0"
127.0.0.1:6379> select 5
OK
127.0.0.1:6379[5]> get a
"5"
127.0.0.1:6379[5]> get a
Error: Connection reset by peer
not connected> get a
"5"
127.0.0.1:6379[5]> 

Copy link

codecov bot commented Dec 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.77%. Comparing base (3df609e) to head (5cd9b3c).
Report is 167 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1382      +/-   ##
============================================
- Coverage     70.78%   70.77%   -0.02%     
============================================
  Files           118      118              
  Lines         63497    63499       +2     
============================================
- Hits          44945    44939       -6     
- Misses        18552    18560       +8     
Files with missing lines Coverage Δ
src/valkey-cli.c 55.54% <100.00%> (+0.08%) ⬆️

... and 12 files with indirect coverage changes

@@ -3055,6 +3055,9 @@ static int issueCommandRepeat(int argc, char **argv, long repeat) {
config.cluster_reissue_command = 0;
return REDIS_ERR;
}
/* Reset dbnum after reconnecting so we can re-select the previous db in cliSelect(). */
config.dbnum = 0;
cliSelect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconnect can happen in a few places in the code so I think it's better that cliSelect() is done inside cliConnect().

I see now that inside cliConnect() with the CC_FORCE flag, we already set config.dbnum = 0 and we call cliSelect() and a few other things, such as

        /* Do AUTH, select the right DB, switch to RESP3 if needed. */
        if (cliAuth(context, config.conn_info.user, config.conn_info.auth) != REDIS_OK) return REDIS_ERR;
        if (cliSelect() != REDIS_OK) return REDIS_ERR;
        if (cliSwitchProto() != REDIS_OK) return REDIS_ERR;

I wonder why that doesn't work as intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delayed response.

When the valkey-cli encounters a “Connection reset by peer” error, the failure is actually triggered in the following code block:

if (cliSendCommand(argc, argv, repeat) != REDIS_OK) {
    cliPrintContextError();
    redisFree(context);
    context = NULL;
    return REDIS_ERR;
}

The value of context is set to NULL.

After the server restarts and the client executes the GET a command, the client reconnects, but since context == NULL, the config.dbnum is not reset to db0:

static int cliConnect(int flags) {
    if (context == NULL || flags & CC_FORCE) {
        if (context != NULL) {
            redisFree(context);
            config.dbnum = 0;
            config.in_multi = 0;
            config.pubsub_mode = 0;
            cliRefreshPrompt();
        }

Later, when the code reaches the cliSelect(void) function, The values of config.conn_info.input_dbnum and config.dbnum are both 5, so it directly returns OK

static int cliSelect(void) {
    redisReply *reply;
    if (config.conn_info.input_dbnum == config.dbnum)
    {
        return REDIS_OK;
    }

Copy link
Contributor

@zuiderkwast zuiderkwast Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we have no context here. Thanks!

I think you're right. This case is when we had no connection for a while (prompt is no connection>) and later we have a connection again, right?

Should we reset something else too, such as config.in_multi? I tried your branch and if I do SELECT 1 and MULTI, I get the prompt 127.0.0.1:6379[1](TX)> even after reconnected. When reconnecting, the transaction is lost so I think in_multi should be set to zero.

In subscribed mode, there seems to be another problem. The CLI exits if the server disconnects in this case. (Maybe we can fix that later in another PR.)

@zuiderkwast
Copy link
Contributor

Obsoleted by #1694, which is essentially the same fix plus a test case. Thanks anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants