Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDMA: Fix dead loop when transfer large data (20KB) #1386

Merged
merged 4 commits into from
Dec 5, 2024

Conversation

fengquyoumo
Copy link
Contributor

@fengquyoumo fengquyoumo commented Dec 3, 2024

Determine the status of the Client when attempting to read data. If state=CLIENT_COMPLETED_IO, no read attempt is made and I/O operations on the Client are rescheduled by the main thread.

And 20474 Byte = PROTO_IOBUF_LEN(16KB) + SDS_HDR_VAR(16, s)(4090 Byte)

Fixes #1385

@zuiderkwast
Copy link
Contributor

@pizhenwei RDMA expert, please take a look.

@fengquyoumo Thanks for your contribution. You need to commit with git commit -s which adds the Signed-off-by header in the commit message. See CONTRIBUTING.md about why we need this. How to fix it, see the Details link at the "DCO" CI job.

Copy link

codecov bot commented Dec 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.83%. Comparing base (3df609e) to head (8517cde).
Report is 7 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1386      +/-   ##
============================================
+ Coverage     70.78%   70.83%   +0.05%     
============================================
  Files           118      118              
  Lines         63497    63549      +52     
============================================
+ Hits          44945    45014      +69     
+ Misses        18552    18535      -17     
Files with missing lines Coverage Δ
src/rdma.c 100.00% <ø> (ø)

... and 16 files with indirect coverage changes

@pizhenwei
Copy link
Contributor

Hi @fengquyoumo
I've got a cold, and will be back soon. please be a little more patient.

@fengquyoumo
Copy link
Contributor Author

@zuiderkwast :) Thanks for your help!

@pizhenwei
Copy link
Contributor

Hi @fengquyoumo
Thanks for your report and contribution!

There is a lack of multi-threads support from RDMA. In general, you are right on this issue, but it's not valkey connection style. I suggest:

  • implement postpone_update_state and update_state for RDMA
  • CLIENT_PENDING_IO is only used by networking.c and io_threads.c, don't break this. Use RDMA_CONN_FLAG_POSTPONE_UPDATE_STATE in rdma.c
  • handle pending data in rdma.c correctly

Please see tls.c as example:

    .postpone_update_state = postPoneUpdateSSLState,
    .update_state = updateSSLState,

…pone_update_state and update_stat

Signed-off-by: fengquyoumo <[email protected]>
@fengquyoumo
Copy link
Contributor Author

fengquyoumo commented Dec 5, 2024

@pizhenwei Thanks for your help!
I implemented postpone_update_state and update_state for RDMA to solve this issue.
Also, the first implementation had serious performance issues, but this one worked just fine.

Please review the code again when you are free, Thanks!

src/rdma.c Outdated
typedef struct rdma_connection {
connection c;
int flags;
Copy link
Contributor

@pizhenwei pizhenwei Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess

struct rdma_cm_id *cm_id;
int flags;
int last_errno;

would reduce 8 bytes.

Otherwise LGTM. I also test it locally, it works fine.

Hi @zuiderkwast , would you please take a look at this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) Thanks a lot for the detailed feedback!

Copy link
Contributor

@pizhenwei pizhenwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Signed-off-by: fengquyoumo <[email protected]>
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thanks both of you for the fix and the review.

@zuiderkwast zuiderkwast changed the title fix dead loop when transfer large data (size>20474 Byte) by rdma RDMA: Fix dead loop when transfer large data (20KB) Dec 5, 2024
@zuiderkwast zuiderkwast added the release-notes This issue should get a line item in the release notes label Dec 5, 2024
@zuiderkwast zuiderkwast merged commit 6b3e122 into valkey-io:unstable Dec 5, 2024
47 of 48 checks passed
vudiep411 pushed a commit to Autxmaton/valkey that referenced this pull request Dec 15, 2024
Determine the status of the Client when attempting to read data. If
state=CLIENT_COMPLETED_IO, no read attempt is made and I/O operations on
the Client are rescheduled by the main thread.

> And 20474 Byte = PROTO_IOBUF_LEN(16KB) + SDS_HDR_VAR(16, s)(4090 Byte)

Fixes valkey-io#1385

---------

Signed-off-by: fengquyoumo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes
Projects
None yet
3 participants