-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA-404658: Split heartbeat thread #17
base: master
Are you sure you want to change the base?
Conversation
BengangY
commented
Feb 14, 2025
- Split heartbeat thread
- Print thread ID at startup
Can you please share the test results w/ and w/o the splitting change? |
addc01e
to
be67a8c
Compare
} | ||
|
||
// start heartbeat sending thread | ||
ret = pthread_create(&hb_send_thread, xhad_pthread_attr, hb_send, NULL); | ||
if (ret) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do anything here to cleanup the receiving thread if we fail to start the sending thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need. Currently, if heartbeat threads fails to create, the return status of hb_initialize is non-0, then hb_cleanup_objects is called, but it doesn't do anything (code in it will not be compiled).
be67a8c
to
4f0163e
Compare
I created a crontab to run "cat /proc/net/udp | grep '02B6'" on each host every 5 minute to record UDP 694 packet drop. The test results are below:
|
I've recently added a CI to this repo, if you rebase your branch on top of latest master then we should start to see workflow runs here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've run some testing with this change, and in a situation where xha was failing to keep up with receiving heartbeats this does seem to resolve it.
There is a separate effort underway to identify why the receiving is getting 'bogged down', as it isn't a huge load so it should cope even with sending heartbeats occasionally as well, but this change definitely appears to be an improvement so I think is worth taking regardless of the outcome of that investigation.
4f0163e
to
a4bf3d5
Compare
Split heartbeat thread into sending heartbeat thread and receiving heartbeat thread. Signed-off-by: Bengang Yuan <[email protected]>
e165992
to
e57c590
Compare
Print all threads' ID in the xha.log at startup. Signed-off-by: Bengang Yuan <[email protected]>
e57c590
to
1770cff
Compare
I have rebased on master and resolved the CI. Now it has passed all the checks. |