Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log: list the number of pending cmds for each device #533

Closed
wants to merge 2 commits into from

Conversation

lxbsz
Copy link
Collaborator

@lxbsz lxbsz commented Feb 12, 2019

Currently we have hit the problem that tcmu-runner couldn't get any
response from the gluster backend for many times in the product, and
there is no any explict evidence from the logs.

This patch will add one timer for each cmd and will record the number
pending cmds for each tcmu device in 5/10/15/20/25/30+ seconds.

@lxbsz lxbsz force-pushed the timer_pending_cmds branch from cf50d18 to afc2d7b Compare February 12, 2019 08:18
@lxbsz
Copy link
Collaborator Author

lxbsz commented Feb 12, 2019

@mikechristie @pkalever Please review, thanks.

@lxbsz lxbsz force-pushed the timer_pending_cmds branch from afc2d7b to 11567c9 Compare February 12, 2019 08:58
@mikechristie
Copy link
Collaborator

@lxbsz just to make sure I am on the same page. This patch will be used instead of the other timer patch? Will you need the functionality in the other timer patch in the future?

@lxbsz
Copy link
Collaborator Author

lxbsz commented Feb 14, 2019

@lxbsz just to make sure I am on the same page. This patch will be used instead of the other timer patch? Will you need the functionality in the other timer patch in the future?

Currently in gluster side there still has some problems to add the timeout & clean the ringbuffer patch for each command here.

And this listing the pending cmds patch will be helpful before that patch is done, and this patch should also be useful for the other handlers, which do not have the timer in the backend stoarge sevice.

IMO, this two patches( the timeout & clean patch and the listing the pending cmds patch) should be coexist. The timeout & clean patch could help recovery the iscsi device or multipath device and keep it working well and the second one could give us some clue to troubleshooting.

Thanks,

@lxbsz
Copy link
Collaborator Author

lxbsz commented Mar 25, 2019

@mikechristie Hi Mike, what's the status of this PR ?
We are hitting the stuck issue many times, so we need this patch to get some useful logs, or we must keep reproducing it again and again to confirm it.

Thanks.

@mikechristie
Copy link
Collaborator

Sorry. I pinged Richard for you the other week. He is looking into the proper file heading for the timer code.

However, I was looking for possible existing timer libs that could be used. For example, I was thinking we could maybe use the uv timer lib from the azure PR

#540

Maybe move that timer code to the core libtcmu or tcmu-runner code and add a handler call back so we can do what you needed to do to log the cmd and then also allow handlers to cancel cmds like how azure wants to.

@lxbsz
Copy link
Collaborator Author

lxbsz commented Mar 26, 2019

Sorry. I pinged Richard for you the other week. He is looking into the proper file heading for the timer code.

However, I was looking for possible existing timer libs that could be used. For example, I was thinking we could maybe use the uv timer lib from the azure PR

#540

Maybe move that timer code to the core libtcmu or tcmu-runner code and add a handler call back so we can do what you needed to do to log the cmd and then also allow handlers to cancel cmds like how azure wants to.

Cool, that will be very nice if we can use the uv timer lib here. And it seems [1] could work for us by using a higher precision.

[1] http://docs.libuv.org/en/v1.x/timer.html

@lxbsz lxbsz force-pushed the timer_pending_cmds branch from 11567c9 to c060a1f Compare March 29, 2019 12:40
@lxbsz
Copy link
Collaborator Author

lxbsz commented Mar 29, 2019

@mikechristie
Switch to libuv now.

@cavery
For the azblk handler, will this be helpful for you ?

@pkalever Please review, thanks.

@lxbsz lxbsz force-pushed the timer_pending_cmds branch from c060a1f to eff7794 Compare April 8, 2019 09:59
lxbsz added 2 commits May 20, 2019 16:10
Currently we have hit the problem that tcmu-runner couldn't get any
response from the gluster backend for many times in the product, and
there is no any explict evidence from the logs.

This patch will add one timer for each cmd and will record all the
pending cmds for each tcmu device in 5/10/15/20/25/30+ seconds.

Signed-off-by: Xiubo Li <[email protected]>
@lxbsz lxbsz force-pushed the timer_pending_cmds branch from eff7794 to 120c23a Compare May 20, 2019 08:49
@lxbsz
Copy link
Collaborator Author

lxbsz commented May 20, 2019

Rebased to the latest and changed the STEP from 5 second to 30.

@mikechristie
Copy link
Collaborator

Closing this PR. Implemented here

#568

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants