-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Journal thread fetch entry from queue cost too much time on acquireInterruptibly #2820
Comments
@merlimat @eolivelli @dlg99 Would you please help take a look? |
it looks like the Journal is not able to serve write requests at the requested pace, so we are blocked on appending to the Queue |
@hangc0276 I'd start with monitoring of the situation. Is the disk a bottleneck?
iostat, sar, etc should help with this. As a simple test you can try configuring 2-3GB ramdrive (if node memory permits), move the journal there + disable journalSyncData. Is the journal/queue still a bottleneck? Check if the disk configuration is optimized (deadline or noop scheduler for disk, ec, google for ideas to optimize IO for nvme ssd) After that you can try tuning some parameters, like increase journalBufferedWritesThreshold, increase journalPreAllocSizeMB, journalWriteBufferSizeKB, journalQueueSize, etc. Experiment with read/write buffers for entry log, maybe try flushEntrylogBytes of e.g.64-128MB (more frequent but faster flushes instead of) tune compaction intervals to make it run less frequently and with smaller data volumes to rewrite, if the disk space permits. I don't remember that well, but I think disabling journalRemoveFromPageCache might help if the node has a lot of memory. If you can confirm that the queue (and not the disk) is definitely a bottleneck, we have JCTools as a dependency already and can try swapping the queue implementation. See #1682 Hope this helps. |
BUG REPORT
Describe the bug
The bookie configuration list as follow.
The write throughput keeps on 500MB/s.
When read request increase and the read throughput reaches about 600MB/s, the journal queue will be full for a long time. I check the journal sync latency is low. I print the stack of the bookie, and found the BookieJournal-3181 thread keeps on acquireInterruptibly in ArrayBlockingQueue.poll operation.
And the bookie-io threads are keeps on ArrayBlockingQueue.put operation
Does anyone has any ideas for this situation?
The jstack list as follow
jstack.log
The text was updated successfully, but these errors were encountered: