-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Michael-Scott queue and remove snapshot
mechanism
#122
Conversation
572bf7e
to
4026c70
Compare
snapshot
mechanism
2e44242
to
311adcf
Compare
311adcf
to
e9da697
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR !
I made a few minor comments (including one related to the use of Obj.magic
that seems safer here then in the Spsc_queue
optimization PR).
I really like the refactoring work to factorize the exception/option version of a function !
head : ('a, [ `Next ]) Node.t Atomic.t; | ||
tail : ('a, [ `Next ]) Node.t Atomic.t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to put ('a option, ['Next]) Node.t Atomic.t
as a type for both the head and the tail and with minimum changes below, the code works :
- without
Obj.magic
- with (roughly) a 10% decrease in performance (I did not try it a lot so it may fall into error margins).
Also here Obj.magic
is used for value that should never be read so this seems to be an acceptable use for it !
loop b | ||
in | ||
loop b | ||
type ('a, _) poly = Option : ('a, 'a option) poly | Value : ('a, 'a) poly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice way of refactoring to avoid copy/paste code ! I will remember it :)
|
||
let peek_opt { head; _ } = | ||
let rec pop_as : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this function be [@inline]
? (as well as the other _as
functions below). I would I guess you use a GADT instead of a function to avoid the issue inlining have with functions.
let[@inline] as_atomic (Next r : ('a, [ `Next ]) t) = r.next | ||
let[@inline] make value = Next { next = Nil; value } | ||
|
||
external as_atomic : ('a, [ `Next ]) t -> ('a, [ `Nil | `Next ]) t Atomic.t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next_as_atomic
seems like a better name for readability (and I think it is the name you usually use). Also, I won't be again a comment in the code about this line :)
513ec24
to
8b3d673
Compare
61bc326
to
ceb8e26
Compare
ceb8e26
to
d495cb1
Compare
d495cb1
to
f580b8e
Compare
f580b8e
to
bd8eae0
Compare
src_lockfree/michael_scott_queue.ml
Outdated
if not (Atomic.compare_and_set tail old_tail new_node) then | ||
let backoff = Backoff.once Backoff.default in | ||
fix_tail tail new_node backoff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I think that calling fix_tail
at this point is atually unnecessary. It should be safe to simplify the above to:
Atomic.compare_and_set tail old_tail new_node |> ignore
The only reason for the CAS to fail at this point should be that some other thread already pushed another value into the queue and this means we don't need to worry about updating the tail anymore.
This moves the node representation to a separate file so that it is easy to change only the node representation. This also changes the queue representation to be closer to the traditional Michael-Scott queue where the head points to a node whose value is no longer logically part of the queue. The "snapshot" mechanism cannot work with this representation, because the nodes must be mutated as they are logically removed from the queue to free the values to avoid space leaks. This commit also implements various micro-optimizations and tweaks to the implementation that improve performance. For example, backoffs are added after failed CAS operations, which improves performance in case of contention.
This effectively halves the length of the linked list of nodes and roughly doubles the thruput of the queue.
bd8eae0
to
47963a3
Compare
Thanks @polytypic for this work. It has been merged via PR #146 . |
This PR optimizes the Michael-Scott queue implementation. The main optimizations are avoiding false sharing and then optimization of the node representation, which effectively halves the length of the linked list of nodes by "inlining" the atomic into the node record and improves performance dramatically.
This PR also removes the snapshot mechanism. The snapshot mechanism is not used in any tests in this repository. Furthermore, the snapshot mechanism cannot be made to work (efficiently) as such with the traditional Michael-Scott queue representation where the head points to a node whose value has been taken from the queue. That is because, to avoid space leaks, the value in the head node must be overwritten and that simply breaks the snapshot mechanism. The previous implementation avoided this problem by using an inefficient node representation that effectively made the linked list of nodes twice as long and operations on the queue twice as slow.
Here is a run of the benchmarks on my M3 Max before the optimizations:
And here after the optimizations:
Here is a run of the benchmarks without the inlined node representation:
As you can see, on the
one domain
and1 nb adder, 1 nb taker
configurations the inlined node representation is roughly twice as fast! On the more contentious configurations, the additional cores likely both help to overcome some of the additional overhead (when not using the inlined representation) and the additional contention brings down the performance of inlined representation so the performance difference is not quite as dramatic.You can run
cp test/michael_scott_queue/michael_scott_queue_node.ml src_lockfree
orgit reset --hard HEAD~1
to use the non-inlined node representation: