Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Michael-Scott queue : safe - unsafe versions #146

Merged
merged 11 commits into from
Aug 19, 2024

Conversation

lyrm
Copy link
Collaborator

@lyrm lyrm commented Jul 11, 2024

This PR follows PR #122 work: the optimizations or changes made in this previous PR are separated into two to get a safe Michael-Scott queue and an "unsafe" one (with Obj.magic use).

TODO :

  • For now, the dscheck tests for the unsafe version get segmentation faults.
  • Documentation

@lyrm lyrm changed the title Ms queue safe unsafe Michael-Scott queue : safe - unsafe versions Jul 11, 2024
@lyrm lyrm marked this pull request as draft July 11, 2024 21:00
@lyrm
Copy link
Collaborator Author

lyrm commented Jul 23, 2024

@polytypic: The dscheck tests of the unsafe version result in a segmentation fault. I encountered the same issue with PR #122 after rebasing on main and adding the few necessary changes. Could you have a look?

@lyrm lyrm marked this pull request as ready for review July 25, 2024 15:03
@lyrm
Copy link
Collaborator Author

lyrm commented Jul 29, 2024

@polytypic : this PR is ready for review. Could you have a look ?

README.md Outdated
@@ -286,6 +304,16 @@ Because of the great properties of OCaml 5 memory model (see the
more details), not a lot can go wrong here. At least, data corruption or
segmentation fault won't happen like it can in other languages.

## Safe and unsafe data structures
Some data structures are available in two versions: a normal version and a more optimized but **unsafe** version. The **unsafe** version utilizes `Obj.magic` in a way that may be unsafe with `flambda2` optimizations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's format .md files with prettier. I recommend setting up your editor so that it does that automatically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should have been working this way, but apparently, my editor config was not working right ! Thanks for catching it.

@@ -33,15 +34,24 @@ is distributed under the

# Contents

- [Saturn — Parallelism-Safe Data Structures for Multicore OCaml](#saturn--parallelism-safe-data-structures-for-multicore-ocaml)
Copy link
Contributor

@polytypic polytypic Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an issue with this PR, but some time ago GitHub started to automatically produce a kind of contents or navigation drop down for .md files. So, I've been thinking of potentially starting to drop contents section from .md files (less stuff to maintain). On quick look ocaml.org/packages doesn't yet produce such a navigation helper, so I guess there is still value in having a contents section. So, just mentioning this here as a potential thing to discuss at some point.

@@ -75,7 +72,7 @@ let rec fix_tail tail new_tail =
&& not (Atomic.compare_and_set tail old_tail new_tail)
then fix_tail tail new_tail

let push { tail; _ } value =
let push_exn { tail; _ } value =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... The convention has been to use the _exn suffix on operations that raise an expected exception in some normal cases (such as when the queue is empty or full), but in this case no such exception is being used. Why should we use the _exn suffix in this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch ! I made the change on saturn_benchmarks to match the generic Queue signature. I should not have move it here !

@polytypic
Copy link
Contributor

Looking at the "safe" version, the push logic:

(* ... *)

let push { tail; _ } value =
  let rec find_tail_and_enq curr_end node =
    if not (Atomic.compare_and_set curr_end Nil node) then
      match Atomic.get curr_end with
      | Nil -> find_tail_and_enq curr_end node
      | Next (_, n) -> find_tail_and_enq n node
  in
  let new_tail = Atomic.make Nil in
  let newnode = Next (value, new_tail) in
  let old_tail = Atomic.get tail in
  find_tail_and_enq old_tail newnode;
  if not (Atomic.compare_and_set tail old_tail new_tail) then
    fix_tail tail new_tail

doesn't have some of the (entirely safe) optimizations from the "unsafe" version:

(* ... *)

let push { tail; _ } value =
  let (Next _ as new_node : (_, [ `Next ]) Node.t) = Node.make value in
  let old_tail = Atomic.get tail in
  let link = Node.as_atomic old_tail in
  if Atomic.compare_and_set link Nil new_node then
    Atomic.compare_and_set tail old_tail new_node |> ignore
  else
    let backoff = Backoff.once Backoff.default in
    push tail link new_node backoff

In particular, notice how the "unsafe" version does not call any recursive functions in the fast path:

let push { tail; _ } value =
  let (Next _ as new_node : (_, [ `Next ]) Node.t) = Node.make value in
  let old_tail = Atomic.get tail in
  let link = Node.as_atomic old_tail in
  (* first we try the fast path *)
  if Atomic.compare_and_set link Nil new_node then
    Atomic.compare_and_set tail old_tail new_node |> ignore
  else
    (* failed, so we do the slow path *)

It should be possible to incorporate this optimization into the "safe" version and it might speed it up a tiny bit.

Comment on lines +159 to +162
let module Safe = Dscheck_ms_queue (Michael_scott_queue) in
let safe_test = Safe.tests "safe" in
let module Unsafe = Dscheck_ms_queue (Michael_scott_queue_unsafe) in
let unsafe_test = Unsafe.tests "unsafe" in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a major issue, but this would probably be a bit less verbose with first-class modules. It could be like:

let safe_tests = tests "safe" (module Michael_scott_queue) in
let unsafe_tests = tests "unsafe" (module Michael_scott_queue_unsafe) in
...

The module language can be a bit cumbersome at times and in a case like this where we are only interested in a simple (non module) return value (a list of tests) first-class modules can be (IMHO) more convenient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I pushed the dscheck tests with a first class module and may be I did not do it right but it does not feel less verbose :) But it works.

Comment on lines +68 to +73
let module Safe = STM_ms_queue (Ms_queues.Michael_scott_queue) in
let exit_code = Safe.run () in
if exit_code <> 0 then exit exit_code
else
let module Unsafe = STM_ms_queue (Ms_queues.Michael_scott_queue_unsafe) in
Unsafe.run () |> exit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be a bit less verbose using first-class modules. We are not really interested in the Safe and Unsafe modules as such. We just want to run the tests and get an int as a result.

@lyrm
Copy link
Collaborator Author

lyrm commented Jul 31, 2024

About the missing safe optimizations: I added the fast path in the push function. However, the benchmarks suggest that adding backoff (either for in fix_tail or find_tail_and_end) does not improve performance (at least on my computer). Did I miss other safe optimizations ?

@polytypic
Copy link
Contributor

About the missing safe optimizations: I added the fast path in the push function. However, the benchmarks suggest that adding backoff (either for in fix_tail or find_tail_and_end) does not improve performance (at least on my computer). Did I miss other safe optimizations ?

Hmm... One could also leave the backoff out. It is possible that the backoff helps on other CPU (micro)architectures and we can add that later.

Copy link
Contributor

@polytypic polytypic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks good to me. 🎉

@lyrm lyrm merged commit 1e3024b into ocaml-multicore:main Aug 19, 2024
9 checks passed
@toots
Copy link

toots commented Aug 26, 2024

Hi there! While I understand and appreciate the hard work that went into optimizing the queue, what is the recommended alternative to grab the elements of the queue not that snapshot are gone? A pop loop followed by a push loop?

@polytypic
Copy link
Contributor

polytypic commented Aug 26, 2024

As a side note, the removal of the snapshot mechanism wasn't just about optimization.

Here is a puzzle about the snapshot mechanism. The code below uses the previous 0.4.1 version of this library, which is the only published version of this library that has the MS queue with the snapshot mechanism and no (additional) space leaks. Let's first define a module alias:

module Q = Saturn.Queue

Here is a function that converts a "snapshot" into a list:

let[@tail_mod_cons] rec to_list s =
  match Q.next s with
  | None -> []
  | Some (v, s) -> v :: to_list s

Question A: What is the return value of the below program?

let q = Q.create () in
let s = Q.snapshot q in
Q.push q 101;
ignore (Q.pop q);
to_list s

Question B: What is the return value of the below program?

let q = Q.create () in
Q.push q 101;
let s = Q.snapshot q in
ignore (Q.pop q);
Q.push q 42;
ignore (Q.pop q);
to_list s
Click to see answer and explanation

The first program returns [] and the second program returns [101; 42]. Note that there is no point in time when the queue in the second program contains both 101 and 42.

To me, a "snapshot" should be an immutable instant. But that is not how the mechanism worked. Instead, snapshot simply returned a pointer to the first non-dummy (if any) node of the queue and the next operation then dereferenced the (mutable) pointer to the next node.

This meant that if the queue happened to be empty, the snapshot would be empty. If the queue happened to be non-empty, the snapshot would contain all elements pushed to the queue at that point and potentially all the element to be pushed to the queue in the future.

This behaviour of the snapshot mechanism is, of course, not limited to simple sequential programs. Concurrent pushes to the queue are also potentially visible through the "snapshot".

@toots
Copy link

toots commented Aug 26, 2024

Good to know! Given the nature of the concurent queue, I was never expecting a strict control over the interleaving calls but this is clearly opposite to what it was advertising.

I think it'd be nice to have a way to get through the elements in the queue without removing them, basically a deeper peek API.

I might see if I can propose a PR for that.

@polytypic
Copy link
Contributor

polytypic commented Aug 27, 2024

Note that providing a snapshot operation can be easier for some other lock-free queue approaches.

The "lazy" queue, for example, makes it particularly easy.

The approach using two (essentially immutable) stacks also makes it relatively easy to take a snapshot of the queue. The Picos project has an implementation of a two-stack multi-producer, multi-consumer queue that should be relatively easy to extend with a snapshot operation.

Also, the queue provided with Kcas has snapshot functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants