execution.bs

<pre class='metadata'>
Title: `std::execution`
H1: <code>std::execution</code>
Shortname: D2300
Revision: 8
Status: D
Group: WG21
Audience: SG1, LEWG
Editor: Michał Dominiak, griwes@griwes.info
Editor: Georgy Evtushenko, evtushenko.georgy@gmail.com
Editor: Lewis Baker, lewissbaker@gmail.com
Editor: Lucian Radu Teodorescu, lucteo@lucteo.ro
Editor: Lee Howes, xrikcus@gmail.com
Editor: Kirk Shoop, kirk.shoop@gmail.com
Editor: Michael Garland, mgarland@nvidia.com
Editor: Eric Niebler, eric.niebler@gmail.com
Editor: Bryce Adelstein Lelbach, brycelelbach@gmail.com
URL: https://wg21.link/P2300
!Source: <a href="https://github.com/brycelelbach/wg21_p2300_execution/blob/main/execution.bs">GitHub</a>
Issue Tracking: GitHub https://github.com/brycelelbach/wg21_p2300_execution/issues
Metadata Order: Editor, This Version, Source, Issue Tracking, Project, Audience
Markup Shorthands: markdown yes
Toggle Diffs: no
No Abstract: yes
Default Biblio Display: inline
Default Highlight: c++
</pre>

<style>
pre {
  margin-top: 0px;
  margin-bottom: 0px;
}
table, th, tr, td {
  border: 2px solid black !important;
}
@media (prefers-color-scheme: dark) {
  table, th, tr, td {
    border: 2px solid white !important;
  }
}
.ins, ins, ins *, span.ins, span.ins * {
  background-color: rgb(200, 250, 200);
  color: rgb(0, 136, 0);
  text-decoration: none;
}
.del, del, del *, span.del, span.del * {
  background-color: rgb(250, 200, 200);
  color: rgb(255, 0, 0);
  text-decoration: line-through;
  text-decoration-color: rgb(255, 0, 0);
}
math, span.math {
  font-family: serif;
  font-style: italic;
}
ul {
  list-style-type: "— ";
}
blockquote {
  counter-reset: paragraph;
}
div.numbered, div.newnumbered {
  margin-left: 2em;
  margin-top: 1em;
  margin-bottom: 1em;
}
div.numbered:before, div.newnumbered:before {
  position: absolute;
  margin-left: -2em;
  display-style: block;
}
div.numbered:before {
  content: counter(paragraph);
  counter-increment: paragraph;
}
div.newnumbered:before {
  content: "�";
}
div.numbered ul, div.newnumbered ul {
  counter-reset: list_item;
}
div.numbered li, div.newnumbered li {
  margin-left: 3em;
}
div.numbered li:before, div.newnumbered li:before {
  position: absolute;
  margin-left: -4.8em;
  display-style: block;
}
div.numbered li:before {
  content: "(" counter(paragraph) "." counter(list_item) ")";
  counter-increment: list_item;
}
div.newnumbered li:before {
  content: "(�." counter(list_item) ")";
  counter-increment: list_item;
}
div.ed-note {
  color: blue !important;
  margin-left: 2em;
}
div.ed-note:before {
  content: "[Editorial note: ";
  font-style: italic;
}
div.ed-note:after {
  content: " -- end note]";
  font-style: italic;
}
div.ed-note * {
  color: blue !important;
  margin-top: 0em;
  margin-bottom: 0em;
}
div.ed-note blockquote {
  margin-left: 2em;
}
div.wg21note:before, span.wg21note:before {
  content: "[Note: ";
  font-style: italic;
}
div.wg21note:after, span.wg21note:after {
  content: " -- end note]";
  font-style: italic;
}
h5 {
  font-style: normal; /* turn off italics of h5 headers */
}
</style>

# Introduction # {#intro}

This paper proposes a self-contained design for a Standard C++ framework for managing asynchronous execution on generic execution resources. It is based on the ideas in [[P0443R14]] and its companion papers.

## Motivation ## {#motivation}

Today, C++ software is increasingly asynchronous and parallel, a trend that is likely to only continue going forward.
Asynchrony and parallelism appears everywhere, from processor hardware interfaces, to networking, to file I/O, to GUIs, to accelerators.
Every C++ domain and every platform needs to deal with asynchrony and parallelism, from scientific computing to video games to financial services, from the smallest mobile devices to your laptop to GPUs in the world's fastest supercomputer.

While the C++ Standard Library has a rich set of concurrency primitives (`std::atomic`, `std::mutex`, `std::counting_semaphore`, etc) and lower level building blocks (`std::thread`, etc), we lack a Standard vocabulary and framework for asynchrony and parallelism that C++ programmers desperately need.
`std::async`/`std::future`/`std::promise`, C++11's intended exposure for asynchrony, is inefficient, hard to use correctly, and severely lacking in genericity, making it unusable in many contexts.
We introduced parallel algorithms to the C++ Standard Library in C++17, and while they are an excellent start, they are all inherently synchronous and not composable.

This paper proposes a Standard C++ model for asynchrony, based around three key abstractions: schedulers, senders, and receivers, and a set of customizable asynchronous algorithms.

## Priorities ## {#priorities}

* Be composable and generic, allowing users to write code that can be used with many different types of execution resources.
* Encapsulate common asynchronous patterns in customizable and reusable algorithms, so users don't have to invent things themselves.
* Make it easy to be correct by construction.
* Support the diversity of execution resources and execution agents, because not all execution agents are created equal; some are less capable than others, but not less important.
* Allow everything to be customized by an execution resource, including transfer to other execution resources, but don't require that execution resources customize everything.
* Care about all reasonable use cases, domains and platforms.
* Errors must be propagated, but error handling must not present a burden.
* Support cancellation, which is not an error.
* Have clear and concise answers for where things execute.
* Be able to manage and terminate the lifetimes of objects asynchronously.

## Examples: End User ## {#example-end-user}

In this section we demonstrate the end-user experience of asynchronous programming directly with the sender algorithms presented in this paper. See [[#design-sender-factories]], [[#design-sender-adaptors]], and [[#design-sender-consumers]] for short explanations of the algorithms used in these code examples.

### Hello world ### {#example-hello-world}

```c++
using namespace std::execution;

scheduler auto sch = thread_pool.scheduler();                                 // 1

sender auto begin = schedule(sch);                                            // 2
sender auto hi = then(begin, []{                                              // 3
    std::cout << "Hello world! Have an int.";                                 // 3
    return 13;                                                                // 3
});                                                                           // 3
sender auto add_42 = then(hi, [](int arg) { return arg + 42; });              // 4

auto [i] = this_thread::sync_wait(add_42).value();                            // 5
```

This example demonstrates the basics of schedulers, senders, and receivers:

1. First we need to get a scheduler from somewhere, such as a thread pool. A scheduler is a lightweight handle to an execution resource.
2. To start a chain of work on a scheduler, we call [[#design-sender-factory-schedule]], which returns a sender that completes on the scheduler. A sender describes asynchronous work and sends a signal (value, error, or stopped) to some recipient(s) when that work completes.
3. We use sender algorithms to produce senders and compose asynchronous work. [[#design-sender-adaptor-then]] is a sender adaptor that takes an input sender and a `std::invocable`, and calls the `std::invocable` on the signal sent by the input sender. The sender returned by `then` sends the result of that invocation. In this case, the input sender came from `schedule`, so its `void`, meaning it won't send us a value, so our `std::invocable` takes no parameters. But we return an `int`, which will be sent to the next recipient.
4. Now, we add another operation to the chain, again using [[#design-sender-adaptor-then]]. This time, we get sent a value - the `int` from the previous step. We add `42` to it, and then return the result.
5. Finally, we're ready to submit the entire asynchronous pipeline and wait for its completion. Everything up until this point has been completely asynchronous; the work may not have even started yet. To ensure the work has started and then block pending its completion, we use [[#design-sender-consumer-sync_wait]], which will either return a `std::optional<std::tuple<...>>` with the value sent by the last sender, or an empty `std::optional` if the last sender sent a stopped signal, or it throws an exception if the last sender sent an error.

### Asynchronous inclusive scan ### {#example-async-inclusive-scan}

```c++
using namespace std::execution;

sender auto async_inclusive_scan(scheduler auto sch,                          // 2
                                 std::span<const double> input,               // 1
                                 std::span<double> output,                    // 1
                                 double init,                                 // 1
                                 std::size_t tile_count)                      // 3
{
  std::size_t const tile_size = (input.size() + tile_count - 1) / tile_count;

  std::vector<double> partials(tile_count + 1);                               // 4
  partials[0] = init;                                                         // 4

  return just(std::move(partials))                                            // 5
       | transfer(sch)
       | bulk(tile_count,                                                     // 6
           [ = ](std::size_t i, std::vector<double>& partials) {              // 7
             auto start = i * tile_size;                                      // 8
             auto end   = std::min(input.size(), (i + 1) * tile_size);        // 8
             partials[i + 1] = *--std::inclusive_scan(begin(input) + start,   // 9
                                                      begin(input) + end,     // 9
                                                      begin(output) + start); // 9
           })                                                                 // 10
       | then(                                                                // 11
           [](std::vector<double>&& partials) {
             std::inclusive_scan(begin(partials), end(partials),              // 12
                                 begin(partials));                            // 12
             return std::move(partials);                                      // 13
           })
       | bulk(tile_count,                                                     // 14
           [ = ](std::size_t i, std::vector<double>& partials) {              // 14
             auto start = i * tile_size;                                      // 14
             auto end   = std::min(input.size(), (i + 1) * tile_size);        // 14
             std::for_each(begin(output) + start, begin(output) + end,        // 14
               [&] (double& e) { e = partials[i] + e; }                       // 14
             );
           })
       | then(                                                                // 15
           [ = ](std::vector<double>&& partials) {                            // 15
             return output;                                                   // 15
           });                                                                // 15
}
```

This example builds an asynchronous computation of an inclusive scan:

1. It scans a sequence of `double`s (represented as the `std::span<const double>` `input`) and stores the result in another sequence of `double`s (represented as `std::span<double>` `output`).
2. It takes a scheduler, which specifies what execution resource the scan should be launched on.
3. It also takes a `tile_count` parameter that controls the number of execution agents that will be spawned.
4. First we need to allocate temporary storage needed for the algorithm, which we'll do with a `std::vector`, `partials`. We need one `double` of temporary storage for each execution agent we create.
5. Next we'll create our initial sender with [[#design-sender-factory-just]] and [[#design-sender-adaptor-transfer]]. These senders will send the temporary storage, which we've moved into the sender. The sender has a completion scheduler of `sch`, which means the next item in the chain will use `sch`.
6. Senders and sender adaptors support composition via `operator|`, similar to C++ ranges. We'll use `operator|` to attach the next piece of work, which will spawn `tile_count` execution agents using [[#design-sender-adaptor-bulk]] (see [[#design-pipeable]] for details).
7. Each agent will call a `std::invocable`, passing it two arguments. The first is the agent's index (`i`) in the [[#design-sender-adaptor-bulk]] operation, in this case a unique integer in `[0, tile_count)`. The second argument is what the input sender sent - the temporary storage.
8. We start by computing the start and end of the range of input and output elements that this agent is responsible for, based on our agent index.
9. Then we do a sequential `std::inclusive_scan` over our elements. We store the scan result for our last element, which is the sum of all of our elements, in our temporary storage `partials`.
10. After all computation in that initial [[#design-sender-adaptor-bulk]] pass has completed, every one of the spawned execution agents will have written the sum of its elements into its slot in `partials`.
11. Now we need to scan all of the values in `partials`. We'll do that with a single execution agent which will execute after the [[#design-sender-adaptor-bulk]] completes. We create that execution agent with [[#design-sender-adaptor-then]].
12. [[#design-sender-adaptor-then]] takes an input sender and an `std::invocable` and calls the `std::invocable` with the value sent by the input sender. Inside our `std::invocable`, we call `std::inclusive_scan` on `partials`, which the input senders will send to us.
13. Then we return `partials`, which the next phase will need.
14. Finally we do another [[#design-sender-adaptor-bulk]] of the same shape as before. In this [[#design-sender-adaptor-bulk]], we will use the scanned values in `partials` to integrate the sums from other tiles into our elements, completing the inclusive scan.
15. `async_inclusive_scan` returns a sender that sends the output `std::span<double>`. A consumer of the algorithm can chain additional work that uses the scan result. At the point at which `async_inclusive_scan` returns, the computation may not have completed. In fact, it may not have even started.

### Asynchronous dynamically-sized read ### {#example-async-dynamically-sized-read}

```c++
using namespace std::execution;

sender_of<std::size_t> auto async_read(                                       // 1
    sender_of<std::span<std::byte>> auto buffer,                              // 1
    auto handle);                                                             // 1

struct dynamic_buffer {                                                       // 3
  std::unique_ptr<std::byte[]> data;                                          // 3
  std::size_t size;                                                           // 3
};                                                                            // 3

sender_of<dynamic_buffer> auto async_read_array(auto handle) {                // 2
  return just(dynamic_buffer{})                                               // 4
       | let_value([handle] (dynamic_buffer& buf) {                           // 5
           return just(std::as_writeable_bytes(std::span(&buf.size, 1))       // 6
                | async_read(handle)                                          // 7
                | then(                                                       // 8
                    [&buf] (std::size_t bytes_read) {                         // 9
                      assert(bytes_read == sizeof(buf.size));                 // 10
                      buf.data = std::make_unique<std::byte[]>(buf.size);     // 11
                      return std::span(buf.data.get(), buf.size);             // 12
                    })
                | async_read(handle)                                          // 13
                | then(
                    [&buf] (std::size_t bytes_read) {
                      assert(bytes_read == buf.size);                         // 14
                      return std::move(buf);                                  // 15
                    });
       });
}
```

This example demonstrates a common asynchronous I/O pattern - reading a payload of a dynamic size by first reading the size, then reading the number of bytes specified by the size:

1. `async_read` is a pipeable sender adaptor. It's a customization point object, but this is what it's call signature looks like. It takes a sender parameter which must send an input buffer in the form of a `std::span<std::byte>`, and a handle to an I/O context. It will asynchronously read into the input buffer, up to the size of the `std::span`. It returns a sender which will send the number of bytes read once the read completes.
2. `async_read_array` takes an I/O handle and reads a size from it, and then a buffer of that many bytes. It returns a sender that sends a `dynamic_buffer` object that owns the data that was sent.
3. `dynamic_buffer` is an aggregate struct that contains a `std::unique_ptr<std::byte[]>` and a size.
4. The first thing we do inside of `async_read_array` is create a sender that will send a new, empty `dynamic_array` object using [[#design-sender-factory-just]]. We can attach more work to the pipeline using `operator|` composition (see [[#design-pipeable]] for details).
5. We need the lifetime of this `dynamic_array` object to last for the entire pipeline. So, we use `let_value`, which takes an input sender and a `std::invocable` that must return a sender itself (see [[#design-sender-adaptor-let]] for details). `let_value` sends the value from the input sender to the `std::invocable`. Critically, the lifetime of the sent object will last until the sender returned by the `std::invocable` completes.
6. Inside of the `let_value` `std::invocable`, we have the rest of our logic. First, we want to initiate an `async_read` of the buffer size. To do that, we need to send a `std::span` pointing to `buf.size`. We can do that with [[#design-sender-factory-just]].
7. We chain the `async_read` onto the [[#design-sender-factory-just]] sender with `operator|`.
8. Next, we pipe a `std::invocable` that will be invoked after the `async_read` completes using [[#design-sender-adaptor-then]].
9. That `std::invocable` gets sent the number of bytes read.
10. We need to check that the number of bytes read is what we expected.
11. Now that we have read the size of the data, we can allocate storage for it.
12. We return a `std::span<std::byte>` to the storage for the data from the `std::invocable`. This will be sent to the next recipient in the pipeline.
13. And that recipient will be another `async_read`, which will read the data.
14. Once the data has been read, in another [[#design-sender-adaptor-then]], we confirm that we read the right number of bytes.
15. Finally, we move out of and return our `dynamic_buffer` object. It will get sent by the sender returned by `async_read_array`. We can attach more things to that sender to use the data in the buffer.

## Asynchronous Windows socket `recv` ## {#example-async-windows-socket-recv}

To get a better feel for how this interface might be used by low-level operations see this example implementation
of a cancellable `async_recv()` operation for a Windows Socket.

```c++
struct operation_base : WSAOVERALAPPED {
    using completion_fn = void(operation_base* op, DWORD bytesTransferred, int errorCode) noexcept;

    // Assume IOCP event loop will call this when this OVERLAPPED structure is dequeued.
    completion_fn* completed;
};

template<typename Receiver>
struct recv_op : operation_base {
    recv_op(SOCKET s, void* data, size_t len, Receiver r)
    : receiver(std::move(r))
    , sock(s) {
        this->Internal = 0;
        this->InternalHigh = 0;
        this->Offset = 0;
        this->OffsetHigh = 0;
        this->hEvent = NULL;
        this->completed = &recv_op::on_complete;
        buffer.len = len;
        buffer.buf = static_cast<CHAR*>(data);
    }

    friend void tag_invoke(std::execution::start_t, recv_op& self) noexcept {
        // Avoid even calling WSARecv() if operation already cancelled
        auto st = std::execution::get_stop_token(
          std::get_env(self.receiver));
        if (st.stop_requested()) {
            std::execution::set_stopped(std::move(self.receiver));
            return;
        }

        // Store and cache result here in case it changes during execution
        const bool stopPossible = st.stop_possible();
        if (!stopPossible) {
            self.ready.store(true, std::memory_order_relaxed);
        }

        // Launch the operation
        DWORD bytesTransferred = 0;
        DWORD flags = 0;
        int result = WSARecv(self.sock, &self.buffer, 1, &bytesTransferred, &flags,
                             static_cast<WSAOVERLAPPED*>(&self), NULL);
        if (result == SOCKET_ERROR) {
            int errorCode = WSAGetLastError();
            if (errorCode != WSA_IO_PENDING)) {
                if (errorCode == WSA_OPERATION_ABORTED) {
                    std::execution::set_stopped(std::move(self.receiver));
                } else {
                    std::execution::set_error(std::move(self.receiver),
                                              std::error_code(errorCode, std::system_category()));
                }
                return;
            }
        } else {
            // Completed synchronously (assuming FILE_SKIP_COMPLETION_PORT_ON_SUCCESS has been set)
            execution::set_value(std::move(self.receiver), bytesTransferred);
            return;
        }

        // If we get here then operation has launched successfully and will complete asynchronously.
        // May be completing concurrently on another thread already.
        if (stopPossible) {
            // Register the stop callback
            self.stopCallback.emplace(std::move(st), cancel_cb{self});

            // Mark as 'completed'
            if (self.ready.load(std::memory_order_acquire) ||
                self.ready.exchange(true, std::memory_order_acq_rel)) {
                // Already completed on another thread
                self.stopCallback.reset();

                BOOL ok = WSAGetOverlappedResult(self.sock, (WSAOVERLAPPED*)&self, &bytesTransferred, FALSE, &flags);
                if (ok) {
                    std::execution::set_value(std::move(self.receiver), bytesTransferred);
                } else {
                    int errorCode = WSAGetLastError();
                    std::execution::set_error(std::move(self.receiver),
                                              std::error_code(errorCode, std::system_category()));
                }
            }
        }
    }

    struct cancel_cb {
        recv_op& op;

        void operator()() noexcept {
            CancelIoEx((HANDLE)op.sock, (OVERLAPPED*)(WSAOVERLAPPED*)&op);
        }
    };

    static void on_complete(operation_base* op, DWORD bytesTransferred, int errorCode) noexcept {
        recv_op& self = *static_cast<recv_op*>(op);

        if (ready.load(std::memory_order_acquire) ||
            ready.exchange(true, std::memory_order_acq_rel)) {
            // Unsubscribe any stop-callback so we know that CancelIoEx() is not accessing 'op'
            // any more
            stopCallback.reset();

            if (errorCode == 0) {
                std::execution::set_value(std::move(receiver), bytesTransferred);
            } else {
                std::execution::set_error(std::move(receiver),
                                          std::error_code(errorCode, std::system_category()));
            }
        }
    }

    Receiver receiver;
    SOCKET sock;
    WSABUF buffer;
    std::optional<typename stop_callback_type_t<Receiver>
        ::template callback_type<cancel_cb>> stopCallback;
    std::atomic<bool> ready{false};
};

struct recv_sender {
    using is_sender = void;
    SOCKET sock;
    void* data;
    size_t len;

    template<typename Receiver>
    friend recv_op<Receiver> tag_invoke(std::execution::connect_t,
                                        const recv_sender& s,
                                        Receiver r) {
        return recv_op<Receiver>{s.sock, s.data, s.len, std::move(r)};
    }
};

recv_sender async_recv(SOCKET s, void* data, size_t len) {
    return recv_sender{s, data, len};
}
```

### More end-user examples ### {#example-moar}

#### Sudoku solver #### {#example-sudoku}

This example comes from Kirk Shoop, who ported an example from TBB's documentation to sender/receiver in his fork of the libunifex repo. It is a Sudoku solver that uses a configurable number of threads to explore the search space for solutions.

The sender/receiver-based Sudoku solver can be found [here](https://github.com/kirkshoop/libunifex/blob/sudoku/examples/sudoku.cpp). Some things that are worth noting about Kirk's solution:

1. Although it schedules asychronous work onto a thread pool, and each unit of work will schedule more work, its use of structured concurrency patterns make reference counting unnecessary. The solution does not make use of `shared_ptr`.

2. In addition to eliminating the need for reference counting, the use of structured concurrency makes it easy to ensure that resources are cleaned up on all code paths. In contrast, the TBB example that inspired this one [leaks memory](https://github.com/oneapi-src/oneTBB/issues/568).

For comparison, the TBB-based Sudoku solver can be found [here](https://github.com/oneapi-src/oneTBB/blob/40a9a1060069d37d5f66912c6ee4cf165144774b/examples/task_group/sudoku/sudoku.cpp).

#### File copy #### {#example-file-copy}

This example also comes from Kirk Shoop which uses sender/receiver to recursively copy the files a directory tree. It demonstrates how sender/receiver can be used to do IO, using a scheduler that schedules work on Linux's io_uring.

As with the Sudoku example, this example obviates the need for reference counting by employing structured concurrency. It uses iteration with an upper limit to avoid having too many open file handles.

You can find the example [here](https://github.com/kirkshoop/libunifex/blob/filecopy/examples/file_copy.cpp).

#### Echo server #### {#example-echo-server}

Dietmar Kuehl has a hobby project that implements networking APIs on top of sender/receiver. He recently implemented an echo server as a demo. His echo server code can be found [here](https://github.com/dietmarkuehl/kuhllib/blob/main/src/examples/echo_server.cpp).

Below, I show the part of the echo server code. This code is executed for each client that connects to the echo server. In a loop, it reads input from a socket and echos the input back to the same socket. All of this, including the loop, is implemented with generic async algorithms.

    <pre highlight="c++">
    outstanding.start(
        EX::repeat_effect_until(
              EX::let_value(
                  NN::async_read_some(ptr->d_socket,
                                      context.scheduler(),
                                      NN::buffer(ptr->d_buffer))
            | EX::then([ptr](::std::size_t n){
                ::std::cout &lt;&lt; "read='" &lt;&lt; ::std::string_view(ptr->d_buffer, n) &lt;&lt; "'\n";
                ptr->d_done = n == 0;
                return n;
            }),
              [&context, ptr](::std::size_t n){
                return NN::async_write_some(ptr->d_socket,
                                            context.scheduler(),
                                            NN::buffer(ptr->d_buffer, n));
              })
            | EX::then([](auto&&...){})
            , [owner = ::std::move(owner)]{ return owner->d_done; }
        )
    );
    </pre>

In this code, `NN::async_read_some` and `NN::async_write_some` are asynchronous socket-based networking APIs that return senders. `EX::repeat_effect_until`, `EX::let_value`, and `EX::then` are fully generic sender adaptor algorithms that accept and return senders.

This is a good example of seamless composition of async IO functions with non-IO operations. And by composing the senders in this structured way, all the state for the composite operation -- the `repeat_effect_until` expression and all its child operations -- is stored altogether in a single object.

## Examples: Algorithms ## {#example-algorithm}

In this section we show a few simple sender/receiver-based algorithm implementations.

### `then` ### {#example-then}

```c++
namespace exec = std::execution;

template<class R, class F>
class _then_receiver
    : exec::receiver_adaptor<_then_receiver<R, F>, R> {
  friend exec::receiver_adaptor<_then_receiver, R>;
  F f_;

  // Customize set_value by invoking the callable and passing the result to the inner receiver
  template<class... As>
  void set_value(As&&... as) && noexcept try {
    exec::set_value(std::move(*this).base(), std::invoke((F&&) f_, (As&&) as...));
  } catch(...) {
    exec::set_error(std::move(*this).base(), std::current_exception());
  }

 public:
  _then_receiver(R r, F f)
   : exec::receiver_adaptor<_then_receiver, R>{std::move(r)}
   , f_(std::move(f)) {}
};

template<exec::sender S, class F>
struct _then_sender {
  using is_sender = void;
  S s_;
  F f_;

  template <class... Args>
    using _set_value_t = exec::completion_signatures<
      exec::set_value_t(std::invoke_result_t<F, Args...>)>;

  // Compute the completion signatures
  template<class Env>
  friend auto tag_invoke(exec::get_completion_signatures_t, _then_sender&&, Env)
    -> exec::transform_completion_signatures_of<S, Env,
        exec::completion_signatures<exec::set_error_t(std::exception_ptr)>,
        _set_value_t>;

  // Connect:
  template<exec::receiver R>
  friend auto tag_invoke(exec::connect_t, _then_sender&& self, R r)
    -> exec::connect_result_t<S, _then_receiver<R, F>> {
      return exec::connect(
        (S&&) self.s_, _then_receiver<R, F>{(R&&) r, (F&&) self.f_});
  }

  friend decltype(auto) tag_invoke(get_env_t, const _then_sender& self) noexcept {
    return get_env(self.s_);
  }
};

template<exec::sender S, class F>
exec::sender auto then(S s, F f) {
  return _then_sender<S, F>{(S&&) s, (F&&) f};
}
```

This code builds a `then` algorithm that transforms the value(s) from the input sender
with a transformation function. The result of the transformation becomes the new value.
The other receiver functions (`set_error` and `set_stopped`), as well as all receiver queries,
are passed through unchanged.

In detail, it does the following:

1. Defines a receiver in terms of `execution::receiver_adaptor` that aggregates
    another receiver and an invocable that:
    * Defines a constrained `tag_invoke` overload for transforming the value
        channel.
    * Defines another constrained overload of `tag_invoke` that passes all other
        customizations through unchanged.

    The `tag_invoke` overloads are actually implemented by
    `execution::receiver_adaptor`; they dispatch either to named members, as
    shown above with `_then_receiver::set_value`, or to the adapted receiver.
2. Defines a sender that aggregates another sender and the invocable, which defines a `tag_invoke` customization for `std::execution::connect` that wraps the incoming receiver in the receiver from (1) and passes it and the incoming sender to `std::execution::connect`, returning the result. It also defines a `tag_invoke` customization of `get_completion_signatures` that declares the sender's completion signatures when executed within a particular environment.

### `retry` ### {#example-retry}

```c++
using namespace std;
namespace exec = execution;

template <class From, class To>
concept _decays_to = same_as<decay_t<From>, To>;

// _conv needed so we can emplace construct non-movable types into
// a std::optional.
template<invocable F>
  requires is_nothrow_move_constructible_v<F>
struct _conv {
  F f_;
  explicit _conv(F f) noexcept : f_((F&&) f) {}
  operator invoke_result_t<F>() && {
    return ((F&&) f_)();
  }
};

template<class S, class R>
struct _op;

// pass through all customizations except set_error, which retries the operation.
template<class S, class R>
struct _retry_receiver
  : exec::receiver_adaptor<_retry_receiver<S, R>> {
  _op<S, R>* o_;

  R&& base() && noexcept { return (R&&) o_->r_; }
  const R& base() const & noexcept { return o_->r_; }

  explicit _retry_receiver(_op<S, R>* o) : o_(o) {}

  void set_error(auto&&) && noexcept {
    o_->_retry(); // This causes the op to be retried
  }
};

// Hold the nested operation state in an optional so we can
// re-construct and re-start it if the operation fails.
template<class S, class R>
struct _op {
  S s_;
  R r_;
  optional<
      exec::connect_result_t<S&, _retry_receiver<S, R>>> o_;

  _op(S s, R r): s_((S&&)s), r_((R&&)r), o_{_connect()} {}
  _op(_op&&) = delete;

  auto _connect() noexcept {
    return _conv{[this] {
      return exec::connect(s_, _retry_receiver<S, R>{this});
    }};
  }
  void _retry() noexcept try {
    o_.emplace(_connect()); // potentially-throwing
    exec::start(*o_);
  } catch(...) {
    exec::set_error((R&&) r_, std::current_exception());
  }
  friend void tag_invoke(exec::start_t, _op& o) noexcept {
    exec::start(*o.o_);
  }
};

template<class S>
struct _retry_sender {
  using is_sender = void;
  S s_;
  explicit _retry_sender(S s) : s_((S&&) s) {}

  template <class... Ts>
    using _value_t =
      exec::completion_signatures<exec::set_value_t(Ts...)>;
  template <class>
    using _error_t = exec::completion_signatures<>;

  // Declare the signatures with which this sender can complete
  template <class Env>
  friend auto tag_invoke(exec::get_completion_signatures_t, const _retry_sender&, Env)
    -> exec::transform_completion_signatures_of<S&, Env,
        exec::completion_signatures<exec::set_error_t(std::exception_ptr)>,
        _value_t, _error_t>;

  template<exec::receiver R>
  friend _op<S, R> tag_invoke(exec::connect_t, _retry_sender&& self, R r) {
    return {(S&&) self.s_, (R&&) r};
  }

  friend decltype(auto) tag_invoke(exec::get_env_t, const _retry_sender& self) noexcept {
    return get_env(self.s_);
  }
};

template<exec::sender S>
exec::sender auto retry(S s) {
  return _retry_sender{(S&&) s};
}
```

The `retry` algorithm takes a multi-shot sender and causes it to repeat on error, passing
through values and stopped signals. Each time the input sender is restarted, a new receiver
is connected and the resulting operation state is stored in an `optional`, which allows us
to reinitialize it multiple times.

This example does the following:

1. Defines a `_conv` utility that takes advantage of C++17's guaranteed copy elision to
    emplace a non-movable type in a `std::optional`.

2. Defines a `_retry_receiver` that holds a pointer back to the operation state. It passes
    all customizations through unmodified to the inner receiver owned by the operation state
    except for `set_error`, which causes a `_retry()` function to be called instead.

3. Defines an operation state that aggregates the input sender and receiver, and declares
    storage for the nested operation state in an `optional`. Constructing the operation
    state constructs a `_retry_receiver` with a pointer to the (under construction) operation
    state and uses it to connect to the aggregated sender.

4. Starting the operation state dispatches to `start` on the inner operation state.

5. The `_retry()` function reinitializes the inner operation state by connecting the sender
    to a new receiver, holding a pointer back to the outer operation state as before.

6. After reinitializing the inner operation state, `_retry()` calls `start` on it, causing
    the failed operation to be rescheduled.

7. Defines a `_retry_sender` that implements the `connect` customization point to return
    an operation state constructed from the passed-in sender and receiver.

8. `_retry_sender` also implements the `get_completion_signatures` customization point to describe the ways this sender may complete when executed in a particular execution resource.

## Examples: Schedulers ## {#example-schedulers}

In this section we look at some schedulers of varying complexity.

### Inline scheduler ### {#example-schedulers-inline}

```c++
class inline_scheduler {
  template <class R>
    struct _op {
      [[no_unique_address]] R rec_;
      friend void tag_invoke(std::execution::start_t, _op& op) noexcept {
        std::execution::set_value((R&&) op.rec_);
      }
    };

  struct _env {
    template <class Tag>
      friend inline_scheduler tag_invoke(
          std::execution::get_completion_scheduler_t<Tag>, _env) noexcept {
        return {};
      }
  };

  struct _sender {
    using is_sender = void;
    using completion_signatures =
      std::execution::completion_signatures<std::execution::set_value_t()>;

    template <class R>
      friend auto tag_invoke(std::execution::connect_t, _sender, R&& rec)
        noexcept(std::is_nothrow_constructible_v<std::remove_cvref_t<R>, R>)
        -> _op<std::remove_cvref_t<R>> {
        return {(R&&) rec};
      }

    friend _env tag_invoke(exec::get_env_t, _sender) noexcept {
      return {};
    }
  };

  friend _sender tag_invoke(std::execution::schedule_t, const inline_scheduler&) noexcept {
    return {};
  }

 public:
  inline_scheduler() = default;
  bool operator==(const inline_scheduler&) const noexcept = default;
};
```

The inline scheduler is a trivial scheduler that completes immediately and synchronously on
the thread that calls `std::execution::start` on the operation state produced by its sender.
In other words, <code>start(connect(schedule(<i>inline-scheduler</i>), receiver))</code> is
just a fancy way of saying `set_value(receiver)`, with the exception of the fact that `start`
wants to be passed an lvalue.

Although not a particularly useful scheduler, it serves to illustrate the basics of
implementing one. The `inline_scheduler`:

1. Customizes `execution::schedule` to return an instance of the sender type
    `_sender`.
2. The `_sender` type models the `sender` concept and provides the metadata
    needed to describe it as a sender of no values
    and that never calls `set_error` or `set_stopped`. This
    metadata is provided with the help of the `execution::completion_signatures`
    utility.
3. The `_sender` type customizes `execution::connect` to accept a receiver of no
    values. It returns an instance of type `_op` that holds the receiver by
    value.
4. The operation state customizes `std::execution::start` to call
    `std::execution::set_value` on the receiver.

### Single thread scheduler ### {#example-single-thread}

This example shows how to create a scheduler for an execution resource that consists of a single
thread. It is implemented in terms of a lower-level execution resource called `std::execution::run_loop`.

```c++
class single_thread_context {
  std::execution::run_loop loop_;
  std::thread thread_;

public:
  single_thread_context()
    : loop_()
    , thread_([this] { loop_.run(); })
  {}

  ~single_thread_context() {
    loop_.finish();
    thread_.join();
  }

  auto get_scheduler() noexcept {
    return loop_.get_scheduler();
  }

  std::thread::id get_thread_id() const noexcept {
    return thread_.get_id();
  }
};
```

The `single_thread_context` owns an event loop and a thread to drive it. In the destructor, it tells the event
loop to finish up what it's doing and then joins the thread, blocking for the event loop to drain.

The interesting bits are in the `execution::run_loop` context implementation. It
is slightly too long to include here, so we only provide [a reference to
it](https://github.com/NVIDIA/stdexec/blob/c2cdb2a2abe2b29a34cf277728319d6ca92ec0bb/include/stdexec/execution.hpp#L3916-L4101),
but there is one noteworthy detail about its implementation: It uses space in
its operation states to build an intrusive linked list of work items. In
structured concurrency patterns, the operation states of nested operations
compose statically, and in an algorithm like `this_thread::sync_wait`, the
composite operation state lives on the stack for the duration of the operation.
The end result is that work can be scheduled onto this thread with zero
allocations.

## Examples: Server theme ## {#example-server}

In this section we look at some examples of how one would use senders to implement an HTTP server. The examples ignore the low-level details of the HTTP server and looks at how senders can be combined to achieve the goals of the project.

General application context:
* server application that processes images
* execution resources:
    - 1 dedicated thread for network I/O
    - N worker threads used for CPU-intensive work
    - M threads for auxiliary I/O
    - optional GPU context that may be used on some types of servers
* all parts of the applications can be asynchronous
* no locks shall be used in user code

### Composability with `execution::let_*` ### {#example-server-let}

Example context:
- we are looking at the flow of processing an HTTP request and sending back the response
- show how one can break the (slightly complex) flow into steps with `execution::let_*` functions
- different phases of processing HTTP requests are broken down into separate concerns
- each part of the processing might use different execution resources (details not shown in this example)
- error handling is generic, regardless which component fails; we always send the right response to the clients

Goals:
- show how one can break more complex flows into steps with let_* functions
- exemplify the use of `let_value`, `let_error`, `let_stopped`, and `just` algorithms

```c++
namespace ex = std::execution;

// Returns a sender that yields an http_request object for an incoming request
ex::sender auto schedule_request_start(read_requests_ctx ctx) {...}
// Sends a response back to the client; yields a void signal on success
ex::sender auto send_response(const http_response& resp) {...}
// Validate that the HTTP request is well-formed; forwards the request on success
ex::sender auto validate_request(const http_request& req) {...}

// Handle the request; main application logic
ex::sender auto handle_request(const http_request& req) {
  //...
  return ex::just(http_response{200, result_body});
}

// Transforms server errors into responses to be sent to the client
ex::sender auto error_to_response(std::exception_ptr err) {
  try {
    std::rethrow_exception(err);
  } catch (const std::invalid_argument& e) {
    return ex::just(http_response{404, e.what()});
  } catch (const std::exception& e) {
    return ex::just(http_response{500, e.what()});
  } catch (...) {
    return ex::just(http_response{500, "Unknown server error"});
  }
}
// Transforms cancellation of the server into responses to be sent to the client
ex::sender auto stopped_to_response() {
  return ex::just(http_response{503, "Service temporarily unavailable"});
}
//...
// The whole flow for transforming incoming requests into responses
ex::sender auto snd =
    // get a sender when a new request comes
    schedule_request_start(the_read_requests_ctx)
    // make sure the request is valid; throw if not
    | ex::let_value(validate_request)
    // process the request in a function that may be using a different execution resource
    | ex::let_value(handle_request)
    // If there are errors transform them into proper responses
    | ex::let_error(error_to_response)
    // If the flow is cancelled, send back a proper response
    | ex::let_stopped(stopped_to_response)
    // write the result back to the client
    | ex::let_value(send_response)
    // done
    ;
// execute the whole flow asynchronously
ex::start_detached(std::move(snd));
```

The example shows how one can separate out the concerns for interpreting requests, validating requests, running the main logic for handling the request, generating error responses, handling cancellation and sending the response back to the client.
They are all different phases in the application, and can be joined together with the `let_*` functions.

All our functions return `execution::sender` objects, so that they can all generate success, failure and cancellation paths.
For example, regardless where an error is generated (reading request, validating request or handling the response), we would have one common block to handle the error, and following error flows is easy.

Also, because of using `execution::sender` objects at any step, we might expect any of these steps to be completely asynchronous; the overall flow doesn't care.
Regardless of the execution resource in which the steps, or part of the steps are executed in, the flow is still the same.

### Moving between execution resources with `execution::on` and `execution::transfer` ### {#example-server-on}

Example context:
- reading data from the socket before processing the request
- reading of the data is done on the I/O context
- no processing of the data needs to be done on the I/O context

Goals:
- show how one can change the execution resource
- exemplify the use of `on` and `transfer` algorithms


```c++
namespace ex = std::execution;

size_t legacy_read_from_socket(int sock, char* buffer, size_t buffer_len) {}
void process_read_data(const char* read_data, size_t read_len) {}
//...

// A sender that just calls the legacy read function
auto snd_read = ex::just(sock, buf, buf_len) | ex::then(legacy_read_from_socket);
// The entire flow
auto snd =
    // start by reading data on the I/O thread
    ex::on(io_sched, std::move(snd_read))
    // do the processing on the worker threads pool
    | ex::transfer(work_sched)
    // process the incoming data (on worker threads)
    | ex::then([buf](int read_len) { process_read_data(buf, read_len); })
    // done
    ;
// execute the whole flow asynchronously
ex::start_detached(std::move(snd));
```

The example assume that we need to wrap some legacy code of reading sockets, and handle execution resource switching.
(This style of reading from socket may not be the most efficient one, but it's working for our purposes.)
For performance reasons, the reading from the socket needs to be done on the I/O thread, and all the processing needs to happen on a work-specific execution resource (i.e., thread pool).

Calling `execution::on` will ensure that the given sender will be started on the given scheduler.
In our example, `snd_read` is going to be started on the I/O scheduler.
This sender will just call the legacy code.

The completion-signal will be issued in the I/O execution resource, so we have to move it to the work thread pool.
This is achieved with the help of the `execution::transfer` algorithm.
The rest of the processing (in our case, the last call to `then`) will happen in the work thread pool.

The reader should notice the difference between `execution::on` and `execution::transfer`.
The `execution::on` algorithm will ensure that the given sender will start in the specified context, and doesn't care where the completion-signal for that sender is sent.
The `execution::transfer` algorithm will not care where the given sender is going to be started, but will ensure that the completion-signal of will be transferred to the given context.

## What this proposal is **not** ## {#intro-is-not}

This paper is not a patch on top of [[P0443R14]]; we are not asking to update the existing paper, we are asking to retire it in favor of this paper, which is already self-contained; any example code within this paper can be written in Standard C++, without the need
to standardize any further facilities.

This paper is not an alternative design to [[P0443R14]]; rather, we have taken the design in the current executors paper, and applied targeted fixes to allow it to fulfill the promises of the sender/receiver model, as well as provide all the facilities we consider
essential when writing user code using standard execution concepts; we have also applied the guidance of removing one-way executors from the paper entirely, and instead provided an algorithm based around senders that serves the same purpose.

## Design changes from P0443 ## {#intro-compare}

1. The `executor` concept has been removed and all of its proposed functionality
    is now based on schedulers and senders, as per SG1 direction.
2. Properties are not included in this paper. We see them as a possible future
    extension, if the committee gets more comfortable with them.
3. Senders now advertise what scheduler, if any, their evaluation will complete
    on.
4. The places of execution of user code in P0443 weren't precisely defined,
    whereas they are in this paper. See [[#design-propagation]].
5. P0443 did not propose a suite of sender algorithms necessary for writing
    sender code; this paper does. See [[#design-sender-factories]],
    [[#design-sender-adaptors]], and [[#design-sender-consumers]].
6. P0443 did not specify the semantics of variously qualified `connect`
    overloads; this paper does. See [[#design-shot]].
7. This paper extends the sender traits/typed sender design to support typed
    senders whose value/error types depend on type information provided late via
    the receiver.
8. Support for untyped senders is dropped; the `typed_sender` concept is renamed
    `sender`; `sender_traits` is replaced with `completion_signatures_of_t`.
8. Specific type erasure facilities are omitted, as per LEWG direction. Type
    erasure facilities can be built on top of this proposal, as discussed in
    [[#design-dispatch]].
9. A specific thread pool implementation is omitted, as per LEWG direction.
10. Some additional utilities are added:
    * <b>`run_loop`</b>: An execution resource that provides a multi-producer,
        single-consumer, first-in-first-out work queue.
    * <b>`receiver_adaptor`</b>: A utility for algorithm authors for defining one
        receiver type in terms of another.
    * <b>`completion_signatures`</b> and <b>`transform_completion_signatures`</b>:
        Utilities for describing the ways in which a sender can complete in a
        declarative syntax.

## Prior art ## {#intro-prior-art}

This proposal builds upon and learns from years of prior art with asynchronous and parallel programming frameworks in C++. In this section, we discuss async abstractions that have previously been suggested as a possible basis for asynchronous algorithms and why they fall short.

### Futures ### {#intro-prior-art-futures}

A future is a handle to work that has already been scheduled for execution. It is one end of a communication channel; the other end is a promise, used to receive the result from the concurrent operation and to communicate it to the future.

Futures, as traditionally realized, require the dynamic allocation and management of a shared state, synchronization, and typically type-erasure of work and continuation. Many of these costs are inherent in the nature of "future" as a handle to work that is already scheduled for execution. These expenses rule out the future abstraction for many uses and makes it a poor choice for a basis of a generic mechanism.

### Coroutines ### {#intro-prior-art-coroutines}

C++20 coroutines are frequently suggested as a basis for asynchronous algorithms. It's fair to ask why, if we added coroutines to C++, are we suggesting the addition of a library-based abstraction for asynchrony. Certainly, coroutines come with huge syntactic and semantic advantages over the alternatives.

Although coroutines are lighter weight than futures, coroutines suffer many of the same problems. Since they typically start suspended, they can avoid synchronizing the chaining of dependent work. However in many cases, coroutine frames require an unavoidable dynamic allocation and indirect function calls. This is done to hide the layout of the coroutine frame from the C++ type system, which in turn makes possible the separate compilation of coroutines and certain compiler optimizations, such as optimization of the coroutine frame size.

Those advantages come at a cost, though. Because of the dynamic allocation of coroutine frames, coroutines in embedded or heterogeneous environments, which often lack support for dynamic allocation, require great attention to detail. And the allocations and indirections tend to complicate the job of the inliner, often resulting in sub-optimal codegen.

The coroutine language feature mitigates these shortcomings somewhat with the HALO optimization [[P0981R0]], which leverages existing compiler optimizations such as allocation elision and devirtualization to inline the coroutine, completely eliminating the runtime overhead. However, HALO requires a sophisiticated compiler, and a fair number of stars need to align for the optimization to kick in. In our experience, more often than not in real-world code today's compilers are not able to inline the coroutine, resulting in allocations and indirections in the generated code.

In a suite of generic async algorithms that are expected to be callable from hot code paths, the extra allocations and indirections are a deal-breaker. It is for these reasons that we consider coroutines a poor choise for a basis of all standard async.

### Callbacks ### {#intro-prior-art-callbacks}

Callbacks are the oldest, simplest, most powerful, and most efficient mechanism for creating chains of work, but suffer problems of their own. Callbacks must propagate either errors or values. This simple requirement yields many different interface possibilities. The lack of a standard callback shape obstructs generic design.

Additionally, few of these possibilities accommodate cancellation signals when the user requests upstream work to stop and clean up.

## Field experience ## {#intro-field-experience}

### libunifex ### {#intro-field-experience-libunifex}

This proposal draws heavily from our field experience with [libunifex](https://github.com/facebookexperimental/libunifex). Libunifex implements all of the concepts and customization points defined in this paper (with slight variations -- the design of P2300 has evolved due to LEWG feedback), many of this paper's algorithms (some under different names), and much more besides.

Libunifex has several concrete schedulers in addition to the `run_loop` suggested here (where it is called `manual_event_loop`). It has schedulers that dispatch efficiently to epoll and io_uring on Linux and the Windows Thread Pool on Windows.

In addition to the proposed interfaces and the additional schedulers, it has several important extensions to the facilities described in this paper, which demonstrate directions in which these abstractions may be evolved over time, including:

* Timed schedulers, which permit scheduling work on an execution resource at a particular time or after a particular duration has elapsed. In addition, it provides time-based algorithms.
* File I/O schedulers, which permit filesystem I/O to be scheduled.
* Two complementary abstractions for streams (asynchronous ranges), and a set of stream-based algorithms.

Libunifex has seen heavy production use at Facebook. As of October 2021, it is currently used in production within the following applications and platforms:

* Facebook Messenger on iOS, Android, Windows, and macOS
* Instagram on iOS and Android
* Facebook on iOS and Android
* Portal
* An internal Facebook product that runs on Linux

All of these applications are making direct use of the sender/receiver abstraction as presented in this paper. One product (Instagram on iOS) is making use of the sender/coroutine integration as presented. The monthly active users of these products number in the billions.

### Other implementations ### {#intro-field-experience-other-implementations}

The authors are aware of a number of other implementations of sender/receiver from this paper. These are presented here in perceived order of maturity and field experience.

* <b>[[HPX]]</b>

    HPX is a general purpose C++ runtime system for parallel and distributed applications that has been under active development since 2007. HPX exposes a uniform, standards-oriented API, and keeps abreast of the latest standards and proposals. It is used in a wide variety of high-performance applications.

    The sender/receiver implementation in HPX has been under active development since May 2020. It is used to erase the overhead of futures and to make it possible to write efficient generic asynchronous algorithms that are agnostic to their execution resource. In HPX, algorithms can migrate execution between execution resources, even to GPUs and back, using a uniform standard interface with sender/receiver.

    Far and away, the HPX team has the greatest usage experience outside Facebook. Mikael Simberg summarizes the experience as follows:

    > Summarizing, for us the major benefits of sender/receiver compared to the old model are:
    >
    > 1. Proper hooks for transitioning between execution resources.
    > 2. The adaptors. Things like `let_value` are really nice additions.
    > 3. Separation of the error channel from the value channel (also cancellation, but we don't have much use for it at the moment). Even from a teaching perspective having to explain that the future `f2` in the continuation will always be ready here `f1.then([](future<T> f2) {...})` is enough of a reason to separate the channels. All the other obvious reasons apply as well of course.
    > 4. For futures we have a thing called `hpx::dataflow` which is an optimized version of `when_all(...).then(...)` which avoids intermediate allocations. With the sender/receiver `when_all(...) | then(...)` we get that "for free".

* <b>[kuhllib](https://github.com/dietmarkuehl/kuhllib/) by Dietmar Kuehl</b>

    This is a prototype Standard Template Library with an implementation of sender/receiver that has been under development since May, 2021. It is significant mostly for its support for sender/receiver-based networking interfaces.

    Here, Dietmar Kuehl speaks about the perceived complexity of sender/receiver:

    > ... and, also similar to STL: as I had tried to do things in that space before I recognize sender/receivers as being maybe complicated in one way but a huge simplification in another one: like with STL I think those who use it will benefit - if not from the algorithm from the clarity of abstraction: the separation of concerns of STL (the algorithm being detached from the details of the sequence representation) is a major leap. Here it is rather similar: the separation of the asynchronous algorithm from the details of execution. Sure, there is some glue to tie things back together but each of them is simpler than the combined result.

    Elsewhere, he said:

    > ... to me it feels like sender/receivers are like iterators when STL emerged: they are different from what everybody did in that space. However, everything people are already doing in that space isn’t right.

    Kuehl also has experience teaching sender/receiver at Bloomberg. About that experience he says:

    > When I asked [my students] specifically about how complex they consider the sender/receiver stuff the feedback was quite unanimous that the sender/receiver parts aren’t trivial but not what contributes to the complexity.

* <b>[The reference implementation](https://github.com/NVIDIA/stdexec)</b>

    This is a complete implementation written from the specification in this paper. Its primary purpose is to help find specification bugs and to harden the wording of the proposal. It is
    fit for broad use and for contribution to libc++.

    It is current with R7 of this paper.

* <b>[Reference implementation for the Microsoft STL](https://github.com/miscco/STL/tree/proposal/executors) by Michael Schellenberger Costa</b>

    This is another reference implementation of this proposal, this time in a fork of the Mircosoft STL implementation. Michael Schellenberger Costa is not affiliated with Microsoft. He intends to contribute this implementation upstream when it is complete.

### Inspirations ### {#intro-field-experience-inspirations}

This proposal also draws heavily from our experience with [Thrust](https://github.com/NVIDIA/thrust) and [Agency](https://github.com/agency-library/agency). It is also inspired by the needs of countless other C++ frameworks for asynchrony, parallelism, and concurrency, including:

* <a href="https://github.com/STEllAR-GROUP/hpx">HPX</a>
* [Folly](https://github.com/facebook/folly/blob/master/folly/docs/Futures.md)
* [stlab](https://stlab.cc/libraries/concurrency/)

# Revision history # {#revisions}

## R8 ## {#r8}

The changes since R7 are as follows:

<b>Fixes:</b>

  * `get_env(obj)` is required to be nothrow.

  * `make_completion_signatures` is renamed `transform_completion_signatures_of`
    and is expressed in terms of the new `transform_completion_signatures`,
    which takes an input set of completion signatures instead of a sender and an
    environment.

  * Add a requirement on queryable objects that if `tag_invoke(query, env,
    args...)` is well-formed, then `query(env, args...)` is
    expression-equivalent to it. This is necessary to properly specify how to
    join two environments in the presence of queries that have defaults.

## R7 ## {#r7}

The changes since R6 are as follows:

<b>Fixes:</b>

  * Make it valid to pass non-variadic templates to the exposition-only alias
    template <code><i>gather-signatures</i></code>, fixing the definitions of
    `value_types_of_t`, `error_types_of_t`, and the exposition-only alias
    template <code><i>sync-wait-type</i></code>.
  * Removed the query forwarding from `receiver_adaptor` that was
    inadvertantly left over from a previous edit.
  * When adapting a sender to an awaitable with `as_awaitable`, the sender's
    value result datum is decayed before being stored in the exposition-only
    `variant`.
  * Correctly specify the completion signatures of the `schedule_from`
    algorithm.
  * The `sender_of` concept no longer distinguishes between a sender of a
    type `T` and a sender of a type `T&&`.
  * The `just` and `just_error` sender factories now reject C-style arrays
    instead of silently decaying them to pointers.

<b>Enhancements:</b>

  * The `sender` and `receiver` concepts get explicit opt-in traits called
    `enable_sender` and `enable_receiver`, respectively. The traits have
    default implementations that look for nested `is_sender` and `is_receiver`
    types, respectively.
  * `get_attrs` is removed and `get_env` is used in its place.
  * The exposition-only type <code><i>empty-env</i></code> is made normative
    and is renamed `empty_env`.
  * `get_env` gets a fall-back implementation that simply returns `empty_env{}`
    if a `tag_invoke` overload is not found.
  * `get_env` is required to be insensitive to the cvref-qualification of its
    argument.
  * `get_env`, `empty_env`, and `env_of_t` are moved into the `std::` namespace.
  * Add a new subclause describing the async programming model of senders in
    abstract terms. See [[#spec-execution-async.ops]].

## R6 ## {#r6}

The changes since R5 are as follows:

<b>Fixes:</b>

  * Fix typo in the specification of `in_place_stop_source` about the relative
    lifetimes of the tokens and the source that produced them.
  * `get_completion_signatures` tests for awaitability with a promise type
    similar to the one used by `connect` for the sake of consistency.
  * A coroutine promise type is an environment provider (that is, it implements
    `get_env()`) rather than being directly queryable. The previous draft was
    inconsistent about that.

<b>Enhancements:</b>

  * Sender queries are moved into a separate queryable "attributes" object
    that is accessed by passing the sender to `get_attrs()` (see below). The
    `sender` concept is reexpressed to require `get_attrs()` and separated
    from a new `sender_in<Snd, Env>` concept for checking whether a type is
    a sender within a particular execution environment.
  * The placeholder types `no_env` and `dependent_completion_signatures<>`
    are no longer needed and are dropped.
  * `ensure_started` and `split` are changed to persist the result of
    calling `get_attrs()` on the input sender.
  * Reorder constraints of the `scheduler` and `receiver` concepts to avoid constraint recursion
    when used in tandem with poorly-constrained, implicitly convertible types.
  * Re-express the `sender_of` concept to be more ergonomic and general.
  * Make the specification of the alias templates `value_types_of_t` and
    `error_types_of_t`, and the variable template `sends_done` more concise by
    expressing them in terms of a new exposition-only alias template
    <code><i>gather-signatures</i></code>.

### Environments and attributes ### {#environments-and-attributes}

In earlier revisions, receivers, senders, and schedulers all were directly
queryable. In R4, receiver queries were moved into a separate "environment"
object, obtainable from a receiver with a `get_env` accessor. In R6, the
sender queries are given similar treatment, relocating to a "attributes"
object obtainable from a sender with a `get_attrs` accessor. This was done
to solve a number of design problems with the `split` and `ensure_started`
algorithms; _e.g._, see
[NVIDIA/stdexec#466](https://github.com/NVIDIA/stdexec/issues/466).

Schedulers, however, remain directly queryable. As lightweight handles
that are required to be movable and copyable, there is little reason to
want to dispose of a scheduler and yet persist the scheduler's queries.

This revision also makes operation states directly queryable, even though
there isn't yet a use for such. Some early prototypes of cooperative bulk
parallel sender algorithms done at NVIDIA suggest the utility of
forwardable operation state queries. The authors chose to make opstates
directly queryable since the opstate object is itself required to be kept
alive for the duration of asynchronous operation.

## R5 ## {#r5}

The changes since R4 are as follows:

<b>Fixes:</b>

  * `start_detached` requires its argument to be a `void` sender (sends no values
    to `set_value`).

<b>Enhancements:</b>

  * Receiver concepts refactored to no longer require an error channel for
    `exception_ptr` or a stopped channel.
  * `sender_of` concept and `connect` customization point additionally require
    that the receiver is capable of receiving all of the sender's possible
    completions.
  * `get_completion_signatures` is now required to return an instance of either
    `completion_signatures` or `dependent_completion_signatures`.
  * `make_completion_signatures` made more general.
  * `receiver_adaptor` handles `get_env` as it does the `set_*` members; that is,
    `receiver_adaptor` will look for a member named `get_env()` in the derived
    class, and if found dispatch the `get_env_t` tag invoke customization to it.
  * `just`, `just_error`, `just_stopped`, and `into_variant` have been respecified
    as customization point objects instead of functions, following LEWG guidance.

## R4 ## {#r4}

The changes since R3 are as follows:

<b>Fixes:</b>

  * Fix specification of `get_completion_scheduler` on the `transfer`, `schedule_from`
    and `transfer_when_all` algorithms; the completion scheduler cannot be guaranteed
    for `set_error`.
  * The value of `sends_stopped` for the default sender traits of types that are
    generally awaitable was changed from `false` to `true` to acknowledge the
    fact that some coroutine types are generally awaitable and may implement the
    `unhandled_stopped()` protocol in their promise types.
  * Fix the incorrect use of inline namespaces in the `<execution>` header.
  * Shorten the stable names for the sections.
  * `sync_wait` now handles `std::error_code` specially by throwing a
    `std::system_error` on failure.
  * Fix how ADL isolation from class template arguments is specified so it
    doesn't constrain implmentations.
  * Properly expose the tag types in the header `<execution>` synopsis.

<b>Enhancements:</b>

  * Support for "dependently-typed" senders, where the completion signatures -- and
    thus the sender metadata -- depend on the type of the receiver connected
    to it. See the section [dependently-typed
    senders](#dependently-typed-senders) below for more information.
  * Add a <code>read(<i>query</i>)</code> sender factory for issuing a query
    against a receiver and sending the result through the value channel. (This is
    a useful instance of a dependently-typed sender.)
  * Add `completion_signatures` utility for declaratively defining a typed
    sender's metadata.
  * Add `make_completion_signatures` utility for specifying a sender's completion
    signatures by adapting those of another sender.
  * Drop support for untyped senders and rename `typed_sender` to `sender`.
  * `set_done` is renamed to `set_stopped`. All occurances of "`done`" in
    indentifiers replaced with "`stopped`"
  * Add customization points for controlling the forwarding of scheduler,
    sender, receiver, and environment queries through layers of adaptors;
    specify the behavior of the standard adaptors in terms of the new
    customization points.
  * Add `get_delegatee_scheduler` query to forward a scheduler that can be used
    by algorithms or by the scheduler to delegate work and forward progress.
  * Add `schedule_result_t` alias template.
  * More precisely specify the sender algorithms, including precisely what their
    completion signatures are.
  * `stopped_as_error` respecified as a customization point object.
  * `tag_invoke` respecified to improve diagnostics.

### Dependently-typed senders ### {#dependently-typed-senders}

**Background:**

In the sender/receiver model, as with coroutines, contextual information about
the current execution is most naturally propagated from the consumer to the
producer. In coroutines, that means information like stop tokens, allocators and
schedulers are propagated from the calling coroutine to the callee. In
sender/receiver, that means that that contextual information is associated with
the receiver and is queried by the sender and/or operation state after the
sender and the receiver are `connect`-ed.

**Problem:**

The implication of the above is that the sender alone does not have all the
information about the async computation it will ultimately initiate; some of
that information is provided late via the receiver. However, the `sender_traits`
mechanism, by which an algorithm can introspect the value and error types the
sender will propagate, *only* accepts a sender parameter. It does not take into
consideration the type information that will come in late via the receiver. The
effect of this is that some senders cannot be typed senders when they
otherwise could be.

**Example:**

To get concrete, consider the case of the "`get_scheduler()`" sender: when
`connect`-ed and `start`-ed, it queries the receiver for its associated
scheduler and passes it back to the receiver through the value channel. That
sender's "value type" is the type of the *receiver's* scheduler. What then
should `sender_traits<get_scheduler_sender>::value_types` report for the
`get_scheduler()`'s value type? It can't answer because it doesn't know.

This causes knock-on problems since some important algorithms require a typed
sender, such as `sync_wait`. To illustrate the problem, consider the following
code:

<pre highlight="c++">
namespace ex = std::execution;

ex::sender auto task =
  ex::let_value(
    ex::get_scheduler(), // Fetches scheduler from receiver.
    [](auto current_sched) {
      // Lauch some nested work on the current scheduler:
      return ex::on(current_sched, <i>nested work...</i>);
    });

std::this_thread::sync_wait(std::move(task));
</pre>

The code above is attempting to schedule some work onto the `sync_wait`'s
`run_loop` execution resource. But `let_value` only returns a typed sender when
the input sender is typed. As we explained above, `get_scheduler()` is not
typed, so `task` is likewise not typed. Since `task` isn't typed, it cannot be
passed to `sync_wait` which is expecting a typed sender. The above code would
fail to compile.

**Solution:**

The solution is conceptually quite simple: extend the `sender_traits` mechanism
to optionally accept a receiver in addition to the sender. The algorithms can
use <code>sender_traits&lt;<i>Sender</i>, <i>Receiver</i>></code> to inspect the
async operation's completion-signals. The `typed_sender` concept would also need
to take an optional receiver parameter. This is the simplest change, and it
would solve the immediate problem.

**Design:**

Using the receiver type to compute the sender traits turns out to have pitfalls
in practice. Many receivers make use of that type information in their
implementation. It is very easy to create cycles in the type system, leading to
inscrutible errors. The design pursued in R4 is to give receivers an associated
*environment* object -- a bag of key/value pairs -- and to move the contextual
information (schedulers, etc) out of the receiver and into the environment. The
`sender_traits` template and the `typed_sender` concept, rather than taking a
receiver, take an environment. This is a much more robust design.

A further refinement of this design would be to separate the receiver and the
environment entirely, passing then as separate arguments along with the sender to
`connect`. This paper does not propose that change.

**Impact:**

This change, apart from increasing the expressive power of the sender/receiver abstraction, has the following impact:

  * Typed senders become moderately more challenging to write. (The new
    `completion_signatures` and `transform_completion_signatures` utilities are added
    to ease this extra burden.)

  * Sender adaptor algorithms that previously constrained their sender arguments
    to satisfy the `typed_sender` concept can no longer do so as the receiver is
    not available yet. This can result in type-checking that is done later, when
    `connect` is ultimately called on the resulting sender adaptor.

  * Operation states that own receivers that add to or change the environment
    are typically larger by one pointer. It comes with the benefit of far fewer
    indirections to evaluate queries.

**"Has it been implemented?"**

Yes, the reference implementation, which can be found at
https://github.com/NVIDIA/stdexec, has implemented this
design as well as some dependently-typed senders to confirm that it works.

**Implementation experience**

Although this change has not yet been made in libunifex, the most widely adopted sender/receiver implementation, a similar design can be found in Folly's coroutine support library. In Folly.Coro, it is possible to await a special awaitable to obtain the current coroutine's associated scheduler (called an executor in Folly).

For instance, the following Folly code grabs the current executor, schedules a task for execution on that executor, and starts the resulting (scheduled) task by enqueueing it for execution.

```c++
// From Facebook's Folly open source library:
template <class T>
folly::coro::Task<void> CancellableAsyncScope::co_schedule(folly::coro::Task<T>&& task) {
  this->add(std::move(task).scheduleOn(co_await co_current_executor));
  co_return;
}
```

Facebook relies heavily on this pattern in its coroutine code. But as described
above, this pattern doesn't work with R3 of `std::execution` because of the lack
of dependently-typed schedulers. The change to `sender_traits` in R4 rectifies that.

**Why now?**

The authors are loathe to make any changes to the design, however small, at this
stage of the C++23 release cycle. But we feel that, for a relatively minor
design change -- adding an extra template parameter to `sender_traits` and
`typed_sender` -- the returns are large enough to justify the change. And there
is no better time to make this change than as early as possible.

One might wonder why this missing feature not been added to sender/receiver
before now. The designers of sender/receiver have long been aware of the need.
What was missing was a clean, robust, and simple design for the change, which we
now have.

**Drive-by:**

We took the opportunity to make an additional drive-by change: Rather than
providing the sender traits via a class template for users to specialize, we
changed it into a sender *query*: <code>get_completion_signatures(<i>sender</i>,
<i>env</i>)</code>. That function's return type is used as the sender's traits.
The authors feel this leads to a more uniform design and gives sender authors a
straightforward way to make the value/error types dependent on the cv- and
ref-qualification of the sender if need be.

**Details:**

Below are the salient parts of the new support for dependently-typed senders in
R4:

* Receiver queries have been moved from the receiver into a separate environment
    object.
* Receivers have an associated environment. The new `get_env` CPO retrieves a
    receiver's environment. If a receiver doesn't implement `get_env`, it returns
    an unspecified "empty" environment -- an empty struct.
* `sender_traits` now takes an optional `Env` parameter that is used to
    determine the error/value types.
* The primary `sender_traits` template is replaced with a `completion_signatures_of_t`
    alias implemented in terms of a new `get_completion_signatures` CPO that dispatches
    with `tag_invoke`. `get_completion_signatures` takes a sender and an optional
    environment. A sender can customize this to specify its value/error types.
* Support for untyped senders is dropped. The `typed_sender` concept has been
    renamed to `sender` and now takes an optional environment.
* The environment argument to the `sender` concept and the `get_completion_signatures`
    CPO defaults to `no_env`. All environment queries fail (are ill-formed) when
    passed an instance of `no_env`.
* A type `S` is required to satisfy <code>sender&lt;<i>S</i>></code> to be
    considered a sender. If it doesn't know what types it will complete with
    independent of an environment, it returns an instance of the placeholder
    traits `dependent_completion_signatures`.
* If a sender satisfies both <code>sender&lt;<i>S</i>></code> and
    <code>sender&lt;<i>S</i>, <i>Env</i>></code>, then the completion signatures
    for the two cannot be different in any way. It is possible for an
    implementation to enforce this statically, but not required.
* All of the algorithms and examples have been updated to work with
    dependently-typed senders.

## R3 ## {#r3}

The changes since R2 are as follows:

<b>Fixes:</b>

    * Fix specification of the `on` algorithm to clarify lifetimes of
        intermediate operation states and properly scope the `get_scheduler` query.
    * Fix a memory safety bug in the implementation of <code><i>connect-awaitable</i></code>.
    * Fix recursive definition of the `scheduler` concept.

<b>Enhancements:</b>

    * Add `run_loop` execution resource.
    * Add `receiver_adaptor` utility to simplify writing receivers.
    * Require a scheduler's sender to model `sender_of` and provide a completion scheduler.
    * Specify the cancellation scope of the `when_all` algorithm.
    * Make `as_awaitable` a customization point.
    * Change `connect`'s handling of awaitables to consider those types that are awaitable owing to customization of `as_awaitable`.
    * Add `value_types_of_t` and `error_types_of_t` alias templates; rename `stop_token_type_t` to `stop_token_of_t`.
    * Add a design rationale for the removal of the possibly eager algorithms.
    * Expand the section on field experience.

## R2 ## {#r2}

The changes since R1 are as follows:

* Remove the eagerly executing sender algorithms.
* Extend the `execution::connect` customization point and the `sender_traits<>` template to recognize awaitables as `typed_sender`s.
* Add utilities `as_awaitable()` and `with_awaitable_senders<>` so a coroutine type can trivially make senders awaitable with a coroutine.
* Add a section describing the design of the sender/awaitable interactions.
* Add a section describing the design of the cancellation support in sender/receiver.
* Add a section showing examples of simple sender adaptor algorithms.
* Add a section showing examples of simple schedulers.
* Add a few more examples: a sudoku solver, a parallel recursive file copy, and an echo server.
* Refined the forward progress guarantees on the `bulk` algorithm.
* Add a section describing how to use a range of senders to represent async sequences.
* Add a section showing how to use senders to represent partial success.
* Add sender factories `execution::just_error` and `execution::just_stopped`.
* Add sender adaptors `execution::stopped_as_optional` and `execution::stopped_as_error`.
* Document more production uses of sender/receiver at scale.
* Various fixes of typos and bugs.

## R1 ## {#r1}

The changes since R0 are as follows:

* Added a new concept, `sender_of`.
* Added a new scheduler query, `this_thread::execute_may_block_caller`.
* Added a new scheduler query, `get_forward_progress_guarantee`.
* Removed the `unschedule` adaptor.
* Various fixes of typos and bugs.

## R0 ## {#r0}

Initial revision.

# Design - introduction # {#design-intro}

The following three sections describe the entirety of the proposed design.

* [[#design-intro]] describes the conventions used through the rest of the
    design sections, as well as an example illustrating how we envision code will
    be written using this proposal.
* [[#design-user]] describes all the functionality from the perspective we
    intend for users: it describes the various concepts they will interact with,
    and what their programming model is.
* [[#design-implementer]] describes the machinery that allows for that
    programming model to function, and the information contained there is
    necessary for people implementing senders and sender algorithms (including the
    standard library ones) - but is not necessary to use senders productively.

## Conventions ## {#design-conventions}

The following conventions are used throughout the design section:

  1. The namespace proposed in this paper is the same as in [[P0443R14]]:
      `std::execution`; however, for brevity, the `std::` part of this name is
      omitted. When you see `execution::foo`, treat that as
      `std::execution::foo`.
  2. Universal references and explicit calls to `std::move`/`std::forward` are
      omitted in code samples and signatures for simplicity; assume universal
      references and perfect forwarding unless stated otherwise.
  3. None of the names proposed here are names that we are particularly attached
      to; consider the names to be reasonable placeholders that can freely be
      changed, should the committee want to do so.

## Queries and algorithms ## {#design-queries-and-algorithms}

A **query** is a callable that takes some set of objects (usually one) as
parameters and returns facts about those objects without modifying them. Queries
are usually customization point objects, but in some cases may be functions.

An **algorithm** is a callable that takes some set of objects as parameters and
causes those objects to do something. Algorithms are usually customization point
objects, but in some cases may be functions.

# Design - user side # {#design-user}

## Execution resources describe the place of execution ## {#design-contexts}

An [=execution resource=] is a resource that represents the *place* where
execution will happen. This could be a concrete resource - like a specific
thread pool object, or a GPU - or a more abstract one, like the current thread
of execution. Execution contexts don't need to have a representation in code;
they are simply a term describing certain properties of execution of a function.

## Schedulers represent execution resources ## {#design-schedulers}

A [=scheduler=] is a lightweight handle that represents a strategy for
scheduling work onto an execution resource. Since execution resources don't
necessarily manifest in C++ code, it's not possible to program directly against
their API. A scheduler is a solution to that problem: the scheduler concept is
defined by a single sender algorithm, `schedule`, which returns a sender that
will complete on an execution resource determined by the scheduler. Logic that
you want to run on that context can be placed in the receiver's
completion-signalling method.

<pre highlight="c++">
execution::scheduler auto sch = thread_pool.scheduler();
execution::sender auto snd = execution::schedule(sch);
// snd is a sender (see below) describing the creation of a new execution resource
// on the execution resource associated with sch
</pre>

Note that a particular scheduler type may provide other kinds of scheduling operations
which are supported by its associated execution resource. It is not limited to scheduling
purely using the `execution::schedule` API.

Future papers will propose additional scheduler concepts that extend `scheduler`
to add other capabilities. For example:

* A `time_scheduler` concept that extends `scheduler` to support time-based
    scheduling. Such a concept might provide access to `schedule_after(sched,
    duration)`, `schedule_at(sched, time_point)` and `now(sched)` APIs.
* Concepts that extend `scheduler` to support opening, reading and writing files
    asynchronously.
* Concepts that extend `scheduler` to support connecting, sending data and
    receiving data over the network asynchronously.

## Senders describe work ## {#design-senders}

A [=sender=] is an object that describes work. Senders are similar to futures in
existing asynchrony designs, but unlike futures, the work that is being done to
arrive at the values they will *send* is also directly described by the sender
object itself. A sender is said to *send* some values if a receiver connected
(see [[#design-connect]]) to that sender will eventually *receive* said values.

The primary defining sender algorithm is [[#design-connect]]; this function,
however, is not a user-facing API; it is used to facilitate communication
between senders and various sender algorithms, but end user code is not expected
to invoke it directly.

The way user code is expected to interact with senders is by using [=sender
algorithms=]. This paper proposes an initial set of such sender algorithms,
which are described in [[#design-composable]], [[#design-sender-factories]],
[[#design-sender-adaptors]], and [[#design-sender-consumers]]. For example, here
is how a user can create a new sender on a scheduler, attach a continuation to
it, and then wait for execution of the continuation to complete:

<pre highlight="c++">
execution::scheduler auto sch = thread_pool.scheduler();
execution::sender auto snd = execution::schedule(sch);
execution::sender auto cont = execution::then(snd, []{
    std::fstream file{ "result.txt" };
    file << compute_result;
});

this_thread::sync_wait(cont);
// at this point, cont has completed execution
</pre>

## Senders are composable through sender algorithms ## {#design-composable}

Asynchronous programming often departs from traditional code structure and control flow that we are familiar with.
A successful asynchronous framework must provide an intuitive story for composition of asynchronous work: expressing dependencies, passing objects, managing object lifetimes, etc.

The true power and utility of senders is in their composability.
With senders, users can describe generic execution pipelines and graphs, and then run them on and across a variety of different schedulers.
Senders are composed using [=sender algorithms=]:

* [=sender factories=], algorithms that take no senders and return a sender.
* [=sender adaptors=], algorithms that take (and potentially
    `execution::connect`) senders and return a sender.
* [=sender consumers=], algorithms that take (and potentially
    `execution::connect`) senders and do not return a sender.

## Senders can propagate completion schedulers ## {#design-propagation}

One of the goals of executors is to support a diverse set of execution resources, including traditional thread pools, task and fiber frameworks (like <a href="https://github.com/STEllAR-GROUP/hpx">HPX</a> and [Legion](https://github.com/StanfordLegion/legion)), and GPUs and other accelerators (managed by runtimes such as CUDA or SYCL).
On many of these systems, not all execution agents are created equal and not all functions can be run on all execution agents.
Having precise control over the execution resource used for any given function call being submitted is important on such systems, and the users of standard execution facilities will expect to be able to express such requirements.

[[P0443R14]] was not always clear about the <i>place of execution</i> of any given piece of code.
Precise control was present in the two-way execution API present in earlier executor designs, but it has so far been missing from the senders design. There has been a proposal ([[P1897R3]]) to provide a number of sender algorithms that would enforce certain rules on the places of execution
of the work described by a sender, but we have found those sender algorithms to be insufficient for achieving the best performance on all platforms that are of interest to us. The implementation strategies that we are aware of result in one of the following situations:

  1. trying to submit work to one execution resource (such as a CPU thread pool) from another execution resource (such as a GPU or a task framework), which assumes that all execution agents are as capable as a `std::thread` (which they aren't).
  2. forcibly interleaving two adjacent execution graph nodes that are both executing on one execution resource (such as a GPU) with glue code that runs on another execution resource (such as a CPU), which is prohibitively expensive for some execution resources (such as CUDA or SYCL).
  3. having to customise most or all sender algorithms to support an execution resource, so that you can avoid problems described in 1. and 2, which we believe is impractical and brittle based on months of field experience attempting this in [Agency](https://github.com/agency-library/agency).

None of these implementation strategies are acceptable for many classes of parallel runtimes, such as task frameworks (like <a href="https://github.com/STEllAR-GROUP/hpx">HPX</a>) or accelerator runtimes (like CUDA or SYCL).

Therefore, in addition to the `on` sender algorithm from [[P1897R3]], we are proposing a way for senders to advertise what scheduler (and by extension what execution resource) they will complete on.
Any given sender <b>may</b> have [=completion schedulers=] for some or all of the signals (value, error, or stopped) it completes with (for more detail on the completion-signals, see [[#design-receivers]]).
When further work is attached to that sender by invoking sender algorithms, that work will also complete on an appropriate completion scheduler.

### `execution::get_completion_scheduler` ### {#design-sender-query-get_completion_scheduler}

`get_completion_scheduler` is a query that retrieves the completion scheduler for a specific completion-signal from a sender's environment.
For a sender that lacks a completion scheduler query for a given signal, calling `get_completion_scheduler` is ill-formed.
If a sender advertises a completion scheduler for a signal in this way, that sender <b>must</b> ensure that it [=send|sends=] that signal on an execution agent belonging to an execution resource represented by a scheduler returned from this function.
See [[#design-propagation]] for more details.

<pre highlight="c++">
execution::scheduler auto cpu_sched = new_thread_scheduler{};
execution::scheduler auto gpu_sched = cuda::scheduler();

execution::sender auto snd0 = execution::schedule(cpu_sched);
execution::scheduler auto completion_sch0 =
  execution::get_completion_scheduler&lt;execution::set_value_t>(get_env(snd0));
// completion_sch0 is equivalent to cpu_sched

execution::sender auto snd1 = execution::then(snd0, []{
    std::cout << "I am running on cpu_sched!\n";
});
execution::scheduler auto completion_sch1 =
  execution::get_completion_scheduler&lt;execution::set_value_t>(get_env(snd1));
// completion_sch1 is equivalent to cpu_sched

execution::sender auto snd2 = execution::transfer(snd1, gpu_sched);
execution::sender auto snd3 = execution::then(snd2, []{
    std::cout << "I am running on gpu_sched!\n";
});
execution::scheduler auto completion_sch3 =
  execution::get_completion_scheduler&lt;execution::set_value_t>(get_env(snd3));
// completion_sch3 is equivalent to gpu_sched
</pre>

## Execution resource transitions are explicit ## {#design-transitions}

[[P0443R14]] does not contain any mechanisms for performing an execution resource transition. The only sender algorithm that can create a sender that will move execution to a *specific* execution resource is `execution::schedule`, which does not take an input sender.
That means that there's no way to construct sender chains that traverse different execution resources. This is necessary to fulfill the promise of senders being able to replace two-way executors, which had this capability.

We propose that, for senders advertising their [=completion scheduler=], all execution resource transitions <b>must</b> be explicit; running user code anywhere but where they defined it to run <b>must</b> be considered a bug.

The `execution::transfer` sender adaptor performs a transition from one execution resource to another:

<pre highlight="c++">
execution::scheduler auto sch1 = ...;
execution::scheduler auto sch2 = ...;

execution::sender auto snd1 = execution::schedule(sch1);
execution::sender auto then1 = execution::then(snd1, []{
    std::cout << "I am running on sch1!\n";
});

execution::sender auto snd2 = execution::transfer(then1, sch2);
execution::sender auto then2 = execution::then(snd2, []{
    std::cout << "I am running on sch2!\n";
});

this_thread::sync_wait(then2);
</pre>

## Senders can be either multi-shot or single-shot ## {#design-shot}

Some senders may only support launching their operation a single time, while others may be repeatable
and support being launched multiple times. Executing the operation may consume resources owned by the
sender.

For example, a sender may contain a `std::unique_ptr` that it will be transferring ownership of to the
operation-state returned by a call to `execution::connect` so that the operation has access to
this resource. In such a sender, calling `execution::connect` consumes the sender such that after
the call the input sender is no longer valid. Such a sender will also typically be move-only so that
it can maintain unique ownership of that resource.

A <dfn export=true>single-shot sender</dfn> can only be connected to a receiver
at most once. Its implementation of `execution::connect` only has overloads for
an rvalue-qualified sender. Callers must pass the sender as an rvalue to the
call to `execution::connect`, indicating that the call consumes the sender.

A <dfn export=true>multi-shot sender</dfn> can be connected to multiple
receivers and can be launched multiple times. Multi-shot senders customise
`execution::connect` to accept an lvalue reference to the sender. Callers can
indicate that they want the sender to remain valid after the call to
`execution::connect` by passing an lvalue reference to the sender to call these
overloads. Multi-shot senders should also define overloads of
`execution::connect` that accept rvalue-qualified senders to allow the sender to
be also used in places where only a single-shot sender is required.

If the user of a sender does not require the sender to remain valid after connecting it to a
receiver then it can pass an rvalue-reference to the sender to the call to `execution::connect`.
Such usages should be able to accept either single-shot or multi-shot senders.

If the caller does wish for the sender to remain valid after the call then it can pass an lvalue-qualified sender
to the call to `execution::connect`. Such usages will only accept multi-shot senders.

Algorithms that accept senders will typically either decay-copy an input sender and store it somewhere
for later usage (for example as a data-member of the returned sender) or will immediately call
`execution::connect` on the input sender, such as in `this_thread::sync_wait` or `execution::start_detached`.

Some multi-use sender algorithms may require that an input sender be copy-constructible but will only call
`execution::connect` on an rvalue of each copy, which still results in effectively executing the operation multiple times.
Other multi-use sender algorithms may require that the sender is move-constructible but will invoke `execution::connect`
on an lvalue reference to the sender.

For a sender to be usable in both multi-use scenarios, it will generally be required to be both copy-constructible and lvalue-connectable.

## Senders are forkable ## {#design-forkable}

Any non-trivial program will eventually want to fork a chain of senders into independent streams of work, regardless of whether they are single-shot or multi-shot.
For instance, an incoming event to a middleware system may be required to trigger events on more than one downstream system.
This requires that we provide well defined mechanisms for making sure that connecting a sender multiple times is possible and correct.

The `split` sender adaptor facilitates connecting to a sender multiple times, regardless of whether it is single-shot or multi-shot:

<pre highlight="c++">
auto some_algorithm(execution::sender auto&& input) {
    execution::sender auto multi_shot = split(input);
    // "multi_shot" is guaranteed to be multi-shot,
    // regardless of whether "input" was multi-shot or not

    return when_all(
      then(multi_shot, [] { std::cout << "First continuation\n"; }),
      then(multi_shot, [] { std::cout << "Second continuation\n"; })
    );
}
</pre>

## Senders support cancellation ## {#design-cancellation}

Senders are often used in scenarios where the application may be concurrently executing
multiple strategies for achieving some program goal. When one of these strategies succeeds
(or fails) it may not make sense to continue pursuing the other strategies as their results
are no longer useful.

For example, we may want to try to simultaneously connect to multiple network servers and use
whichever server responds first. Once the first server responds we no longer need to continue
trying to connect to the other servers.

Ideally, in these scenarios, we would somehow be able to request that those other strategies
stop executing promptly so that their resources (e.g. cpu, memory, I/O bandwidth) can be
released and used for other work.

While the design of senders has support for cancelling an operation before it starts
by simply destroying the sender or the operation-state returned from `execution::connect()`
before calling `execution::start()`, there also needs to be a standard, generic mechanism
to ask for an already-started operation to complete early.

The ability to be able to cancel in-flight operations is fundamental to supporting some kinds
of generic concurrency algorithms.

For example:
* a `when_all(ops...)` algorithm should cancel other operations as soon as one operation fails
* a `first_successful(ops...)` algorithm should cancel the other operations as soon as one operation completes successfuly
* a generic `timeout(src, duration)` algorithm needs to be able to cancel the `src` operation after the timeout duration has elapsed.
* a `stop_when(src, trigger)` algorithm should cancel `src` if `trigger` completes first and cancel `trigger` if `src` completes first


The mechanism used for communcating cancellation-requests, or stop-requests, needs to have a uniform interface
so that generic algorithms that compose sender-based operations, such as the ones listed above, are able to
communicate these cancellation requests to senders that they don't know anything about.

The design is intended to be composable so that cancellation of higher-level operations can propagate
those cancellation requests through intermediate layers to lower-level operations that need to actually
respond to the cancellation requests.

For example, we can compose the algorithms mentioned above so that child operations
are cancelled when any one of the multiple cancellation conditions occurs:
<pre highlight="c++">
sender auto composed_cancellation_example(auto query) {
  return stop_when(
    timeout(
      when_all(
        first_successful(
          query_server_a(query),
          query_server_b(query)),
        load_file("some_file.jpg")),
      5s),
    cancelButton.on_click());
}
</pre>

In this example, if we take the operation returned by `query_server_b(query)`, this operation will
receive a stop-request when any of the following happens:
* `first_successful` algorithm will send a stop-request if `query_server_a(query)` completes successfully
* `when_all` algorithm will send a stop-request if the `load_file("some_file.jpg")` operation completes with an error or stopped result.
* `timeout` algorithm will send a stop-request if the operation does not complete within 5 seconds.
* `stop_when` algorithm will send a stop-request if the user clicks on the "Cancel" button in the user-interface.
* The parent operation consuming the `composed_cancellation_example()` sends a stop-request


Note that within this code there is no explicit mention of cancellation, stop-tokens, callbacks, etc.
yet the example fully supports and responds to the various cancellation sources.

The intent of the design is that the common usage of cancellation in sender/receiver-based code is
primarily through use of concurrency algorithms that manage the detailed plumbing of cancellation
for you. Much like algorithms that compose senders relieve the user from having to write their own
receiver types, algorithms that introduce concurrency and provide higher-level cancellation semantics
relieve the user from having to deal with low-level details of cancellation.

### Cancellation design summary ### {#design-cancellation-summary}

The design of cancellation described in this paper is built on top of and extends the `std::stop_token`-based
cancellation facilities added in C++20, first proposed in [[P2175R0]].

At a high-level, the facilities proposed by this paper for supporting cancellation include:
* Add `std::stoppable_token` and `std::stoppable_token_for` concepts that generalise the interface of `std::stop_token` type to allow other types with different implementation strategies.
* Add `std::unstoppable_token` concept for detecting whether a `stoppable_token` can never receive a stop-request.
* Add `std::in_place_stop_token`, `std::in_place_stop_source` and `std::in_place_stop_callback<CB>` types that provide a more efficient implementation of a stop-token for use in structured concurrency situations.
* Add `std::never_stop_token` for use in places where you never want to issue a stop-request
* Add `std::execution::get_stop_token()` CPO for querying the stop-token to use for an operation from its receiver's execution environment.
* Add `std::execution::stop_token_of_t<T>` for querying the type of a stop-token returned from `get_stop_token()`

In addition, there are requirements added to some of the algorithms to specify what their cancellation
behaviour is and what the requirements of customisations of those algorithms are with respect to
cancellation.

The key component that enables generic cancellation within sender-based operations is the `execution::get_stop_token()` CPO.
This CPO takes a single parameter, which is the execution environment of the receiver passed to `execution::connect`, and returns a `std::stoppable_token`
that the operation can use to check for stop-requests for that operation.

As the caller of `execution::connect` typically has control over the receiver
type it passes, it is able to customise the `std::get_env()` CPO for that
receiver to return an execution environment that hooks the
`execution::get_stop_token()` CPO to return a stop-token that the receiver has
control over and that it can use to communicate a stop-request to the operation
once it has started.

### Support for cancellation is optional ### {#design-cancellation-optional}

Support for cancellation is optional, both on part of the author of the receiver and on part of the author of the sender.

If the receiver's execution environment does not customise the
`execution::get_stop_token()` CPO then invoking the CPO on that receiver's
environment will invoke the default implementation which returns
`std::never_stop_token`. This is a special `stoppable_token` type that is
statically known to always return `false` from the `stop_possible()` method.

Sender code that tries to use this stop-token will in general result in code that handles stop-requests being
compiled out and having little to no run-time overhead.

If the sender doesn't call `execution::get_stop_token()`, for example because the operation does not support
cancellation, then it will simply not respond to stop-requests from the caller.

Note that stop-requests are generally racy in nature as there is often a race betwen an operation completing
naturally and the stop-request being made. If the operation has already completed or past the point at which
it can be cancelled when the stop-request is sent then the stop-request may just be ignored. An application
will typically need to be able to cope with senders that might ignore a stop-request anyway.

### Cancellation is inherently racy ### {#design-cancellation-racy}

Usually, an operation will attach a stop-callback at some point inside the call to `execution::start()` so that
a subsequent stop-request will interrupt the logic.

A stop-request can be issued concurrently from another thread. This means the implementation of `execution::start()`
needs to be careful to ensure that, once a stop-callback has been registered, that there are no data-races between
a potentially concurrently-executing stop-callback and the rest of the `execution::start()` implementation.

An implementation of `execution::start()` that supports cancellation will generally need to perform (at least)
two separate steps: launch the operation, subscribe a stop-callback to the receiver's stop-token. Care needs
to be taken depending on the order in which these two steps are performed.

If the stop-callback is subscribed first and then the operation is launched, care needs to be taken to ensure
that a stop-request that invokes the stop-callback on another thread after the stop-callback is registered
but before the operation finishes launching does not either result in a missed cancellation request or a
data-race. e.g. by performing an atomic write after the launch has finished executing

If the operation is launched first and then the stop-callback is subscribed, care needs to be taken to ensure
that if the launched operation completes concurrently on another thread that it does not destroy the operation-state
until after the stop-callback has been registered. e.g. by having the `execution::start` implementation write to
an atomic variable once it has finished registering the stop-callback and having the concurrent completion handler
check that variable and either call the completion-signalling operation or store the result and defer calling the
receiver's completion-signalling operation to the `execution::start()` call (which is still executing).

For an example of an implementation strategy for solving these data-races see [[#example-async-windows-socket-recv]].

### Cancellation design status ### {#design-cancellation-status}

This paper currently includes the design for cancellation as proposed in
[[P2175R0]] - "Composable cancellation for sender-based async operations".
P2175R0 contains more details on the background motivation and prior-art and design rationale of this design.

It is important to note, however, that initial review of this design in the SG1 concurrency subgroup raised some concerns
related to runtime overhead of the design in single-threaded scenarios and these concerns are still being investigated.

The design of P2175R0 has been included in this paper for now, despite its potential to change, as we believe that
support for cancellation is a fundamental requirement for an async model and is required in some form to be able to
talk about the semantics of some of the algorithms proposed in this paper.

This paper will be updated in the future with any changes that arise from the investigations into P2175R0.

## Sender factories and adaptors are lazy ## {#design-lazy-algorithms}

In an earlier revision of this paper, some of the proposed algorithms supported
executing their logic eagerly; <i>i.e.</i>, before the returned sender has been
connected to a receiver and started. These algorithms were removed because eager
execution has a number of negative semantic and performance implications.

We have originally included this functionality in the paper because of a long-standing
belief that eager execution is a mandatory feature to be included in the standard Executors
facility for that facility to be acceptable for accelerator vendors. A particular concern
was that we must be able to write generic algorithms that can run either eagerly or lazily,
depending on the kind of an input sender or scheduler that have been passed into them as
arguments. We considered this a requirement, because the _latency_ of launching work on an
accelerator can sometimes be considerable.

However, in the process of working on this paper and implementations of the features
proposed within, our set of requirements has shifted, as we understood the different
implementation strategies that are available for the feature set of this paper better,
and, after weighting the earlier concerns against the points presented below, we
have arrived at the conclusion that a purely lazy model is enough for most algorithms,
and users who intend to launch work earlier may use an algorithm such as `ensure_started`
to achieve that goal. We have also come to deeply appreciate the fact that a purely
lazy model allows both the implementation and the compiler to have a much better
understanding of what the complete graph of tasks looks like, allowing them to better
optimize the code - also when targetting accelerators.

### Eager execution leads to detached work or worse ### {#design-lazy-algorithms-detached}

One of the questions that arises with APIs that can potentially return
eagerly-executing senders is "What happens when those senders are destructed
without a call to `execution::connect`?" or similarly, "What happens if a call
to `execution::connect` is made, but the returned operation state is destroyed
before `execution::start` is called on that operation state"?

In these cases, the operation represented by the sender is potentially executing
concurrently in another thread at the time that the destructor of the sender
and/or operation-state is running. In the case that the operation has not
completed executing by the time that the destructor is run we need to decide
what the semantics of the destructor is.

There are three main strategies that can be adopted here, none of which is
particularly satisfactory:

1.  Make this undefined-behaviour - the caller must ensure that any
    eagerly-executing sender is always joined by connecting and starting that
    sender. This approach is generally pretty hostile to programmers,
    particularly in the presence of exceptions, since it complicates the ability
    to compose these operations.

    Eager operations typically need to acquire resources when they are first
    called in order to start the operation early. This makes eager algorithms
    prone to failure. Consider, then, what might happen in an expression such as
    `when_all(eager_op_1(), eager_op_2())`. Imagine `eager_op_1()` starts an
    asynchronous operation successfully, but then `eager_op_2()` throws. For
    lazy senders, that failure happens in the context of the `when_all`
    algorithm, which handles the failure and ensures that async work joins on
    all code paths. In this case though -- the eager case -- the child operation
    has failed even before `when_all` has been called.

    It then becomes the responsibility, not of the algorithm, but of the end
    user to handle the exception and ensure that `eager_op_1()` is joined before
    allowing the exception to propagate. If they fail to do that, they incur
    undefined behavior.

2.  Detach from the computation - let the operation continue in the background -
    like an implicit call to `std::thread::detach()`. While this approach can
    work in some circumstances for some kinds of applications, in general it is
    also pretty user-hostile; it makes it difficult to reason about the safe
    destruction of resources used by these eager operations. In general,
    detached work necessitates some kind of garbage collection; <i>e.g.</i>,
    `std::shared_ptr`, to ensure resources are kept alive until the operations
    complete, and can make clean shutdown nigh impossible.

3.  Block in the destructor until the operation completes. This approach is
    probably the safest to use as it preserves the structured nature of the
    concurrent operations, but also introduces the potential for deadlocking the
    application if the completion of the operation depends on the current thread
    making forward progress.

      The risk of deadlock might occur, for example, if a thread-pool with a
    small number of threads is executing code that creates a sender representing
    an eagerly-executing operation and then calls the destructor of that sender
    without joining it (e.g. because an exception was thrown). If the current
    thread blocks waiting for that eager operation to complete and that eager
    operation cannot complete until some entry enqueued to the thread-pool's
    queue of work is run then the thread may wait for an indefinite amount of
    time. If all threads of the thread-pool are simultaneously performing such
    blocking operations then deadlock can result.

There are also minor variations on each of these choices. For example:

4.  A variation of (1): Call `std::terminate` if an eager sender is destructed
    without joining it. This is the approach that `std::thread` destructor
    takes.

5.  A variation of (2): Request cancellation of the operation before detaching.
    This reduces the chances of operations continuing to run indefinitely in the
    background once they have been detached but does not solve the
    lifetime- or shutdown-related challenges.

6.  A variation of (3): Request cancellation of the operation before blocking on
    its completion. This is the strategy that `std::jthread` uses for its
    destructor. It reduces the risk of deadlock but does not eliminate it.

### Eager senders complicate algorithm implementations ### {#design-lazy-algorithms-complexity}

Algorithms that can assume they are operating on senders with strictly lazy
semantics are able to make certain optimizations that are not available if
senders can be potentially eager. With lazy senders, an algorithm can safely
assume that a call to `execution::start` on an operation state strictly happens
before the execution of that async operation. This frees the algorithm from
needing to resolve potential race conditions. For example, consider an algorithm
`sequence` that puts async operations in sequence by starting an operation only
after the preceding one has completed. In an expression like `sequence(a(),
then(src, [] { b(); }), c())`, one may reasonably assume that `a()`, `b()` and
`c()` are sequenced and therefore do not need synchronisation. Eager algorithms
break that assumption.

When an algorithm needs to deal with potentially eager senders, the potential
race conditions can be resolved one of two ways, neither of which is desirable:

1.  Assume the worst and implement the algorithm defensively, assuming all
    senders are eager. This obviously has overheads both at runtime and in
    algorithm complexity. Resolving race conditions is hard.

2.  Require senders to declare whether they are eager or not with a query.
    Algorithms can then implement two different implementation strategies, one
    for strictly lazy senders and one for potentially eager senders. This
    addresses the performance problem of (1) while compounding the complexity
    problem.

### Eager senders incur cancellation-related overhead ### {#design-lazy-algorithms-runtime}

Another implication of the use of eager operations is with regards to
cancellation. The eagerly executing operation will not have access to the
caller's stop token until the sender is connected to a receiver. If we still
want to be able to cancel the eager operation then it will need to create a new
stop source and pass its associated stop token down to child operations. Then
when the returned sender is eventually connected it will register a stop
callback with the receiver's stop token that will request stop on the eager
sender's stop source.

As the eager operation does not know at the time that it is launched what the
type of the receiver is going to be, and thus whether or not the stop token
returned from `execution::get_stop_token` is an `std::unstoppable_token` or not,
the eager operation is going to need to assume it might be later connected to a
receiver with a stop token that might actually issue a stop request. Thus it
needs to declare space in the operation state for a type-erased stop callback
and incur the runtime overhead of supporting cancellation, even if cancellation
will never be requested by the caller.

The eager operation will also need to do this to support sending a stop request
to the eager operation in the case that the sender representing the eager work
is destroyed before it has been joined (assuming strategy (5) or (6) listed
above is chosen).

### Eager senders cannot access execution resource from the receiver ### {#design-lazy-algorithms-context}

In sender/receiver, contextual information is passed from parent operations to
their children by way of receivers. Information like stop tokens, allocators,
current scheduler, priority, and deadline are propagated to child operations
with custom receivers at the time the operation is connected. That way, each
operation has the contextual information it needs before it is started.

But if the operation is started before it is connected to a receiver, then there
isn't a way for a parent operation to communicate contextual information to its
child operations, which may complete before a receiver is ever attached.

## Schedulers advertise their forward progress guarantees ## {#design-fpg}

To decide whether a scheduler (and its associated execution resource) is sufficient for a specific task, it may be necessary to know what kind of forward progress guarantees it provides for the execution agents it creates. The C++ Standard defines the following
forward progress guarantees:

* <i>concurrent</i>, which requires that a thread makes progress <i>eventually</i>;
* <i>parallel</i>, which requires that a thread makes progress once it executes a step; and
* <i>weakly parallel</i>, which does not require that the thread makes progress.

This paper introduces a scheduler query function, `get_forward_progress_guarantee`, which returns one of the enumerators of a new `enum` type, `forward_progress_guarantee`. Each enumerator of `forward_progress_guarantee` corresponds to one of the aforementioned
guarantees.

## Most sender adaptors are pipeable ## {#design-pipeable}

To facilitate an intuitive syntax for composition, most sender adaptors are <dfn export=true>pipeable</dfn>; they can be composed (<dfn export=true>piped</dfn>) together with `operator|`.
This mechanism is similar to the `operator|` composition that C++ range adaptors support and draws inspiration from piping in *nix shells.
Pipeable sender adaptors take a sender as their first parameter and have no other sender parameters.

`a | b` will pass the sender `a` as the first argument to the pipeable sender adaptor `b`. Pipeable sender adaptors support partial application of the parameters after the first. For example, all of the following are equivalent:

<pre highlight="c++">
execution::bulk(snd, N, [] (std::size_t i, auto d) {});
execution::bulk(N, [] (std::size_t i, auto d) {})(snd);
snd | execution::bulk(N, [] (std::size_t i, auto d) {});
</pre>

Piping enables you to compose together senders with a linear syntax.
Without it, you'd have to use either nested function call syntax, which would cause a syntactic inversion of the direction of control flow, or you'd have to introduce a temporary variable for each stage of the pipeline.
Consider the following example where we want to execute first on a CPU thread pool, then on a CUDA GPU, then back on the CPU thread pool:

<table>
<tr>
<th>Syntax Style
<th>Example
<tr>
<th>Function call <br/> (nested)
<td><pre highlight="c++">
auto snd = execution::then(
             execution::transfer(
               execution::then(
                 execution::transfer(
                   execution::then(
                     execution::schedule(thread_pool.scheduler())
                     []{ return 123; }),
                   cuda::new_stream_scheduler()),
                 [](int i){ return 123 * 5; }),
               thread_pool.scheduler()),
             [](int i){ return i - 5; });
auto [result] = this_thread::sync_wait(snd).value();
// result == 610
</pre>
<tr>
<th>Function call <br/> (named temporaries)
<td><pre highlight="c++">
auto snd0 = execution::schedule(thread_pool.scheduler());
auto snd1 = execution::then(snd0, []{ return 123; });
auto snd2 = execution::transfer(snd1, cuda::new_stream_scheduler());
auto snd3 = execution::then(snd2, [](int i){ return 123 * 5; })
auto snd4 = execution::transfer(snd3, thread_pool.scheduler())
auto snd5 = execution::then(snd4, [](int i){ return i - 5; });
auto [result] = *this_thread::sync_wait(snd4);
// result == 610
</pre>
<tr>
<th>Pipe
<td><pre highlight="c++">
auto snd = execution::schedule(thread_pool.scheduler())
         | execution::then([]{ return 123; })
         | execution::transfer(cuda::new_stream_scheduler())
         | execution::then([](int i){ return 123 * 5; })
         | execution::transfer(thread_pool.scheduler())
         | execution::then([](int i){ return i - 5; });
auto [result] = this_thread::sync_wait(snd).value();
// result == 610
</pre>
</table>

Certain sender adaptors are not pipeable, because using the pipeline syntax can result in confusion of the semantics of the adaptors involved. Specifically, the following sender adaptors are not pipeable.

* `execution::when_all` and `execution::when_all_with_variant`: Since this sender adaptor takes a variadic pack of senders, a partially applied form would be ambiguous with a non partially applied form with an arity of one less.
* `execution::on`: This sender adaptor changes how the sender passed to it is executed, not what happens to its result, but allowing it in a pipeline makes it read as if it performed a function more similar to `transfer`.

Sender consumers could be made pipeable, but we have chosen to not do so.
However, since these are terminal nodes in a pipeline and nothing can be piped after them, we believe a pipe syntax may be confusing as well as unnecessary, as consumers cannot be chained.
We believe sender consumers read better with function call syntax.

## A range of senders represents an async sequence of data ## {#design-range-of-senders}

Senders represent a single unit of asynchronous work. In many cases though, what is being modelled is a sequence of data arriving asynchronously, and you want computation to happen on demand, when each element arrives. This requires nothing more than what is in this paper and the range support in C++20. A range of senders would allow you to model such input as keystrikes, mouse movements, sensor readings, or network requests.

Given some expression <code><i>R</i></code> that is a range of senders, consider the following in a coroutine that returns an async generator type:

    <pre highlight="c++">
    for (auto snd : <i>R</i>) {
      if (auto opt = co_await execution::stopped_as_optional(std::move(snd)))
        co_yield fn(*std::move(opt));
      else
        break;
    }
    </pre>

This transforms each element of the asynchronous sequence <code><i>R</i></code> with the function `fn` on demand, as the data arrives. The result is a new asynchronous sequence of the transformed values.

Now imagine that <code><i>R</i></code> is the simple expression `views::iota(0) | views::transform(execution::just)`. This creates a lazy range of senders, each of which completes immediately with monotonically increasing integers. The above code churns through the range, generating a new infine asynchronous range of values [`fn(0)`, `fn(1)`, `fn(2)`, ...].

Far more interesting would be if <code><i>R</i></code> were a range of senders representing, say, user actions in a UI. The above code gives a simple way to respond to user actions on demand.

## Senders can represent partial success ## {#design-partial-success}

Receivers have three ways they can complete: with success, failure, or cancellation. This begs the question of how they can be used to represent async operations that *partially* succeed. For example, consider an API that reads from a socket. The connection could drop after the API has filled in some of the buffer. In cases like that, it makes sense to want to report both that the connection dropped and that some data has been successfully read.

Often in the case of partial success, the error condition is not fatal nor does it mean the API has failed to satisfy its post-conditions. It is merely an extra piece of information about the nature of the completion. In those cases, "partial success" is another way of saying "success". As a result, it is sensible to pass both the error code and the result (if any) through the value channel, as shown below:

    <pre highlight="c++">
    // Capture a buffer for read_socket_async to fill in
    execution::just(array&lt;byte, 1024>{})
      | execution::let_value([socket](array&lt;byte, 1024>& buff) {
          // read_socket_async completes with two values: an error_code and
          // a count of bytes:
          return read_socket_async(socket, span{buff})
              // For success (partial and full), specify the next action:
            | execution::let_value([](error_code err, size_t bytes_read) {
                if (err != 0) {
                  // OK, partial success. Decide how to deal with the partial results
                } else {
                  // OK, full success here.
                }
              });
        })
    </pre>

In other cases, the partial success is more of a partial *failure*. That happens when the error condition indicates that in some way the function failed to satisfy its post-conditions. In those cases, sending the error through the value channel loses valuable contextual information. It's possible that bundling the error and the incomplete results into an object and passing it through the error channel makes more sense. In that way, generic algorithms will not miss the fact that a post-condition has not been met and react inappropriately.

Another possibility is for an async API to return a *range* of senders: if the API completes with full success, full error, or cancellation, the returned range contains just one sender with the result. Otherwise, if the API partially fails (doesn't satisfy its post-conditions, but some incomplete result is available), the returned range would have *two* senders: the first containing the partial result, and the second containing the error. Such an API might be used in a coroutine as follows:

    <pre highlight="c++">
    // Declare a buffer for read_socket_async to fill in
    array&lt;byte, 1024> buff;

    for (auto snd : read_socket_async(socket, span{buff})) {
      try {
        if (optional&lt;size_t> bytes_read =
              co_await execution::stopped_as_optional(std::move(snd)))
          // OK, we read some bytes into buff. Process them here....
        } else {
          // The socket read was cancelled and returned no data. React
          // appropriately.
        }
      } catch (...) {
        // read_socket_async failed to meet its post-conditions.
        // Do some cleanup and propagate the error...
      }
    }
    </pre>

Finally, it's possible to combine these two approaches when the API can both partially succeed (meeting its post-conditions) and partially fail (not meeting its post-conditions).

## All awaitables are senders ## {#design-awaitables-are-senders}

Since C++20 added coroutines to the standard, we expect that coroutines and awaitables will be how a great many will choose to express their asynchronous code. However, in this paper, we are proposing to add a suite of asynchronous algorithms that accept senders, not awaitables. One might wonder whether and how these algorithms will be accessible to those who choose coroutines instead of senders.

In truth there will be no problem because all generally awaitable types
automatically model the `sender` concept. The adaptation is transparent and
happens in the sender customization points, which are aware of awaitables. (By
"generally awaitable" we mean types that don't require custom `await_transform`
trickery from a promise type to make them awaitable.)

For an example, imagine a coroutine type called `task<T>` that knows nothing
about senders. It doesn't implement any of the sender customization points.
Despite that fact, and despite the fact that the `this_thread::sync_wait`
algorithm is constrained with the `sender` concept, the following would compile
and do what the user wants:

```c++
task<int> doSomeAsyncWork();

int main() {
  // OK, awaitable types satisfy the requirements for senders:
  auto o = this_thread::sync_wait(doSomeAsyncWork());
}
```

Since awaitables are senders, writing a sender-based asynchronous algorithm is trivial if you have a coroutine task type: implement the algorithm as a coroutine. If you are not bothered by the possibility of allocations and indirections as a result of using coroutines, then there is no need to ever write a sender, a receiver, or an operation state.

## Many senders can be trivially made awaitable ## {#design-senders-are-awaitable}

If you choose to implement your sender-based algorithms as coroutines, you'll run into the issue of how to retrieve results from a passed-in sender. This is not a problem. If the coroutine type opts in to sender support -- trivial with the `execution::with_awaitable_senders` utility -- then a large class of senders are transparently awaitable from within the coroutine.

For example, consider the following trivial implementation of the sender-based `retry` algorithm:

<pre highlight="c++">
template&lt;class S>
  requires <i>single-sender</i>&lt;S&> // see <a href="#spec-execution.coro_utils.as_awaitable">[exec.as.awaitable]</a>
task&lt;<i>single-sender-value-type</i>&lt;S>> retry(S s) {
  for (;;) {
    try {
      co_return co_await s;
    } catch(...) {
    }
  }
}
</pre>

Only *some* senders can be made awaitable directly because of the fact that callbacks are more expressive than coroutines. An awaitable expression has a single type: the result value of the async operation. In contrast, a callback can accept multiple arguments as the result of an operation. What's more, the callback can have overloaded function call signatures that take different sets of arguments. There is no way to automatically map such senders into awaitables. The `with_awaitable_senders` utility recognizes as awaitables those senders that send a single value of a single type. To await another kind of sender, a user would have to first map its value channel into a single value of a single type -- say, with the `into_variant` sender algorithm -- before `co_await`-ing that sender.

## Cancellation of a sender can unwind a stack of coroutines ## {#design-native-coro-unwind}

When looking at the sender-based `retry` algorithm in the previous section, we can see that the value and error cases are correctly handled. But what about cancellation? What happens to a coroutine that is suspended awaiting a sender that completes by calling `execution::set_stopped`?

When your task type's promise inherits from `with_awaitable_senders`, what happens is this: the coroutine behaves as if an *uncatchable exception* had been thrown from the `co_await` expression. (It is not really an exception, but it's helpful to think of it that way.) Provided that the promise types of the calling coroutines also inherit from `with_awaitable_senders`, or more generally implement a member function called `unhandled_stopped`, the exception unwinds the chain of coroutines as if an exception were thrown except that it bypasses `catch(...)` clauses.

In order to "catch" this uncatchable stopped exception, one of the calling coroutines in the stack would have to await a sender that maps the stopped channel into either a value or an error. That is achievable with the `execution::let_stopped`, `execution::upon_stopped`, `execution::stopped_as_optional`, or `execution::stopped_as_error` sender adaptors. For instance, we can use `execution::stopped_as_optional` to "catch" the stopped signal and map it into an empty optional as shown below:

```c++
if (auto opt = co_await execution::stopped_as_optional(some_sender)) {
  // OK, some_sender completed successfully, and opt contains the result.
} else {
  // some_sender completed with a cancellation signal.
}
```

As described in the section <a href="#design-awaitables-are-senders">"All awaitables are senders"</a>, the sender customization points recognize awaitables and adapt them transparently to model the sender concept. When `connect`-ing an awaitable and a receiver, the adaptation layer awaits the awaitable within a coroutine that implements `unhandled_stopped` in its promise type. The effect of this is that an "uncatchable" stopped exception propagates seamlessly out of awaitables, causing `execution::set_stopped` to be called on the receiver.

Obviously, `unhandled_stopped` is a library extension of the coroutine promise interface. Many promise types will not implement `unhandled_stopped`. When an uncatchable stopped exception tries to propagate through such a coroutine, it is treated as an unhandled exception and `terminate` is called. The solution, as described above, is to use a sender adaptor to handle the stopped exception before awaiting it. It goes without saying that any future Standard Library coroutine types ought to implement `unhandled_stopped`. The author of [[P1056R1]], which proposes a standard coroutine task type, is in agreement.

## Composition with parallel algorithms ## {#design-parallel-algorithms}

The C++ Standard Library provides a large number of algorithms that offer the potential for non-sequential execution via the use of execution policies. The set of algorithms with execution policy overloads are often referred to as "parallel algorithms", although
additional policies are available.

Existing policies, such as `execution::par`, give the implementation permission to execute the algorithm in parallel. However, the choice of execution resources used to perform the work is left to the implementation.

We will propose a customization point for combining schedulers with policies in order to provide control over where work will execute.

<pre highlight="c++">
template&lt;class ExecutionPolicy>
<i>unspecified</i> executing_on(
    execution::scheduler auto scheduler,
    ExecutionPolicy && policy
);
</pre>

This function would return an object of an unspecified type which can be used in place of an execution policy as the first argument to one of the parallel algorithms. The overload selected by that object should execute its computation as requested by
`policy` while using `scheduler` to create any work to be run. The expression may be ill-formed if `scheduler` is not able to support the given policy.

The existing parallel algorithms are synchronous; all of the effects performed by the computation are complete before the algorithm returns to its caller. This remains unchanged with the `executing_on` customization point.

In the future, we expect additional papers will propose asynchronous forms of the parallel algorithms which (1) return senders rather than values or `void` and (2) where a customization point pairing a sender with an execution policy would similarly be used to
obtain an object of unspecified type to be provided as the first argument to the algorithm.

## User-facing sender factories ## {#design-sender-factories}

A [=sender factory=] is an algorithm that takes no senders as parameters and returns a sender.

### `execution::schedule` ### {#design-sender-factory-schedule}

<pre highlight="c++">
execution::sender auto schedule(
    execution::scheduler auto scheduler
);
</pre>

Returns a sender describing the start of a task graph on the provided scheduler. See [[#design-schedulers]].

<pre highlight="c++">
execution::scheduler auto sch1 = get_system_thread_pool().scheduler();

execution::sender auto snd1 = execution::schedule(sch1);
// snd1 describes the creation of a new task on the system thread pool
</pre>

### `execution::just` ### {#design-sender-factory-just}

<pre highlight="c++">
execution::sender auto just(
    auto ...&& values
);
</pre>

Returns a sender with no [=completion scheduler|completion schedulers=], which [=send|sends=] the provided values. The input values are decay-copied into the returned sender. When the returned sender is connected to a receiver, the values are moved into the operation state if the sender is an rvalue; otherwise, they are copied. Then xvalues referencing the values in the operation state are passed to the receiver's `set_value`.

```c++
execution::sender auto snd1 = execution::just(3.14);
execution::sender auto then1 = execution::then(snd1, [] (double d) {
  std::cout << d << "\n";
});

execution::sender auto snd2 = execution::just(3.14, 42);
execution::sender auto then2 = execution::then(snd2, [] (double d, int i) {
  std::cout << d << ", " << i << "\n";
});

std::vector v3{1, 2, 3, 4, 5};
execution::sender auto snd3 = execution::just(v3);
execution::sender auto then3 = execution::then(snd3, [] (std::vector<int>&& v3copy) {
  for (auto&& e : v3copy) { e *= 2; }
  return std::move(v3copy);
}
auto&& [v3copy] = this_thread::sync_wait(then3).value();
// v3 contains {1, 2, 3, 4, 5}; v3copy will contain {2, 4, 6, 8, 10}.

execution::sender auto snd4 = execution::just(std::vector{1, 2, 3, 4, 5});
execution::sender auto then4 = execution::then(std::move(snd4), [] (std::vector<int>&& v4) {
  for (auto&& e : v4) { e *= 2; }
  return std::move(v4);
});
auto&& [v4] = this_thread::sync_wait(std::move(then4)).value();
// v4 contains {2, 4, 6, 8, 10}. No vectors were copied in this example.
```

### `execution::just_error` ### {#design-sender-factory-just_error}

<pre highlight="c++">
execution::sender auto just_error(
    auto && error
);
</pre>

Returns a sender with no [=completion scheduler|completion schedulers=], which completes with the specified error. If the provided error is an lvalue reference, a copy is made inside the returned sender and a non-const lvalue reference to the copy is sent to the receiver's `set_error`. If the provided value is an rvalue reference, it is moved into the returned sender and an rvalue reference to it is sent to the receiver's `set_error`.

### `execution::just_stopped` ### {#design-sender-factory-just_stopped}

<pre highlight="c++">
execution::sender auto just_stopped();
</pre>

Returns a sender with no [=completion scheduler|completion schedulers=], which completes immediately by calling the receiver's `set_stopped`.

### `execution::read` ### {#design-sender-factory-read}

<pre highlight="c++">
execution::sender auto read(auto tag);

execution::sender auto get_scheduler() {
  return read(execution::get_scheduler);
}
execution::sender auto get_delegatee_scheduler() {
  return read(execution::get_delegatee_scheduler);
}
execution::sender auto get_allocator() {
  return read(execution::get_allocator);
}
execution::sender auto get_stop_token() {
  return read(execution::get_stop_token);
}
</pre>

Returns a sender that reaches into a receiver's environment and pulls out the current value associated with the customization point denoted by `Tag`. It then sends the value read back to the receiver through the value channel. For instance, `get_scheduler()` (with no arguments) is a sender that asks the receiver for the currently suggested `scheduler` and passes it to the receiver's `set_value` completion-signal.

This can be useful when scheduling nested dependent work. The following sender pulls the current schduler into the value channel and then schedules more work onto it.

    <pre highlight="c++">
    execution::sender auto task =
      execution::get_scheduler()
        | execution::let_value([](auto sched) {
            return execution::on(sched, <i>some nested work here</i>);
        });

    this_thread::sync_wait( std::move(task) ); // wait for it to finish
    </pre>

This code uses the fact that `sync_wait` associates a scheduler with the receiver that it connects with `task`. `get_scheduler()` reads that scheduler out of the receiver, and passes it to `let_value`'s receiver's `set_value` function, which in turn passes it to the lambda. That lambda returns a new sender that uses the scheduler to schedule some nested work onto `sync_wait`'s scheduler.

## User-facing sender adaptors ## {#design-sender-adaptors}

A [=sender adaptor=] is an algorithm that takes one or more senders, which it may `execution::connect`, as parameters, and returns a sender, whose completion is related to the sender arguments it has received.

Sender adaptors are <i>lazy</i>, that is, they are never allowed to submit any work for execution prior to the returned sender being [=started=] later on, and are also guaranteed to not start any input senders passed into them. Sender consumers
such as [[#design-sender-consumer-start_detached]] and [[#design-sender-consumer-sync_wait]] start senders.

For more implementer-centric description of starting senders, see [[#design-laziness]].

### `execution::transfer` ### {#design-sender-adaptor-transfer}

<pre highlight="c++">
execution::sender auto transfer(
    execution::sender auto input,
    execution::scheduler auto scheduler
);
</pre>

Returns a sender describing the transition from the execution agent of the input sender to the execution agent of the target scheduler. See [[#design-transitions]].

<pre highlight="c++">
execution::scheduler auto cpu_sched = get_system_thread_pool().scheduler();
execution::scheduler auto gpu_sched = cuda::scheduler();

execution::sender auto cpu_task = execution::schedule(cpu_sched);
// cpu_task describes the creation of a new task on the system thread pool

execution::sender auto gpu_task = execution::transfer(cpu_task, gpu_sched);
// gpu_task describes the transition of the task graph described by cpu_task to the gpu
</pre>

### `execution::then` ### {#design-sender-adaptor-then}

<pre highlight="c++">
execution::sender auto then(
    execution::sender auto input,
    std::invocable<<i>values-sent-by(input)</i>...> function
);
</pre>

`then` returns a sender describing the task graph described by the input sender, with an added node of invoking the provided function with the values [=send|sent=] by the input sender as arguments.

`then` is **guaranteed** to not begin executing `function` until the returned sender is started.

<pre highlight="c++">
execution::sender auto input = get_input();
execution::sender auto snd = execution::then(input, [](auto... args) {
    std::print(args...);
});
// snd describes the work described by pred
// followed by printing all of the values sent by pred
</pre>

This adaptor is included as it is necessary for writing any sender code that actually performs a useful function.

### `execution::upon_*` ### {#design-sender-adaptor-upon}

<pre highlight="c++">
execution::sender auto upon_error(
    execution::sender auto input,
    std::invocable&lt;<i>errors-sent-by(input)</i>...> function
);

execution::sender auto upon_stopped(
    execution::sender auto input,
    std::invocable auto function
);
</pre>

`upon_error` and `upon_stopped` are similar to `then`, but where `then` works with values sent by the input sender, `upon_error` works with errors, and `upon_stopped` is invoked when the "stopped" signal is sent.

### `execution::let_*` ### {#design-sender-adaptor-let}

<pre highlight="c++">
execution::sender auto let_value(
    execution::sender auto input,
    std::invocable<<i>values-sent-by(input)</i>...> function
);

execution::sender auto let_error(
    execution::sender auto input,
    std::invocable<<i>errors-sent-by(input)</i>...> function
);

execution::sender auto let_stopped(
    execution::sender auto input,
    std::invocable auto function
);
</pre>

`let_value` is very similar to `then`: when it is started, it invokes the provided function with the values [=send|sent=] by the input sender as arguments. However, where the sender returned from `then` sends exactly what that function ends up returning -
`let_value` requires that the function return a sender, and the sender returned by `let_value` sends the values sent by the sender returned from the callback. This is similar to the notion of "future unwrapping" in future/promise-based frameworks.

`let_value` is **guaranteed** to not begin executing `function` until the returned sender is started.

`let_error` and `let_stopped` are similar to `let_value`, but where `let_value` works with values sent by the input sender, `let_error` works with errors, and `let_stopped` is invoked when the "stopped" signal is sent.

### `execution::on` ### {#design-sender-adaptor-on}

<pre highlight="c++">
execution::sender auto on(
    execution::scheduler auto sched,
    execution::sender auto snd
);
</pre>

Returns a sender which, when started, will start the provided sender on an execution agent belonging to the execution resource associated with the provided scheduler. This returned sender has no [=completion scheduler|completion schedulers=].

### `execution::into_variant` ### {#design-sender-adaptor-into_variant}

<pre highlight="c++">
execution::sender auto into_variant(
    execution::sender auto snd
);
</pre>

Returns a sender which sends a variant of tuples of all the possible sets of types sent by the input sender. Senders can send multiple sets of values depending on runtime conditions; this is a helper function that turns them into a single variant value.

### `execution::stopped_as_optional` ### {#design-sender-adaptor-stopped_as_optional}

<pre highlight="c++">
execution::sender auto stopped_as_optional(
    <i>single-sender</i> auto snd
);
</pre>

Returns a sender that maps the value channel from a `T` to an `optional<decay_t<T>>`, and maps the stopped channel to a value of an empty `optional<decay_t<T>>`.

### `execution::stopped_as_error` ### {#design-sender-adaptor-stopped_as_error}

<pre highlight="c++">
template&lt;move_constructible Error>
execution::sender auto stopped_as_error(
    execution::sender auto snd,
    Error err
);
</pre>

Returns a sender that maps the stopped channel to an error of `err`.

### `execution::bulk` ### {#design-sender-adaptor-bulk}

<pre highlight="c++">
execution::sender auto bulk(
    execution::sender auto input,
    std::integral auto shape,
    invocable&lt;decltype(size), <i>values-sent-by(input)</i>...> function
);
</pre>

Returns a sender describing the task of invoking the provided function with every index in the provided shape along with the values sent by the input sender. The returned sender completes once all invocations have completed, or an error has occurred. If it completes
by sending values, they are equivalent to those sent by the input sender.

No instance of `function` will begin executing until the returned sender is started. Each invocation of `function` runs in an execution agent whose forward progress guarantees are determined by the scheduler on which they are run. All agents created by a single use
of `bulk` execute with the same guarantee. The number of execution agents used by `bulk` is not specified. This allows a scheduler to execute some invocations of the `function` in parallel.

In this proposal, only integral types are used to specify the shape of the bulk section. We expect that future papers may wish to explore extensions of the interface to explore additional kinds of shapes, such as multi-dimensional grids, that are commonly used for
parallel computing tasks.

### `execution::split` ### {#design-sender-adaptor-split}

<pre highlight="c++">
execution::sender auto split(execution::sender auto sender);
</pre>

If the provided sender is a multi-shot sender, returns that sender. Otherwise, returns a multi-shot sender which sends values equivalent to the values sent by the provided sender. See [[#design-shot]].

### `execution::when_all` ### {#design-sender-adaptor-when_all}

<pre highlight="c++">
execution::sender auto when_all(
    execution::sender auto ...inputs
);

execution::sender auto when_all_with_variant(
    execution::sender auto ...inputs
);
</pre>

`when_all` returns a sender that completes once all of the input senders have completed. It is constrained to only accept senders that can complete with a single set of values (_i.e._, it only calls one overload of `set_value` on its receiver). The values sent by this sender are the values sent by each of the input senders, in order of the arguments passed to `when_all`. It completes inline on the execution resource on which the last input sender completes, unless stop is requested before `when_all` is started, in which case it completes inline within the call to `start`.

`when_all_with_variant` does the same, but it adapts all the input senders using `into_variant`, and so it does not constrain the input arguments as `when_all` does.

The returned sender has no [=completion scheduler|completion schedulers=].

<pre highlight="c++">
execution::scheduler auto sched = thread_pool.scheduler();

execution::sender auto sends_1 = ...;
execution::sender auto sends_abc = ...;

execution::sender auto both = execution::when_all(sched,
    sends_1,
    sends_abc
);

execution::sender auto final = execution::then(both, [](auto... args){
    std::cout << std::format("the two args: {}, {}", args...);
});
// when final executes, it will print "the two args: 1, abc"
</pre>

### `execution::ensure_started` ### {#design-sender-adaptor-ensure_started}

<pre highlight="c++">
execution::sender auto ensure_started(
    execution::sender auto sender
);
</pre>

Once `ensure_started` returns, it is known that the provided sender has been [=connect|connected=] and `start` has been called on the resulting operation state (see [[#design-states]]); in other words, the work described by the provided sender has been submitted
for execution on the appropriate execution resources. Returns a sender which completes when the provided sender completes and sends values equivalent to those of the provided sender.

If the returned sender is destroyed before `execution::connect()` is called, or if `execution::connect()` is called but the
returned operation-state is destroyed before `execution::start()` is called, then a stop-request is sent to the eagerly launched
operation and the operation is detached and will run to completion in the background. Its result will be discarded when it
eventually completes.

Note that the application will need to make sure that resources are kept alive in the case that the operation detaches.
e.g. by holding a `std::shared_ptr` to those resources or otherwise having some out-of-band way to signal completion of
the operation so that resource release can be sequenced after the completion.

## User-facing sender consumers ## {#design-sender-consumers}

A [=sender consumer=] is an algorithm that takes one or more senders, which it may `execution::connect`, as parameters, and does not return a sender.

### `execution::start_detached` ### {#design-sender-consumer-start_detached}

<pre highlight="c++">
void start_detached(
    execution::sender auto sender
);
</pre>

Like `ensure_started`, but does not return a value; if the provided sender sends an error instead of a value, `std::terminate` is called.

### `this_thread::sync_wait` ### {#design-sender-consumer-sync_wait}

<pre highlight="c++">
auto sync_wait(
    execution::sender auto sender
) requires (<i>always-sends-same-values</i>(sender))
    -> std::optional&lt;std::tuple&lt;<i>values-sent-by</i>(sender)>>;
</pre>

`this_thread::sync_wait` is a sender consumer that submits the work described by the provided sender for execution, similarly to `ensure_started`, except that it blocks <b>the current `std::thread` or thread of `main`</b> until the work is completed, and returns
an optional tuple of values that were sent by the provided sender on its completion of work. Where [[#design-sender-factory-schedule]] and [[#design-sender-factory-just]] are meant to <i>enter</i> the domain of senders, `sync_wait` is meant to <i>exit</i> the domain of
senders, retrieving the result of the task graph.

If the provided sender sends an error instead of values, `sync_wait` throws that error as an exception, or rethrows the original exception if the error is of type `std::exception_ptr`.

If the provided sender sends the "stopped" signal instead of values, `sync_wait` returns an empty optional.

For an explanation of the `requires` clause, see [[#design-typed]]. That clause also explains another sender consumer, built on top of `sync_wait`: `sync_wait_with_variant`.

Note: This function is specified inside `std::this_thread`, and not inside `execution`. This is because `sync_wait` has to block the <i>current</i> execution agent, but determining what the current execution agent is is not reliable. Since the standard
does not specify any functions on the current execution agent other than those in `std::this_thread`, this is the flavor of this function that is being proposed. If C++ ever obtains fibers, for instance, we expect that a variant of this function called
`std::this_fiber::sync_wait` would be provided. We also expect that runtimes with execution agents that use different synchronization mechanisms than `std::thread`'s will provide their own flavors of `sync_wait` as well (assuming their execution agents have the means
to block in a non-deadlock manner).

## `execution::execute` ## {#design-execute}

In addition to the three categories of functions presented above, we also propose to include a convenience function for fire-and-forget eager one-way submission of an invocable to a scheduler, to fulfil the role of one-way executors from P0443.

<pre highlight="c++">
void execution::execute(
    execution::schedule auto sched,
    std::invocable<void> auto fn
);
</pre>

Submits the provided function for execution on the provided scheduler, as-if by:

<pre highlight="c++">
auto snd = execution::schedule(sched);
auto work = execution::then(snd, fn);
execution::start_detached(work);
</pre>

# Design - implementer side # {#design-implementer}

## Receivers serve as glue between senders ## {#design-receivers}

A [=receiver=] is a callback that supports more than one channel. In fact, it supports three of them:

* `set_value`, which is the moral equivalent of an `operator()` or a function
    call, which signals successful completion of the operation its execution
    depends on;
* `set_error`, which signals that an error has happened during scheduling of the
    current work, executing the current work, or at some earlier point in the
    sender chain; and
* `set_stopped`, which signals that the operation completed without succeeding
    (`set_value`) and without failing (`set_error`). This result is often used
    to indicate that the operation stopped early, typically because it was asked
    to do so because the result is no longer needed.

Once an async operation has been started exactly one of these functions must be invoked
on a receiver before it is destroyed.

While the receiver interface may look novel, it is in fact very similar to the
interface of `std::promise`, which provides the first two signals as `set_value`
and `set_exception`, and it's possible to emulate the third channel with
lifetime management of the promise.

Receivers are not a part of the end-user-facing API of this proposal; they are necessary to allow unrelated senders communicate with each other, but the only users who will interact with receivers directly are authors of senders.

Receivers are what is passed as the second argument to [[#design-connect]].

## Operation states represent work ## {#design-states}

An [=operation state=] is an object that represents work. Unlike senders, it is not a chaining mechanism; instead, it is a concrete object that packages the work described by a full sender chain, ready to be executed. An operation state is neither movable nor
copyable, and its interface consists of a single algorithm: `start`, which serves as the submission point of the work represented by a given operation state.

Operation states are not a part of the user-facing API of this proposal; they are necessary for implementing sender consumers like `execution::ensure_started` and `this_thread::sync_wait`, and the knowledge of them is necessary to implement senders, so the only users who will
interact with operation states directly are authors of senders and authors of sender algorithms.

The return value of [[#design-connect]] must satisfy the operation state concept.

## `execution::connect` ## {#design-connect}

`execution::connect` is a customization point which [=connects=] senders with receivers, resulting in an operation state that will ensure that if `start` is called that one of the completion operations will be called on the receiver passed to `connect`.

<pre highlight="c++">
execution::sender auto snd = <i>some input sender</i>;
execution::receiver auto rcv = <i>some receiver</i>;
execution::operation_state auto state = execution::connect(snd, rcv);

execution::start(state);
// at this point, it is guaranteed that the work represented by state has been submitted
// to an execution resource, and that execution resource will eventually call one of the
// completion operations on rcv

// operation states are not movable, and therefore this operation state object must be
// kept alive until the operation finishes
</pre>

## Sender algorithms are customizable ## {#design-customization}

Senders being able to advertise what their [=completion schedulers=] are fulfills one of the promises of senders: that of being able to customize an implementation of a sender algorithm based on what scheduler any work it depends on will complete on.

The simple way to provide customizations for functions like `then`, that is for [=sender adaptors=] and [=sender consumers=], is to follow the customization scheme that has been adopted for C++20 ranges library; to do that, we would define
the expression `execution::then(sender, invocable)` to be equivalent to:

  1. `sender.then(invocable)`, if that expression is well-formed; otherwise
  2. `then(sender, invocable)`, performed in a context where this call always performs ADL, if that expression is well-formed; otherwise
  3. a default implementation of `then`, which returns a sender adaptor, and then define the exact semantics of said adaptor.

However, this definition is problematic. Imagine another sender adaptor, `bulk`, which is a structured abstraction for a loop over an index space. Its default implementation is just a for loop. However, for accelerator runtimes like CUDA, we would like sender algorithms
like `bulk` to have specialized behavior, which invokes a kernel of more than one thread (with its size defined by the call to `bulk`); therefore, we would like to customize `bulk` for CUDA senders to achieve this. However, there's no reason for CUDA kernels to
necessarily customize the `then` sender adaptor, as the generic implementation is perfectly sufficient. This creates a problem, though; consider the following snippet:

<pre highlight="c++">
execution::scheduler auto cuda_sch = cuda_scheduler{};

execution::sender auto initial = execution::schedule(cuda_sch);
// the type of initial is a type defined by the cuda_scheduler
// let's call it cuda::schedule_sender&lt;>

execution::sender auto next = execution::then(cuda_sch, []{ return 1; });
// the type of next is a standard-library unspecified sender adaptor
// that wraps the cuda sender
// let's call it execution::then_sender_adaptor&lt;cuda::schedule_sender&lt;>>

execution::sender auto kernel_sender = execution::bulk(next, shape, [](int i){ ... });
</pre>

How can we specialize the `bulk` sender adaptor for our wrapped `schedule_sender`? Well, here's one possible approach, taking advantage of ADL (and the fact that the definition of "associated namespace" also recursively enumerates the associated namespaces of all template
parameters of a type):

<pre highlight="c++">
namespace cuda::for_adl_purposes {
template&lt;typename... SentValues>
class schedule_sender {
    execution::operation_state auto connect(execution::receiver auto rcv);
    execution::scheduler auto get_completion_scheduler() const;
};

execution::sender auto bulk(
    execution::sender auto && input,
    execution::shape auto && shape,
    invocable%lt;<i>sender-values(input)</i>&gt; auto && fn)
{
    // return a cuda sender representing a bulk kernel launch
}
} // namespace cuda::for_adl_purposes
</pre>

However, if the input sender is not just a `then_sender_adaptor` like in the example above, but another sender that overrides `bulk` by itself, as a member function, because its author believes they know an optimization for bulk - the specialization above will no
longer be selected, because a member function of the first argument is a better match than the ADL-found overload.

This means that well-meant specialization of sender algorithms that are entirely scheduler-agnostic can have negative consequences.
The scheduler-specific specialization - which is essential for good performance on platforms providing specialized ways to launch certain sender algorithms - would not be selected in such cases.
But it's really the scheduler that should control the behavior of sender algorithms when a non-default implementation exists, not the sender. Senders merely describe work; schedulers, however, are the handle to the
runtime that will eventually execute said work, and should thus have the final say in *how* the work is going to be executed.

Therefore, we are proposing the following customization scheme (also modified to take [[#design-dispatch]] into account): the expression `execution::<sender-algorithm>(sender, args...)`, for any given sender algorithm that accepts a sender as its first argument, should be
equivalent to:

  1. <code>tag_invoke(&lt;sender-algorithm>, get_completion_scheduler&lt;<i>Tag</i>>(get_env(sender)), sender, args...)</code>, if that expression is well-formed; otherwise
  2. `tag_invoke(<sender-algorithm>, sender, args...)`, if that expression is well-formed; otherwise
  4. a default implementation, if there exists a default implementation of the given sender algorithm.

where <code><i>Tag</i></code> is one of `set_value`, `set_error`, or `set_stopped`. For most sender algorithms, the completion scheduler for `set_value` would be used, but for some (like `upon_error` or `let_stopped`), one of the others would be used.

For sender algorithms which accept concepts other than `sender` as their first argument, we propose that the customization scheme remains as it has been in [[P0443R14]] so far, except it should also use `tag_invoke`.

## Sender adaptors are lazy ## {#design-laziness}

Contrary to early revisions of this paper, we propose to make all sender adaptors perform strictly lazy submission, unless specified otherwise (the one notable exception in this paper is [[#design-sender-adaptor-ensure_started]], whose sole purpose is to start an
input sender).

 <dfn export=true>Strictly lazy submission</dfn> means that there is a guarantee that no work is submitted to an execution resource before a receiver is connected to a sender, and `execution::start` is called on the resulting operation state.

## Lazy senders provide optimization opportunities ## {#design-fusion}

Because lazy senders fundamentally *describe* work, instead of describing or representing the submission of said work to an execution resource, and thanks to the flexibility of the customization of most sender algorithms, they provide an opportunity for fusing
multiple algorithms in a sender chain together, into a single function that can later be submitted for execution by an execution resource. There are two ways this can happen.

The first (and most common) way for such optimizations to happen is thanks to the structure of the implementation: because all the work is done within callbacks invoked on the completion of an earlier sender, recursively up to the original source of computation,
the compiler is able to see a chain of work described using senders as a tree of tail calls, allowing for inlining and removal of most of the sender machinery. In fact, when work is not submitted to execution resources outside of the current thread of execution,
compilers are capable of removing the senders abstraction entirely, while still allowing for composition of functions across different parts of a program.

The second way for this to occur is when a sender algorithm is specialized for a specific set of arguments. For instance, we expect that, for senders which are known to have been started already, [[#design-sender-adaptor-ensure_started]] will be an identity transformation,
because the sender algorithm will be specialized for such senders. Similarly, an implementation could recognize two subsequent [[#design-sender-adaptor-bulk]]s of compatible shapes, and merge them together into a single submission of a GPU kernel.

## Execution resource transitions are two-step ## {#design-transition-details}

Because `execution::transfer` takes a sender as its first argument, it is not actually directly customizable by the target scheduler. This is by design: the target scheduler may not know how to transition <i>from</i> a scheduler such as a CUDA scheduler;
transitioning away from a GPU in an efficient manner requires making runtime calls that are specific to the GPU in question, and the same is usually true for other kinds of accelerators too (or for scheduler running on remote systems). To avoid this problem,
specialized schedulers like the ones mentioned here can still hook into the transition mechanism, and inject a sender which will perform a transition to the regular CPU execution resource, so that any sender can be attached to it.

This, however, is a problem: because customization of sender algorithms must be controlled by the scheduler they will run on (see [[#design-customization]]), the type of the sender returned from `transfer` must be controllable by the target scheduler. Besides, the target
scheduler may itself represent a specialized execution resource, which requires additional work to be performed to transition <i>to</i> it. GPUs and remote node schedulers are once again good examples of such schedulers: executing code on their execution resources
requires making runtime API calls for work submission, and quite possibly for the data movement of the values being sent by the input sender passed into `transfer`.

To allow for such customization from both ends, we propose the inclusion of a secondary transitioning sender adaptor, called `schedule_from`. This adaptor is a form of `schedule`, but takes an additional, second argument: the input sender. This adaptor is not
meant to be invoked manually by the end users; they are always supposed to invoke `transfer`, to ensure that both schedulers have a say in how the transitions are made. Any scheduler that specializes `transfer(snd, sch)` shall ensure that the
return value of their customization is equivalent to `schedule_from(sch, snd2)`, where `snd2` is a successor of `snd` that sends values equivalent to those sent by `snd`.

The default implementation of `transfer(snd, sched)` is `schedule_from(sched, snd)`.

## All senders are typed ## {#design-typed}

All senders must advertise the types they will [=send=] when they complete.
This is necessary for a number of features, and writing code in a way that's
agnostic of whether an input sender is typed or not in common sender adaptors
such as `execution::then` is hard.

The mechanism for this advertisement is similar to the one in [[P0443R14]]; the
way to query the types is through `completion_signatures_of_t<S,
[Env]>::value_types<tuple_like, variant_like>`.

`completion_signatures_of_t::value_types` is a template that takes two
arguments: one is a tuple-like template, the other is a variant-like template.
The tuple-like argument is required to represent senders sending more than one
value (such as `when_all`). The variant-like argument is required to represent
senders that choose which specific values to send at runtime.

There's a choice made in the specification of
[[#design-sender-consumer-sync_wait]]: it returns a tuple of values sent by the
sender passed to it, wrapped in `std::optional` to handle the `set_stopped`
signal. However, this assumes that those values can be represented as a tuple,
like here:

<pre highlight="c++">
execution::sender auto sends_1 = ...;
execution::sender auto sends_2 = ...;
execution::sender auto sends_3 = ...;

auto [a, b, c] = this_thread::sync_wait(
    execution::when_all(
        sends_1,
        sends_2,
        sends_3)
    | execution::transfer(
        execution::get_completion_scheduler&lt;execution::set_value_t>(get_env(sends_1))),
    ).value();
// a == 1
// b == 2
// c == 3
</pre>

This works well for senders that always send the same set of arguments. If we ignore the possibility of having a sender that sends different sets of arguments into a receiver, we can specify the "canonical" (i.e. required to be followed by all senders) form of
`value_types` of a sender which sends `Types...` to be as follows:

<pre highlight="c++">
template&lt;template&lt;typename ...> typename TupleLike>
using value_types = TupleLike<Types...>;
</pre>

If senders could only ever send one specific set of values, this would probably need to be the required form of `value_types` for all senders; defining it otherwise would cause very weird results and should be considered a bug.

This matter is somewhat complicated by the fact that (1) `set_value` for receivers can be overloaded and accept different sets of arguments, and (2) senders are allowed to send multiple different sets of values, depending on runtime conditions, the data they
consumed, and so on. To accomodate this, [[P0443R14]] also includes a second template parameter to `value_types`, one that represents a variant-like type. If we permit such senders, we would almost certainly need to require that the canonical form of `value_types`
for *all* senders (to ensure consistency in how they are handled, and to avoid accidentally interpreting a user-provided variant as a sender-provided one) sending the different sets of arguments `Types1...`, `Types2...`, ..., `TypesN...` to be as follows:

<pre highlight="c++">
template&lt;
    template&lt;typename ...> typename TupleLike,
    template&lt;typename ...> typename VariantLike
>
using value_types = VariantLike&lt;
    TupleLike&lt;Types1...>,
    TupleLike&lt;Types2...>,
    ...,
    TupleLike&lt;Types3...>
>;
</pre>

This, however, introduces a couple of complications:

1. A `just(1)` sender would also need to follow this structure, so the correct type for storing the value sent by it would be `std::variant<std::tuple<int>>` or some such. This introduces a lot of compile time overhead for the simplest senders, and this overhead
    effectively exists in all places in the code where `value_types` is queried, regardless of the tuple-like and variant-like templates passed to it. Such overhead does exist if only the tuple-like parameter exists, but is made much worse by adding this second
    wrapping layer.
2. As a consequence of (1): because `sync_wait` needs to store the above type, it can no longer return just a `std::tuple<int>` for `just(1)`; it has to return `std::variant<std::tuple<int>>`. C++ currently does not have an easy way to destructure this; it may get
    less awkward with pattern matching, but even then it seems extremely heavyweight to involve variants in this API, and for the purpose of generic code, the kind of the return type of `sync_wait` must be the same across all sender types.

One possible solution to (2) above is to place a requirement on `sync_wait` that it can only accept senders which send only a single set of values, therefore removing the need for `std::variant` to appear in its API; because of this, we propose to expose both
`sync_wait`, which is a simple, user-friendly version of the sender consumer, but requires that `value_types` have only one possible variant, and `sync_wait_with_variant`, which accepts any sender, but returns an optional whose value type is the variant of all the
possible tuples sent by the input sender:

<pre highlight="c++">
auto sync_wait_with_variant(
    execution::sender auto sender
) -> std::optional&lt;std::variant&lt;
        std::tuple&lt;<i>values<i><sub>0</sub></i>-sent-by</i>(sender)>,
        std::tuple&lt;<i>values<i><sub>1</sub></i>-sent-by</i>(sender)>,
        ...,
        std::tuple&lt;<i>values<i><sub>n</sub></i>-sent-by</i>(sender)>
    >>;

auto sync_wait(
    execution::sender auto sender
) requires (<i>always-sends-same-values</i>(sender))
    -> std::optional&lt;std::tuple&lt;<i>values-sent-by</i>(sender)>>;
</pre>

## Ranges-style CPOs vs `tag_invoke` ## {#design-dispatch}

The contemporary technique for customization in the Standard Library is customization point objects. A customization point object, will it look for member functions and then for nonmember functions with the same name as the customization point, and calls those if
they match. This is the technique used by the C++20 ranges library, and previous executors proposals ([[P0443R14]] and [[P1897R3]]) intended to use it as well. However, it has several unfortunate consequences:

1. It does not allow for easy propagation of customization points unknown to the adaptor to a wrapped object, which makes writing universal adapter types much harder - and this proposal uses quite a lot of those.

2. It effectively reserves names globally. Because neither member names nor ADL-found functions can be qualified with a namespace, every customization point object that uses the ranges scheme reserves the name for all types in all namespaces. This is unfortunate
    due to the sheer number of customization points already in the paper, but also ones that we are envisioning in the future. It's also a big problem for one of the operations being proposed already: `sync_wait`. We imagine that if, in the future, C++ was to
    gain fibers support, we would want to also have `std::this_fiber::sync_wait`, in addition to `std::this_thread::sync_wait`. However, because we would want the names to be the same in both cases, we would need to make the names of the customizations not match the
    names of the customization points. This is undesirable.

This paper proposes to instead use the mechanism described in [[P1895R0]]: `tag_invoke`; the wording for `tag_invoke` has been incorporated into the proposed specification in this paper.

In short, instead of using globally reserved names, `tag_invoke` uses the <i>type</i> of the customization point object itself as the mechanism to find customizations. It globally reserves only a single name - `tag_invoke` - which itself is used the same way that
ranges-style customization points are used. All other customization points are defined in terms of `tag_invoke`. For example, the customization for `std::this_thread::sync_wait(s)` will call `tag_invoke(std::this_thread::sync_wait, s)`, instead of attempting
to invoke `s.sync_wait()`, and then `sync_wait(s)` if the member call is not valid.

Using `tag_invoke` has the following benefits:

1. It reserves only a single global name, instead of reserving a global name for every customization point object we define.

2. It is possible to propagate customizations to a subobject, because the information of which customization point is being resolved is in the type of an argument, and not in the name of the function:

    <pre highlight="c++">
    // forward most customizations to a subobject
    template&lt;typename Tag, typename ...Args>
    friend auto tag_invoke(Tag && tag, wrapper & self, Args &&... args) {
        return std::forward&lt;Tag>(tag)(self.subobject, std::forward&lt;Args>(args)...);
    }

    // but override one of them with a specific value
    friend auto tag_invoke(specific_customization_point_t, wrapper & self) {
        return self.some_value;
    }
    </pre>

3. It is possible to pass those as template arguments to types, because the information of which customization point is being resolved is in the type. Similarly to how [[P0443R14]] defines a polymorphic executor wrapper which accepts a list of properties it
    supports, we can imagine scheduler and sender wrappers that accept a list of queries and operations they support. That list can contain the types of the customization point objects, and the polymorphic wrappers can then specialize those customization points on
    themselves using `tag_invoke`, dispatching to manually constructed vtables containing pointers to specialized implementations for the wrapped objects. For an example of such a polymorphic wrapper, see
    <code>[unifex::any_unique](https://github.com/facebookexperimental/libunifex/blob/1a6fbfc9cc3829356ccbdcf9e8d1f3cc33a6d9e0/include/unifex/any_unique.hpp)</code>
    ([example](https://github.com/facebookexperimental/libunifex/blob/1a6fbfc9cc3829356ccbdcf9e8d1f3cc33a6d9e0/examples/any_unique.cpp)).

# Specification # {#spec}

Much of this wording follows the wording of [[P0443R14]].

[[#spec-library]] is meant to be a diff relative to the wording of the <b>[library]</b> clause of [[N4885]].

[[#spec-utilities]] is meant to be a diff relative to the wording of the <b>[utilities]</b> clause of [[N4885]]. This diff applies changes from [[P1895R0]].

[[#spec-thread]] is meant to be a diff relative to the wording of the <b>[thread]</b> clause of [[N4885]]. This diff applies changes from [[P2175R0]].

[[#spec-execution]] is meant to be added as a new library clause to the working draft of C++.

# Exception handling [except]   # {#spec-except}

## Special functions [except.special]  ## {#spec-except.special}

### General [except.special.general]   ### {#spec-except.special.general}

#### The `std::terminate` function [except.terminate]   #### {#spec-except.terminate}

<div class="ed-note">At the end of the bulleted list in the Note in paragraph 1, add a new bullet as follows:</div>

<ins>
<blockquote>
 - when a callback invocation exits via an exception when requesting stop on a `std::stop_source`
    or a `std::in_place_stop_source` ([stopsource.mem], [stopsource.inplace.mem]), or in
    the constructor of `std::stop_callback` or `std::in_place_stop_callback`
    ([stopcallback.cons], [stopcallback.inplace.cons]) when a callback invocation exits
    via an exception.

</blockquote>
</ins>

# Library introduction <b>[library]</b> # {#spec-library}

<div class="ed-note">Add the header `<execution>` to Table 23: C++ library headers [tab:headers.cpp]

In subclause [conforming], after [lib.types.movedfrom], add the following new subclause with suggested stable name [lib.tmpl-heads].</div>

<ins>
<blockquote>
**16.4.6.17  Class template-heads**

1. If a class template's template-head is marked with "*arguments are not
    associated entities*"", any template arguments do not contribute to the
    associated entities ([basic.lookup.argdep]) of a function call where a
    specialization of the class template is an associated entity. In such a case,
    the class template can be implemented as an alias template referring to a
    templated class, or as a class template where the template arguments
    themselves are templated classes.

2. [*Example:*

    <pre highlight="c++">
    template&lt;class T> // arguments are not associated entities
    struct S {};

    namespace N {
      int f(auto);
      struct A {};
    }

    int x = f(S&lt;N::A>{});  // error: N::f not a candidate
    </pre>

    The template `S` specified above can be implemented as

    <pre highlight="c++">
    template&lt;class T>
    struct <i>s-impl</i> {
      struct type { };
    };

    template&lt;class T>
    using S = typename <i>s-impl</i>&lt;T>::type;
    </pre>

    or as

    <pre highlight="c++">
    template&lt;class T>
    struct <i>hidden</i> {
      using type = struct _ {
        using type = T;
      };
    };

    template&lt;class HiddenT>
    struct <i>s-impl</i> {
      using T = typename HiddenT::type;
    };

    template&lt;class T>
    using S = <i>s-impl</i>&lt;typename <i>hidden</i>&lt;T>::type>;
    </pre>

    -- <i>end example</i>]
    </blockquote>
    </ins>

# General utilities library <b>[utilities]</b> # {#spec-utilities}

## Function objects <b>[function.objects]</b> ## {#spec-function.objects}

### Header `<functional>` synopsis <b>[functional.syn]</b> ### {#spec-functional.syn}

At the end of this subclause, insert the following declarations into the synopsis within `namespace std`:

<ins>
<blockquote>
<pre highlight="c++">
<i>// expositon only:</i>
template&lt;class Fn, class... Args>
  concept <i>callable</i> =
    requires (Fn&& fn, Args&&... args) {
      std::forward&lt;Fn>(fn)(std::forward&lt;Args>(args)...);
    };
template&lt;class Fn, class... Args>
  concept <i>nothrow-callable</i> =
    <i>callable</i>&lt;Fn, Args...> &amp;&amp;
    requires (Fn&& fn, Args&&... args) {
      { std::forward&lt;Fn>(fn)(std::forward&lt;Args>(args)...) } noexcept;
    };
template&lt;class Fn, class... Args>
  using <i>call-result-t</i> = decltype(declval&lt;Fn>()(declval&lt;Args>()...));

// [func.tag_invoke], tag_invoke
namespace <i>tag-invoke</i> { <i>// exposition only</i>
  void tag_invoke();

  template&lt;class Tag, class... Args>
    concept tag_invocable =
      requires (Tag&& tag, Args&&... args) {
        tag_invoke(std::forward&lt;Tag>(tag), std::forward&lt;Args>(args)...);
      };

  template&lt;class Tag, class... Args>
    concept nothrow_tag_invocable =
      tag_invocable&lt;Tag, Args...> &&
      requires (Tag&& tag, Args&&... args) {
        { tag_invoke(std::forward&lt;Tag>(tag), std::forward&lt;Args>(args)...) } noexcept;
      };

  template&lt;class Tag, class... Args>
    using tag_invoke_result_t =
      decltype(tag_invoke(declval&lt;Tag>(), declval&lt;Args>()...));

  template&lt;class Tag, class... Args>
    struct tag_invoke_result&lt;Tag, Args...> {
      using type =
        tag_invoke_result_t&lt;Tag, Args...>; <i>// present if and only if tag_invocable&lt;Tag, Args...> is true</i>
    };

  struct <i>tag</i>; <i>// exposition only</i>
}
inline constexpr <i>tag-invoke</i>::<i>tag</i> tag_invoke {};
using <i>tag-invoke</i>::tag_invocable;
using <i>tag-invoke</i>::nothrow_tag_invocable;
using <i>tag-invoke</i>::tag_invoke_result_t;
using <i>tag-invoke</i>::tag_invoke_result;

template&lt;auto& Tag>
  using tag_t = decay_t&lt;decltype(Tag)>;
</pre>
</blockquote>
</ins>

### `tag_invoke` <b>[func.tag_invoke]</b> ### {#spec-func.tag_invoke}

Insert this subclause as a new subclause, between Searchers <b>[func.search]</b> and Class template `hash` <b>[unord.hash]</b>.

<ins>
<blockquote>

1. Given a subexpression `E`, let <code><i>REIFY</i>(E)</code> be expression-equivalent to
    a glvalue with the same type and value as `E` as if by `identity()(E)`.

2. The name `std::tag_invoke` denotes a customization point object [customization.point.object].
    Given subexpressions `T` and `A...`, the expression `std::tag_invoke(T, A...)` is
    expression-equivalent [defns.expression-equivalent] to
    <code>tag_invoke(<i>REIFY</i>(T), <i>REIFY</i>(A)...)</code>
    with overload resolution performed in a context in which unqualified lookup for `tag_invoke`
    finds only the declaration

    ```c++
    void tag_invoke();
    ```

2. [Note: Diagnosable ill-formed cases above result in substitution failure when `std::tag_invoke(T, A...)` appears in the immediate context of a template instantiation. —end note]

</blockquote>
</ins>

# Thread support library <b>[thread]</b> # {#spec-thread}

## Stop tokens <b>[thread.stoptoken]</b> ## {#spec-thread.stoptoken}

### Header `<stop_token>` synopsis <b>[thread.stoptoken.syn]</b> ### {#spec-thread.stoptoken.syn}

At the beginning of this subclause, insert the following declarations into the synopsis within `namespace std`:

<ins>
<blockquote>
<pre highlight="c++">
template&lt;template&lt;class> class>
  struct <i>check-type-alias-exists</i>; // exposition-only

template&lt;class T>
  concept stoppable_token = <i>see-below</i>;

template&lt;class T, class CB, class Initializer = CB>
  concept stoppable_token_for = <i>see-below</i>;

template&lt;class T>
  concept unstoppable_token = <i>see-below</i>;
</pre>
</blockquote>
</ins>

At the end of this subclause, insert the following declarations into the synopsis of within `namespace std`:

<ins>
<blockquote>
<pre highlight="c++">
// [stoptoken.never], class never_stop_token
class never_stop_token;

// [stoptoken.inplace], class in_place_stop_token
class in_place_stop_token;

// [stopsource.inplace], class in_place_stop_source
class in_place_stop_source;

// [stopcallback.inplace], class template in_place_stop_callback
template&lt;class CB>
  class in_place_stop_callback;

template&lt;class T, class CB>
  using stop_callback_for_t = typename T::template callback_type&lt;CB>;
</pre>
</blockquote>
</ins>

### Stop token concepts <b>[thread.stoptoken.concepts]</b> ### {#spec-thread.stoptoken.concepts}

Insert this subclause as a new subclause between Header `<stop_token>` synopsis <b>[thread.stoptoken.syn]</b> and Class `stop_token` <b>[stoptoken]</b>.

<ins>
<blockquote>
1. The `stoppable_token` concept checks for the basic interface of a stop token
    that is copyable and allows polling to see if stop has been requested and
    also whether a stop request is possible. For a stop token type `T` and a type
    `CB` that is callable with no arguments, the type `T::callback_type<CB>` is
    valid and denotes the stop callback type to use to register a callback
    to be executed if a stop request is ever made on a `stoppable_token` of type
    `T`. The `stoppable_token_for` concept checks for a stop token type compatible
    with a given callback type. The `unstoppable_token` concept checks for a stop
    token type that does not allow stopping.

<pre highlight="c++">
template&lt;class T>
  concept stoppable_token =
    copyable&lt;T> &&
    equality_comparable&lt;T> &&
    requires (const T t) {
      { T(t) } noexcept; // <i>see implicit expression variations ([concepts.equality])</i>
      { t.stop_requested() } noexcept -> same_as&lt;bool>;
      { t.stop_possible() } noexcept -> same_as&lt;bool>;
      typename <i>check-type-alias-exists</i>&lt;T::template callback_type>;
    };

template&lt;class T, class CB, class Initializer = CB>
  concept stoppable_token_for =
    stoppable_token&lt;T> &&
    invocable&lt;CB> &&
    constructible_from&lt;CB, Initializer> &&
    requires { typename stop_callback_for_t&lt;T, CB>; } &&
    constructible_from&lt;stop_callback_for_t&lt;T, CB>, const T&, Initializer>;

template&lt;class T>
  concept unstoppable_token =
    stoppable_token&lt;T> &&
    requires {
      { bool_constant&lt;T::stop_possible()>{} } -> same_as&lt;false_type>;
    };
</pre>

<div class="ed-note">LWG directed me to replace `T::stop_possible()` with `t.stop_possible()` because
of the recent `constexpr` changes in [[P2280r2|P2280R2]]. However, even with those changes, a nested
requirement like `requires (!t.stop_possible())`, where `t` is an argument in the requirement-parameter-list, is ill-formed according to
<a href="http://eel.is/c++draft/expr.prim.req#nested-2.sentence-1">[expr.prim.req.nested/p2]</a>:

> A local parameter shall only appear as an unevaluated operand within the constraint-expression.

This is the subject of core issue [[cwg2517|2517]].
</div>

2. Let `t` and `u` be distinct, valid objects of type `T`. The type `T` models `stoppable_token` only if:

    1. If `t.stop_possible()` evaluates to `false` then, if `t` and `u` reference the same logical shared stop state, `u.stop_possible()` shall also subsequently evaluate to `false` and `u.stop_requested()` shall also subsequently evaluate to `false`.

    2. If `t.stop_requested()` evaluates to `true` then, if `t` and `u` reference the same logical shared stop state, `u.stop_requested()` shall also subsequently evaluate to `true` and `u.stop_possible()` shall also subsequently evaluate to `true`.

3. Let `t` and `u` be distinct, valid objects of type `T` and let `init` be an
    object of type `Initializer`. Then for some type `CB`, the type `T` models
    `stoppable_token_for<CB, Initializer>` only if:

    1. The type `T::callback_type<CB>` models:

        ```c++
        constructible_from<T, Initializer> &&
        constructible_from<T&, Initializer> &&
        constructible_from<const T, Initializer>
        ```

    2. Direct non-list initializing an object `cb` of type `T::callback_type<CB>`
        from `t, init` shall, if `t.stop_possible()` is `true`, construct an
        instance, `callback`, of type `CB`, direct-initialized with `init`,
        and register callback with `t`'s shared stop state such that `callback`
        will be invoked with an empty argument list if a stop request is made
        on the shared stop state.

        1. If `t.stop_requested()` evaluates to `true` at the time `callback` is
            registered then `callback` can be invoked on the thread executing
            `cb`'s constructor.

        2. If `callback` is invoked then, if `t` and `u` reference the same shared stop
            state, an evaluation of `u.stop_requested()` will be `true`
            if the beginning of the invocation of `callback`
            strongly-happens-before the evaluation of `u.stop_requested()`.

        3. [*Note:* If `t.stop_possible()` evaluates to `false` then the construction of
            `cb` is not required to construct and initialize `callback`. *--end note*]

    3. Construction of a `T::callback_type<CB>` instance shall only throw exceptions thrown by the initialization of the `CB` instance from the value of type `Initializer`.

    4. Destruction of the `T::callback_type<CB>` object, `cb`, removes `callback` from the shared stop state such that `callback` will not be invoked after the destructor returns.

        1. If `callback` is currently being invoked on another thread then the destructor of `cb` will block until the invocation of `callback` returns such that the return from the invocation of `callback` strongly-happens-before the destruction of `callback`.

        2. Destruction of a callback `cb` shall not block on the completion of the invocation of some other callback registered with the same shared stop state.

</blockquote>
</ins>

### Class `stop_token` <b>[stoptoken]</b> ### {#spec-stoptoken}

#### General <b>[stoptoken.general]</b> #### {#spec-stoptoken.general}

Modify the synopsis of class `stop_token` in subclause General <b>[stoptoken.general]</b> as follows:

<pre highlight="c++">
namespace std {
  class stop_token {
  public:
<ins>    template&lt;class T>
      using callback_type = stop_callback&lt;T>;</ins>

    // [stoptoken.cons], constructors, copy, and assignment
    stop_token() noexcept;

    // ...
</pre>

### Class `never_stop_token` <b>[stoptoken.never]</b> ### {#spec-stoptoken.never}

Insert a new subclause, Class `never_stop_token` <b>[stoptoken.never]</b>, after subclause Class template `stop_callback` <b>[stopcallback]</b>, as a new subclause of Stop tokens <b>[thread.stoptoken]</b>.

#### General <b>[stoptoken.never.general]</b> #### {#spec-stoptoken.never.general}

1. The class `never_stop_token` provides an implementation of the `unstoppable_token` concept. It provides a stop token interface, but also provides static information that a stop is never possible nor requested.

<pre highlight="c++">
namespace std
{
  class never_stop_token {
    <i>// exposition only</i>
    struct <i>callback</i> {
      explicit <i>callback</i>(never_stop_token, auto&&) noexcept {}
    };
  public:
    template&lt;class>
      using callback_type = <i>callback</i>;

    [[nodiscard]] static constexpr bool stop_requested() noexcept { return false; }
    [[nodiscard]] static constexpr bool stop_possible() noexcept { return false; }

    [[nodiscard]] friend bool operator==(const never_stop_token&, const never_stop_token&) noexcept = default;
  };
}
</pre>

### Class `in_place_stop_token` <b>[stoptoken.inplace]</b> ### {#spec-stoptoken.inplace}

Insert a new subclause, Class `in_place_stop_token` <b>[stoptoken.inplace]</b>, after the subclause added above, as a new subclause of Stop tokens <b>[thread.stoptoken]</b>.

#### General <b>[stoptoken.inplace.general]</b> #### {#spec-stoptoken.inplace.general}

1. The class `in_place_stop_token` provides an interface for querying whether a stop request has been made (`stop_requested`) or can ever be made (`stop_possible`) using an associated `in_place_stop_source` object ([stopsource.inplace]).
    An `in_place_stop_token` can also be passed to an `in_place_stop_callback` ([stopcallback.inplace]) constructor to register a callback to be called when a stop request has been made from an associated `in_place_stop_source`.

<pre highlight="c++">
namespace std {
  class in_place_stop_token {
  public:
    template&lt;class CB>
      using callback_type = in_place_stop_callback&lt;CB>;

    // [stoptoken.inplace.cons], constructors, copy, and assignment
    in_place_stop_token() noexcept;
    ~in_place_stop_token();
    void swap(in_place_stop_token&) noexcept;

    // [stoptoken.inplace.mem], stop handling
    [[nodiscard]] bool stop_requested() const noexcept;
    [[nodiscard]] bool stop_possible() const noexcept;

    [[nodiscard]] friend bool operator==(const in_place_stop_token&, const in_place_stop_token&) noexcept = default;
    friend void swap(in_place_stop_token& lhs, in_place_stop_token& rhs) noexcept;

  private:
    const in_place_stop_source* <i>source_</i>; // exposition only
  };
}
</pre>

#### Constructors, copy, and assignment <b>[stoptoken.inplace.cons]</b> #### {#spec-stoptoken.inplace.cons}

<pre highlight="c++">
in_place_stop_token() noexcept;
</pre>

1. *Effects*: initializes <code><i>source_</i></code> with `nullptr`.

<pre highlight="c++">
void swap(stop_token& rhs) noexcept;
</pre>

2. *Effects*: Exchanges the values of <code><i>source_</i></code> and <code>rhs.<i>source_</i></code>.

#### Members <b>[stoptoken.inplace.mem]</b> #### {#spec-stoptoken.inplace.mem}

<pre highlight="c++">
[[nodiscard]] bool stop_requested() const noexcept;
</pre>

1. *Effects*: Equivalent to: <code>return <i>source_</i> != nullptr && <i>source_</i>->stop_requested();</code>

2. [*Note*: The behavior of `stop_requested()` is undefined unless the call
    strongly happens before the start of the destructor of the associated
    `in_place_stop_source`, if any ([basic.life]). --*end note*]

<pre highlight="c++">
[[nodiscard]] bool stop_possible() const noexcept;
</pre>

3. *Effects*: Equivalent to: <code>return <i>source_</i> != nullptr;</code>

4. [*Note*: The behavior of `stop_possible()` is implementation-defined unless
    the call strongly happens before the end of the storage duration of the
    associated `in_place_stop_source` object, if any ([basic.stc.general]). --*end note*]

#### Non-member functions <b>[stoptoken.inplace.nonmembers]</b> #### {#spec-stoptoken.inplace.nonmembers}

<pre highlight="c++">
friend void swap(in_place_stop_token& x, in_place_stop_token& y) noexcept;
</pre>

2. *Effects*: Equivalent to: `x.swap(y)`.

### Class `in_place_stop_source` <b>[stopsource.inplace]</b> ### {#spec-stopsource.inplace}

Insert a new subclause, Class `in_place_stop_source` <b>[stopsource.inplace]</b>, after the subclause added above, as a new subclause of Stop tokens <b>[thread.stoptoken]</b>.

#### General <b>[stopsource.inplace.general]</b> #### {#spec-stopsource.inplace.general}

1. The class `in_place_stop_source` implements the semantics of making a stop request, without the need for a dynamic allocation of a shared state.
    A stop request made on a `in_place_stop_source` object is visible to all associated `in_place_stop_token` ([stoptoken.inplace]) objects.
    Once a stop request has been made it cannot be withdrawn (a subsequent stop request has no effect).
    All uses of `in_place_stop_token` objects associated with a given `in_place_stop_source` object must happen before the start of the destructor of that `in_place_stop_source` object.

<pre highlight="c++">
namespace std {
  class in_place_stop_source {
  public:
    // [stopsource.inplace.cons], constructors, copy, and assignment
    in_place_stop_source() noexcept;

    in_place_stop_source(in_place_stop_source&&) noexcept = delete;
    ~in_place_stop_source();

    //[stopsource.inplace.mem], stop handling
    [[nodiscard]] in_place_stop_token get_token() const noexcept;
    [[nodiscard]] static constexpr bool stop_possible() noexcept { return true; }
    [[nodiscard]] bool stop_requested() const noexcept;
    bool request_stop() noexcept;
  };
}
</pre>

2. An instance of `in_place_stop_source` maintains a list of registered callback invocations.
    The registration of a callback invocation either succeeds or fails. When an invocation
    of a callback is registered, the following happens atomically:

        - The stop state is checked. If stop has not been requested, the callback invocation is
            added to the list of registered callback invocations, and registration has succeeded.

        - Otherwise, registration has failed.

    When an invocation of a callback is unregistered, the invocation is atomically removed
    from the list of registered callback invocations. The removal is not blocked by the concurrent
    execution of another callback invocation in the list. If the callback invocation
    being unregistered is currently executing, then:

        - If the execution of the callback invocation is happening concurrently on another thread,
            the completion of the execution strongly happens before ([intro.races]) the end of the
            callback's lifetime.

        - Otherwise, the execution is happening on the current thread. Removal of the
            callback invocation does not block waiting for the execution to complete.

#### Constructors, copy, and assignment <b>[stopsource.inplace.cons]</b> #### {#spec-stopsource.inplace.cons}

<pre highlight="c++">
in_place_stop_source() noexcept;
</pre>

1. *Effects*: Initializes a new stop state inside `*this`.

2. *Postconditions*: `stop_requested()` is `false`.

#### Members <b>[stopsource.inplace.mem]</b> #### {#spec-stopsource.inplace.mem}

<pre highlight="c++">
[[nodiscard]] in_place_stop_token get_token() const noexcept;
</pre>

1. *Returns*: A new associated `in_place_stop_token` object.

<pre highlight="c++">
[[nodiscard]] bool stop_requested() const noexcept;
</pre>

3. *Returns*: `true` if the stop state inside `*this` has received a stop request; otherwise, `false`.

<pre highlight="c++">
bool request_stop() noexcept;
</pre>

4. *Effects*: Atomically determines whether the stop state inside `*this` has received a stop request, and if not, makes a stop request.
    The determination and making of the stop request are an atomic read-modify-write operation ([intro.races]).
    If the request was made, the registered invocations are executed and the evaluations of the invocations are indeterminately sequenced.
    If an invocation of a callback exits via an exception then `terminate` is invoked ([except.terminate]).

5. *Postconditions*: `stop_requested()` is `true`.

6. *Returns*: `true` if this call made a stop request; otherwise `false`.

### Class template `in_place_stop_callback` <b>[stopcallback.inplace]</b> ### {#spec-stopcallback.inplace}

Insert a new subclause, Class template `in_place_stop_callback` <b>[stopcallback.inplace]</b>, after the subclause added above, as a new subclause of Stop tokens <b>[thread.stoptoken]</b>.

#### General <b>[stopcallback.inplace.general]</b> #### {#spec-stopcallback.inplace.general}

1.

    <pre highlight="c++">
    namespace std {
      template&lt;class Callback>
      class in_place_stop_callback {
      public:
        using callback_type = Callback;

        // [stopcallback.inplace.cons], constructors and destructor
        template&lt;class C>
          explicit in_place_stop_callback(in_place_stop_token st, C&& cb)
            noexcept(is_nothrow_constructible_v&lt;Callback, C>);
        ~in_place_stop_callback();

        in_place_stop_callback(in_place_stop_callback&&) = delete;

      private:
        Callback <i>callback_</i>;      <i>// exposition only</i>
      };

      template&lt;class Callback>
        in_place_stop_callback(in_place_stop_token, Callback)
          -> in_place_stop_callback&lt;Callback>;
    }
    </pre>

2. *Mandates*: `in_place_stop_callback` is instantiated with an argument for the template parameter `Callback` that satisfies both `invocable` and `destructible`.

3. *Preconditions*: `in_place_stop_callback` is instantiated with an argument for the template parameter `Callback` that models both `invocable` and `destructible`.

4. *Recommended practice*: Implementations should use the storage of the `in_place_stop_callback` objects to store the state necessary for their association with an `in_place_stop_source` object.

#### Constructors and destructor <b>[stopcallback.inplace.cons]</b> #### {#spec-stopcallback.inplace.cons}

<pre highlight="c++">
template&lt;class C>
  explicit in_place_stop_callback(in_place_stop_token st, C&& cb)
    noexcept(is_nothrow_constructible_v&lt;Callback, C>);
</pre>

1. *Constraints*: `Callback` and `C` satisfy `constructible_from<Callback, C>`.

2. *Preconditions*: `Callback` and `C` model `constructible_from<Callback, C>`.

3. *Effects*: Initializes <code><i>callback_</i></code> with `std::forward<C>(cb)`.
    Any `in_place_stop_source` associated with `st` becomes associated with `*this`.
    Registers ([stopsource.inplace.general]) the callback invocation
    <code>std::forward&lt;Callback>(<i>callback_</i>)()</code> with the associated
    `in_place_stop_source`, if any. If the registration fails, evaluates
    the callback invocation.

4. *Throws*: Any exception thrown by the initialization of <code><i>callback_</i></code>.

5. *Remarks*: If evaluating <code>std::forward&lt;Callback>(<i>callback_</i>)()</code>
    exits via an exception, then `terminate` is invoked ([except.terminate]).

<pre highlight="c++">
~in_place_stop_callback();
</pre>

6. *Effects*: Unregisters ([stopsource.inplace.general]) the callback invocation from
    the associated `in_place_stop_source` object, if any.

7. *Remarks*: A program has undefined behavior if the start of this destructor does
    not strongly happen before the start of the destructor of the associated
    `in_place_stop_source` object, if any.

# Execution control library <b>[exec]</b> # {#spec-execution}

## General <b>[exec.general]</b> ## {#spec-execution.general}

1. This Clause describes components supporting execution of function objects
    [function.objects].

2. The following subclauses describe the requirements, concepts, and components
    for execution control primitives as summarized in Table 1.

<table>
<caption>Table <i>N</i>: Execution control library summary <b>[tab:execution.summary]</b></caption>
<th><td>Subclause</td><td>Header</td></th>
<tr style="border-bottom-style: hidden;"><td><a href="#spec-execution.schedulers">[exec.sched]</a></td><td>Schedulers</td><td>`<execution>`</td></tr>
<tr style="border-bottom-style: hidden;"><td><a href="#spec-execution.receivers">[exec.recv]</a></td><td>Receivers</td><td></td></tr>
<tr style="border-bottom-style: hidden;"><td><a href="#spec-execution.opstate">[exec.opstate]</a></td><td>Operation states</td><td></td></tr>
<tr style="border-bottom-style: hidden;"><td><a href="#spec-execution.senders">[exec.snd]</a></td><td>Senders</td><td></td></tr>
<tr><td><a href="#spec-execution.execute">[exec.execute]</a></td><td>One-way execution</td><td></td></tr>
</table>

3. [<i>Note:</i> A large number of execution control primitives are
    customization point objects. For an object one might define multiple types of
    customization point objects, for which different rules apply. Table 2 shows
    the types of customization point objects used in the execution control
    library:

<table>
<caption>Table <i>N+1</i>: Types of customization point objects in the execution control library <b>[tab:execution.cpos]</b></caption>
<tr>
    <th>Customization point object type</th>
    <th>Purpose</th>
    <th>Examples</th>
</tr>
<tr>
    <td>core</td>
    <td>provide core execution functionality, and connection between core components</td>
    <td>`connect`, `start`, `execute`</td>
</tr>
<tr>
    <td>completion functions</td>
    <td>called by senders to announce the completion of the work (success, error, or cancellation)</td>
    <td>`set_value`, `set_error`, `set_stopped`</td>
</tr>
<tr>
    <td><a href="#spec-execution.senders">senders</a></td>
    <td>allow the specialization of the provided sender algorithms</td>
    <td>
        <ul>
            <li>sender factories (`schedule`, `just`, `read`, ...)</li>
            <li>sender adaptors (`transfer`, `then`, `let_value`, ...)</li>
            <li>sender consumers (`start_detached`, `sync_wait`)</li>
        </ul>
    </td>
</tr>
<tr>
    <td><a href="#spec-execution.queries">queries</a></td>
    <td>allow querying different properties of objects</td>
    <td>
        <ul>
            <li>general queries (`get_allocator`, `get_stop_token`, ...)</li>
            <li>environment queries (`get_scheduler`, `get_delegatee_scheduler`, ...)</li>
            <li>scheduler queries (`get_forward_progress_guarantee`, `execute_may_block_caller`, ...)</li>
            <li>sender attribute queries (`get_completion_scheduler`)</li>
        </ul>
    </td>
</tr>
</table>

-- <i>end note</i>]

4. This clause makes use of the following exposition-only entities:

    1. <pre highlight="c++">
        template&lt;class Fn, class... Args>
            requires <i>callable</i>&lt;Fn, Args...>
          constexpr auto <i>mandate-nothrow-call</i>(Fn&& fn, Args&&... args) noexcept
            -> <i>call-result-t</i>&lt;Fn, Args...> {
            return std::forward&lt;Fn>(fn)(std::forward&lt;Args>(args)...);
          }
        </pre>

        * <i>Mandates:</i> <code><i>nothrow-callable</i>&lt;Fn, Args...></code> is `true`.

    2. <pre highlight="c++">
        template&lt;class T>
          concept <i>movable-value</i> =
            move_constructible&lt;decay_t&lt;T>> &&
            constructible_from&lt;decay_t&lt;T>, T>;
        </pre>

    3. For function types `F1` and `F2` denoting `R1(Args1...)` and `R2(Args2...)`
        respectively, <code><i>MATCHING-SIG</i>(F1, F2)</code> is `true` if and only if
        `same_as<R1(Args&&...), R2(Args2&&...)>` is `true`.

## Queries and queryables <b>[exec.queryable]</b> ## {#spec-execution.queryable}

### General <b>[exec.queryable.general]</b> ### {#spec-execution.queryable.general}

1. A <dfn export=true>queryable object</dfn> is a read-only collection of
    key/value pairs where each key is a customization point object known as a
    <dfn export=true lt="query object">query object</dfn>. A <dfn
    export=true>query</dfn> is an invocation of a query object with a queryable
    object as its first argument and a (possibly empty) set of additional
    arguments. The result of a query expression is valid as long as the
    queryable object is valid. <span class="wg21note">A query imposes syntactic
    and semantic requirements on its invocations.</span>

2. Given a subexpression `e` that refers to a queryable object `o`, a query
    object <code><i>q</i></code>, and a (possibly empty) pack of subexpressions
    `args`, the expression <code><i>q</i>(e, args...)</code> is equal to
    ([concepts.equality]) the expression <code><i>q</i>(c, args...)</code> where
    `c` is a `const` lvalue reference to `o`.

3. The type of a query expression can not be `void`.

4. The expression <code><i>q</i>(e, args...)</code> is equality-preserving
    ([concepts.equality]) and does not modify the function object or the
    arguments.

5. If <code>tag_invoke(<i>q</i>, e, args...)</code> is well-formed, then
    <code><i>q</i>(e, args...)</code> is expression-equivalent to
    <code>tag_invoke(<i>q</i>, e, args...)</code>.

6. Unless otherwise specified, the value returned by the expression
    <code><i>q</i>(e, args...)</code> is valid as long as `e` is valid.

### `queryable` concept <b>[exec.queryable.concept]</b> ### {#spec-execution.queryable.concept}

    <pre highlight="c++">
    template&lt;class T>
      concept queryable = destructible&lt;T>;
    </pre>

1. The `queryable` concept specifies the constraints on the types of queryable
    objects.

2. Let `e` be an object of type `E`. The type `E` models `queryable` if for each
    callable object <code><i>q</i></code> and a pack of subexpressions `args`,
    if <code>requires { <i>q</i>(e, args...) }</code> is `true` then
    <code><i>q</i>(e, args...)</code> meets any semantic requirements imposed by
    <code><i>q</i></code>.

## Asynchronous operations <b>[async.ops]</b> ## {#spec-execution-async.ops}

1. An <dfn export=true>execution resource</dfn> is a program entity that manages
    a (possibly dynamic) set of execution agents
    ([thread.req.lockable.general]), which it uses to execute parallel work on
    behalf of callers. [*Example 1*: The currently active thread, a
    system-provided thread pool, and uses of an API associated with an external
    hardware accelerator are all examples of execution resources. -- *end
    example*] Execution resources execute asynchronous operations. An execution
    resource is either valid or invalid.

2. An <dfn export=true>asynchronous operation</dfn> is a distinct unit of
    program execution that:

      - is explicitly created;

      - can be explicitly <dfn lt="start" export=true>started</dfn>; an
          asynchronous operation can be started once at most;

      - if started, eventually <dfn lt="complete" export=true>completes</dfn>
          with a (possibly empty) set of result datums, and in exactly one of
          three modes: success, failure, or cancellation, known as the
          operation's <dfn export=true>disposition</dfn>; an asychronous
          operation can only complete once; a successful completion, also known
          as a <dfn export=true>value completion</dfn>, can have an arbitrary
          number of result datums; a failure completion, also known as an <dfn
          export=true>error completion</dfn>, has a single result datum; a
          cancellation completion, also known as a <dfn export=true>stopped
          completion</dfn>, has no result datum; an asynchronous operation's
          <dfn export=true>async result</dfn> is its disposition and its
          (possibly empty) set of result datums.

      - can complete on a different execution resource than that on which it
          started; and

      - can create and start other asychronous operations called <dfn
          export=true>child operations</dfn>. A child operation is an
          asynchronous operation that is created by the parent operation and, if
          started, completes before the parent operation completes. A <dfn
          export=true>parent operation</dfn> is the asynchronous operation that
          created a particular child operation.

    <span class="wg21note">An asynchronous operation can in fact execute
    synchronously; that is, it can complete during the execution of its start
    operation on the thread of execution that started it.</span>

3. An asynchronous operation has associated state known as its <dfn
    export=true>operation state</dfn>.

    <!-- Mutating any part of an asynchronous
    operation's operation state is undefined behavior unless performed by the
    asynchronous operation or a child operation thereof. -->

4. An asynchronous operation has an associated environment. An <dfn
    export=true>environment</dfn> is a queryable object ([exec.queryable])
    representing the execution-time properties of the operation's caller. The
    <dfn export=true lt="caller">caller of an asynchronous operation</dfn> is
    its parent operation or the function that created it. An asynchronous
    operation's operation state owns the operation's environment.

5. An asynchronous operation has an associated receiver. A <dfn
    export=true>receiver</dfn> is an aggregation of three handlers for the three
    asynchronous completion dispositions: a value completion handler for a value
    completion, an error completion handler for an error completion, and a
    stopped completion handler for a stopped completion. A receiver has an
    associated environment. An asynchronous operation's operation state owns the
    operation's receiver. The environment of an asynchronous operation is equal
    to its receiver's environment.

6. For each completion disposition, there is a <dfn export=true>completion
    function</dfn>. A completion function is a customization point object
    ([customization.point.object]) that accepts an asynchronous operation's
    receiver as the first argument and the result datums of the asynchronous
    operation as additional arguments. The value completion function invokes the
    receiver's value completion handler with the value result datums; likewise
    for the error completion function and the stopped completion function. A
    completion function has an associated type known as its <dfn
    export=true>completion tag</dfn> that names the unqualified type of the
    completion function. A valid invocation of a completion function is called a
    <dfn export=true>completion operation</dfn>.

7. The <dfn lt="asychronous operation lifetime" export=true>lifetime of an
    asynchronous operation</dfn>, also known as the operation's <dfn
    export=true>async lifetime</dfn>, begins when its start operation begins
    executing and ends when its completion operation begins executing. If the
    lifetime of an asynchronous operation's associated operation state ends
    before the lifetime of the asynchronous operation, the behavior is
    undefined. After an asynchronous operation executes a completion operation,
    its associated operation state is invalid. Accessing any part of an invalid
    operation state is undefined behavior.

8. An asynchronous operation shall not execute a completion operation before its
    start operation has begun executing. After its start operation has begun
    executing, exactly one completion operation shall execute. The lifetime of an
    asynchronous operation's operation state can end during the execution of the
    completion operation.

9. A <dfn export=true>sender</dfn> is a factory for one or more asynchronous
    operations. <dfn export=true lt="connect">Connecting</dfn> a sender and a
    receiver creates an asynchronous operation. The asynchronous operation's
    associated receiver is equal to the receiver used to create it, and its
    associated environment is equal to the environment associated with the
    receiver used to create it. The lifetime of an asynchronous operation's
    associated operation state does not depend on the lifetimes of either the
    sender or the receiver from which it was created. A sender <dfn lt="send"
    export=true>sends</dfn> its results by way of the asynchronous operation(s)
    it produces, and a receiver <dfn lt="receive" export=true>receives</dfn>
    those results. A sender is either valid or invalid; it becomes invalid
    when its parent sender (see below) becomes invalid.

10. A <dfn export=true>scheduler</dfn> is an abstraction of an execution
    resource with a uniform, generic interface for scheduling work onto that
    resource. It is a factory for senders whose asynchronous operations execute
    value completion operations on an execution agent belonging to the
    scheduler's associated execution resource. A <dfn
    export=true>schedule-expression</dfn> obtains such a sender from a
    scheduler. A <dfn export=true>schedule sender</dfn> is the result of a
    schedule expression. On success, an asynchronous operation produced by a
    schedule sender executes a value completion operation with an empty set of
    result datums. Multiple schedulers can refer to the same execution resource.
    A scheduler can be valid or invalid. A scheduler becomes invalid when the
    execution resource to which it refers becomes invalid, as do any schedule
    senders obtained from the scheduler, and any operation states obtained from
    those senders.

11. An asynchronous operation has one or more associated completion schedulers
    for each of its possible dispositions. A <dfn export=true>completion
    scheduler</dfn> is a scheduler whose associated execution resource is used
    to execute a completion operation for an asynchronous operation. A value
    completion scheduler is a scheduler on which an asynchronous operation's
    value completion operation can execute. Likewise for error completion
    schedulers and stopped completion schedulers.

12. A sender has an associated queryable object ([exec.queryable]) known as its
    <dfn export=true>attributes</dfn> that describes various characteristics of
    the sender and of the asynchronous operation(s) it produces. For each
    disposition, there is a query object for reading the associated completion
    scheduler from a sender's attributes; *i.e.*, a value completion scheduler
    query object for reading a sender's value completion scheduler, *etc*. If a
    completion scheduler query is well-formed, the returned completion scheduler
    is unique for that disposition for any asynchronous operation the sender
    creates. A schedule sender is required to have a value completion scheduler
    attribute whose value is equal to the scheduler that produced the schedule
    sender.

13. A completion operation has an associated completion signature. A <dfn
    export=true>completion signature</dfn> is a function type that describes a
    completion operation. An asychronous operation has a finite set of possible
    completion signatures corresponding to the completion operations that the
    asynchronous operation potentially evaluates ([basic.def.odr]). The
    completion signature's return type is the completion tag associated with the
    completion function that executes the completion operation. The completion
    signature's parameter types are the (possibly `cv`- and reference-qualified)
    types of the asynchronous operation's result datums, modulo rvalue reference
    qualification (see <code><i>MATCHING-SIG</i></code> in [exec.general]).
    [*Example:* For subexpressions `x` and `y`, the completion signature
    corresponding to the completion operation `set_value(std::move(rcvr), x, y)`
    is `set_value_t(decltype((x)), decltype((y)))`. -- *end example*] Together,
    a sender type and an environment type `E` determine the set of completion
    signatures of an asynchronous operation that results from connecting the
    sender with a receiver whose environment has type `E`. <span
    class="wg21note">The type of the receiver does not affect an asychronous
    operation's completion signatures, only the type of the receiver's
    environment.</span>

14. A <dfn export=true>sender algorithm</dfn> is a function that takes and/or
    returns a sender. There are three categories of sender algorithms:

    * A <dfn export=true>sender factory</dfn> is a function that takes
        non-senders as arguments and that returns a sender.

    * A <dfn export=true>sender adaptor</dfn> is a function that constructs and
        returns a parent sender from a set of one or more child senders and a
        (possibly empty) set of additional arguments. An asynchronous operation
        created by a <dfn export=true>parent sender</dfn> is a parent to the
        child operations created by the <dfn export=true lt="child sender">child
        senders</dfn>.

    * A <dfn export=true>sender consumer</dfn> is a function that takes one or
        more senders and a (possibly empty) set of additional arguments, and
        whose return type is not the type of a sender.

## Header `<execution>` synopsis <b>[exec.syn]</b> ## {#spec-execution.syn}

<pre highlight="c++">
namespace std {
  // [exec.general], helper concepts
  template&lt;class T>
    concept <i>movable-value</i> = <i>see-below</i>; <i>// exposition only</i>

  template&lt;class From, class To>
    concept <i>decays-to</i> = same_as&lt;decay_t&lt;From>, To>; <i>// exposition only</i>

  template&lt;class T>
    concept <i>class-type</i> = <i>decays-to</i>&lt;T, T> && is_class_v&lt;T>;  <i>// exposition only</i>

  // [exec.queryable], queryable objects
  template&lt;class T>
    concept queryable = destructible<T>;

  // [exec.queries], queries
  namespace <i>queries</i> { <i>// exposition only</i>
    struct forwarding_query_t;
    struct get_allocator_t;
    struct get_stop_token_t;
  }
  using <i>queries</i>::forwarding_query_t;
  using <i>queries</i>::get_allocator_t;
  using <i>queries</i>::get_stop_token_t;
  inline constexpr forwarding_query_t forwarding_query{};
  inline constexpr get_allocator_t get_allocator{};
  inline constexpr get_stop_token_t get_stop_token{};

  template&lt;class T>
    using stop_token_of_t =
      remove_cvref_t&lt;decltype(get_stop_token(declval&lt;T>()))>;

  template&lt;class T>
    concept <i>forwarding-query</i> = // exposition only
      forwarding_query(T{});

  namespace <i>exec-envs</i> { // exposition only
    struct empty_env {};
    struct get_env_t;
  }
  using <i>envs-envs</i>::empty_env;
  using <i>envs-envs</i>::get_env_t;
  inline constexpr get_env_t get_env {};

  template&lt;class T>
    using env_of_t = decltype(get_env(declval&lt;T>()));
}

namespace std::execution {
  // [exec.queries], queries
  enum class forward_progress_guarantee;
  namespace <i>queries</i> { // exposition only
    struct get_domain_t;
    struct get_scheduler_t;
    struct get_delegatee_scheduler_t;
    struct get_forward_progress_guarantee_t;
    template&lt;class CPO>
      struct get_completion_scheduler_t;
  }
  using <i>queries</i>::get_domain_t;
  using <i>queries</i>::get_scheduler_t;
  using <i>queries</i>::get_delegatee_scheduler_t;
  using <i>queries</i>::get_forward_progress_guarantee_t;
  using <i>queries</i>::get_completion_scheduler_t;
  inline constexpr get_domain_t get_domain{};
  inline constexpr get_scheduler_t get_scheduler{};
  inline constexpr get_delegatee_scheduler_t get_delegatee_scheduler{};
  inline constexpr get_forward_progress_guarantee_t get_forward_progress_guarantee{};
  template&lt;class CPO>
    inline constexpr get_completion_scheduler_t&lt;CPO> get_completion_scheduler{};

  // [exec.domain.default], execution domains
  struct default_domain;

  // [exec.sched], schedulers
  template&lt;class S>
    concept scheduler = <i>see-below</i>;

  // [exec.recv], receivers
  template&lt;class R>
    inline constexpr bool enable_receiver = <i>see-below</i>;

  template&lt;class R>
    concept receiver = <i>see-below</i>;

  template&lt;class R, class Completions>
    concept receiver_of = <i>see-below</i>;

  namespace <i>receivers</i> { // exposition only
    struct set_value_t;
    struct set_error_t;
    struct set_stopped_t;
  }
  using <i>receivers</i>::set_value_t;
  using <i>receivers</i>::set_error_t;
  using <i>receivers</i>::set_stopped_t;
  inline constexpr set_value_t set_value{};
  inline constexpr set_error_t set_error{};
  inline constexpr set_stopped_t set_stopped{};

  // [exec.opstate], operation states
  template&lt;class O>
    concept operation_state = <i>see-below</i>;

  namespace <i>op-state</i> { // exposition only
    struct start_t;
  }
  using <i>op-state</i>::start_t;
  inline constexpr start_t start{};

  // [exec.snd], senders
  template&lt;class S>
    inline constexpr bool enable_sender = <i>see below</i>;

  template&lt;class S>
    concept sender = <i>see-below</i>;

  template&lt;class S, class E = empty_env>
    concept sender_in = <i>see-below</i>;

  template&lt;class S, class R>
    concept sender_to = <i>see-below</i>;

  template&lt;class... Ts>
    struct <i>type-list</i>; // exposition only

  template&lt;class S, class E = empty_env>
    using <i>single-sender-value-type</i> = <i>see below</i>; // exposition only

  template&lt;class S, class E = empty_env>
    concept <i>single-sender</i> = <i>see below</i>; // exposition only

  // [exec.getcomplsigs], completion signatures
  namespace <i>completion-signatures</i> { // exposition only
    struct get_completion_signatures_t;
  }
  using <i>completion-signatures</i>::get_completion_signatures_t;
  inline constexpr get_completion_signatures_t get_completion_signatures {};

  template&lt;class S, class E = empty_env>
      requires sender_in&lt;S, E>
    using completion_signatures_of_t = <i>call-result-t</i>&lt;get_completion_signatures_t, S, E>;

  template&lt;class... Ts>
    using <i>decayed-tuple</i> = tuple&lt;decay_t&lt;Ts>...>; // exposition only

  template&lt;class... Ts>
    using <i>variant-or-empty</i> = <i>see below</i>; // exposition only

  template&lt;class S,
           class E = empty_env,
           template&lt;class...> class Tuple = <i>decayed-tuple</i>,
           template&lt;class...> class Variant = <i>variant-or-empty</i>>
      requires sender_in&lt;S, E>
    using value_types_of_t = <i>see below</i>;

  template&lt;class S,
           class Env = empty_env,
           template&lt;class...> class Variant = <i>variant-or-empty</i>>
      requires sender_in&lt;S, E>
    using error_types_of_t = <i>see below</i>;

  template&lt;class S, class E = empty_env>
      requires sender_in&lt;S, E>
    inline constexpr bool sends_stopped = <i>see below</i>;

  template &lt;sender Sender>
    using tag_of_t = see below;

  // [exec.snd.transform], sender transformations
  template&lt;class Domain, sender Sender, class... Env>
      requires (sizeof...(Env) &lt;= 1)
    constexpr sender decltype(auto) transform_sender(Domain dom, Sender&& sndr, const Env&... env);

  template&lt;class Domain, sender Sender, class Env>
    constexpr decltype(auto) transform_env(Domain dom, Sender&& sndr, Env&& env) noexcept;

  // [exec.snd.apply], sender algorithm application
  template&lt;class Domain, class Tag, sender Sender, class... Args>
    constexpr decltype(auto) apply_sender(Domain dom, Tag, Sender&& sndr, Args&&... args) noexcept(<i>see below</i>);

  // [exec.connect], the connect sender algorithm
  namespace <i>senders-connect</i> { // exposition only
    struct connect_t;
  }
  using <i>senders-connect</i>::connect_t;
  inline constexpr connect_t connect{};

  template&lt;class S, class R>
    using connect_result_t = decltype(connect(declval&lt;S>(), declval&lt;R>()));

  // [exec.factories], sender factories
  namespace <i>senders-factories</i> { // exposition only
    struct just_t;
    struct just_error_t;
    struct just_stopped_t;
    struct schedule_t;
  }
  inline constexpr just just{};
  inline constexpr just_error_t just_error{};
  inline constexpr just_stopped_t just_stopped{};
  using <i>senders-factories</i>::schedule_t;
  inline constexpr schedule_t schedule{};
  inline constexpr <i>unspecified</i> read{};

  template&lt;scheduler S>
    using schedule_result_t = decltype(schedule(declval&lt;S>()));

  // [exec.adapt], sender adaptors
  namespace <i>sender-adaptor-closure</i> { // exposition only
    template&lt;<i>class-type</i> D>
      struct sender_adaptor_closure { };
  }
  using <i>sender-adaptor-closure</i>::sender_adaptor_closure;

  namespace <i>sender-adaptors</i> { // exposition only
    struct on_t;
    struct transfer_t;
    struct schedule_from_t;
    struct then_t;
    struct upon_error_t;
    struct upon_stopped_t;
    struct let_value_t;
    struct let_error_t;
    struct let_stopped_t;
    struct bulk_t;
    struct split_t;
    struct when_all_t;
    struct when_all_with_variant_t;
    struct into_variant_t;
    struct stopped_as_optional_t;
    struct stopped_as_error_t;
    struct ensure_started_t;
  }
  using <i>sender-adaptors</i>::on_t;
  using <i>sender-adaptors</i>::transfer_t;
  using <i>sender-adaptors</i>::schedule_from_t;
  using <i>sender-adaptors</i>::then_t;
  using <i>sender-adaptors</i>::upon_error_t;
  using <i>sender-adaptors</i>::upon_stopped_t;
  using <i>sender-adaptors</i>::let_value_t;
  using <i>sender-adaptors</i>::let_error_t;
  using <i>sender-adaptors</i>::let_stopped_t;
  using <i>sender-adaptors</i>::bulk_t;
  using <i>sender-adaptors</i>::split_t;
  using <i>sender-adaptors</i>::when_all_t;
  using <i>sender-adaptors</i>::when_all_with_variant_t;
  using <i>sender-adaptors</i>::into_variant_t;
  using <i>sender-adaptors</i>::stopped_as_optional_t;
  using <i>sender-adaptors</i>::stopped_as_error_t;
  using <i>sender-adaptors</i>::ensure_started_t;

  inline constexpr on_t on{};
  inline constexpr transfer_t transfer{};
  inline constexpr schedule_from_t schedule_from{};

  inline constexpr then_t then{};
  inline constexpr upon_error_t upon_error{};
  inline constexpr upon_stopped_t upon_stopped{};

  inline constexpr let_value_t let_value{};
  inline constexpr let_error_t let_error{};
  inline constexpr let_stopped_t let_stopped{};

  inline constexpr bulk_t bulk{};

  inline constexpr split_t split{};
  inline constexpr when_all_t when_all{};
  inline constexpr when_all_with_variant_t when_all_with_variant{};

  inline constexpr into_variant_t into_variant{};

  inline constexpr stopped_as_optional_t stopped_as_optional;

  inline constexpr stopped_as_error_t stopped_as_error;

  inline constexpr ensure_started_t ensure_started{};

  // [exec.consumers], sender consumers
  namespace <i>sender-consumers</i> { // exposition only
    struct start_detached_t;
  }
  using <i>sender-consumers</i>::start_detached_t;
  inline constexpr start_detached_t start_detached{};

  // [exec.utils], sender and receiver utilities
  // [exec.utils.rcvr.adptr]
  template&lt;
      <i>class-type</i> Derived,
      receiver Base = <i>unspecified</i>> // arguments are not associated entities ([lib.tmpl-heads])
    class receiver_adaptor;

  template&lt;class Fn>
    concept <i>completion-signature</i> = <i>// exposition only</i>
      <i>see below</i>;

  // [exec.utils.cmplsigs]
  template&lt;<i>completion-signature</i>... Fns>
    struct completion_signatures {};

  template&lt;class... Args> <i>// exposition only</i>
    using <i>default-set-value</i> =
      completion_signatures&lt;set_value_t(Args...)>;

  template&lt;class Err> <i>// exposition only</i>
    using <i>default-set-error</i> =
      completion_signatures&lt;set_error_t(Err)>;

  template&lt;class Sigs> // exposition only
    concept <i>valid-completion-signatures</i> = <i>see below</i>;

  // [exec.utils.mkcmplsigs]
  template&lt;
    <i>valid-completion-signatures</i> InputSignatures,
    <i>valid-completion-signatures</i> AdditionalSignatures = completion_signatures<>,
    template&lt;class...> class SetValue = <i>see below</i>,
    template&lt;class> class SetError = <i>see below</i>,
    <i>valid-completion-signatures</i> SetStopped = completion_signatures&lt;set_stopped_t()>>
  using transform_completion_signatures = completion_signatures&lt;<i>see below</i>>;

  template&lt;
    sender Sndr,
    class Env = empty_env,
    <i>valid-completion-signatures</i> AdditionalSignatures = completion_signatures<>,
    template&lt;class...> class SetValue = <i>see below</i>,
    template&lt;class> class SetError = <i>see below</i>,
    <i>valid-completion-signatures</i> SetStopped = completion_signatures&lt;set_stopped_t()>>
      requires sender_in&lt;Sndr, Env>
  using transform_completion_signatures_of =
    transform_completion_signatures&lt;
      completion_signatures_of_t&lt;Sndr, Env>, AdditionalSignatures, SetValue, SetError, SetStopped>;

  // [exec.ctx], execution resources
  class run_loop;
}

namespace std::this_thread {
  // [exec.queries], queries
  namespace <i>queries</i> { <i>// exposition only</i>
    struct execute_may_block_caller_t;
  }
  using <i>queries</i>::execute_may_block_caller_t;
  inline constexpr execute_may_block_caller_t execute_may_block_caller{};

  namespace <i>this-thread</i> { <i>// exposition only</i>
    struct <i>sync-wait-env</i>; <i>// exposition only</i>
    template&lt;class S>
        requires sender_in&lt;S, <i>sync-wait-env</i>>
      using <i>sync-wait-type</i> = <i>see-below</i>; <i>// exposition only</i>
    template&lt;class S>
      using <i>sync-wait-with-variant-type</i> = <i>see-below</i>; <i>// exposition only</i>

    struct sync_wait_t;
    struct sync_wait_with_variant_t;
  }
  using <i>this-thread</i>::sync_wait_t;
  using <i>this-thread</i>::sync_wait_with_variant_t;
  inline constexpr sync_wait_t sync_wait{};
  inline constexpr sync_wait_with_variant_t sync_wait_with_variant{};
}

namespace std::execution {
  // [exec.execute], one-way execution
  namespace <i>execute</i> { // exposition only
    struct execute_t;
  }
  using <i>execute</i>::execute_t;
  inline constexpr execute_t execute{};

  // [exec.as.awaitable]
  namespace <i>coro-utils</i> { // exposition only
    struct as_awaitable_t;
  }
  using <i>coro-utils</i>::as_awaitable_t;
  inline constexpr as_awaitable_t as_awaitable;

  // [exec.with.awaitable.senders]
  template&lt;<i>class-type</i> Promise>
    struct with_awaitable_senders;
}
</pre>

1. The exposition-only type <code><i>variant-or-empty&lt;Ts...></i></code> is
     defined as follows:

    1. If `sizeof...(Ts)` is greater than zero,
        <code><i>variant-or-empty&lt;Ts...></i></code> names the type
        `variant<Us...>` where `Us...` is the pack `decay_t<Ts>...` with
        duplicate types removed.

    2. Otherwise, <code><i>variant-or-empty&lt;Ts...></i></code> names the
        exposition-only class type:

        <pre highlight="c++">
        struct <i>empty-variant</i> {
          <i>empty-variant</i>() = delete;
        };
        </pre>

## Queries <b>[exec.queries]</b> ## {#spec-execution.queries}

### `std::get_env` <b>[exec.get.env]</b> ### {#spec-execution.environment.get_env}

1. `get_env` is a customization point object. For some subexpression `o` of type
    `O`, `get_env(o)` is expression-equivalent to

    1. `tag_invoke(std::get_env, const_cast<const O&>(o))` if that expression is
        well-formed.

        * <i>Mandates:</i> The expression above is not potentially throwing, and
            its type satisfies `queryable` ([exec.queryable]).

    2. Otherwise, `empty_env{}`.

2. The value of `get_env(o)` shall be valid while `o` is valid.

3. When passed a sender object, `get_env` returns the sender's attributes. When
    passed a receiver, `get_env` returns the receiver's environment.

### `std::forwarding_query` <b>[exec.fwd.env]</b> ### {#spec-execution.forwarding_query}

1. `forwarding_query` asks a query object whether it should be forwarded
    through queryable adaptors.

2. The name `forwarding_query` denotes a query object. For some query
    object `q` of type `Q`, `forwarding_query(q)` is expression-equivalent
    to:

    1. <code><i>mandate-nothrow-call</i>(tag_invoke, forwarding_query,
        q)</code> if that expression is well-formed.

        * <i>Mandates:</i> The expression above has type `bool` and is a core
            constant expressions if `q` is a core constant expression.

    2. Otherwise, `true` if `derived_from<Q, forwarding_query_t>` is
        `true`.

    3. Otherwise, `false`.

### `std::get_allocator` <b>[exec.get.allocator]</b> ### {#spec-execution.get_allocator}

1. `get_allocator` asks an object for its associated allocator.

2. The name `get_allocator` denotes a query object. For some subexpression `r`,
    `get_allocator(r)` is expression-equivalent to
    <code><i>mandate-nothrow-call</i>(tag_invoke, std::get_allocator,
    as_const(r))</code>.

        * <i>Mandates:</i> The type of the expression above
            satisfies <i>Allocator</i>.

3. `forwarding_query(std::get_allocator)` is `true`.

4. `get_allocator()` (with no arguments) is expression-equivalent to
    `execution::read(std::get_allocator)` ([exec.read]).

### `std::get_stop_token` <b>[exec.get.stop.token]</b> ### {#spec-execution.get_stop_token}

1. `get_stop_token` asks an object for an associated stop token.

2. The name `get_stop_token` denotes a query object. For some subexpression `r`,
    `get_stop_token(r)` is expression-equivalent to:

    1. <code><i>mandate-nothrow-call</i>(tag_invoke, std::get_stop_token,
        as_const(r))</code>, if this expression is well-formed.

        * <i>Mandates:</i> The type of the expression above satisfies
            `stoppable_token`.

    2. Otherwise, `never_stop_token{}`.

3. `forwarding_query(std::get_stop_token)` is a core constant
    expression and has value `true`.

4. `get_stop_token()` (with no arguments) is expression-equivalent to
    `execution::read(std::get_stop_token)` ([exec.read]).

### `execution::get_domain` <b>[exec.get.domain]</b> ### {#spec-execution.get_domain}

1. `get_domain` asks an object for an associated execution domain tag.

2. The name `get_domain` denotes a query object. For some subexpression `r`,
    `get_domain(r)` is expression-equivalent to
    <code><i>mandate-nothrow-call</i>(tag_invoke, get_domain, as_const(r))</code>,
    if this expression is well-formed.

3. `forwarding_query(execution::get_domain)` is a core constant
    expression and has value `true`.

4. `get_domain()` (with no arguments) is expression-equivalent to
    `execution::read(get_domain)` ([exec.read]).

### `execution::get_scheduler` <b>[exec.get.scheduler]</b> ### {#spec-execution.get_scheduler}

1. `get_scheduler` asks an object for its associated scheduler.

2. The name `get_scheduler` denotes a query object. For some
    subexpression `r`,  `get_scheduler(r)` is expression-equivalent to
    <code><i>mandate-nothrow-call</i>(tag_invoke, get_scheduler, as_const(r))</code>.

        * <i>Mandates:</i> The type of the expression above satisfies `scheduler`.

3. `forwarding_query(execution::get_scheduler)` is a core constant
    expression and has value `true`.

4. `get_scheduler()` (with no arguments) is expression-equivalent to `execution::read(get_scheduler)` ([exec.read]).

### `execution::get_delegatee_scheduler` <b>[exec.get.delegatee.scheduler]</b> ### {#spec-execution.get_delegatee_scheduler}

1. `get_delegatee_scheduler` asks an object for a scheduler that can be used to delegate work to for the purpose of forward progress delegation.

2. The name `get_delegatee_scheduler` denotes a query object. For some
    subexpression `r`, `get_delegatee_scheduler(r)` is expression-equivalent to
    <code><i>mandate-nothrow-call</i>(tag_invoke, get_delegatee_scheduler, as_const(r))</code>.

        * <i>Mandates:</i> The type of the expression above is satisfies `scheduler`.

3. `forwarding_query(execution::get_delegatee_scheduler)` is a core
    constant expression and has value `true`.

4. `get_delegatee_scheduler()` (with no arguments) is expression-equivalent to `execution::read(get_delegatee_scheduler)`  ([exec.read]).

### `execution::get_forward_progress_guarantee` <b>[exec.get.forward.progress.guarantee]</b> ### {#spec-execution.get_forward_progress_guarantee}

<pre highlight="c++">
enum class forward_progress_guarantee {
  concurrent,
  parallel,
  weakly_parallel
};
</pre>

1. `get_forward_progress_guarantee` asks a scheduler about the forward progress guarantees of execution agents created by that scheduler.

2. The name `get_forward_progress_guarantee` denotes a query object. For some subexpression `s`, let `S` be `decltype((s))`. If `S` does not satisfy `scheduler`, `get_forward_progress_guarantee` is ill-formed.
    Otherwise, `get_forward_progress_guarantee(s)` is expression-equivalent to:

    1. <code><i>mandate-nothrow-call</i>(tag_invoke, get_forward_progress_guarantee, as_const(s))</code>, if this expression is well-formed.

        * <i>Mandates:</i> The type of the expression above is
            `forward_progress_guarantee`.

    2. Otherwise, `forward_progress_guarantee::weakly_parallel`.

3. If `get_forward_progress_guarantee(s)` for some scheduler `s` returns `forward_progress_guarantee::concurrent`, all execution agents created by that scheduler shall provide the concurrent forward progress guarantee. If it returns
    `forward_progress_guarantee::parallel`, all execution agents created by that scheduler shall provide at least the parallel forward progress guarantee.

### `this_thread::execute_may_block_caller` <b>[exec.execute.may.block.caller]</b> ### {#spec-execution.execute_may_block_caller}

1. `this_thread::execute_may_block_caller` asks a scheduler `s` whether a call `execute(s, f)` with any invocable `f` may block the thread where such a call occurs.

2. The name `this_thread::execute_may_block_caller` denotes a query object. For some subexpression `s`, let `S` be `decltype((s))`. If `S` does not satisfy `scheduler`, `this_thread::execute_may_block_caller` is ill-formed. Otherwise,
    `this_thread::execute_may_block_caller(s)` is expression-equivalent to:

    1. <code><i>mandate-nothrow-call</i>(tag_invoke, this_thread::execute_may_block_caller, as_const(s))</code>, if this expression is well-formed.

        * <i>Mandates:</i> The type of the expression above is `bool`.

    2. Otherwise, `true`.

3. If `this_thread::execute_may_block_caller(s)` for some scheduler `s` returns `false`, no `execute(s, f)` call with some invocable `f` shall block the calling thread.

### `execution::get_completion_scheduler` <b>[exec.completion.scheduler]</b> ### {#spec-execution.get_completion_scheduler}

1. <code>get_completion_scheduler&lt;<i>completion-tag</i>></code> obtains the
    completion scheduler associated with a completion tag from a sender's
    attributes.

2. The name `get_completion_scheduler` denotes a query object template. For some
    subexpression `q`, let `Q` be `decltype((q))`. If the template argument
    `Tag` in `get_completion_scheduler<Tag>(q)` is not one of `set_value_t`,
    `set_error_t`, or `set_stopped_t`, `get_completion_scheduler<Tag>(q)` is
    ill-formed. Otherwise, `get_completion_scheduler<Tag>(q)` is
    expression-equivalent to <code><i>mandate-nothrow-call</i>(tag_invoke,
    get_completion_scheduler<Tag>, as_const(q))</code> if this expression is
    well-formed.

        * <i>Mandates:</i> The type of the expression above satisfies
            `scheduler`.

3. If, for some sender `s` and completion function `C` that has an associated
    completion tag `Tag`, `get_completion_scheduler<Tag>(get_env(s))` is
    well-formed and results in a scheduler `sch`, and the sender `s` invokes
    `C(r, args...)`, for some receiver `r` that has been connected to `s`, with
    additional arguments `args...`, on an execution agent that does not
    belong to the associated execution resource of `sch`, the behavior is
    undefined.

4. The expression `forwarding_query(get_completion_scheduler<CPO>)`
    is a core constant expression and has value `true`.

## Schedulers <b>[exec.sched] </b> ## {#spec-execution.schedulers}

1. The `scheduler` concept defines the requirements of a scheduler type
    ([async.ops]). `schedule` is a customization point object that accepts a
    scheduler. A valid invocation of `schedule` is a schedule-expression.

    <pre highlight="c++">
    template&lt;class S>
      concept scheduler =
        queryable&lt;S> &&
        requires(S&& s, const get_completion_scheduler_t&lt;set_value_t> tag) {
          { schedule(std::forward&lt;S>(s)) } -> sender;
          { tag_invoke(tag, std::get_env(
              schedule(std::forward&lt;S>(s)))) } -> same_as&lt;remove_cvref_t&lt;S>>;
        } &&
        equality_comparable&lt;remove_cvref_t&lt;S>> &&
        copy_constructible&lt;remove_cvref_t&lt;S>>;
    </pre>

2. Let `S` be the type of a scheduler and let `E` be the type of an execution
    environment for which `sender_in<schedule_result_t<S>, E>` is `true`. Then
    <code><i>sender-of-in</i>&lt;schedule_result_t&lt;S>, E></code> shall be `true`.

3. None of a scheduler's copy constructor, destructor, equality comparison, or
    `swap` member functions shall exit via an exception.

4. None of these member functions, nor a scheduler type's `schedule` function,
    shall introduce data races as a result of concurrent invocations of those
    functions from different threads.

5. For any two (possibly `const`) values `s1` and `s2` of some scheduler type
    `S`, `s1 == s2` shall return `true` only if both `s1` and `s2` share the
    same associated execution resource.

6. For a given scheduler expression `s`, the expression
    `get_completion_scheduler<set_value_t>(std::get_env(schedule(s)))` shall
    compare equal to `s`.

7. For a given scheduler expression `s`, if the expression `get_domain(s)`
    is well-formed, then the expression `get_domain(get_env(schedule(s)))`
    is also well-formed and has the same type.

8. A scheduler type's destructor shall not block pending completion of any
    receivers connected to the sender objects returned from `schedule`. <span
    class="wg21note">The ability to wait for completion of submitted function
    objects can be provided by the associated execution resource of the
    scheduler.</span>

## Receivers <b>[exec.recv]</b> ## {#spec-execution.receivers}

### Receiver concepts <b>[exec.recv.concepts]</b> ### {#spec-execution.receiver_concepts}

1. A receiver represents the continuation of an asynchronous operation. The
    `receiver` concept defines the requirements for a receiver type
    ([async.ops]). The `receiver_of` concept defines the requirements for a
    receiver type that is usable as the first argument of a set of completion
    operations corresponding to a set of completion signatures. The `get_env`
    customization point is used to access a receiver's associated environment.

    <pre highlight="c++">
    template&lt;class R>
      inline constexpr bool enable_receiver =
        requires {
          typename R::is_receiver;
        };

    template&lt;class R>
      concept receiver =
        enable_receiver&lt;remove_cvref_t&lt;R>> &&
        requires(const remove_cvref_t&lt;R>& r) {
          { get_env(r) } -> queryable;
        } &&
        move_constructible&lt;remove_cvref_t&lt;R>> &&  <i>// rvalues are movable, and</i>
        constructible_from&lt;remove_cvref_t&lt;R>, R>; <i>// lvalues are copyable</i>

    template&lt;class Signature, class R>
      concept <i>valid-completion-for</i> = <i>// exposition only</i>
        requires (Signature* sig) {
          []&lt;class Tag, class... Args>(Tag(*)(Args...))
              requires <i>callable</i>&lt;Tag, remove_cvref_t&lt;R>, Args...>
          {}(sig);
        };

    template&lt;class R, class Completions>
      concept receiver_of =
        receiver&lt;R> &amp;&amp;
        requires (Completions* completions) {
          []&lt;<i>valid-completion-for</i>&lt;R>...Sigs>(completion_signatures&lt;Sigs...>*)
          {}(completions);
        };
    </pre>

3. <i>Remarks:</i> Pursuant to [namespace.std], users can specialize `enable_receiver` to
    `true` for cv-unqualified program-defined types that model `receiver`, and `false`
    for types that do not. Such specializations shall be usable in constant
    expressions ([expr.const]) and have type `const bool`.

4. Let `r` be a receiver and let `op_state` be an operation state associated
    with an asynchronous operation created by connecting `r` with a sender. Let
    `token` be a stop token equal to `get_stop_token(get_env(r))`. `token` shall
    remain valid for the duration of the asynchronous operation's lifetime
    ([async.ops]). <span class="wg21note">This means that, unless it knows about
    further guarantees provided by the type of receiver `r`, the implementation
    of `op_state` can not use `token` after it executes a completion operation.
    This also implies that any stop callbacks registered on `token` must be
    destroyed before the invocation of the completion operation.</span>

### `execution::set_value` <b>[exec.set.value]</b> ### {#spec-execution.receivers.set_value}

1. `set_value` is a value completion function ([async.ops]). Its associated
    completion tag is `set_value_t`. The expression `set_value(R, Vs...)` for
    some subexpression `R` and pack of subexpressions `Vs` is ill-formed if `R`
    is an lvalue or a `const` rvalue. Otherwise, it is expression-equivalent to
    <code><i>mandate-nothrow-call</i>(tag_invoke, set_value, R, Vs...)</code>.

### `execution::set_error` <b>[exec.set.error]</b> ### {#spec-execution.receivers.set_error}

1. `set_error` is an error completion function. Its associated completion tag is
    `set_error_t`. The expression `set_error(R, E)` for some subexpressions `R`
    and `E` is ill-formed if `R` is an lvalue or a `const` rvalue. Otherwise, it is
    expression-equivalent to <code><i>mandate-nothrow-call</i>(tag_invoke,
    set_error, R, E)</code>.

### `execution::set_stopped` <b>[exec.set.stopped]</b> ### {#spec-execution.receivers.set_stopped}

1. `set_stopped` is a stopped completion function. Its associated completion tag
    is `set_stopped_t`.  The expression `set_stopped(R)` for some subexpression
    `R` is ill-formed if `R` is an lvalue or a `const` rvalue. Otherwise, it is
    expression-equivalent to <code><i>mandate-nothrow-call</i>(tag_invoke,
    set_stopped, R)</code>.

## Operation states <b>[exec.opstate]</b> ## {#spec-execution.opstate}

1. The `operation_state` concept defines the requirements of an operation state
    type ([async.ops]).

    <pre highlight="c++">
    template&lt;class O>
      concept operation_state =
        queryable&lt;O> &&
        is_object_v&lt;O> &&
        requires (O& o) {
          { start(o) } noexcept;
        };
    </pre>

2. If an `operation_state` object is moved during the lifetime of its
    asynchronous operation ([async.ops]), the behavior is undefined.

3. Library-provided operation state types are non-movable.

### `execution::start` <b>[exec.opstate.start]</b> ### {#spec-execution.opstate.start}

1. The name `start` denotes a customization point object that starts
    ([async.ops]) the asynchronous operation associated with the operation state
    object. The expression `start(O)` for some subexpression `O` is ill-formed
    if `O` is an rvalue. Otherwise, it is expression-equivalent to:

    <pre highlight="c++">
    <i>mandate-nothrow-call</i>(tag_invoke, start, O)
    </pre>

2. If the function selected by `tag_invoke` does not start the asynchronous
    operation associated with the operation state `O`, the behavior of calling
    `start(O)` is undefined.

## Senders <b>[exec.snd]</b> ## {#spec-execution.senders}

### General <b>[exec.snd.general]</b> ### {#spec-execution.senders.general}

1. For the purposes of this subclause, a sender is an object that satisfies the
    `sender` concept ([async.ops]).

2. Subclauses [exec.factories] and [exec.adapt] define customizable algorithms
    that return senders. Each algorithm has a default implementation specified
    herein. Let `sndr` be the result of an invocation of such an algorithm or an
    object equal to such ([concepts.equality]), and let `Sndr` be
    `decltype((sndr))`. Let `Env` be the type of an environment `env` such that
    `sender_in<Sndr, Env>` is `true`, and let `rcvr` be a receiver whose
    associated environment is `Env`. For the default implementation of the
    algorithm that produced `sndr`, connecting `sndr` to `rcvr` and starting the
    resulting operation state ([async.ops]) necessarily results in the potential
    evaluation ([intro.execution]) of a set of completion operations whose first
    argument is a subexpression equal to `rcvr`. Let `Sigs` be a pack of
    completion signatures corresponding to this set of completion operations.
    Then the type of the expression `get_completion_signatures(sndr, env)` is a
    specialization of the class template `completion_signatures`,
    ([exec.utils.cmplsigs]) the set of whose template arguments is `Sigs`. If a
    user-provided implementation of the algorithm that produced `sndr` is
    selected instead of the default, the set of completion signatures denoted by
    the return type of `get_completion_signatures(sndr, env)` shall be a
    superset of `Sigs`, with any additional completion signatures corresponding
    to error or stopped completion operations.

3. This subclause makes use of the following exposition-only entities.

    1. For a queryable object `e`, let <code><i>FWD-ENV</i>(e)</code> be a
        queryable object such that for a query object `q` and a pack of
        subexpressions `as`, the expression <code>tag_invoke(q,
        <i>FWD-ENV</i>(e), as...)</code> is ill-formed if
        `forwarding_query(q)` is `false`;
        otherwise, it is expression-equivalent to `tag_invoke(q, e, as...)`.

    2. For a query object `q` and a subexpression `v`, let
        <code><i>MAKE-ENV</i>(q, v)</code> be a queryable object `e` such that
        `tag_invoke(q, e)` is a `const` lvalue reference to an object
        decay-copied from `v`. Unless otherwise stated, the object to which
        `tag_invoke(q, e)` refers remains valid while `e` remains valid.

    3. For two queryable objects `e1` and `e2`, a query object `q` and a pack of
        subexpressions `as`, let <code><i>JOIN-ENV</i>(e1, e2)</code> be an
        environment `e3` such that `tag_invoke(q, e3, as...)` is
        expression-equivalent to:

          - `tag_invoke(q, e1, as...)` if that expression is well-formed,

          - otherwise, `tag_invoke(q, e2, as...)` if that expression is
              well-formed,

          - otherwise, `tag_invoke(q, e3, as...)` is ill-formed.

    4. For two subexpressions `r` and `e`, let <code><i>SET-VALUE</i>(r,
        e)</code> be `(e, set_value(r))` if the type of `e` is `void`;
        otherwise, it is `set_value(r, e)`. Let <code><i>TRY-SET-VALUE</i>(r,
        e)</code> be:

            <pre highlight="c++">
            try {
              <i>SET-VALUE</i>(r, e);
            } catch(...) {
              set_error(r, current_exception());
            }
            </pre>

        if `e` is potentially-throwing, except that `r` is evaluated only once;
        or <code><i>SET-VALUE</i>(r, e)</code> otherwise.

    5.  <pre highlight="c++">
        template&lt;class Default = default_domain, class Sender>
        constexpr auto <i>completion-domain</i>(const Sender& sndr) noexcept;
        </pre>

        1. *Effects:* Let <code><i>COMPL-DOMAIN</i>(T)</code> be the type of the expression
            `get_domain(get_completion_scheduler<T>(get_env(sndr)))`. If
            <code><i>COMPL-DOMAIN</i>(set_value_t)</code>,
            <code><i>COMPL-DOMAIN</i>(set_error_t)</code>, and
            <code><i>COMPL-DOMAIN</i>(set_stopped_t)</code> all share a common type
            [meta.trans.other] (ignoring those types that are ill-formed), then
            <code><i>completion-domain</i>&lt;Default>(sndr)</code> is a default-constructed
            prvalue of that type.
            Otherwise, if all of those types are ill-formed,
            <code><i>completion-domain</i>&lt;Default>(sndr)</code> is a default-constructed
            prvalue of type `Default`.
            Otherwise, <code><i>completion-domain</i>&lt;Default>(sndr)</code> is ill-formed.

    6.  <pre highlight="c++">
        template&lt;class Tag, class Env, class Default>
        constexpr decltype(auto) <i>query-with-default</i>(Tag, const Env& env, Default&& value) noexcept(<i>see below</i>);
        </pre>

        1. <i>Effects:</i> Equivalent to:

            - `return Tag()(env);` if that expression is well-formed,

            - `return static_cast<Default>(std::forward<Default>(value));` otherwise.

        2. <i>Remarks:</i> The expression in the `noexcept` clause is:

                <pre highlight="c++">
                is_invocable_v&lt;Tag, const Env&> ? is_nothrow_invocable_v&lt;Tag, const Env&>
                                                : is_nothrow_constructible_v&lt;Default, Default>
                </pre>

    7.  <pre highlight="c++">
        template&lt;class Sender>
        constexpr auto <i>get-domain-early</i>(const Sender& sndr) noexcept;
        </pre>

        1. <i>Effects:</i> Equivalent to the first of the following that is well-formed:

            - `return get_domain(get_env(sndr));`

            - <code>return <i>completion-domain</i>(sndr);</code>

            - `return default_domain();`

    8.  <pre highlight="c++">
        template&lt;class Sender, class Env>
        constexpr auto <i>get-domain-late</i>(const Sender& sndr, const Env& env) noexcept;
        </pre>

        1. <i>Effects:</i> Equivalent to:

            - If <code><i>sender-for</i>&lt;Sender, transfer_t></code> is `true`, then
                <code>return <i>query-or-default</i>(get_domain, sch, default_domain())</code> where `sch`
                is the scheduler that was used to construct `sndr`,

            - Otherwise, `return get_domain(get_env(sndr));` if that expression is well-formed,

            - Otherwise, <code>return <i>completion-domain</i>&lt;<i>X</i>>(sndr);</code>
                if that expression is well-formed and its type is not <code>*X*</code> where
                <code>*X*</code> is an unspecified type,

            - Otherwise, `return get_domain(env);` if that expression is well-formed,

            - Otherwise, `return get_domain(get_scheduler(env));` if that expression is well-formed,

            - Otherwise, `return default_domain();`.

            [*Note:* The `transfer` algorithm is unique in that it ignores the execution domain of
            its predecessor, using only its destination scheduler to select a customization. *--end note*]


    9.  <pre highlight="c++">
        template&lt;class... T>
        struct <i>tuple-like</i> {
          T<sub><i>0</i></sub> <i>t<sub>0</sub></i>;      // exposition only
          T<sub><i>1</i></sub> <i>t<sub>1</sub></i>;      // exposition only
            ...
          T<sub><i>n-1</i></sub> <i>t<sub>n-1</sub></i>;   // exposition only
        };
        </pre>

        - <span class="wg21note">An expression of type <code><i>tuple-like</i></code> is usable as the initializer
            of a structured binding declaration [dcl.struct.bind].</span>

    10. <pre highlight="c++">
        template &lt;semiregular Tag, <i>movable-value</i> Data, sender... Child>
        constexpr auto <i>make-sender</i>(Tag, Data&& data, Child&&... child);
        </pre>

        1. <i>Returns:</i> A prvalue of type <code><i>basic-sender</i>&lt;Tag, decay_t&lt;Data>, decay_t&lt;Child>...></code>
            where the <code><i>tag</i></code> member has been default-initialized and the
            <code><i>data</i></code> and <code><i>child<sub>n</sub></i>...</code> members have
            been direct initialized from their respective forwarded arguments, where
            <code><i>basic-sender</i></code> is the following exposition-only class template
            except as noted below:

              <pre highlight="c++">
              template&lt;class T, class... Us>
              concept <i>one-of</i> = (same_as&lt;T, Us> ||...); // exposition only

              template&lt;template&lt;class...> class T, class... Args>
              concept <i>well-formed</i> = requires { typename T&lt;Args...>; }; // exposition only

              template&lt;const auto& Fun, class... Args>
              concept <i>cpo-callable</i> = <i>callable</i>&lt;decltype(Fun), Args...>; // exposition only

              template&lt;const auto& Fun, class... Args>
              using <i>cpo-result-t</i> = <i>call-result-t</i>&lt;decltype(Fun), Args...>; // exposition only

              struct <i>default-impls</i> {  // exposition only
                static constexpr auto <i>get-attrs</i> = <i>see below</i>;
                static constexpr auto <i>get-env</i> = <i>see below</i>;
                static constexpr auto <i>get-state</i> = <i>see below</i>;
                static constexpr auto <i>start</i> = <i>see below</i>;
                static constexpr auto <i>complete</i> = <i>see below</i>;
              };

              template&lt;class Tag>
              struct <i>impls-for</i> : <i>default-impls</i> {}; // exposition only

              template&lt;class Sndr, class Rcvr> // exposition only
              using <i>state-type</i> = decay_t&lt;<i>cpo-result-t</i>&lt;
                <i>impls-for</i>&lt;tag_of_t&lt;Sndr>>::<i>get-state</i>, Sndr, Rcvr&>>;

              template&lt;class Index, class Sndr, class Rcvr> // exposition only
              using <i>env-type</i> = <i>cpo-result-t</i>&lt;
                <i>impls-for</i>&lt;tag_of_t&lt;Sndr>>::<i>get-env</i>, Index,
                <i>state-type</i>&lt;Sndr, Rcvr>&, const Rcvr&>>;

              template&lt;class Sndr, class Rcvr, class Index>  // arguments are not associated entities ([lib.tmpl-heads])
                requires <i>well-formed</i>&lt;<i>env-type</i>, Index, Sndr, Rcvr>
              struct <i>basic-receiver</i> {  // exposition only
                using tag_t = tag_of_t&lt;Sndr>; // exposition only
                using receiver_concept = receiver_t;

                template&lt;<i>one-of</i>&lt;set_value_t, set_error_t, set_stopped_t> Tag, class... Args>
                  requires <i>cpo-callable</i>&lt;<i>impls-for</i>&lt;tag_t>::<i>complete</i>,
                    Index, <i>state-type</i>&lt;Sndr, Rcvr>&, Rcvr&, Tag, Args...>
                friend void tag_invoke(Tag, <i>basic-receiver</i>&& self, Args&&... args) noexcept {
                  (void) <i>impls-for</i>&lt;tag_t>::<i>complete</i>(
                    Index(), self.op_->state_, self.op_->rcvr_, Tag(), std::forward&lt;Args>(args)...);
                }

                template&lt;same_as&lt;get_env_t> Tag>
                friend auto tag_invoke(Tag, const <i>basic-receiver</i>& self) noexcept
                  -> <i>env-type</i>&lt;Index, Sndr, Rcvr> {
                  const auto& rcvr = self.op_->rcvr_;
                  return <i>impls-for</i>&lt;tag_t>::<i>get-env</i>(Index(), self.op_->state_, rcvr);
                }

                <i>basic-operation</i>&lt;Sndr, Rcvr>* op_; // exposition only
              };

              constexpr auto <i>connect-all</i> =   // exposition only
                []&lt;class Sndr, class Rcvr, size_t... Is>(
                  <i>basic-operation</i>&lt;Sndr, Rcvr>* op, Sndr&& sndr, index_sequence&lt;Is...>)
                    noexcept( <i><b>TODO</b></i> ) requires ( <i><b>TODO</b></i> ) {
                    auto&& [ign1, ign2, child...] = std::forward&lt;Sndr>(sndr);
                    return <i>tuple-like</i>{connect(
                      std::forward_like&lt;Sndr>(child),
                      <i>basic-receiver</i>&lt;Sndr, Rcvr, integral_constant&lt;size_t, Is>>{op})...};
                  };

              template&lt;class Sndr>
              using <i>indices-for</i> = make_index_sequence&lt;tuple_size_v&lt;Sndr>-2>; // exposition only

              template&lt;class Sndr, class Rcvr>
              using <i>inner-ops-tuple</i> =   // exposition only
                <i>cpo-result-t</i>&lt;<i>connect-all</i>, <i>basic-operation</i>&lt;Sndr, Rcvr>*, Sndr,
                  <i>indices-for</i>&lt;Sndr>>;

              template&lt;class Sndr, class Rcvr> // arguments are not associated entities ([lib.tmpl-heads])
                requires <i>well-formed</i>&lt;<i>state-type</i>, Sndr, Rcvr> &&
                  <i>well-formed</i>&lt;<i>inner-ops-tuple</i>, Sndr, Rcvr>;
              struct <i>basic-operation</i> {  // exposition only
                using tag_t = tag_of_t&lt;Sndr>; // exposition only

                Rcvr rcvr_; // exposition only
                <i>state-type</i>&lt;Sndr, Rcvr> state_; // exposition only
                <i>inner-ops-tuple</i>&lt;Sndr, Rcvr> inner_ops_; // exposition only

                <i>basic-operation</i>(Sndr&& sndr, Rcvr rcvr)  // exposition only
                  : rcvr_(std::move(rcvr))
                  , state_(<i>impls-for</i>&lt;tag_t>::<i>get-state</i>(std::forward&lt;Sndr>(sndr), rcvr_))
                  , inner_ops_(<i>connect-all</i>(this, std::forward&lt;Sndr>(sndr), <i>indices-for</i>&lt;Sndr>()))
                {}

                friend void tag_invoke(start_t, <i>basic-operation</i>& self) noexcept {
                  auto& [ops...] = self.inner_ops_;
                  <i>impls-for</i>&lt;tag_t>::<i>start</i>(self.state_, self.rcvr_, ops...);
                }
              };

              template&lt;class Tag, class Data, class... Child> // arguments are not associated entities ([lib.tmpl-heads])
              struct <i>basic-sender</i> {  // exposition only
                using sender_concept = sender_t;

                template&lt;same_as&lt;get_env_t> GetEnvTag>
                friend decltype(auto) tag_invoke(GetEnvTag, const <i>basic-sender</i>& self) noexcept {
                  return <i>impls-for</i>&lt;Tag>::<i>get-attrs</i>(<i>data</i>, <i>child<sub>0</sub></i>, ... <i>child<sub>n-1</sub></i>);
                }

                template&lt;same_as&lt;connect_t> ConnectTag,
                         <i>decays-to</i>&lt;<i>basic-sender</i>> Self, receiver Rcvr>
                friend auto tag_invoke(ConnectTag, Self&& self, Rcvr rcvr)
                  -> <i>basic-operation</i>&lt;Self, Rcvr> {
                  return {std::forward&lt;Self>(self), std::move(rcvr)};
                }

                template&lt;same_as&lt;get_completion_signatures_t> GetComplSigsTag,
                         <i>decays-to</i>&lt;<i>basic-sender</i>> Self, class Env>
                friend auto tag_invoke(GetComplSigsTag, Self&& self, Env&& env) noexcept
                  -> <i>see below</i> {
                  return {};
                }

                [[no_unique_address]] Tag <i>tag</i>;  // exposition only
                Data <i>data</i>;          // exposition only
                Child<sub><i>0</i></sub> <i>child<sub>0</sub></i>;      // exposition only
                Child<sub><i>1</i></sub> <i>child<sub>1</sub></i>;      // exposition only
                  ...
                Child<sub><i>n-1</i></sub> <i>child<sub>n-1</sub></i>;   // exposition only
              };
              </pre>

        2. It is unspecified whether instances of <code><i>basic-sender</i></code> can be
            aggregate initialized.

        3. <span class="wg21note">An expression of type <code><i>basic-sender</i></code> is usable as the
            initializer of a structured binding declaration
            [dcl.struct.bind].</span>

        4. The member <code><i>default-impls</i>::<i>get-attrs</i></code> is initialized
            with a callable object equal to the following lambda:

              <pre highlight="c++">
              [](const auto& data, const auto&... child) noexcept -> decltype(auto) {
                if constexpr (sizeof...(child) == 1)
                  return <i>FWD-ENV</i>(execution::get_env(child...)); //
                else
                  return empty_env();
              }
              </pre>

        5. The member <code><i>default-impls</i>::<i>get-env</i></code> is initialized
            with a callable object equal to the following lambda:

              <pre highlight="c++">
              []&lt;class Rcvr>(auto index, auto& state, const Rcvr& rcvr) noexcept
                -> decltype(<i>FWD-ENV</i>(execution::get_env(rcvr))) {
                return <i>FWD-ENV</i>(execution::get_env(rcvr));
              }
              </pre>

        6. The member <code><i>default-impls</i>::<i>get-state</i></code> is initialized
            with a callable object equal to the following lambda:

              <pre highlight="c++">
              []&lt;class Sndr>(Sndr&& sndr, auto& rcvr) noexcept -> decltype(auto) {
                return get&lt;1>(std::forward&lt;Sndr>(sndr));
              }
              </pre>

        7. The member <code><i>default-impls</i>::<i>start</i></code> is initialized
            with a callable object equal to the following lambda:

              <pre highlight="c++">
              [](auto& state, auto& rcvr, auto&... ops) noexcept -> completion_signatures<> {
                (execution::start(ops), ...);
                return {};
              }
              </pre>

        8. The member <code><i>default-impls</i>::<i>complete</i></code> is initialized
            with a callable object equal to the following lambda:

              <pre highlight="c++">
              []&lt;class Index, class Rcvr, class Tag, class... Args>(
                Index, auto& state, Rcvr& rcvr, Tag, Args&&... args) noexcept
                  -> completion_signatures&lt;Tag(Args...)>
                    requires <i>callable</i>&lt;Tag, Rcvr, Args...> {
                <i>// Mandates: Index::value == 0</i>
                tag(std::move(rcvr), std::forward&lt;Args>(args)...); 
                return {};
              }
              </pre>

        9. The return type of <code><i>basic-sender</i></code>'s customization of
            `get_completion_signatures` is computed as follows.

              1. Let `Env` be the type of an environment, let the type `Sndr` be
                  the (possibly cv- and ref-qualified) type of a specialization
                  of <code><i>basic-sender</i></code>, and let
                  <code>Child<sub><i>i</i></sub></code> be the type of the
                  <code><i>i</i></code>-th child sender of `Sndr`, with the same
                  cv- and ref-qualifications as `Sndr`.

              2. Let <code>ChildEnv<sub><i>i</i></sub></code> be the type 

### Sender concepts <b>[exec.snd.concepts]</b> ### {#spec-execution.snd.concepts}

1. The `sender` concept defines the requirements for a sender type
    ([async.ops]). The `sender_in` concept defines the requirements for a sender
    type that can create asynchronous operations given an associated environment
    type. The `sender_to` concept defines the requirements for a sender type
    that can connect with a specific receiver type. The `get_env` customization
    point object is used to access a sender's associated attributes. The
    `connect` customization point object is used to connect ([async.ops]) a
    sender and a receiver to produce an operation state.

    <pre highlight="c++">
    template&lt;class Sigs>
      concept <i>valid-completion-signatures</i> = <i>see below</i>;

    template&lt;class S>
      inline constexpr bool enable_sender =
        requires { typename S::is_sender; };

    template&lt;<i>is-awaitable</i>&lt;<i>env-promise</i>&lt;empty_env>> S> <i>// [exec.awaitables]</i>
      inline constexpr bool enable_sender&lt;S> = true;

    template&lt;class S>
      concept sender =
        enable_sender&lt;remove_cvref_t&lt;S>> &&
        requires (const remove_cvref_t&lt;S>& s) {
          { get_env(s) } -> queryable;
          <i>completion-domain</i>(s);
        } &&
        move_constructible&lt;remove_cvref_t&lt;S>> &&  <i>// rvalues are movable, and</i>
        constructible_from&lt;remove_cvref_t&lt;S>, S>; <i>// lvalues are copyable</i>

    template&lt;class S, class E = empty_env>
      concept sender_in =
        sender&lt;S> &&
        requires (S&& s, E&& e) {
          { get_completion_signatures(std::forward&lt;S>(s), std::forward&lt;E>(e)) } ->
            <i>valid-completion-signatures</i>;
        };

    template&lt;class S, class R>
      concept sender_to =
        sender_in&lt;S, env_of_t&lt;R>> &amp;&amp;
        receiver_of&lt;R, completion_signatures_of_t&lt;S, env_of_t&lt;R>>> &amp;&amp;
        requires (S&amp;&amp; s, R&amp;&amp; r) {
          connect(std::forward&lt;S>(s), std::forward&lt;R>(r));
        };
    </pre>

2. Given a subexpression `sndr`, let `Sndr` be `decltype((sndr))`, let `Env` be
    the type of an environment, and let `rcvr` be a receiver whose associated
    environment is `Env`. A completion operation is a <dfn
    export=true>permissible completion</dfn> for `Sndr` and `Env` if its
    completion signature appears in the argument list of the specialization of
    `completion_signatures` denoted by `completion_signatures_of_t<Sndr, Env>`.
    `Sndr` and `Env` model `sender_to<Sndr, Env>` if connecting `sndr` to `rcvr`
    and starting the resulting operation state does not cause the potential
    evaluation of any completion operation on `rcvr` that is not a permissible
    completion for `Sndr` and `Env`.

3. A type `Sigs` satisfies and models the exposition-only concept
    <code><i>valid-completion-signatures</i></code> if it names a specialization
    of the `completion_signatures` class template.

4. <i>Remarks:</i> Pursuant to [namespace.std], users can specialize `enable_sender` to
    `true` for cv-unqualified program-defined types that model `sender`, and `false`
    for types that do not. Such specializations shall be usable in constant
    expressions ([expr.const]) and have type `const bool`.

5. The exposition-only concepts <code><i>sender-of</i></code> and
    <code><i>sender-of-in</i></code> define the requirements for a sender
    type that completes with a given unique set of value result types.

    <pre highlight="c++">
    template&lt;class... As>
      using <i>value-signature</i> = set_value_t(As...); <i>// exposition only</i>

    template&lt;class S, class E, class... Values>
      concept <i>sender-of-in</i> =
        sender_in&lt;S, E> &&
        <i>MATCHING-SIG</i>( <i>// see [exec.general]</i>
          set_value_t(Values...),
          value_types_of_t&lt;S, E, <i>value-signature</i>, type_identity_t>);

    template&lt;class S, class... Values>
      concept <i>sender-of</i> = <i>sender-of-in</i>&lt;S, empty_env, Values...>;
    </pre>

6. Let `s` be an expression such that `decltype((s))` is `S`. The type
    `tag_of_t<S>` is as follows:

      - If the declaration `auto&& [tag, data, ...children] = s;` would be
        well-formed, `tag_of_t<S>` is an alias for `decltype(auto(tag))`.

      - Otherwise, `tag_of_t<S>` is ill-formed.

    <div class="ed-note">
    There is no way in standard C++ to determine whether the above declaration
    is well-formed without causing a hard error, so this presumes compiler
    magic. However, the author anticipates the adoption of [@P2141R1], which
    makes it possible to implement this purely in the library. P2141 has already
    been approved by EWG for C++26.</div>

7. Let <code><i>sender-for</i></code> be an exposition-only concept defined as follows:

    <pre highlight="c++">
    template&lt;class Sender, class Tag>
    concept <i>sender-for</i> =
      sender&lt;Sender> &&
      same_as&lt;tag_of_t&lt;Sender>, Tag>;
    </pre>

8. For a type `T`, <code><i>SET-VALUE-SIG</i>(T)</code> names the type
    `set_value_t()` if `T` is *cv* `void`; otherwise, it names the type
    `set_value_t(T)`.

9. Library-provided sender types:
      - Always expose an overload of a customization of `connect`
          that accepts an rvalue sender.
      - Only expose an overload of a customization of `connect` that
          accepts an lvalue sender if they model `copy_constructible`.
      - Model `copy_constructible` if they satisfy `copy_constructible`.

### Awaitable helpers <b>[exec.awaitables]</b> ### {#spec.exec-awaitables}

1. The sender concepts recognize awaitables as senders. For this clause
    ([exec]), an <dfn export=true>awaitable</dfn> is an expression that would be
    well-formed as the operand of a `co_await` expression within a given
    context.

2. For a subexpression `c`, let <code><i>GET-AWAITER</i>(c, p)</code> be
    expression-equivalent to the series of transformations and conversions
    applied to `c` as the operand of an *await-expression* in a coroutine,
    resulting in lvalue <i>`e`</i> as described by [expr.await]/3.2-4, where `p`
    is an lvalue referring to the coroutine's promise type, `P`. <span
    class="wg21note">This includes the invocation of the promise type's
    `await_transform` member if any, the invocation of the `operator co_await`
    picked by overload resolution if any, and any necessary implicit
    conversions and materializations.</span>

    <div class="ed-note">I have opened
    [cwg#250](https://github.com/cplusplus/CWG/issues/250) to give these
    transformations a term-of-art so we can more easily refer to it here.</div>

3. Let <code><i>is-awaitable</i></code> be the following exposition-only
    concept:

    <pre highlight="c++">
    template&lt;class T>
    concept <i>await-suspend-result</i> = <i>see below</i>;

    template&lt;class A, class P>
    concept <i>is-awaiter</i> = <i>// exposition only</i>
      requires (A& a, coroutine_handle&lt;P> h) {
        a.await_ready() ? 1 : 0;
        { a.await_suspend(h) } -> <i>await-suspend-result</i>;
        a.await_resume();
      };

    template&lt;class C, class P>
    concept <i>is-awaitable</i> =
      requires (C (*fc)() noexcept, P& p) {
        { <i>GET-AWAITER</i>(fc(), p) } -> <i>is-awaiter</i>&lt;P>;
      };
    </pre>

    <code><i>await-suspend-result</i>&lt;T></code> is `true` if and only if one
      of the following is `true`:

        - `T` is `void`, or
        - `T` is `bool`, or
        - `T` is a specialization of `coroutine_handle`.

3. For a subexpression `c` such that `decltype((c))` is type `C`, and
    an lvalue `p` of type `P`, <code><i>await-result-type</i>&lt;C, P></code>
    names the type <code>decltype(<i>GET-AWAITER</i>(c, p).await_resume())</code>.

4. Let <code><i>with-await-transform</i></code> be the exposition-only class template:

    <pre highlight="c++">
    template&lt;class Derived>
    struct <i>with-await-transform</i> {
      template&lt;class T>
      T&& await_transform(T&& value) noexcept {
        return std::forward&lt;T>(value);
      }

      template&lt;class T>
        requires tag_invocable&lt;as_awaitable_t, T, Derived&>
      auto await_transform(T&& value)
        noexcept(nothrow_tag_invocable&lt;as_awaitable_t, T, Derived&>)
        -> tag_invoke_result_t&lt;as_awaitable_t, T, Derived&> {
        return tag_invoke(as_awaitable, std::forward&lt;T>(value), static_cast&lt;Derived&>(*this));
      }
    };
    </pre>

5. Let <code><i>env-promise</i></code> be the exposition-only class template:

    <pre highlight="c++">
    template&lt;class Env>
    struct <i>env-promise</i> : <i>with-await-transform</i>&lt;<i>env-promise</i>&lt;Env>> {
      <i>unspecified</i> get_return_object() noexcept;
      <i>unspecified</i> initial_suspend() noexcept;
      <i>unspecified</i> final_suspend() noexcept;
      void unhandled_exception() noexcept;
      void return_void() noexcept;
      coroutine_handle&lt;> unhandled_stopped() noexcept;

      friend const Env& tag_invoke(get_env_t, const <i>env-promise</i>&) noexcept;
    };
    </pre>

    <span class="wg21note">Specializations of <code><i>env-promise</i></code>
      are only used for the purpose of type computation; its members need not be
      defined.</span>

### `execution::default_domain` <b>[exec.domain.default]</b> ### {#spec-execution.default_domain}

<pre highlight="c++">
struct default_domain {
  template &lt;sender Sender, class... Env>
      requires (sizeof...(Env) &lt;= 1)
    static constexpr sender decltype(auto) transform_sender(Sender&& sndr, const Env&... env) noexcept(<i>see below</i>);

  template &lt;sender Sender, class Env>
    static constexpr decltype(auto) transform_env(Sender&& sndr, Env&& env) noexcept;

  template&lt;class Tag, sender Sender, class... Args>
    static constexpr decltype(auto) apply_sender(Tag, Sender&& sndr, Args&&... args) noexcept(<i>see below</i>);
};
</pre>

#### Static members <b>[exec.domain.default.statics]</b> #### {#spec-execution.default_domain.statics}

<pre highlight="c++">
template &lt;sender Sender, class... Env>
    requires (sizeof...(Env) &lt;= 1)
  constexpr sender decltype(auto) default_domain::transform_sender(Sender&& sndr, const Env&... env) noexcept(<i>see below</i>);
</pre>

1. <i>Returns:</i> `tag_of_t<Sender>().transform_sender(std::forward<Sender>(sndr), env...)`
    if that expression is well-formed; otherwise, `std::forward<Sender>(sndr)`.

2. <i>Remarks:</i> The exception specification is equivalent to:

    <pre highlight="c++">
    noexcept(tag_of_t&lt;Sender>().transform_sender(std::forward&lt;Sender>(sndr), env...))
    </pre>

    if that expression is well-formed; otherwise, `true`;

<pre highlight="c++">
template &lt;sender Sender, class Env>
  constexpr decltype(auto) default_domain::transform_env(Sender&& sndr, Env&& env) noexcept;
</pre>

3. <i>Returns:</i> `tag_of_t<Sender>().transform_env(std::forward<Sender>(sndr), std::forward<Env>(env))`
    if that expression is well-formed; otherwise, `static_cast<Env>(std::forward<Env>(env))`.

4. <i>Mandates:</i> The selected expression in <i>Returns:</i> is not potentially throwing.

<pre highlight="c++">
template&lt;class Tag, sender Sender, class... Args>
  static constexpr decltype(auto) default_domain::apply_sender(Tag, Sender&& sndr, Args&&... args) noexcept(<i>see below</i>);
</pre>

5. <i>Returns:</i> `Tag().apply_sender(std::forward<Sender>(sndr), std::forward<Args>(args)...)`
    if that expression is well-formed; otherwise, this function shall not participate
    in overload resolution.

6. <i>Remarks:</i> The exception specification is equivalent to:

    <pre highlight="c++">
    noexcept(Tag().apply_sender(std::forward&lt;Sender>(sndr), std::forward&lt;Args>(args)...))
    </pre>

### `execution::transform_sender` <b>[exec.snd.transform]</b> ### {#spec-execution.sender_transform}

<pre highlight="c++">
template&lt;class Domain, sender Sender, class... Env>
    requires (sizeof...(Env) &lt;= 1)
  constexpr sender decltype(auto) transform_sender(Domain dom, Sender&& sndr, const Env&... env);
</pre>

1. <i>Returns:</i> Let `s2` be the expression
      `dom.transform_sender(std::forward<Sender>(sndr), env...)` if that
      expression is well-formed; otherwise,
      `default_domain().transform_sender(std::forward<Sender>(sndr), env...)`.
      If `s2` and `sndr` have the same type ignoring _cv_ qualifiers,
      returns `s2`; otherwise, `transform_sender(dom, s2, env...)`.

<pre highlight="c++">
template&lt;class Domain, sender Sender, class Env>
  constexpr decltype(auto) transform_env(Domain dom, Sender&& sndr, Env&& env) noexcept;
</pre>

3. <i>Returns:</i> `dom.transform_sender(std::forward<Sender>(sndr), std::forward<Env>(env))` if that
      expression is well-formed; otherwise,
      `default_domain().transform_sender(std::forward<Sender>(sndr), std::forward<Env>(env))`.

### `execution::apply_sender` <b>[exec.snd.apply]</b> ### {#spec-execution.apply_sender}

<pre highlight="c++">
template&lt;class Domain, class Tag, sender Sender, class... Args>
  constexpr decltype(auto) apply_sender(Domain dom, Tag, Sender&& sndr, Args&&... args) noexcept(<i>see below</i>);
</pre>

1. <i>Returns:</i> `dom.apply_sender(Tag(), std::forward<Sender>(sndr), std::forward<Args>(args)...)` if that
      expression is well-formed; otherwise,
      `default_domain().apply_sender(Tag(), std::forward<Sender>(sndr), std::forward<Args>(args)...)`
      if that expression is well-formed; otherwise, this function shall not participate in
      overload resolution.

2. <i>Remarks:</i> The exception specification is equivalent to:

      <pre highlight="c++">
      noexcept(dom.apply_sender(Tag(), std::forward&lt;Sender>(sndr), std::forward&lt;Args>(args)...))
      </pre>

      if that expression is well-formed; otherwise,

      <pre highlight="c++">
      noexcept(default_domain().apply_sender(Tag(), std::forward&lt;Sender>(sndr), std::forward&lt;Args>(args)...))
      </pre>

### `execution::get_completion_signatures` <b>[exec.getcomplsigs]</b> ### {#spec-execution.getcomplsigs}

1. `get_completion_signatures` is a customization point object. Let `s` be an
    expression such that `decltype((s))` is `S`, and let `e` be an expression
    such that `decltype((e))` is `E`. Then `get_completion_signatures(s, e)` is
    expression-equivalent to:

    1. `tag_invoke_result_t<get_completion_signatures_t, S, E>{}` if that
        expression is well-formed,

        * <i>Mandates:</i>
            <code><i>valid-completion-signatures</i>&lt;Sigs></code>, where
            `Sigs` names the type
            `tag_invoke_result_t<get_completion_signatures_t, S, E>`.

    2. Otherwise, `remove_cvref_t<S>::completion_signatures{}` if that expression is well-formed,

        * <i>Mandates:</i> <code><i>valid-completion-signatures</i>&lt;Sigs></code>,
            where `Sigs` names the type `remove_cvref_t<S>::completion_signatures`.

    3. Otherwise, if <code><i>is-awaitable</i>&lt;S, <i>env-promise</i>&lt;E>></code>
        is `true`, then:

            <pre highlight="c++">
            completion_signatures<
              <i>SET-VALUE-SIG</i>(<i>await-result-type</i>&lt;S, <i>env-promise</i>&lt;E>>), <i>// see [exec.snd.concepts]</i>
              set_error_t(exception_ptr),
              set_stopped_t()>{}
            </pre>

    4. Otherwise, `get_completion_signatures(s, e)` is ill-formed.

2. Let `r` be an rvalue receiver of type `R`, and let `S` be the type of a
    sender such that `sender_in<S, env_of_t<R>>` is `true`. Let `Sigs...` be the
    template arguments of the `completion_signatures` specialization named by
    `completion_signatures_of_t<S, env_of_t<R>>`. Let <code><i>CSO</i></code> be
    a completion function. If sender `S` or its operation state cause the
    expression <code><i>CSO</i>(r, args...)</code> to be potentially evaluated
    ([basic.def.odr]) then there shall be a signature `Sig` in `Sigs...` such
    that <code><i>MATCHING-SIG</i>(tag_t&lt;<i>CSO</i>>(decltype(args)...),
    Sig)</code> is `true` ([exec.general]).

### `execution::connect` <b>[exec.connect]</b> ### {#spec-execution.senders.connect}

1. `connect` connects ([async.op]) a sender with a receiver.

2. The name `connect` denotes a customization point object. For subexpressions
    `s` and `r`, let `S` be `decltype((s))` and `R` be `decltype((r))`, and let
    `DS` and `DR` be the decayed types of `S` and `R`, respectively. 
    
3. Let <code><i>connect-awaitable-promise</i></code> be the following class:

    <pre highlight="c++">
    struct <i>connect-awaitable-promise</i> : <i>with-await-transform</i>&lt;<i>connect-awaitable-promise</i>> {
      DR& <i>rcvr</i>; <i>// exposition only</i>

      <i>connect-awaitable-promise</i>(DS&, DR& r) noexcept : <i>rcvr</i>(r) {}

      suspend_always initial_suspend() noexcept { return {}; }
      [[noreturn]] suspend_always final_suspend() noexcept { std::terminate(); }
      [[noreturn]] void unhandled_exception() noexcept { std::terminate(); }
      [[noreturn]] void return_void() noexcept { std::terminate(); }

      coroutine_handle<> unhandled_stopped() noexcept {
        set_stopped((DR&&) <i>rcvr</i>);
        return noop_coroutine();
      }

      <i>operation-state-task</i> get_return_object() noexcept {
        return <i>operation-state-task</i>{
          coroutine_handle&lt;<i>connect-awaitable-promise</i>>::from_promise(*this)};
      }

      friend auto tag_invoke(get_env_t, <i>connect-awaitable-promise</i>& self)
        noexcept(<i>nothrow-callable</i>&lt;get_env_t, const DR&>) -> env_of_t&lt;const DR&> {
        return get_env(self.<i>rcvr</i>);
      }
    };
    </pre>

4. Let <code><i>operation-state-task</i></code> be the following class:

    <pre highlight="c++">
    struct <i>operation-state-task</i> {
      using promise_type = <i>connect-awaitable-promise</i>;
      coroutine_handle<> <i>coro</i>; <i>// exposition only</i>

      explicit <i>operation-state-task</i>(coroutine_handle<> h) noexcept : <i>coro</i>(h) {}
      <i>operation-state-task</i>(<i>operation-state-task</i>&& o) noexcept
        : <i>coro</i>(exchange(o.<i>coro</i>, {})) {}
      ~<i>operation-state-task</i>() { if (<i>coro</i>) <i>coro</i>.destroy(); }

      friend void tag_invoke(start_t, <i>operation-state-task</i>& self) noexcept {
        self.<i>coro</i>.resume();
      }
    };
    </pre>

5. Let `V` name the type <code><i>await-result-type</i>&lt;DS,
    <i>connect-awaitable-promise</i>></code>, let `Sigs` name the type:
    
    <pre highlight="c++">
    completion_signatures&lt;
      <i>SET-VALUE-SIG</i>(V), <i>// see [exec.snd.concepts]</i>
      set_error_t(exception_ptr),
      set_stopped_t()>
    </pre>

    and let <code><i>connect-awaitable</i></code> be an exposition-only
    coroutine defined as follows:

    <pre highlight="c++">
    template&lt;class Fun, class... Ts>
    auto <i>suspend-complete</i>(Fun fun, Ts&&... as) noexcept { <i>// exposition only</i>
      auto fn = [&, fun]() noexcept { fun(std::forward&lt;Ts>(as)...); };

      struct awaiter {
        decltype(fn) <i>fn_</i>;

        static bool await_ready() noexcept { return false; }
        void await_suspend(coroutine_handle<>) noexcept { <i>fn_</i>(); }
        [[noreturn]] void await_resume() noexcept { unreachable(); }
      };
      return awaiter{fn};
    };

    <i>operation-state-task</i> <i>connect-awaitable</i>(DS s, DR r) requires receiver_of&lt;DR, Sigs> {
      exception_ptr ep;
      try {
        if constexpr (same_as&lt;V, void>) {
          co_await std::move(s);
          co_await <i>suspend-complete</i>(set_value, std::move(r));
        } else {
          co_await <i>suspend-complete</i>(set_value, std::move(r), co_await std::move(s));
        }
      } catch(...) {
        ep = current_exception();
      }
      co_await <i>suspend-complete</i>(set_error, std::move(r), std::move(ep));
    }
    </pre>

6. If `S` does not satisfy `sender` or if `R` does not satisfy `receiver`,
    `connect(s, r)` is ill-formed. Otherwise, the expression `connect(s, r)` is
    expression-equivalent to:

    1. `tag_invoke(connect, s, r)` if
        <code><i>connectable-with-tag-invoke</i>&lt;S, R></code> is modeled.

        * <i>Mandates:</i> The type of the `tag_invoke` expression above
            satisfies `operation_state`.

    2. Otherwise, <code><i>connect-awaitable</i>(s, r)</code> if that expression is
        well-formed.

    3. Otherwise, `connect(s, r)` is ill-formed.

### Sender factories <b>[exec.factories]</b> ### {#spec-execution.senders.factories}

#### `execution::schedule` <b>[exec.schedule]</b> #### {#spec-execution.senders.schedule}

1. `schedule` obtains a schedule-sender ([async.ops]) from a scheduler.

2. The name `schedule` denotes a customization point object. For some
    subexpression `s`, the expression `schedule(s)` is expression-equivalent to:

    1. `tag_invoke(schedule, s)`, if that expression is valid. If the function
        selected by `tag_invoke` does not return a sender whose `set_value`
        completion scheduler is equivalent to `s`, the behavior of calling
        `schedule(s)` is undefined.

        * <i>Mandates:</i> The type of the `tag_invoke` expression above
            satisfies `sender`.

    2. Otherwise, `schedule(s)` is ill-formed.

#### `execution::just`, `execution::just_error`, `execution::just_stopped` <b>[exec.just]</b> #### {#spec-execution.senders.just}

1. `just`, `just_error`, and `just_stopped` are sender factories whose
    asynchronous operations complete synchronously in their start operation
    with a value completion operation, an error completion operation, or a
    stopped completion operation respectively.
   
2. Let `just-sender` be the class template:

    <pre highlight="c++">
    template&lt;class Tag, <i>movable-value</i>... Ts> // arguments are not associated entities ([lib.tmpl-heads])
    struct <i>just-sender</i> { // exposition only
      using is_sender = <i>unspecified</i>;
      using completion_signatures =
        execution::completion_signatures&lt;Tag(Ts...)>;

      tuple&lt;Ts...> <i>vs</i>; // exposition only

      template&lt;class R> // arguments are not associated entities ([lib.tmpl-heads])
      struct <i>operation</i> { // exposition only
        tuple&lt;Ts...> <i>vs</i>; // exposition only
        R <i>r</i>; // exposition only

        friend void tag_invoke(start_t, <i>operation</i>& s) noexcept {
          apply([&s](Ts&... values) {
            Tag()(std::move(s.<i>r</i>), std::move(values)...);
          }, s.<i>vs</i>);
        }
      };

      template&lt;receiver_of&lt;completion_signatures> R>
        requires (copy_constructible&ltTs> &&...)
      friend <i>operation</i>&lt;decay_t&lt;R>> tag_invoke(connect_t, const <i>just-sender</i>& s, R && r) {
        return { s.<i>vs</i>, std::forward&lt;R>(r) };
      }

      template&lt;receiver_of&lt;completion_signatures> R>
      friend <i>operation</i>&lt;decay_t&lt;R>> tag_invoke(connect_t, <i>just-sender</i>&& s, R && r) {
        return { std::move(s.<i>vs</i>), std::forward&lt;R>(r) };
      }
    };
    </pre>

3. The name `just` denotes a customization point object. For some pack of
    subexpressions `vs`, let `Vs` be the template parameter pack
    `decltype((vs))`. `just(vs...)` is expression-equivalent to
    <code><i>just-sender</i>&lt;set_value_t,
    remove_cvref_t&lt;Vs>...>({vs...})</i></code>.

4. The name `just_error` denotes a customization point object. For some
    subexpression `err`, let `Err` be `decltype((err))`. `just_error(err)` is expression-equivalent to
    <code><i>just-sender</i>&lt;set_error_t, remove_cvref_t&lt;Err>>({err})</i></code>.

5. Then name `just_stopped` denotes a customization point object. `just_stopped()`
    is expression-equivalent to
    <code><i>just-sender</i>&lt;set_stopped_t>()</i></code>.

6. When used as the initializer of a structured binding declaration,
    expressions of type <code><i>just-sender</i>&lt;Tag, Ts...></code> behave as do
    expressions of type <code><i>basic-sender</i>&lt;Tag, <i>tuple-like</i>&lt;Ts...>></code>.

#### `execution::read` <b>[exec.read]</b> #### {#spec-execution.senders.read}

1. `read` is a sender factory for a sender whose asynchronous operation
    completes synchronously in its start operation with a value completion
    result equal to a value read from the receiver's associated environment.

2. `read` is a customization point object of the unspecified class type:

    <pre highlight="c++">
    template&lt;class Tag>
      struct <i>read-sender</i>; // exposition only

    struct <i>read-t</i> { // exposition only
      template&lt;class Tag>
        constexpr <i>read-sender</i>&lt;Tag> operator()(Tag) const noexcept {
          return {};
        }
    };
    </pre>

3. <code><i>read-sender</i></code> is the exposition-only class template:

    <pre highlight="c++">
    template&lt;class Tag>
      struct <i>read-sender</i> { // exposition only
        using is_sender = <i>unspecified</i>;
        template&lt;class R>
          struct <i>operation-state</i> { // exposition only
            R r_; // exposition only

            friend void tag_invoke(start_t, <i>operation-state</i>& s) noexcept {
              <i>TRY-SET-VALUE</i>(std::move(s.r_), Tag{}(get_env(s.r_)));
            }
          };

        template&lt;receiver R>
        friend <i>operation-state</i>&lt;decay_t&lt;R>> tag_invoke(connect_t, <i>read-sender</i>, R && r) {
          return { std::forward&lt;R>(r) };
        }

        template&lt;class Env>
            requires <i>callable</i>&lt;Tag, Env>
          friend auto tag_invoke(get_completion_signatures_t, <i>read-sender</i>, Env)
            -> completion_signatures&lt;
              set_value_t(<i>call-result-t</i>&lt;Tag, Env>), set_error_t(exception_ptr)>; <i>// not defined</i>

        template&lt;class Env>
            requires <i>nothrow-callable</i>&lt;Tag, Env>
          friend auto tag_invoke(get_completion_signatures_t, <i>read-sender</i>, Env)
            -> completion_signatures&lt;set_value_t(<i>call-result-t</i>&lt;Tag, Env>)>; <i>// not defined</i>

        friend empty_env tag_invoke(get_env_t, const <i>read-sender</i>&) noexcept {
          return {};
        }
      };
    </pre>

### Sender adaptors <b>[exec.adapt]</b> ### {#spec-execution.senders.adapt}

#### General <b>[exec.adapt.general]</b> #### {#spec-execution.senders.adapt.general}

1. Subclause [exec.adapt] specifies a set of sender adaptors.

2. The bitwise OR operator is overloaded for the purpose of creating sender
    chains. The adaptors also support function call syntax with equivalent
    semantics.

3. Unless otherwise specified, a sender adaptor is required to not begin
    executing any functions that would observe or modify any of the arguments
    of the adaptor before the returned sender is connected with a receiver using
    `connect`, and `start` is called on the resulting operation state. This
    requirement applies to any function that is selected by the implementation
    of the sender adaptor.

4. Unless otherwise specified, a parent sender ([async.ops]) with a single child
    sender `s` has an associated attribute object equal to
    <code><i>FWD-ENV</i>(get_env(s))</code> ([exec.fwd.env]). Unless
    otherwise specified, a parent sender with more than one child senders has an
    associated attributes object equal to <code>empty_env{}</code>. These
    requirements apply to any function that is selected by the implementation of
    the sender adaptor.

5. Unless otherwise specified, when a parent sender is connected to a receiver
    `r`, any receiver used to connect a child sender has an associated
    environment equal to <code><i>FWD-ENV</i>(get_env(r))</code>. This
    requirements applies to any sender returned from a function that is selected
    by the implementation of such sender adaptor.

6. For any sender type, receiver type, operation state type, queryable type, or
    coroutine promise type that is part of the implementation of any sender
    adaptor in this subclause and that is a class template, the template
    arguments do not contribute to the associated entities
    ([basic.lookup.argdep]) of a function call where a specialization of the
    class template is an associated entity.

    [*Example:*

    <pre highlight="c++">
    namespace <i>sender-adaptors</i> { // exposition only
      template&lt;class Sch, class S> // arguments are not associated entities ([lib.tmpl-heads])
      class <i>on-sender</i> {
        // ...
      };

      struct on_t {
        template&lt;scheduler Sch, sender S>
        <i>on-sender</i>&lt;Sch, S> operator()(Sch&& sch, S&& s) const {
          // ...
        }
      };
    }
    inline constexpr <i>sender-adaptors</i>::on_t on{};
    </pre>

    -- <i>end example</i>]

7. If a sender returned from a sender adaptor specified in this subclause is
    specified to include `set_error_t(E)` among its set of completion signatures
    where `decay_t<E>` names the type `exception_ptr`, but the implementation
    does not potentially evaluate an error completion operation with an
    `exception_ptr` argument, the implementation is allowed to omit the
    `exception_ptr` error completion signature from the set.

#### Sender adaptor closure objects <b>[exec.adapt.objects]</b> #### {#spec-execution.senders.adaptor.objects}

1. A <i>pipeable sender adaptor closure object</i> is a function object that accepts one or more `sender` arguments and returns a `sender`. For a sender adaptor closure object `C` and an expression `S` such that `decltype((S))` models `sender`, the following
    expressions are equivalent and yield a `sender`:

    <pre highlight="c++">
    C(S)
    S | C
    </pre>

    Given an additional pipeable sender adaptor closure object `D`, the expression `C | D` produces another pipeable sender adaptor closure object `E`:

    `E` is a perfect forwarding call wrapper ([func.require]) with the following properties:

    - Its target object is an object `d` of type `decay_t<decltype((D))>` direct-non-list-initialized with `D`.

    - It has one bound argument entity, an object `c` of type `decay_t<decltype((C))>` direct-non-list-initialized with `C`.

    - Its call pattern is `d(c(arg))`, where `arg` is the argument used in a function call expression of `E`.

    The expression `C | D` is well-formed if and only if the initializations of the state entities of `E` are all well-formed.

2. An object `t` of type `T` is a pipeable sender adaptor closure object if `T` models `derived_from<sender_adaptor_closure<T>>`, `T` has no other base
    classes of type `sender_adaptor_closure<U>` for any other type `U`, and `T` does not model `sender`.

3. The template parameter `D` for `sender_adaptor_closure` can be an incomplete type. Before any expression of type <code><i>cv</i> D</code> appears as
    an operand to the `|` operator, `D` shall be complete and model `derived_from<sender_adaptor_closure<D>>`. The behavior of an expression involving an
    object of type <code><i>cv</i> D</code> as an operand to the `|` operator is undefined if overload resolution selects a program-defined `operator|`
    function.

4. A <i>pipeable sender adaptor object</i> is a customization point object that accepts a `sender` as its first argument and returns a `sender`.

5. If a pipeable sender adaptor object accepts only one argument, then it is a pipeable sender adaptor closure object.

6. If a pipeable sender adaptor object `adaptor` accepts more than one argument, then let `s` be an expression such that `decltype((s))` models `sender`,
    let `args...` be arguments such that `adaptor(s, args...)` is a well-formed expression as specified in the rest of this subclause
    ([exec.adapt.objects]), and let `BoundArgs` be a pack that denotes `decay_t<decltype((args))>...`. The expression `adaptor(args...)`
    produces a pipeable sender adaptor closure object `f` that is a perfect forwarding call wrapper with the following properties:

    - Its target object is a copy of `adaptor`.

    - Its bound argument entities `bound_args` consist of objects of types `BoundArgs...` direct-non-list-initialized with `std::forward<decltype((args))>(args)...`, respectively.

    - Its call pattern is `adaptor(r, bound_args...)`, where `r` is the argument used in a function call expression of `f`.

    The expression `adaptor(args...)` is well-formed if and only if the initializations of the bound argument entities of the result, as specified above,
     are all well-formed.

#### `execution::on` <b>[exec.on]</b> #### {#spec-execution.senders.adapt.on}

1. `on` adapts an input sender into a sender that will start on an execution
    agent belonging to a particular scheduler's associated execution resource.

2. Let <code><i>replace-scheduler</i>(e, sch)</code> be an expression denoting
    an object `e2` such that `get_scheduler(e2)` returns a copy of `sch`,
    `get_domain(e2)` is expression-equivalent to `get_domain(sch)`, and
    `tag_invoke(tag, e2, args...)` is expression-equivalent to `tag(e, args...)`
    for all arguments `args...` and for all `tag` whose type satisfies
    <code><i>forwarding-query</i></code> and is not `get_scheduler_t` or
    `get_domain_t`.

3. The name `on` denotes a customization point object. For some subexpressions
    `sch` and `s`, let `Sch` be `decltype((sch))` and `S` be `decltype((s))`. If
    `Sch` does not satisfy `scheduler`, or `S` does not satisfy `sender`,
    `on(sch, s)` is ill-formed. Otherwise, the expression `on(sch, s)` is
    expression-equivalent to:

        <pre highlight="c++">
        transform_sender(
          <i>query-or-default</i>(get_domain, sch, default_domain()),
          <i>make-sender</i>(on, sch, s));
        </pre>

    1. If a sender `S` returned from `on(sch, s)` is
        connected with a receiver `R` with environment `E` such that
        <code>transform_sender(<i>get-domain-late</i>(S, E), S, E)</code> does not return a sender that starts `s` on an execution agent of the associated execution resource of `sch` when
        started, the behavior of calling `connect(S, R)` is undefined.

4. Let `s` and `e` be subexpressions such that `S` is `decltype((s))`. If
    <code><i>sender-for</i>&lt;S, on_t></code> is `false`, then the expression
    `on_t().transform_sender(s, e)` is ill-formed; otherwise, it returns a sender `s1`
    such that when `s1` is connected with some receiver `out_r`, it:

    1. Constructs a receiver `r` such that:

        1. When `set_value(r)` is called, it calls `connect(s, r2)`, where `r2` is as specified below, which results in `op_state3`. It calls `start(op_state3)`. If any of these throws an exception, it calls `set_error` on `out_r`, passing `current_exception()` as the second argument.

        2. `set_error(r, e)` is expression-equivalent to `set_error(out_r, e)`.

        3. `set_stopped(r)` is expression-equivalent to `set_stopped(out_r)`.

        4. `get_env(r)` is expression-equivalent to `get_env(out_r)`.

    2. Calls `schedule(sch)`, which results in `s2`. It then calls `connect(s2, r)`, resulting in `op_state2`.

    3. `op_state2` is wrapped by a new operation state, `op_state1`, that is returned to the caller.

    4. `r2` is a receiver that wraps a reference to `out_r` and forwards all
        completion operations to it. In addition, `get_env(r2)` returns
        <code><i>replace-scheduler</i>(e, sch)</code>.

    5. When `start` is called on `op_state1`, it calls `start` on `op_state2`.

    6. The lifetime of `op_state2`, once constructed, lasts until either `op_state3` is constructed or `op_state1` is destroyed, whichever comes first. The lifetime of `op_state3`, once constructed, lasts until `op_state1` is destroyed.

5. Let `s` and `e` be subexpressions such that `S` is `decltype((s))`. If 
    <code><i>sender-for</i>&lt;S, on_t></code> is `false`, then the expression
    `on_t().transform_env(s, e)` is ill-formed;
    otherwise, let `sch` be the scheduler used to construct `s`.
    `on_t().transform_env(s, e)` is equal to <code><i>replace-scheduler</i>(e, sch)</code>.

#### `execution::transfer` <b>[exec.transfer]</b> #### {#spec-execution.senders.adapt.transfer}

1. `transfer` adapts a sender into a sender with a different associated
    `set_value` completion scheduler. <span class="wg21note">It results in a
    transition between different execution resources when executed.</span>

2. The name `transfer` denotes a customization point object. For some
    subexpressions `sch` and `s`, let `Sch` be `decltype((sch))` and `S` be
    `decltype((s))`. If `Sch` does not satisfy `scheduler`, or `S` does not
    satisfy `sender`, `transfer(s, sch)` is ill-formed. Otherwise, the
    expression `transfer(s, sch)` is expression-equivalent to:

        <pre highlight="c++">
        transform_sender(
          <i>get-domain-early</i>(s),
          <i>make-sender</i>(transfer, sch, s));
        </pre>

    If a sender `S` returned from `transfer(s, sch)` is connected with a
    receiver `R` with environment `E` such that
    <code>transform_sender(<i>get-domain-late</i>(S, E), S, E)</code> does not
    return a sender that is a result of a call to
    <code>transform_sender(<i>get-domain-late</i>(S, E), schedule_from(sch, s2), E)</code>,
    where `s2` is a sender that sends values equal to those sent by `s`,
    the behavior of calling `connect(S, R)` is undefined.

3. For a sender `t` returned from `transfer(s, sch)`, `get_env(t)` shall return
    a queryable object `q` such that `get_domain(q)` is expression-equivalent to
    `get_domain(sch)` and `get_completion_scheduler<CPO>(q)` returns a copy of
    `sch`, where `CPO` is either `set_value_t` or `set_stopped_t`. <span
    class="wg21note">The `get_completion_scheduler<set_error_t>` query is not
    implemented, as the scheduler cannot be guaranteed in case an error is
    thrown while trying to schedule work on the given scheduler object.</span>
    For all other query objects <code><i>Q</i></code> whose type satisfies
    <code><i>forwarding-query</i></code>, the expression <code><i>Q</i>(q,
    args...)</code> shall be equivalent to <code><i>Q</i>(get_env(s),
    args...)</code>.

4. Let `s` and `e` be subexpressions such that `S` is `decltype((s))`. If
    <code><i>sender-for</i>&lt;S, transfer_t></code> is `false`, then the expression
    `transfer_t().transform_sender(s, e)` is ill-formed; otherwise, it
    is equal to:

        <pre highlight="c++">
        const auto& env = e;
        auto domain = <i>get-domain-late</i>(s, env);
        auto [tag, data, child] = s;
        return schedule_from(std::move(data), std::move(child));
        </pre>

    <span class="wg21note">This causes the `transfer(s, sch)` sender to become
    `schedule_from(sch, s)` when it is connected with a receiver whose
    execution domain does not customize `transfer`.</span>

#### `execution::schedule_from` <b>[exec.schedule.from]</b> #### {#spec-execution.senders.adaptors.schedule_from}

1. `schedule_from` schedules work dependent on the completion of a sender onto a
    scheduler's associated execution resource. <span
    class="wg21note">`schedule_from` is not meant to be used in user code; it is
    used in the implementation of `transfer`.</span>

3. The name `schedule_from` denotes a customization point object. For some
    subexpressions `sch` and `s`, let `Sch` be `decltype((sch))` and `S` be
    `decltype((s))`. If `Sch` does not satisfy `scheduler`, or `S` does not
    satisfy `sender`, `schedule_from` is ill-formed. Otherwise, the expression
    `schedule_from(sch, s)` is expression-equivalent to:

        <pre highlight="c++">
        transform_sender(
          <i>query-or-default</i>(get_domain, sch, default_domain()),
          <i>make-schedule-from-sender</i>(sch, s));
        </pre>

        where <code><i>make-schedule-from-sender</i>(sch, s)</code> is expression-equivalent to
        <code><i>make-sender</i>(schedule_from, sch, s)</code> and returns a sender object
        `s2` that behaves as follows:

    1. When `s2` is connected with some receiver `out_r`, it:

        1. Constructs a receiver `r` such that when a receiver completion
            operation <code><i>Tag</i>(r, args...)</code> is called, it
            decay-copies `args...` into `op_state` (see below) as `args2...` and
            constructs a receiver `r2` such that:

            1. When `set_value(r2)` is called, it calls
                <code><i>Tag</i>(out_r, std::move(args2)...)</code>.

            2. `set_error(r2, e)` is expression-equivalent to `set_error(out_r, e)`.

            3. `set_stopped(r2)` is expression-equivalent to `set_stopped(out_r)`.

            4. `get_env(r2)` is equal to `get_env(r)`.

            It then calls `schedule(sch)`, resulting in a sender `s3`. It then
            calls `connect(s3, r2)`, resulting in an operation state
            `op_state3`. It then calls `start(op_state3)`. If any of these
            throws an exception, it catches it and calls `set_error(out_r,
            current_exception())`. If any of these expressions would be
            ill-formed, then <code><i>Tag</i>(r, args...)</code> is ill-formed.

        2. Calls `connect(s, r)` resulting in an operation state `op_state2`. If
            this expression would be ill-formed, `connect(s2, out_r)` is
            ill-formed.

        3. Returns an operation state `op_state` that contains `op_state2`. When
            `start(op_state)` is called, calls `start(op_state2)`. The lifetime
            of `op_state3` ends when `op_state` is destroyed.

    2. If a sender `S` returned from `schedule_from(sch, s)` is connected with a
        receiver `R` with environmment `E` such that
        <code>transform_sender(<i>get-domain-late</i>(S, E), S, E)</code> does not
        return a sender that completes on an execution agent belonging to the
        associated execution resource of `sch` and completing with the same
        async result ([async.ops]) as `s`, the behavior of calling
        `connect(S, R)` is undefined.


3. For a sender `t` returned from `schedule_from(sch, s)`, `get_env(t)` shall
    return a queryable object `q` such that `get_domain(q)` is
    expression-equivalent to `get_domain(sch)` and
    `get_completion_scheduler<CPO>(q)` returns a copy of `sch`, where `CPO` is
    either `set_value_t` or `set_stopped_t`. The
    `get_completion_scheduler<set_error_t>` query is not implemented, as the
    scheduler cannot be guaranteed in case an error is thrown while trying to
    schedule work on the given scheduler object. For all other query objects
    <code><i>Q</i></code> whose type satisfies
    <code><i>forwarding-query</i></code>, the expression <code><i>Q</i>(q,
    args...)</code> shall be equivalent to <code><i>Q</i>(get_env(s),
    args...)</code>.

#### `execution::then`, `execution::upon_error`, `execution::upon_stopped` <b>[exec.then]</b> #### {#spec-execution.senders.adaptor.then}

1. `then` attaches an invocable as a continuation for an input sender's value
    completion operation. `upon_error` and `upon_stopped` do the same for the
    error and stopped completion operations respectively, sending the result
    of the invocable as a value completion.

2. The names `then`, `upon_error`, and `upon_stopped` denote customization point
    objects. Let the expression <code><i>then-cpo</i></code> be one of `then`,
    `upon_error`, or `upon_stopped`. For subexpressions `s` and `f`, let `S` be
    `decltype((s))` and let `F` be the decayed type of `f`. If `S` does not
    satisfy `sender`, or `F` does not satisfy <code><i>movable-value</i></code>,
    <code><i>then-cpo</i>(s, f)</code> is ill-formed.
    
3. Otherwise, the expression <code><i>then-cpo</i>(s, f)</code> is
    expression-equivalent to:

        <pre highlight="c++">
        transform_sender(
          <i>get-domain-early</i>(s),
          <i>make-sender</i>(<i>then-cpo</i>, f, s));
        </pre>

4. For `then`, `upon_error`, and `upon_stopped`, let <code><i>set-cpo</i></code>
    be `set_value_t`, `set_error_t`, and `set_stopped_t` respectively. The
    exposition-only class template <code><i>impls-for</i></code>
    ([exec.snd.general]) is specialized for each of `then_t`, `upon_error_t`,
    and `upon_stopped_t` as follows:

        <pre highlight="c++">
        template&lt;>
        struct <i>impls-for</i>&lt;<i>then-cpo</i>> : <i>default-impls</i> {
          static constexpr auto complete =
            []&lt;class Tag, class... Args>(auto, auto& fn, auto& rcvr, Tag, Args&&... args) noexcept -> void {
              if constexpr (same_as&lt;Tag, <i>set-cpo</i>>) {
                <i>TRY-SET-VALUE</i>(std::move(rcvr),
                              invoke(std::move(fn), std::forward&lt;Args>(args)...));
              } else {
                Tag()(std::move(rcvr), std::forward&lt;Args>(args)...);
              }
            }
        };
        </pre>

5. The expression <code><i>then-cpo</i>(s, f)</code> has undefined behavior
    unless it returns a sender `out_s` that:

    1. Invokes `f` or a copy of such with the value, error, or stopped result
        datums of `s` (for `then`, `upon_error`, and `upon_stopped`
        respectively), using the result value of `f` as `out_s`'s value
        completion, and

    2. Forwards all other completion operations unchanged.

#### `execution::let_value`, `execution::let_error`, `execution::let_stopped`,  <b>[exec.let]</b> #### {#spec-execution.senders.adapt.let}

1. `let_value` transforms a sender's value completion into a new child
    asynchronous operation. `let_error` transforms a sender's error completion
    into a new child asynchronous operation. `let_stopped` transforms a sender's
    stopped completion into a new child asynchronous operation.

2. Let the expression <code><i>let-cpo</i></code> be one of `let_value`,
    `let_error`, or `let_stopped` and let <code><i>set-cpo</i></code> be the
    completion function that corresponds to <code><i>let-cpo</i></code>
    (`set_value` for `let_value`, etc.). For subexpressions `s` and `re`, let
    <code><i>inner-env</i>(s, re)</code> be an environment `e` such that:

        1. `get_domain(e)` is expression-equivalent
            <code><i>get-domain-late</i>(s, re)</code>,
        
        2. `get_scheduler(e)` is expression-equivalent to the first
            well-formed expression below:

            - <code>get_completion_scheduler&lt;<i>set-cpo-t</i>>(get_env(s))</code>,
                where <code><i>set-cpo-t</i></code> is the type of
                <code><i>set-cpo</i></code>.
            
            - `get_scheduler(re)`
            
            or if neither of them are, `get_scheduler(e)` is ill-formed.

        3. For all other query objects <code><i>Q</i></code> and arguments `args...`,
            <code><i>Q</i>(e, args...)</code> is expression-equivalent to
            <code><i>Q</i>(re, args...)</code>.

3. The names `let_value`, `let_error`, and `let_stopped` denote customization
    point objects. For subexpressions `s` and `f`, let `S` be `decltype((s))`,
    let `F` be the decayed type of `f`, and let `f2` be an xvalue that refers to
    an object decay-copied from `f`. If `S` does not satisfy `sender` or if `F`
    does not satisfy <code><i>movable-value</i></code>, the expression
    <code><i>let-cpo</i>(s, f)</code> is ill-formed. If `F` does not satisfy
    `invocable`, the expression `let_stopped(s, f)` is ill-formed. Otherwise,
    the expression <code><i>let-cpo</i>(s, f)</code> is expression-equivalent
    to:

      <pre highlight="c++">
      transform_sender(
        <i>get-domain-early</i>(s),
        <i>make-let-sender</i>(f, s));
      </pre>

    where <code><i>make-let-sender</i>(f, s)</code> is expression-equivalent to
    <code><i>make-sender</i>(<i>let-cpo</i>, f, s)</code> and returns a sender
    object `s2` that behaves as follows:

    1. When `s2` is connected to some receiver `out_r`, it:

        1. Decay-copies `out_r` into `op_state2` (see below). `out_r2` is an
            xvalue referring to the copy of `out_r`. 

        2. Constructs a receiver `r` such that:

            1. When <code><i>set-cpo</i>(r, args...)</code> is called, the
                receiver `r` decay-copies `args...` into `op_state2` as
                `args2...`, then calls `invoke(f2, args2...)`, resulting in a
                sender `s3`. It then calls `connect(s3, out_r3)`, resulting in
                an operation state `op_state3`, where `out_r3` is a receiver
                described below. `op_state3` is saved as a part of `op_state2`.
                It then calls `start(op_state3)`. If any of these throws an
                exception, it catches it and calls `set_error(out_r2,
                current_exception())`. If any of these expressions would be
                ill-formed, <code><i>set-cpo</i>(r, args...)</code> is
                ill-formed.

            2. <code><i>CF</i>(r, args...)</code> is expression-equivalent to
                <code><i>CF</i>(out_r2, args...)</code>, where
                <code><i>CF</i></code> is a completion function other than
                <code><i>set-cpo</i></code>.
            
            3. `get_env(r)` is expression-equivalent to `get_env(out_r)`.

            4. `out_r3` is a receiver that forwards its completion operations
                to `out_r2` and for which `get_env(out_r3)` returns
                <code><i>inner-env</i>(get_env(s), get_env(out_r2))</code>.

         2. Calls `connect(s, r)` resulting in an operation state `op_state2`.
             If the expression `connect(s, r)` is ill-formed, `connect(s2,
             out_r)` is ill-formed.
        
        3. Returns an operation state `op_state` that stores `op_state2`.
            `start(op_state)` is expression-equivalent to `start(op_state2)`.

5. Let `s` and `e` be subexpressions such that `S` is `decltype((s))` and `E` is
    `decltype((e))`. If <code><i>sender-for</i>&lt;S, <i>let-cpo-t</i>></code> is `false` where
    <code><i>let-cpo-t</i></code> is the type of <code><i>let-cpo</i></code>, then the expression
    <code><i>let-cpo-t</i>().transform_env(s, e)</code> is ill-formed. Otherwise, it is equal
    to <code><i>inner-env</i>(get_env(s), e)</code>.

6. If a sender `S` returned from <code><i>let-cpo</i>(s, f)</code> is connected to a
    receiver `R` with environment `E` such that
    <code>transform_sender(<i>get-domain-late</i>(S, E), S, E)</code> does not return a
    sender that:
  
        - invokes `f` when <code><i>set-cpo</i></code> is called with `s`'s result datums,
        
        - makes its completion dependent on the completion of a sender returned by
            `f`, and

        - propagates the other completion operations sent by `s`,
    
    the behavior of calling `connect(S, R)` is undefined.

#### `execution::bulk` <b>[exec.bulk]</b> #### {#spec-execution.senders.adapt.bulk}

1. `bulk` runs a task repeatedly for every index in an index space.

2. The name `bulk` denotes a customization point object. For some
    subexpressions `s`, `shape`, and `f`, let `S` be `decltype((s))`, `Shape` be
    `decltype((shape))`, and `F` be `decltype((f))`. If `S` does not satisfy
    `sender` or `Shape` does not satisfy `integral`,
    `bulk` is ill-formed. Otherwise, the expression
    `bulk(s, shape, f)` is expression-equivalent to:

      <pre highlight="c++">
      transform_sender(
        <i>get-domain-early</i>(s),
        <i>make-bulk-sender</i>(<i>tuple-like</i>{shape, f}, s));
      </pre>

    where <code><i>make-bulk-sender</i>(t, s)</code> is expression-equivalent to
    <code><i>make-sender</i>(bulk, t, s)</code> for a subexpression `t` and
    returns a sender object `s2` that behaves as follows:

    3. When `s2` is connected with a receiver `out_r`, it:

        1. Constructs a receiver `r`:

            1. When `set_value(r, args...)` is called, calls `f(i, args...)` for
                each `i` of type `Shape` from `0` to `shape`, then calls
                `set_value(out_r, args...)`. If any of these throws an
                exception, it catches it and calls `set_error(out_r,
                current_exception())`. If any of these expressions are
                ill-formed, `set_value(r, args...)` is ill-formed.

            2. When `set_error(r, e)` is called, calls `set_error(out_r, e)`.

            3. When `set_stopped(r)` is called, calls `set_stopped(out_r, e)`.

        2. Calls `connect(s, r)`, which results in an operation state `op_state2`.

        3. Returns an operation state `op_state` that contains `op_state2`. When
            `start(op_state)` is called, calls `start(op_state2)`.

    4. Let `S` be the result of calling `bulk(s, shape, f)` or a copy of such.
        If `S` is connected to a receiver `R` with environment `E` such that
        <code>transform_sender(<i>get-domain-late</i>(S, E), S, E)</code> does
        not return a sender that invokes `f(i, args...)` for each `i` of type
        `Shape` from `0` to `shape` where `args` is a pack of subexpressions
        referring to the value completion result datums of the input sender, or
        does not execute a value completion operation with said datums, the
        behavior of calling `connect(S, R)` is undefined.

#### `execution::split` <b>[exec.split]</b> #### {#spec-execution.senders.adapt.split}

1. `split` adapts an arbitrary sender into a sender that can be connected multiple times.

2. Let <code><i>split-env</i></code> be the type of an environment such that,
    given an instance `e`, the expression `get_stop_token(e)` is well-formed and
    has type `stop_token`.

3. The name `split` denotes a customization point object. For some
    subexpression `s`, let `S` be `decltype((s))`. If
    <code>sender_in&lt;S, <i>split-env</i>></code> or
    `constructible_from<decay_t<env_of_t<S>>, env_of_t<S>>` is `false`,
    `split` is ill-formed. Otherwise, the expression
    `split(s)` is expression-equivalent to:

      <pre highlight="c++">
      transform_sender(
        <i>get-domain-early</i>(s),
        <i>make-sender</i>(split, s));
      </pre>

    1. Let `s` be a subexpression such that `S` is `decltype((s))`, and let
        `e...` be a pack of subexpressions such that `sizeof...(e) <= 1` is
        `true`. If <code><i>sender-for</i>&lt;S, split_t></code> is `false`,
        then the expression `split_t().transform_sender(s, e...)` is ill-formed;
        otherwise, it returns a sender `s2` that:

        1. Creates an object `sh_state` that contains a `stop_source`, a list of
            pointers to operation states awaiting the completion of `s`, and that
            also reserves space for storing:

            * the operation state that results from connecting `s` with `r` described below, and
            * the sets of values and errors with which `s` can complete, with
                the addition of `exception_ptr`.
            * the result of decay-copying `get_env(s)`.

        2. Constructs a receiver `r` such that:

            1. When `set_value(r, args...)` is called, decay-copies
                the expressions `args...` into `sh_state`. It then notifies all
                the operation states in `sh_state`'s list of operation states
                that the results are ready. If any exceptions are thrown, the
                exception is caught and `set_error(r,
                current_exception())` is called instead.

            2. When `set_error(r, e)` is called, decay-copies `e`
                into `sh_state`. It then notifies the operation states in
                `sh_state`'s list of operation states that the results are ready.

            3. When `set_stopped(r)` is called, notifies the
                operation states in `sh_state`'s list of operation states that
                the results are ready.

            4. `get_env(r)` is an expression <code><i>e</i></code> of type
                <code><i>split-env</i></code> such that
                <code>get_stop_token(<i>e</i>)</code> is well-formed
                and returns the results of calling `get_token()` on `sh_state`'s
                stop source.

        3. Calls `get_env(s)` and decay-copies the result into
            `sh_state`.

        4. Calls `connect(s, r)`, resulting in an operation state
            `op_state2`. `op_state2` is saved in `sh_state`.

        5. When `s2` is connected with a receiver `out_r` of type `OutR`, it
            returns an operation state object `op_state` that contains:

              * An object `out_r2` of type `OutR` decay-copied from `out_r`,
              * A reference to `sh_state`,
              * A stop callback of type
                <code>optional&lt;stop_token_of_t&lt;env_of_t&lt;OutR>>::callback_type&lt;<i>stop-callback-fn</i>>></code>,
                where <code><i>stop-callback-fn</i></code> is the unspecified
                class type:

                <pre highlight="c++">
                struct <i>stop-callback-fn</i> {
                  stop_source& <i>stop_src_</i>;
                  void operator()() noexcept {
                    <i>stop_src_</i>.request_stop();
                  }
                };
                </pre>

        6. When `start(op_state)` is called:

            * If one of `r`'s completion functions has executed, then let
                <code><i>Tag</i></code> be the completion function that was
                called. Calls <code><i>Tag</i>(out_r2, args2...)</code>,
                where `args2...` is a pack of const lvalues referencing the
                subobjects of `sh_state` that have been saved by the original
                call to <code><i>Tag</i>(r, args...)</code> and returns.

            * Otherwise, it emplace constructs the stop callback optional with
                the arguments `get_stop_token(get_env(out_r2))` and
                <code><i>stop-callback-fn</i>{<i>stop-src</i>}</code>, where
                <code><i>stop-src</i></code> refers to the stop source of
                `sh_state`.

            * Otherwise, it adds a pointer to `op_state` to the list of
                operation states in `sh_state`. If `op_state` is the first such
                state added to the list:

                  * If <code><i>stop-src</i>.stop_requested()</code> is `true`,
                      all of the operation states in `sh_state`'s list of operation
                      states are notified as if `set_stopped(r)` had
                      been called.

                  * Otherwise, `start(op_state2)` is called.

        7. When `r` completes it will notify `op_state` that the result are
            ready. Let <code><i>Tag</i></code> be whichever
            completion function was called on receiver `r`. `op_state`'s
            stop callback optional is reset. Then
            <code><i>Tag</i>(std::move(out_r2), args2...)</code> is called,
            where `args2...` is a pack of const lvalues referencing the subobjects of
            `sh_state` that have been saved by the original call to
            <code><i>Tag</i>(r, args...)</code>.

        8. Ownership of `sh_state` is shared by `s2` and by every `op_state`
            that results from connecting `s2` to a receiver.

    2. Given subexpressions `s2` where `s2` is a sender returned from `split`
        or a copy of such, `get_env(s2)` shall return an lvalue reference to the
        object in `sh_state` that was initialized with the result of `get_env(s)`.

5. Let `s` be a sender expression, `r` be an instance of the receiver type
    described above, `s2` be a sender returned from `split(s)` or a copy of
    such, `r2` is the receiver to which `s2` is connected, and `args` is the
    pack of subexpressions passed to `r`'s completion function
    <code><i>CSO</i></code> when `s` completes. `s2` shall invoke
    <code><i>CSO</i>(r2, args2...)</code> where `args2` is a pack of const
    lvalue references to objects decay-copied from `args`, or by calling
    <code>set_error(r2, e2)</code> for some subexpression `e2`. The objects
    passed to `r2`'s completion operation shall be valid until after the
    completion of the invocation of `r2`'s completion operation.

#### `execution::when_all` <b>[exec.when.all]</b> #### {#spec-execution.senders.adaptor.when_all}

1. `when_all` and `when_all_with_variant` both adapt multiple input senders into
    a sender that completes when all input senders have completed. `when_all`
    only accepts senders with a single value completion signature and on success
    concatenates all the input senders' value result datums into its own value
    completion operation. `when_all_with_variant(s...)` is semantically
    equivalent to `when_all(into_variant(s)...)`, where `s` is a pack of
    subexpressions of sender types.

2. The names `when_all` and `when_all_with_variant` denote customization point
    objects. For some subexpressions <code>s<i><sub>i</sub></i>...</code>, let
    <code>S<i><sub>i</sub></i>...</code> be
    <code>decltype((s<i><sub>i</sub></i>))...</code>. The expressions
    <code>when_all(s<i><sub>i</sub></i>...)</code> and
    <code>when_all_with_variant(s<i><sub>i</sub></i>...)</code> are ill-formed if
    any of the following is true:

      * If the number of subexpressions <code>s<i><sub>i</sub></i>...</code> is 0, or

      * If any type <code>S<i><sub>i</sub></i></code> does not satisfy `sender`.

      * If the expression <code><i>get-domain-early</i>(s<sub><i>i</i></sub>)</code> has a
        different type for any other value of <code><i>i</i></code>.

    Otherwise, those expressions have the semantics specified below.

3. The expression <code>when_all(s<i><sub>i</sub></i>...)</code> is
    expression-equivalent to:

      <pre highlight="c++">
      transform_sender(
        <i>get-domain-early</i>(s<sub><i>0</i></sub>),
        <i>make-when-all-sender</i>(s<sub><i>0</i></sub>, ... s<sub><i>n-1</i></sub>));
      </pre>

    where <code><i>make-when-all-sender</i>(s<sub><i>i</i></sub>...)</code> is
    expression-equivalent to <code><i>make-sender</i>(when_all,
    <i>unspecified</i>, s<sub><i>i</i></sub>...)</code> and returns a sender
    object `w` of type `W` that behaves as follows:

    1. When `w` is connected with some receiver `out_r` of type `OutR`, it
        returns an operation state `op_state` specified as below:

        1. For each sender <code>s<i><sub>i</sub></i></code>, constructs a
            receiver <code>r<i><sub>i</sub></i></code> such that:

            1. If <code>set_value(r<i><sub>i</sub></i>,
                t<i><sub>i</sub></i>...)</code> is called for every
                <code>r<i><sub>i</sub></i></code>, `op_state`'s associated stop
                callback optional is reset and <code>set_value(out_r,
                t<i><sub>0</sub></i>..., t<i><sub>1</sub></i>..., ...,
                t<i><sub>n-1</sub></i>...)</code> is called, where `n` the number
                of subexpressions in <code>s<i><sub>i</sub></i>...</code>.

            2. Otherwise, `set_error` or `set_stopped` was called for at least
                one receiver <code>r<i><sub>i</sub></i></code>. If the first such
                to complete did so with the call
                <code>set_error(r<i><sub>i</sub></i>, e)</code>, `request_stop`
                is called on `op_state`'s associated stop source. When all child
                operations have completed, `op_state`'s associated stop callback
                optional is reset and `set_error(out_r, e)` is called.

            3. Otherwise, `request_stop` is called on `op_state`'s associated
                stop source. When all child operations have completed,
                `op_state`'s associated stop callback optional is reset and
                `set_stopped(out_r)` is called.

            4. For each receiver <code>r<i><sub>i</sub></i></code>,
                <code>get_env(r<i><sub>i</sub></i>)</code> is an expression
                <code><i>e</i></code> such that
                <code>get_stop_token(<i>e</i>)</code> is well-formed and returns
                the results of calling `get_token()` on `op_state`'s associated
                stop source, and for which <code>tag_invoke(tag, <i>e</i>,
                args...)</code> is expression-equivalent to `tag(get_env(out_r),
                args...)` for all arguments `args...` and all `tag` whose type
                satisfies <code><i>forwarding-query</i></code> and is not
                `get_stop_token_t`.

        2. For each sender <code>s<i><sub>i</sub></i></code>, calls
            <code>connect(s<i><sub>i</sub></i>, r<i><sub>i</sub></i>)</code>,
            resulting in operation states
            <code>child_op<i><sub>i</sub></i></code>.

        3. Returns an operation state `op_state` that contains:

            * Each operation state <code>child_op<i><sub>i</sub></i></code>,

            * A stop source of type `in_place_stop_source`,

            * A stop callback of type
                <code>optional&lt;stop_token_of_t&lt;env_of_t&lt;OutR>>::callback_type&lt;<i>stop-callback-fn</i>>></code>,
                where <code><i>stop-callback-fn</i></code> is the unspecified
                class type:

                <pre highlight="c++">
                struct <i>stop-callback-fn</i> {
                  in_place_stop_source& <i>stop_src_</i>;
                  void operator()() noexcept {
                    <i>stop_src_</i>.request_stop();
                  }
                };
                </pre>

        4. When `start(op_state)` is called it:

            * Emplace constructs the stop callback optional with the arguments
                `get_stop_token(get_env(out_r))` and
                <code><i>stop-callback-fn</i>{<i>stop-src</i>}</code>, where
                <code><i>stop-src</i></code> refers to the stop source of
                `op_state`.

            * Then, it checks to see if
                <code><i>stop-src</i>.stop_requested()</code> is true. If so, it
                calls `set_stopped(out_r)`.

            * Otherwise, calls <code>start(child_op<i><sub>i</sub></i>)</code>
                for each <code>child_op<i><sub>i</sub></i></code>.

4. The expression <code>when_all_with_variant(s<sub><i>i</i></sub>...)</code> is
    expression-equivalent to:

      <pre highlight="c++">
      transform_sender(
        <i>get-domain-early</i>(s<sub><i>0</i></sub>),
        <i>make-sender</i>(when_all_with_variant, <i>unspecified</i>, s<sub><i>0</i></sub>, ... s<sub><i>n-1</i></sub>));
      </pre>

    where <code><i>make-when-all-sender</i>(s<sub><i>i</i></sub>...)</code> is
    expression-equivalent to <code><i>make-sender</i>(when_all,
    <i>unspecified</i>, s<sub><i>i</i></sub>...)</code> and returns a sender
    object `w` of type `W` that behaves as follows:

5. Let `s` and `e` be subexpressions such that `S` is `decltype((s))`. If
    <code><i>sender-for</i>&lt;S, when_all_with_variant_t></code> is `false`,
    then the expression `when_all_with_variant_t().transform_sender(s, e)` is
    ill-formed; otherwise, it is equal to:

      <pre highlight="c++">
      const auto& env = e;
      auto domain = <i>get-domain-late</i>(s, env);
      auto [tag, data, ...child] = s;
      return when_all(into_variant(std::move(child))...);
      </pre>

    <span class="wg21note">This causes the `when_all_with_variant(s...)` sender
    to become `when_all(into_variant(s)...)` when it is connected with a
    receiver whose execution domain does not customize
    `when_all_with_variant`.</span>

4. Given a pack of subexpressions `s...`, let `S` be an object returned from
    `when_all(s...)` or `when_all_with_variant(s...)` or a copy of such, and let
    `E` be the environment object returned from `get_env(S)`. Given a query
    object `Q`, `tag_invoke(Q, E)` is expression-equivalent to
    <code><i>get-domain-early</i>(s<sub><i>0</i></sub>)</code> when `Q` is
    `get_domain`; otherwise, it is ill-formed.

#### `execution::into_variant` <b>[exec.into.variant]</b> #### {#spec-execution.senders.adapt.into_variant}

1. `into_variant` adapts a sender with multiple value completion signatures into
    a sender with just one consisting of a `variant` of `tuple`s.

2. The template <code><i>into-variant-type</i></code> computes the type sent by
    a sender returned from `into_variant`.

    <pre highlight="c++">
        template&lt;class S, class E>
            requires sender_in&lt;S, E>
          using <i>into-variant-type</i> =
            value_types_of_t&lt;S, E>;
    </pre>

3. `into_variant` is a customization point object. For some subexpression `s`,
    let `S` be `decltype((s))`. If `S` does not satisfy `sender`,
    `into_variant(s)` is ill-formed. Otherwise, `into_variant(s)` is
    expression-equivalent to:

      <pre highlight="c++">
      transform_sender(
        <i>get-domain-early</i>(s),
        <i>make-into-variant-sender</i>(s))
      </pre>

    where <code><i>make-into-variant-sender</i>(s)</code> is
    expression-equivalent to <code><i>make-sender</i>(into_variant,
    <i>unspecified</i>, s)</code> and returns a sender object `s2` that behaves
    as follows:

    1. When `s2` is connected with some receiver `out_r`, it:

        1. Constructs a receiver `r` such that:

            1. If `set_value(r, ts...)` is called, calls <code>set_value(out_r,
                <i>into-variant-type</i>&lt;S,
                env_of_t&lt;decltype((r))>>(<i>decayed-tuple</i>&lt;decltype(ts)...>(ts...)))</code>.
                If this expression throws an exception, calls `set_error(out_r,
                current_exception())`.

            2. `set_error(r, e)` is expression-equivalent to `set_error(out_r,
                e)`.

            3. `set_stopped(r)` is expression-equivalent to
                `set_stopped(out_r)`.

        2. Calls `connect(s, r)`, resulting in an operation state `op_state2`.

        3. Returns an operation state `op_state` that contains `op_state2`. When
            `start(op_state)` is called, calls `start(op_state2)`.

#### `execution::stopped_as_optional` <b>[exec.stopped.as.optional]</b> #### {#spec-execution.senders.adapt.stopped_as_optional}

1. `stopped_as_optional` maps an input sender's stopped completion operation into the value completion operation as an empty optional. The input sender's value completion operation is also converted into an optional. The result is a sender that never completes with stopped, reporting cancellation by completing with an empty optional.

2. The name `stopped_as_optional` denotes a customization point object. For some subexpression `s`, let `S` be `decltype((s))`.
    The expression `stopped_as_optional(s)` is expression-equivalent to:

    <pre highlight="c++">
    transform_sender(
      <i>get-sender-domain</i>(s),
      <i>make-sender</i>(stopped_as_optional, <i>unspecified</i>, s))
    </pre>

3. Let `s` and `e` be subexpressions such that `S` is `decltype((s))` and `E` is `decltype((e))`.
    If either <code><i>sender-for</i>&lt;S, stopped_as_optional_t></code> or <code><i>single-sender</i>&lt;S, E></code> is `false`
    then the expression `stopped_as_optional_t().transform_sender(s, e)` is ill-formed; otherwise, it is equal to:

    <pre highlight="c++">
    const auto& env = e;
    auto domain = <i>get-env-domain</i>(env);
    auto [tag, data, child] = s;
    using V = <i>single-sender-value-type</i>&lt;S, E>;
    return let_stopped(
        then(std::move(child),
                  []&lt;class T>(T&& t) { return optional<V>(std::forward<T>(t)); }),
        []() noexcept { return just(optional<V>()); });
    </pre>

#### `execution::stopped_as_error` <b>[exec.stopped.as.error]</b> #### {#spec-execution.senders.adapt.stopped_as_error}

1. `stopped_as_error` maps an input sender's stopped completion operation into
    an error completion operation as a custom error type. The result is a sender
    that never completes with stopped, reporting cancellation by completing with
    an error.

2. The name `stopped_as_error` denotes a customization point object. For some subexpressions `s` and `e`, let `S` be `decltype((s))` and let `E` be `decltype((e))`. If the type `S` does not satisfy `sender` or if the type `E` doesn't satisfy <code><i>movable-value</i></code>, `stopped_as_error(s, e)` is ill-formed. Otherwise, the expression `stopped_as_error(s, e)` is expression-equivalent to:

    <pre highlight="c++">
    transform_sender(
      <i>get-sender-domain</i>(s),
      <i>make-sender</i>(stopped_as_error, e, s))
    </pre>

3. Let `s` and `e` be subexpressions such that `S` is `decltype((s))` and `E` is `decltype((e))`.
    If <code><i>sender_for</i>&lt;S, stopped_as_error_t></code> is `false`, then the expression
    `stopped_as_error_t().transform_sender(s, e)` is ill-formed; otherwise, it is equal to:

    <pre highlight="c++">
    const auto& env = e;
    auto domain = <i>get-env-domain</i>(env);
    auto [tag, data, child] = s;
    return let_stopped(
        std::move(child),
        [err = std::move(data)]() mutable { return just_error(std::move(err)); });
    </pre>

#### `execution::ensure_started` <b>[exec.ensure.started]</b> #### {#spec-execution.senders.adapt.ensure_started}

1. `ensure_started` eagerly starts the execution of a sender, returning a sender
    that is usable as intput to additional sender algorithms.

2. Let <code><i>ensure-started-env</i></code> be the type of an execution
    environment such that, given an instance `e`, the expression
    `get_stop_token(e)` is well-formed and has type `stop_token`.

3. The name `ensure_started` denotes a customization point object.
    For some subexpression `s`, let `S` be `decltype((s))`. If
    <code>sender_in&lt;S, <i>ensure-started-env</i>></code> or
    `constructible_from<decay_t<env_of_t<S>>, env_of_t<S>>` is
    `false`, `ensure_started(s)` is ill-formed. Otherwise, the
    expression `ensure_started(s)` is expression-equivalent to:

    <pre highlight="c++">
    transform_sender(
      <i>get-sender-domain</i>(s),
      <i>make-sender</i>(ensure_started, s))
    </pre>

    1. Let `s` be a subexpression such that `S` is `decltype((s))`, and let `e...` be a pack of
        subexpressions such that <code>sizeof...(e) &lt;= 1</code> is `true`.
        If <code><i>sender-for</i>&lt;S, ensure_started_t></code> is `false`, then the expression
        `ensure_started_t().transform_sender(s, e...)` is ill-formed; otherwise, it returns
        a sender `s2`, that:

        1. Creates an object `sh_state` that contains a `stop_source`, an
            initially null pointer to an operation state awaitaing completion,
            and that also reserves space for storing:

            * the operation state that results from connecting `s` with `r` described below, and
            * the sets of values and errors with which `s` can complete, with
                the addition of `exception_ptr`.
            * the result of decay-copying `get_env(s)`.

            `s2` shares ownership of `sh_state` with `r` described below.

        2. Constructs a receiver `r` such that:

            1. When `set_value(r, args...)` is called, decay-copies
                the expressions `args...` into `sh_state`. It then checks
                `sh_state` to see if there is an operation state awaiting
                completion; if so, it notifies the operation state that the
                results are ready. If any exceptions are thrown, the exception
                is caught and `set_error(r, current_exception())` is
                called instead.

            2. When `set_error(r, e)` is called, decay-copies `e`
                into `sh_state`. If there is an operation state awaiting completion,
                it then notifies the operation state that the results are ready.

            3. When `set_stopped(r)` is called, it then notifies any
                awaiting operation state that the results are ready.

            4. `get_env(r)` is an expression <code><i>e</i></code> of type
                <code><i>ensure-started-env</i></code> such that
                <code>get_stop_token(<i>e</i>)</code> is well-formed
                and returns the results of calling `get_token()` on `sh_state`'s
                stop source.

            5. `r` shares ownership of `sh_state` with `s2`. After `r`
                has been completed, it releases its ownership of `sh_state`.

        3. Calls `get_env(s)` and decay-copies the result into
            `sh_state`.

        4. Calls `connect(s, r)`, resulting in an operation state
            `op_state2`. `op_state2` is saved in `sh_state`. It then calls
            `start(op_state2)`.

        5. When `s2` is connected with a receiver `out_r` of type `OutR`, it
            returns an operation state object `op_state` that contains:

              * An object `out_r2` of type `OutR` decay-copied from `out_r`,
              * A reference to `sh_state`,
              * A stop callback of type
                <code>optional&lt;stop_token_of_t&lt;env_of_t&lt;OutR>>::callback_type&lt;<i>stop-callback-fn</i>>></code>,
                where <code><i>stop-callback-fn</i></code> is the unspecified
                class type:

                <pre highlight="c++">
                struct <i>stop-callback-fn</i> {
                  stop_source& <i>stop_src_</i>;
                  void operator()() noexcept {
                    <i>stop_src_</i>.request_stop();
                  }
                };
                </pre>

              `s2` transfers its ownership of `sh_state` to `op_state`.

        6. When `start(op_state)` is called:

            * If `r` has already been completed, then let
                <code><i>CF</i></code> be whichever completion function
                was used to complete `r`. Calls
                <code><i>CF</i>(out_r2, args2...)</code>, where `args2...` is a
                pack of xvalues referencing the subobjects of `sh_state` that have
                been saved by the original call to <code><i>CF</i>(r,
                args...)</code> and returns.

            * Otherwise, it emplace constructs the stop callback optional with
                the arguments `get_stop_token(get_env(out_r2))` and
                <code><i>stop-callback-fn</i>{<i>stop-src</i>}</code>, where
                <code><i>stop-src</i></code> refers to the stop source of
                `sh_state`.

            * Then, it checks to see if
                <code><i>stop-src</i>.stop_requested()</code> is `true`. If so, it
                calls `set_stopped(out_r2)`.

            * Otherwise, it sets `sh_state` operation state pointer to the
                address of `op_state`, registering itself as awaiting the result
                of the completion of `r`.

        7. When `r` completes it will notify `op_state` that the result are
            ready. Let <code><i>CF</i></code> be whichever
            completion function was used to complete `r`. `op_state`'s stop
            callback optional is reset. Then
            <code><i>CF</i>(std::move(out_r2), args2...)</code> is called,
            where `args2...` is a pack of xvalues referencing the subobjects of
            `sh_state` that have been saved by the original call to
            <code><i>CF</i>(r, args...)</code>.

        8. [*Note:* If sender `s2` is destroyed without being connected to a
            receiver, or if it is connected but the operation state is destroyed
            without having been started, then when `r`
            completes and it releases its shared ownership of `sh_state`,
            `sh_state` will be destroyed and the results of the operation are
            discarded. -- *end note*]

    4. Given a subexpression `s`, let `s2` be the result of `ensure_started(s)`.
        The result of `get_env(s2)` shall return an lvalue reference to the
        object in `sh_state` that was initialized with the result of `get_env(s)`.

  4. Let `s` be a sender expression, `r` be an instance of the receiver type
      described above, `s2` be a sender returned
      from `ensure_started(s)` or a copy of such, `r2` is the receiver
      to which `s2` is connected, and `args` is the pack of subexpressions
      passed to `r`'s completion function <code><i>CSO</i></code>
      when `s` completes. `s2` shall invoke <code><i>CSO</i>(r2, args2...)</code> where
      `args2` is a pack of xvalue references to objects decay-copied from
      `args`, or by calling <code>set_error(r2, e2)</code> for some subexpression
      `e2`. The objects passed to `r2`'s completion operation shall
      be valid until after the completion of the invocation of `r2`'s completion
      operation.

### Sender consumers <b>[exec.consumers]</b> ### {#spec-execution.senders.consumers}

#### `execution::start_detached` <b>[exec.start.detached]</b> #### {#spec-execution.senders.consumers.start_detached}

1. `start_detached` eagerly starts a sender without the caller needing to manage
    the lifetimes of any objects.

2. The name `start_detached` denotes a customization point object. For some
    subexpression `s`, let `S` be `decltype((s))`. If `S` does not satisfy
    <code>sender_in&lt;empty_env></code>, `start_detached` is ill-formed.
    Otherwise, the expression `start_detached(s)` is expression-equivalent to:

    <pre highlight="c++">
    apply_sender(<i>get-sender-domain</i>(s), start_detached, s)
    </pre>
    
    * <i>Mandates:</i> The type of the expression above is `void`.

    If the expression above does not eagerly start the sender `s` after
    connecting it with a receiver that ignores value and stopped completion
    operations and calls `terminate()` on error completions, the behavior of
    calling `start_detached(s)` is undefined.

3. Let `s` be a subexpression such that `S` is `decltype((s))`, and let
    <code><i>detached-receiver</i></code> and
    <code><i>detached-operation</i></code> be the following exposition-only
    class types:

    <pre highlight="c++">
    struct <i>detached-receiver</i> {
      using is_receiver = <i>unspecified</i>;
      <i>detached-operation</i>* <i>op</i>; // <i>exposition only</i>

      friend void tag_invoke(set_value_t, <i>detached-receiver</i>&& self) noexcept { delete self.op; }
      friend void tag_invoke(set_error_t, <i>detached-receiver</i>&&, auto&&) noexcept { terminate(); }
      friend void tag_invoke(set_stopped_t, <i>detached-receiver</i>&& self) noexcept { delete self.op; }
      friend empty_env tag_invoke(get_env_t, const <i>detached-receiver</i>&) { return {}; }
    };

    struct <i>detached-operation</i> {
      connect_result_t&lt;S, <i>detached-receiver</i>> op; // <i>exposition only</i>

      explicit <i>detached-operation</i>(S&& s)
        : op(connect(std::forward&lt;S>(s), <i>detached-receiver</i>{this}))
      {}
    };
    </pre>

4. If <code>sender_to&lt;S, <i>detached-receiver</i>></code> is `false`, the
    expression `start_detached.apply_sender(s)` is ill-formed; otherwise, it is
    expression-equivalent to <code>start(*new
    <i>detached-operation</i>(s))</code>.

#### `this_thread::sync_wait` <b>[exec.sync.wait]</b> #### {#spec-execution.senders.consumers.sync_wait}

1. `this_thread::sync_wait` and `this_thread::sync_wait_with_variant` are used
    to block a current thread until a sender passed into it as an argument has
    completed, and to obtain the values (if any) it completed with. `sync_wait`
    requires that the input sender has exactly one value completion signature.

2. For any receiver `r` created by an implementation of `sync_wait` and
    `sync_wait_with_variant`, the expressions `get_scheduler(get_env(r))` and
    `get_delegatee_scheduler(get_env(r))` shall be well-formed. For a receiver
    created by the default implementation of `this_thread::sync_wait`, these
    expressions shall return a scheduler to the same thread-safe,
    first-in-first-out queue of work such that tasks scheduled to the queue
    execute on the thread of the caller of `sync_wait`. [<i>Note:</i> The
    scheduler for an instance of `run_loop` that is a local variable
    within `sync_wait` is one valid implementation. -- <i>end note</i>]

3. The templates <code><i>sync-wait-type</i></code> and
    <code><i>sync-wait-with-variant-type</i></code> are used to determine the
    return types of `this_thread::sync_wait` and
    `this_thread::sync_wait_with_variant`. Let <code><i>sync-wait-env</i></code>
    be the type of the expression `get_env(r)` where `r` is an instance of the
    receiver created by the default implementation of `sync_wait`.

    <pre highlight="c++">
    template&lt;sender_in&lt;<i>sync-wait-env</i>> S>
      using <i>sync-wait-type</i> =
        optional&lt;value_types_of_t&lt;S, <i>sync-wait-env</i>, <i>decayed-tuple</i>, type_identity_t>>;

    template&lt;sender_in&lt;<i>sync-wait-env</i>> S>
      using <i>sync-wait-with-variant-type</i> = optional&lt;<i>into-variant-type</i>&lt;S, <i>sync-wait-env</i>>>;
    </pre>

4. The name `this_thread::sync_wait` denotes a customization point object. For
    some subexpression `s`, let `S` be `decltype((s))`. If
    <code>sender_in&lt;S, <i>sync-wait-env</i>></code> is `false`,
    or if the type `completion_signatures_of_t&lt;S, <i>sync-wait-env</i>, <i>type-list</i>, type_identity_t>` is ill-formed,
    `this_thread::sync_wait(s)` is ill-formed.
    Otherwise, `this_thread::sync_wait(s)` is expression-equivalent to:

    <pre highlight="c++">
    apply_sender(<i>get_sender-domain</i>(s), sync_wait, s)
    </pre>

    * <i>Mandates:</i> The type of the expression above is <code><i>sync-wait-type</i>&lt;S, <i>sync-wait-env</i>></code>.

5. Let <code><i>sync-wait-receiver</i></code> be a class type that satisfies `receiver`, let `r` be an xvalue of that type,
    and let `cr` be a const lvalue refering to `r` such that `get_env(cr)` has type <code><i>sync-wait-env</i></code>.
    If <code>sender_in&lt;S, <i>sync-wait-env</i>></code> is `false`, or if the type
    <code>completion_signatures_of_t&lt;S, <i>sync-wait-env</i>, <i>type-list</i>, type_identity_t></code> is ill-formed,
    the expression `sync_wait_t().apply_sender(s)` is ill-formed; otherwise it has the following effects:

        1. Calls `connect(s, r)`, resulting in an operation state `op_state`, then calls `start(op_state)`.

        2. Blocks the current thread until a completion operation of `r` is executed. When it is:

            1. If `set_value(r, ts...)` has been called, returns <code><i>sync-wait-type</i>&lt;S, <i>sync-wait-env</i>>{<i>decayed-tuple</i>&lt;decltype(ts)...>{ts...}}</code>. If that expression exits exceptionally, the exception is propagated to the caller of `sync_wait`.

            2. If `set_error(r, e)` has been called, let `E` be the decayed type of `e`. If `E` is `exception_ptr`, calls `std::rethrow_exception(e)`. Otherwise, if the `E` is `error_code`, throws `system_error(e)`. Otherwise, throws `e`.

            3. If `set_stopped(r)` has been called, returns <code><i>sync-wait-type</i>&lt;S, <i>sync-wait-env</i>>{}</code>.

6. The name `this_thread::sync_wait_with_variant` denotes a customization point
    object. For some subexpression `s`, let `S` be the type of
    `into_variant(s)`. If <code>sender_in&lt;S,
    <i>sync-wait-env</i>></code> is `false`,
    `this_thread::sync_wait_with_variant(s)` is ill-formed. Otherwise,
    `this_thread::sync_wait_with_variant(s)` is expression-equivalent to:

    <pre highlight="c++">
    apply_sender(<i>get-sender-domain</i>(s), sync_wait_with_variant, s)
    </pre>

    * <i>Mandates:</i> The type of the expression above is <code><i>sync-wait-with-variant-type</i>&lt;S, <i>sync-wait-env</i>></code>.

7. The expression `sync_wait_with_variant_t().apply_sender(s)` is expression-equivalent to `this_thread::sync_wait(into_variant(s))`.

## `execution::execute` <b>[exec.execute]</b> ## {#spec-execution.execute}

1. `execute` creates fire-and-forget tasks on a specified scheduler.

2. The name `execute` denotes a customization point object. For some subexpressions `sch` and `f`, let `Sch` be `decltype((sch))` and `F` be `decltype((f))`. If `Sch` does not satisfy `scheduler` or `F` does not satisfy `invocable`,
    `execute(sch, f)` is ill-formed. Otherwise, `execute(sch, f)` is expression-equivalent to:

    <pre highlight="c++">
    apply_sender(
      <i>query-or-default</i>(get_domain, sch, default_domain()),
      execute, schedule(sch), f)
    </pre>
    * <i>Mandates:</i> The type of the expression above is `void`.

3. For some subexpressions `s` and `f` where `F` is `decltype((f))`,
    if `F` does not satisfy `invocable`, the expression
    `execute_t().apply_sender(s, f)` is ill-formed; otherwise it is
    expression-equivalent to `start_detached(then(s, f))`.

## Sender/receiver utilities <b>[exec.utils]</b> ## {#spec-execution.snd_rec_utils}

1. This subclause makes use of the following exposition-only entities:

    <pre highlight="c++">
    // [<i>Editorial note:</i> copy_cvref_t as in [[P1450R3]] -- <i>end note</i>]
    // Mandates: is_base_of_v&lt;T, remove_reference_t&lt;U>> is true
    template&lt;class T, class U>
      copy_cvref_t&lt;U&amp;&amp;, T> <i>c-style-cast</i>(U&amp;&amp; u) noexcept requires <i>decays-to</i>&lt;T, T> {
        return (copy_cvref_t&lt;U&amp;&amp;, T>) std::forward&lt;U>(u);
      }
    </pre>

2. [<i>Note:</i> The C-style cast in <tt><i>c-style-cast</i></tt> is to disable accessibility checks. -- <i>end note</i>]

### `execution::receiver_adaptor` <b>[exec.utils.rcvr.adptr]</b> ### {#spec-execution.snd_rec_utils.rcvr_adptr}

    <pre highlight="c++">
    template&lt;
        <i>class-type</i> Derived,
        receiver Base = <i>unspecified</i>> // arguments are not associated entities ([lib.tmpl-heads])
      class receiver_adaptor;
    </pre>

1. `receiver_adaptor` simplifies the implementation of one receiver type in terms of another. It defines `tag_invoke` overloads that forward to named members if they exist, and to the adapted receiver otherwise.

2. If `Base` is an alias for the unspecified default template argument, then:

    - Let <code><i>HAS-BASE</i></code> be `false`, and
    - Let <code><i>GET-BASE</i>(d)</code> be `d.base()`.

    otherwise, let:

    - Let <code><i>HAS-BASE</i></code> be `true`, and
    - Let <code><i>GET-BASE</i>(d)</code> be <code><i>c-style-cast</i>&lt;receiver_adaptor&lt;Derived, Base>>(d).base()</code>.

    Let <code><i>BASE-TYPE</i>(D)</code> be the type of <code><i>GET-BASE</i>(declval&lt;D>())</code>.

3. `receiver_adaptor<Derived, Base>` is equivalent to the following:

    <pre highlight="c++">
    template&lt;
      <i>class-type</i> Derived,
      receiver Base = <i>unspecified</i>> // arguments are not associated entities ([lib.tmpl-heads])
    class receiver_adaptor {
      friend Derived;
     public:
      using is_receiver = <i>unspecified</i>;

      // Constructors
      receiver_adaptor() = default;
      template&lt;class B>
          requires <i>HAS-BASE</i> && constructible_from&lt;Base, B>
        explicit receiver_adaptor(B&& base) : base_(std::forward&lt;B>(base)) {}

     private:
      using set_value = <i>unspecified</i>;
      using set_error = <i>unspecified</i>;
      using set_stopped = <i>unspecified</i>;
      using get_env = <i>unspecified</i>;

      // Member functions
      template&lt;class Self>
        requires <i>HAS-BASE</i>
      decltype(auto) base(this Self&amp;&amp; self) noexcept {
        return (std::forward&lt;Self>(self).base_);
      }

      // [exec.utils.rcvr.adptr.nonmembers] Non-member functions
      template&lt;class... As>
        friend void tag_invoke(set_value_t, Derived&amp;&amp; self, As&amp;&amp;... as) noexcept;

      template&lt;class E>
        friend void tag_invoke(set_error_t, Derived&amp;&amp; self, E&amp;&amp; e) noexcept;

      friend void tag_invoke(set_stopped_t, Derived&amp;&amp; self) noexcept;

      friend decltype(auto) tag_invoke(get_env_t, const Derived&amp; self)
          noexcept(<i>see below</i>);

      [[no_unique_address]] Base base_; // present if and only if <i>HAS-BASE</i> is true
    };
    </pre>

4. [<i>Note:</i> `receiver_adaptor` provides `tag_invoke` overloads on behalf of
    the derived class `Derived`, which is incomplete when `receiver_adaptor` is
    instantiated.]

5. [<i>Example:</i>
     <pre highlight="c++">
     using _int_completion =
       completion_signatures&lt;set_value_t(int)>;

     template&lt;receiver_of&lt;_int_completion> R>
       class my_receiver : receiver_adaptor&lt;my_receiver&lt;R>, R> {
         friend receiver_adaptor&lt;my_receiver, R>;
         void set_value() && {
           set_value(std::move(*this).base(), 42);
         }
        public:
         using receiver_adaptor&lt;my_receiver, R>::receiver_adaptor;
       };
     </pre>
     -- <i>end example</i>]

#### Non-member functions <b>[exec.utils.rcvr.adptr.nonmembers]</b> #### {#spec-execution.snd_rec_utils.receiver_adaptor.nonmembers}

    <pre highlight="c++">
    template&lt;class... As>
      friend void tag_invoke(set_value_t, Derived&amp;&amp; self, As&amp;&amp;... as) noexcept;
    </pre>

    1. Let `SET-VALUE-MBR` be the expression `std::move(self).set_value(std::forward<As>(as)...)`.

    2. <i>Constraints:</i> Either `SET-VALUE-MBR` is a valid expression or `typename Derived::set_value` denotes a type and <code><i>callable</i>&lt;set_value_t, <i>BASE-TYPE</i>(Derived), As...></code> is `true`.

    3. <i>Mandates:</i> `SET-VALUE-MBR`, if that expression is valid, is not potentially-throwing.

    4. <i>Effects:</i> Equivalent to:

        * If `SET-VALUE-MBR` is a valid expression, `SET-VALUE-MBR`;

        * Otherwise, <code>set_value(<i>GET-BASE</i>(std::move(self)), std::forward&lt;As>(as)...)</code>.

    <pre highlight="c++">
    template&lt;class E>
      friend void tag_invoke(set_error_t, Derived&amp;&amp; self, E&amp;&amp; e) noexcept;
    </pre>

    1. Let `SET-ERROR-MBR` be the expression `std::move(self).set_error(std::forward<E>(e))`.

    2. <i>Constraints:</i> Either `SET-ERROR-MBR` is a valid expression or `typename Derived::set_error` denotes a type and <code><i>callable</i>&lt;set_error_t, <i>BASE-TYPE</i>(Derived), E></code> is `true`.

    3. <i>Mandates:</i> `SET-ERROR-MBR`, if that expression is valid, is not potentially-throwing.

    4. <i>Effects:</i> Equivalent to:

        * If `SET-ERROR-MBR` is a valid expression, `SET-ERROR-MBR`;

        * Otherwise, <code>set_error(<i>GET-BASE</i>(std::move(self)), std::forward&lt;E>(e))</code>.

    <pre highlight="c++">
    friend void tag_invoke(set_stopped_t, Derived&amp;&amp; self) noexcept;
    </pre>

    1. Let `SET-STOPPED-MBR` be the expression `std::move(self).set_stopped()`.

    2. <i>Constraints:</i> Either `SET-STOPPED-MBR` is a valid expression or `typename Derived::set_stopped` denotes a type and <code><i>callable</i>&lt;set_stopped_t, <i>BASE-TYPE</i>(Derived)></code> is `true`.

    3. <i>Mandates:</i> `SET-STOPPED-MBR`, if that expression is valid, is not potentially-throwing.

    4. <i>Effects:</i> Equivalent to:

        * If `SET-STOPPED-MBR` is a valid expression, `SET-STOPPED-MBR`;

        * Otherwise, <code>set_stopped(<i>GET-BASE</i>(std::move(self)))</code>.

    <pre highlight="c++">
    friend decltype(auto) tag_invoke(get_env_t, const Derived&amp; self)
      noexcept(<i>see below</i>);
    </pre>

    1. <i>Constraints:</i> Either `self.get_env()` is a valid expression or `typename Derived::get_env` denotes a type and <code><i>callable</i>&lt;get_env_t, <i>BASE-TYPE</i>(const Derived&amp;)></code> is `true`.

    2. <i>Effects:</i> Equivalent to:

        * If `self.get_env()` is a valid expression, `self.get_env()`;

        * Otherwise, <code>std::get_env(<i>GET-BASE</i>(self))</code>.

    3. <i>Remarks:</i> The expression in the `noexcept` clause is:

        * If `self.get_env()` is a valid expression, `noexcept(self.get_env())`;

        * Otherwise, <code>noexcept(std::get_env(<i>GET-BASE</i>(self)))</code>.

### `execution::completion_signatures` <b>[exec.utils.cmplsigs]</b> ### {#spec-execution.snd_rec_utils.completion_sigs}

1. `completion_signatures` is a type that encodes a set of completion signatures
    ([async.ops]).

2. [<i>Example:</i>
     <pre highlight="c++">
      class my_sender {
        using completion_signatures =
          completion_signatures&lt;
            set_value_t(),
            set_value_t(int, float),
            set_error_t(exception_ptr),
            set_error_t(error_code),
            set_stopped_t()>;
      };

      // Declares my_sender to be a sender that can complete by calling
      // one of the following for a receiver expression R:
      //    set_value(R)
      //    set_value(R, int{...}, float{...})
      //    set_error(R, exception_ptr{...})
      //    set_error(R, error_code{...})
      //    set_stopped(R)
     </pre>
     -- <i>end example</i>]

3. This subclause makes use of the following exposition-only entities:

    <pre highlight="c++">
    template&lt;class Fn>
      concept <i>completion-signature</i> = <i>see below</i>;

    template&lt;bool>
      struct <i>indirect-meta-apply</i> {
        template&lt;template&lt;class...> class T, class... As>
          using <i>meta-apply</i> = T&lt;As...>; <i>// exposition only</i>
      };

    template&lt;class...>
      concept <i>always-true</i> = true; <i>// exposition only</i>
    </pre>

    1. A type `Fn` satisfies <code><i>completion-signature</i></code> if and only if it is a function type with one of the following forms:

        * <code>set_value_t(<i>Vs</i>...)</code>, where <code><i>Vs</i></code> is an arbitrary parameter pack.
        * <code>set_error_t(<i>E</i>)</code>, where <code><i>E</i></code> is an arbitrary type.
        * `set_stopped_t()`

    <pre highlight="c++">
    template&lt;class Tag,
              <i>valid-completion-signatures</i> Completions,
              template&lt;class...> class Tuple,
              template&lt;class...> class Variant>
      using <i>gather-signatures</i> = <i>see below</i>;
    </pre>

    2. Let `Fns...` be a template parameter pack of the arguments of the
        `completion_signatures` specialization named by
        `Completions`, let <code><i>TagFns</i></code> be a
        template parameter pack of the function types in `Fns` whose return types
        are `Tag`, and let
        <code><i>Ts<i><sub>n</sub></i></i></code> be a template parameter
        pack of the function argument types in the <code><i>n</i></code>-th type
        in <code><i>TagFns</i></code>. Then, given two variadic templates
        <code><i>Tuple</i></code> and <code><i>Variant</i></code>, the type
        <code><i>gather-signatures</i>&lt;Tag, Completions, <i>Tuple</i>, <i>Variant</i>></code>
        names the type
        <code><i>META-APPLY</i>(<i>Variant</i>, <i>META-APPLY</i>(<i>Tuple</i>, <i>Ts<i><sub>0</sub></i></i>...),
        <i>META-APPLY</i>(<i>Tuple</i>, <i>Ts<i><sub>1</sub></i></i>...), ...
        <i>META-APPLY</i>(<i>Tuple</i>, <i>Ts<i><sub>m-1</sub></i></i>...))</code>, where
        <code><i>m</i></code> is the size of the parameter pack
        <code><i>TagFns</i></code> and <code><i>META-APPLY</i>(<i>T</i>, <i>As</i>...)</code> is
        equivalent to:

        <pre highlight="c++">
        typename <i>indirect-meta-apply</i>&lt;<i>always-true</i>&lt;<i>As</i>...>>::template <i>meta-apply</i>&lt;<i>T</i>, <i>As</i>...>;
        </pre>

    3. <span class="wg21note">The purpose of <code><i>META-APPLY</i></code> is to make it
        valid to use non-variadic templates as <code><i>Variant</i></code> and <code><i>Tuple</i></code> arguments to <code><i>gather-signatures</i></code>.</span>

4.  <pre highlight="c++">
    template&lt;<i>completion-signature</i>... Fns>
      struct completion_signatures {};

    template&lt;class S,
              class E = empty_env,
              template&lt;class...> class Tuple = <i>decayed-tuple</i>,
              template&lt;class...> class Variant = <i>variant-or-empty</i>>
        requires sender_in&lt;S, E>
      using value_types_of_t =
          <i>gather-signatures</i>&lt;set_value_t, completion_signatures_of_t&lt;S, E>, Tuple, Variant>;

    template&lt;class S,
              class E = empty_env,
              template&lt;class...> class Variant = <i>variant-or-empty</i>>
        requires sender_in&lt;S, E>
      using error_types_of_t =
          <i>gather-signatures</i>&lt;set_error_t, completion_signatures_of_t&lt;S, E>, type_identity_t, Variant>;

    template&lt;class S, class E = empty_env>
        requires sender_in&lt;S, E>
      inline constexpr bool sends_stopped =
          !same_as&lt;
            <i>type-list</i>&lt;>,
            <i>gather-signatures</i>&lt;set_stopped_t, completion_signatures_of_t&lt;S, E>, <i>type-list</i>, <i>type-list</i>>>;
    </pre>

### `execution::transform_completion_signatures` <b>[exec.utils.tfxcmplsigs]</b> ### {#spec-execution.snd_rec_utils.transform_completion_sigs}

1. `transform_completion_signatures` is an alias template used to transform one
    set of completion signatures into another. It takes a set of completion
    signatures and several other template arguments that apply modifications to
    each completion signature in the set to generate a new specialization of
    `completion_signatures`.

2. [<i>Example:</i>
    <pre highlight="c++">
    // Given a sender S and an environment Env, adapt the completion
    // signatures of S by lvalue-ref qualifying the values, adding an additional
    // exception_ptr error completion if its not already there, and leaving the
    // other completion signatures alone.
    template&lt;class... Args>
      using my_set_value_t =
        completion_signatures&lt;
          set_value_t(add_lvalue_reference_t&lt;Args>...)>;

    using my_completion_signatures =
      transform_completion_signatures&lt;
        completion_signatures_of_t&lt;S, Env>,
        completion_signatures&lt;set_error_t(exception_ptr)>,
        my_set_value_t>;
    </pre>
    -- <i>end example</i>]

3. This subclause makes use of the following exposition-only entities:

    <pre highlight="c++">
    template&lt;class... As>
      using <i>default-set-value</i> =
        completion_signatures&lt;set_value_t(As...)>;

    template&lt;class Err>
      using <i>default-set-error</i> =
        completion_signatures&lt;set_error_t(Err)>;
    </pre>

4.  <pre highlight="c++">
    template&lt;<i>valid-completion-signatures</i> InputSignatures,
             <i>valid-completion-signatures</i> AdditionalSignatures =
                 completion_signatures&lt;>,
             template&lt;class...> class SetValue = <i>default-set-value</i>,
             template&lt;class> class SetError = <i>default-set-error</i>,
             <i>valid-completion-signatures</i> SetStopped =
                 completion_signatures&lt;set_stopped_t()>>
    using transform_completion_signatures =
      completion_signatures&lt;<i>see below</i>>;
    </pre>

    *  `SetValue` shall name an alias template such that for any template
        parameter pack `As...`, the type `SetValue<As...>` is either ill-formed
        or else <code><i>valid-completion-signatures</i>&lt;SetValue&lt;As...>></code>
        is satisfied.
    *  `SetError` shall name an alias template such that for any type `Err`,
        `SetError<Err>` is either ill-formed or else
        <code><i>valid-completion-signatures</i>&lt;SetError&lt;Err>></code>
        is satisfied.

    Then:

        * Let `Vs...` be a pack of the types in the <code><i>type-list</i></code> named
            by <code><i>gether-signatures</i>&lt;set_value_t, InputSignatures, SetValue, <i>type-list</i>></code>.

        *  Let `Es...` be a pack of the types in the
            <code><i>type-list</i></code> named by <code><i>gather-signatures</i>&lt;set_error_t, InputSignatures,
            type_identity_t, <i>error-list</i>></code>, where <code><i>error-list</i></code> is an
            alias template such that <code><i>error-list</i>&lt;Ts...></code> names
            <code><i>type-list</i>&lt;SetError&lt;Ts>...></code>.

        * Let `Ss` name the type `completion_signatures<>` if <code><i>gather-signatures</i>&lt;set_stopped_t, InputSignatures,
            <i>type-list</i>, <i>type-list</i>></code> is an alias for the type <code><i>type-list</i>&lt;></code>; otherwise, `SetStopped`.

        Then:

        1. If any of the above types are ill-formed, then
            `transform_completion_signatures<InputSignatures, AdditionalSignatures, SetValue, SetError,
            SetStopped>` is ill-formed,

        2. Otherwise, `transform_completion_signatures<InputSignatures, AdditionalSignatures, SetValue,
            SetError, SetStopped>` names the type `completion_signatures<Sigs...>`
            where `Sigs...` is the unique set of types in all the template arguments
            of all the `completion_signatures` specializations in `[AdditionalSignatures, Vs..., Es..., Ss]`.

## Execution contexts <b>[exec.ctx]</b> ## {#spec-execution.contexts}

1. This subclause specifies some execution resources on which work can be scheduled.

### `run_loop` <b>[exec.run.loop]</b> ### {#spec-execution.contexts.run_loop}

1. A `run_loop` is an execution resource on which work can be scheduled. It maintains a simple, thread-safe first-in-first-out queue of work. Its `run()` member function removes elements from the queue and executes them in a loop on whatever thread of execution calls `run()`.

2. A `run_loop` instance has an associated <i>count</i> that corresponds to the number of work items that are in its queue. Additionally, a `run_loop` has an associated <i>state</i> that can be one of <i>starting</i>, <i>running</i>, or <i>finishing</i>.

3. Concurrent invocations of the member functions of `run_loop`, other than `run` and its destructor, do not introduce data races. The member functions `pop_front`, `push_back`, and `finish` execute atomically.

4. [<i>Note:</i> Implementations are encouraged to use an intrusive queue of operation states to hold the work units to make scheduling allocation-free. — <i>end note</i>]

    <pre highlight="c++">
    class run_loop {
      // [exec.run.loop.types] Associated types
      class <i>run-loop-scheduler</i>; // exposition only
      class <i>run-loop-sender</i>; // exposition only
      struct <i>run-loop-opstate-base</i> { // exposition only
        virtual void execute() = 0;
        run_loop* <i>loop_</i>;
        <i>run-loop-opstate-base</i>* <i>next_</i>;
      };
      template&lt;receiver_of&lt;completion_signatures&lt;set_value_t()>> R>
        using <i>run-loop-opstate</i> = <i>unspecified</i>; // exposition only

      // [exec.run.loop.members] Member functions:
      <i>run-loop-opstate-base</i>* pop_front(); // exposition only
      void push_back(<i>run-loop-opstate-base</i>*); // exposition only

     public:
      // [exec.run.loop.ctor] construct/copy/destroy
      run_loop() noexcept;
      run_loop(run_loop&&) = delete;
      ~run_loop();

      // [exec.run.loop.members] Member functions:
      <i>run-loop-scheduler</i> get_scheduler();
      void run();
      void finish();
    };
    </pre>

#### Associated types <b>[exec.run.loop.types]</b> #### {#spec-execution.contexts.run_loop.types}

    <pre highlight="c++">
    class <i>run-loop-scheduler</i>;
    </pre>

  1. <code><i>run-loop-scheduler</i></code> is an unspecified type that models the `scheduler` concept.

  2. Instances of <code><i>run-loop-scheduler</i></code> remain valid until the end of the lifetime of the `run_loop` instance from which they were obtained.

  3. Two instances of <code><i>run-loop-scheduler</i></code> compare equal if and only if they were obtained from the same `run_loop` instance.

  4. Let <code><i>sch</i></code> be an expression of type <code><i>run-loop-scheduler</i></code>. The expression  <code>schedule(<i>sch</i>)</code> is not potentially-throwing and has type <code><i>run-loop-sender</i></code>.

  <pre highlight="c++">
  class <i>run-loop-sender</i>;
  </pre>

  1. <code><i>run-loop-sender</i></code> is an unspecified type such that
     <code><i>sender-of</i>&lt;<i>run-loop-sender</i>></code> is `true`.
     Additionally, the types reported by its `error_types` associated type is
     `exception_ptr`, and the value of its `sends_stopped` trait is `true`.

  2. An instance of <code><i>run-loop-sender</i></code> remains valid until the
     end of the lifetime of its associated `run_loop` instance.

  3. Let <code><i>s</i></code> be an expression of type
     <code><i>run-loop-sender</i></code>, let <code><i>r</i></code> be an
     expression such that <code>decltype(<i>r</i>)</code> models the
     `receiver_of` concept, and let `C` be either `set_value_t` or
     `set_stopped_t`. Then:

    * The expression <code>connect(<i>s</i>, <i>r</i>)</code> has type <code><i>run-loop-opstate</i>&lt;decay_t&lt;decltype(<i>r</i>)>></code> and is potentially-throwing if and only if the initialiation of <code>decay_t&lt;decltype(<i>r</i>)></code> from <code><i>r</i></code> is potentially-throwing.

    * The expression <code>get_completion_scheduler&lt;C>(get_env(<i>s</i>))</code> is not potentially-throwing, has type <code><i>run-loop-scheduler</i></code>, and compares equal to the <code><i>run-loop-scheduler</i></code> instance from which <code><i>s</i></code> was obtained.

  <pre highlight="c++">
  template&lt;receiver_of&lt;completion_signatures&lt;set_value_t()>> R> // arguments are not associated entities ([lib.tmpl-heads])
    struct <i>run-loop-opstate</i>;
  </pre>

  1. <code><i>run-loop-opstate</i>&lt;<i>R</i>></code> inherits unambiguously from <code><i>run-loop-opstate-base</i></code>.

  2. Let <code><i>o</i></code> be a non-`const` lvalue of type <code><i>run-loop-opstate</i>&lt;R></code>, and let <code><i>REC</i>(<i>o</i>)</code> be a non-`const` lvalue reference to an instance of type <code><i>R</i></code> that was initialized with the expression <code><i>r</i></code> passed to the invocation of `connect` that returned <code><i>o</i></code>. Then:

    * The object to which <code><i>REC</i>(<i>o</i>)</code> refers remains valid for the lifetime of the object to which <code><i>o</i></code> refers.

    * The type <code><i>run-loop-opstate</i>&lt;R></code> overrides <code><i>run-loop-opstate-base</i>::execute()</code> such that <code><i>o</i>.execute()</code> is equivalent to the following:

        <pre highlight="c++">
        if (get_stop_token(<i>REC</i>(<i>o</i>)).stop_requested()) {
          set_stopped(std::move(<i>REC</i>(<i>o</i>)));
        } else {
          set_value(std::move(<i>REC</i>(<i>o</i>)));
        }
        </pre>

    * The expression <code>start(<i>o</i>)</code> is equivalent to the following:

        <pre highlight="c++">
        try {
          <i>o</i>.<i>loop_</i>->push_back(&<i>o</i>);
        } catch(...) {
          set_error(std::move(<i>REC</i>(<i>o</i>)), current_exception());
        }
        </pre>

#### Constructor and destructor <b>[exec.run.loop.ctor]</b> #### {#spec-execution.contexts.run_loop.ctor}

    <pre highlight="c++">
    run_loop::run_loop() noexcept;
    </pre>

    1. <i>Postconditions:</i> <i>count</i> is `0` and <i>state</i> is <i>starting</i>.

    <pre highlight="c++">
    run_loop::~run_loop();
    </pre>

    1. <i>Effects:</i> If <i>count</i> is not `0` or if <i>state</i> is <i>running</i>, invokes `terminate()`. Otherwise, has no effects.

#### Member functions <b>[exec.run.loop.members]</b> #### {#spec-execution.contexts.run_loop.members}

    <pre highlight="c++">
    <i>run-loop-opstate-base</i>* run_loop::pop_front();
    </pre>

    1. <i>Effects:</i> Blocks ([defns.block]) until one of the following conditions is `true`:

        * <i>count</i> is `0` and <i>state</i> is <i>finishing</i>, in which case `pop_front` returns `nullptr`; or

        * <i>count</i> is greater than `0`, in which case an item is removed from the front of the queue, <i>count</i> is decremented by `1`, and the removed item is returned.

    <pre highlight="c++">
    void run_loop::push_back(<i>run-loop-opstate-base</i>* item);
    </pre>

    1. <i>Effects:</i> Adds `item` to the back of the queue and increments <i>count</i> by `1`.

    2. <i>Synchronization:</i> This operation synchronizes with the `pop_front` operation that obtains `item`.

    <pre highlight="c++">
    <i>run-loop-scheduler</i> run_loop::get_scheduler();
    </pre>

    1. <i>Returns:</i> an instance of <code><i>run-loop-scheduler</i></code> that can be used to schedule work onto this `run_loop` instance.

    <pre highlight="c++">
    void run_loop::run();
    </pre>

    1. <i>Effects:</i> Equivalent to:

        <pre highlight="c++">
        while (auto* op = pop_front()) {
          op->execute();
        }
        </pre>

    2. <i>Precondition:</i> <i>state</i> is <i>starting</i>.

    3. <i>Postcondition:</i> <i>state</i> is <i>finishing</i>.

    4. <i>Remarks:</i> While the loop is executing, <i>state</i> is <i>running</i>. When <i>state</i> changes, it does so without introducing data races.

    <pre highlight="c++">
    void run_loop::finish();
    </pre>

    1. <i>Effects:</i> Changes <i>state</i> to <i>finishing</i>.

    2. <i>Synchronization:</i> This operation synchronizes with all `pop_front` operations on this object.

## Coroutine utilities <b>[exec.coro.utils]</b> ## {#spec-execution.coro_utils}

### `execution::as_awaitable` <b>[exec.as.awaitable]</b> ### {#spec-execution.coro_utils.as_awaitable}

1. `as_awaitable` transforms an object into one that is awaitable within a particular coroutine. This subclause makes use of the following exposition-only entities:

    <pre highlight="c++">
    template&lt;class S, class E>
      using <i>single-sender-value-type</i> = <i>see below</i>;

    template&lt;class S, class E>
      concept <i>single-sender</i> =
        sender_in&lt;S, E> &amp;&amp;
        requires { typename <i>single-sender-value-type</i>&lt;S, E>; };

    template&lt;class S, class P>
      concept <i>awaitable-sender</i> =
        <i>single-sender</i>&lt;S, <i>ENV-OF</i>(P)> &amp;&amp;
        sender_to&lt;S, <i>awaitable-receiver</i>> &amp;&amp; // see below
        requires (P&amp; p) {
          { p.unhandled_stopped() } -> convertible_to&lt;coroutine_handle&lt;>>;
        };

    template&lt;class S, class P>
      class <i>sender-awaitable</i>;
    </pre>

    where <code><i>ENV-OF</i>(P)</code> names the type `env_of_t<P>` if that type
    is well-formed, or <code>empty_env</code> otherwise.

    1. Alias template <i>single-sender-value-type</i> is defined as follows:

        1. If `value_types_of_t<S, E, Tuple, Variant>` would have the form `Variant<Tuple<T>>`, then <code><i>single-sender-value-type</i>&lt;S, E></code> is an alias for type `decay_t<T>`.

        2. Otherwise, if `value_types_of_t<S, E, Tuple, Variant>` would have the form `Variant<Tuple<>>` or `Variant<>`, then <code><i>single-sender-value-type</i>&lt;S, E></code> is an alias for type `void`.

        3. Otherwise, <code><i>single-sender-value-type</i>&lt;S, E></code> is ill-formed.

    2. The type <code><i>sender-awaitable</i>&lt;S, P></code> is equivalent to the following:

        <pre highlight="c++">
        template&lt;class S, class P> // arguments are not associated entities ([lib.tmpl-heads])
        class <i>sender-awaitable</i> {
          struct unit {};
          using value_t = <i>single-sender-value-type</i>&lt;S, <i>ENV-OF</i>(P)>;
          using result_t = conditional_t&lt;is_void_v&lt;value_t>, unit, value_t>;
          struct <i>awaitable-receiver</i>;

          variant&lt;monostate, result_t, exception_ptr> <i>result_</i>{};
          connect_result_t&lt;S, <i>awaitable-receiver</i>> <i>state_</i>;

         public:
          <i>sender-awaitable</i>(S&& s, P& p);
          bool await_ready() const noexcept { return false; }
          void await_suspend(coroutine_handle&lt;P>) noexcept { start(<i>state_</i>); }
          value_t await_resume();
        };
        </pre>

        1. <code><i>awaitable-receiver</i></code> is equivalent to the following:

            <pre highlight="c++">
            struct <i>awaitable-receiver</i> {
              using is_receiver = <i>unspecified</i>;
              variant&lt;monostate, result_t, exception_ptr>* <i>result_ptr_</i>;
              coroutine_handle&lt;P> <i>continuation_</i>;
              // ... <i>see below</i>
            };
            </pre>

            Let `r` be an rvalue expression of type <code><i>awaitable-receiver</i></code>, let `cr` be a `const` lvalue that refers to `r`, let `vs...` be an arbitrary function parameter pack of types `Vs...`, and let `err` be an arbitrary expression of type `Err`. Then:

              1. If `constructible_from<result_t, Vs...>` is satisfied, the expression `set_value(r, vs...)` is equivalent to:

                  <pre highlight="c++">
                  try {
                    r.<i>result_ptr_</i>->emplace&lt;1>(vs...);
                  } catch(...) {
                    r.<i>result_ptr_</i>->emplace&lt;2>(current_exception());
                  }
                  r.<i>continuation_</i>.resume();
                  </pre>

                  Otherwise, `set_value(r, vs...)` is ill-formed.

              2. The expression `set_error(r, err)` is equivalent to:

                  <pre highlight="c++">
                  r.<i>result_ptr_</i>->emplace&lt;2>(<i>AS-EXCEPT-PTR</i>(err));
                  r.<i>continuation_</i>.resume();
                  </pre>

                  where <code><i>AS-EXCEPT-PTR</i>(err)</code> is:

                  1. `err` if `decay_t<Err>` names the same type as `exception_ptr`,

                  2. Otherwise, `make_exception_ptr(system_error(err))` if `decay_t<Err>` names the same type as `error_code`,

                  3. Otherwise, `make_exception_ptr(err)`.

              3. The expression `set_stopped(r)` is equivalent to
                <code>static_cast&lt;coroutine_handle&lt;>>(r.<i>continuation_</i>.promise().unhandled_stopped()).resume()</code>.

              4. For any expression `tag` whose type satisfies <code><i>forwarding-query</i></code>
                  and for any pack of subexpressions `as`, `tag_invoke(tag, get_env(cr), as...)`
                  is expression-equivalent to <code>tag(get_env(as_const(cr.<i>continuation_</i>.promise())),
                  as...)</code> when that expression is well-formed.

        2. <b><code><i>sender-awaitable</i>::<i>sender-awaitable</i>(S&& s, P& p)</code></b>

            - <i>Effects:</i> initializes <i>`state_`</i> with <code>connect(std::forward&lt;S>(s), <i>awaitable-receiver</i>{&<i>result_</i>, coroutine_handle&lt;P>::from_promise(p)})</code>.

        3. <b><code>value_t <i>sender-awaitable</i>::await_resume()</code></b>

            - <i>Effects:</i> equivalent to:

                <pre highlight="c++">
                if (<i>result_</i>.index()) == 2)
                  rethrow_exception(get&lt;2>(<i>result_</i>));
                if constexpr (!is_void_v&lt;value_t>)
                  return std::forward&lt;value_t>(get&lt;1>(<i>result_</i>));
                </pre>

2. `as_awaitable` is a customization point object. For some subexpressions `e` and `p` where `p` is an lvalue, `E` names the type `decltype((e))` and `P` names the type `decltype((p))`, `as_awaitable(e, p)` is expression-equivalent to the following:

    1. `tag_invoke(as_awaitable, e, p)` if that expression is well-formed.

        * <i>Mandates:</i> <code><i>is-awaitable</i>&lt;A, P></code> is `true`, where `A` is the type of the `tag_invoke` expression above.

    2. Otherwise, `e` if <code><i>is-awaitable</i>&lt;E, <i>U</i>></code> is
        `true`, where <code><i>U</i></code> is an unspecified class type that
        lacks a member named `await_transform`. <span class="wg21note">The
        condition is not <code><i>is-awaitable</i>&lt;E, P></code> as that
        creates the potential for constraint recursion.</span>

        * <i>Preconditions:</i> <code><i>is-awaitable</i>&lt;E, P></code> is
            `true` and the expression `co_await e` in a coroutine with promise
            type <code><i>U</i></code> is expression-equivalent to the same
            expression in a coroutine with promise type `P`.

    3. Otherwise, <code><i>sender-awaitable</i>{e, p}</code> if <code><i>awaitable-sender</i>&lt;E, P></code> is `true`.

    4. Otherwise, `e`.

### `execution::with_awaitable_senders` <b>[exec.with.awaitable.senders]</b> ### {#spec-execution.coro_utils.with_awaitable_senders}

  1. `with_awaitable_senders`, when used as the base class of a coroutine promise type, makes senders awaitable in that coroutine type.

    In addition, it provides a default implementation of `unhandled_stopped()` such that if a sender completes by calling `set_stopped`, it is treated as if an uncatchable "stopped" exception were thrown from the <i>await-expression</i>. In practice, the coroutine is never resumed, and the `unhandled_stopped` of the coroutine caller's promise type is called.

    <pre highlight="c++">
    template&lt;<i>class-type</i> Promise>
      struct with_awaitable_senders {
        template&lt;OtherPromise>
          requires (!same_as&lt;OtherPromise, void>)
        void set_continuation(coroutine_handle&lt;OtherPromise> h) noexcept;

        coroutine_handle&lt;> continuation() const noexcept { return <i>continuation_</i>; }

        coroutine_handle&lt;> unhandled_stopped() noexcept {
          return <i>stopped_handler_</i>(<i>continuation_</i>.address());
        }

        template&lt;class Value>
        <i>see-below</i> await_transform(Value&& value);

       private:
        // exposition only
        [[noreturn]] static coroutine_handle&lt;> default_unhandled_stopped(void*) noexcept {
          terminate();
        }
        coroutine_handle&lt;> <i>continuation_</i>{}; // exposition only
        // exposition only
        coroutine_handle&lt;> (*<i>stopped_handler_</i>)(void*) noexcept = &default_unhandled_stopped;
      };
    </pre>

  2. <b>`void set_continuation(coroutine_handle<OtherPromise> h) noexcept`</b>

      - <i>Effects:</i> equivalent to:

        <pre highlight="c++">
        <i>continuation_</i> = h;
        if constexpr ( requires(OtherPromise& other) { other.unhandled_stopped(); } ) {
          <i>stopped_handler_</i> = [](void* p) noexcept -> coroutine_handle&lt;> {
            return coroutine_handle&lt;OtherPromise>::from_address(p)
              .promise().unhandled_stopped();
          };
        } else {
          <i>stopped_handler_</i> = default_unhandled_stopped;
        }
        </pre>

  3. <b><code><i>call-result-t</i>&lt;as_awaitable_t, Value, Promise&amp;> await_transform(Value&amp;&amp; value)</code></b>

      - <i>Effects:</i> equivalent to:

        <pre highlight="c++">
        return as_awaitable(std::forward&lt;Value>(value), static_cast&lt;Promise&>(*this));
        </pre>

<pre class=biblio>
{
    "HPX": {
        "authors": [
            "Hartmut Kaiser",
            "Patrick Diehl",
            "Adrian S. Lemoine",
            "Bryce Adelstein Lelbach",
            "Parsa Amini",
            "Agustín Berge",
            "John Biddiscombe",
            "Steven R. Brandt",
            "Nikunj Gupta",
            "Thomas Heller",
            "Kevin Huck",
            "Zahra Khatami",
            "Alireza Kheirkhahan",
            "Auriane Reverdell",
            "Shahrzad Shirzad",
            "Mikael Simberg",
            "Bibek Wagle",
            "Weile Wei",
            "Tianyi Zhang"
        ],
        "href": "https://doi.org/10.21105/joss.02352",
        "title": "HPX - The C++ Standard Library for Parallelism and Concurrency",
        "volume": 5,
        "number": 53,
        "pages": 2352,
        "publisher": "The Open Journal",
        "journal": "Journal of Open Source Software"
    }
}
</pre>