Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backtrace support, part 3: Actually produce continuation-aware backtraces #119

Merged
merged 9 commits into from
Mar 6, 2024
9 changes: 0 additions & 9 deletions build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -275,15 +275,6 @@ fn ignore(testsuite: &str, testname: &str, strategy: &str) -> bool {
}
}

// TODO(dhil): The unhandled test is flaky as it exposes a
// potential problem with trap generation and propagation across
// stacks.
if testsuite == "typed_continuations" {
if testname == "unhandled" {
return true;
}
}

match env::var("CARGO_CFG_TARGET_ARCH").unwrap().as_str() {
"s390x" => {
// TODO(#6530): These tests require tail calls, but s390x
Expand Down
30 changes: 30 additions & 0 deletions crates/continuations/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,36 @@ impl StackChain {
pub fn is_main_stack(&self) -> bool {
matches!(self, StackChain::MainStack(_))
}

// We don't implement IntoIterator because our iterator is unsafe, so at
// least this gives us some way of indicating this, even though the actual
// unsafety lies in the `next` function.
pub unsafe fn into_iter(self) -> ContinuationChainIterator {
ContinuationChainIterator(self)
}
}

pub struct ContinuationChainIterator(StackChain);

impl Iterator for ContinuationChainIterator {
type Item = (Option<*mut ContinuationObject>, *mut StackLimits);

fn next(&mut self) -> Option<Self::Item> {
match self.0 {
StackChain::Absent => None,
StackChain::MainStack(ms) => {
let next = (None, ms);
self.0 = StackChain::Absent;
Some(next)
}
StackChain::Continuation(ptr) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to make this a NonNull pointer to get rid of the unsafe usage here, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, any of the methods that dereference a NonNull are still unsafe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again. We know that ptr is non-null, right? As far as I remember, it should never take the value of null. In this case, why don't we just do it all with raw pointers. Make the entire arm unsafe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the advantage of that be over the current version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wouldn't have any of the conversion into references... May be negligible.

let continuation = unsafe { ptr.as_mut().unwrap() };
let next = (Some(ptr), (&mut continuation.limits) as *mut StackLimits);
self.0 = continuation.parent_chain.clone();
Some(next)
}
}
}
}

#[repr(transparent)]
Expand Down
17 changes: 17 additions & 0 deletions crates/runtime/src/continuation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,10 @@ pub mod baseline {
// A buffer to help propagate tag payloads across
// continuations.
static SUSPEND_PAYLOADS: RefCell<Vec<u128>> = RefCell::new(vec![]);

// This acts like a fuse that is set to true if this thread has ever
// executed a continuation (e.g., run `resume`).
static HAS_EVER_RUN_CONTINUATION: Cell<bool> = Cell::new(false);
}

/// Allocates a new continuation in suspended mode.
Expand Down Expand Up @@ -382,6 +386,9 @@ pub mod baseline {
/// Continues a given continuation.
#[inline(always)]
pub fn resume(instance: &mut Instance, contref: &mut VMContRef) -> Result<u32, TrapReason> {
// Trigger fuse
HAS_EVER_RUN_CONTINUATION.set(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following may be faster if Cell::set unconditionally writes to the internal memory cell, though, I don't know the specifics of the internals of Cell::set

Suggested change
HAS_EVER_RUN_CONTINUATION.set(true);
if HAS_EVER_RUN_CONTINUATION.get() {
HAS_EVER_RUN_CONTINUATION.set(true);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it again, maybe you should use OnceCell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking yet again, there is a potential problem here (and regression). Consider the scenario: You run a continuation to completion (e.g. the unit continuation), and afterwards on the main stack you perform a wasm-to-host call that reenters wasm, but it would be prohibited from doing so because HAS_EVER_RUN_CONTINUATION is true even though we are not currently running on a continuation... or?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following may be faster if Cell::set unconditionally writes to the internal memory cell, though, I don't know the specifics of the internals of Cell::set

Writing to the Cell here compiles down to a simple move instruction. So there isn't much of a point in making it conditional.

Thinking yet again, there is a potential problem here (and regression). Consider the scenario: You run a continuation to completion (e.g. the unit continuation), and afterwards on the main stack you perform a wasm-to-host call that reenters wasm, but it would be prohibited from doing so because HAS_EVER_RUN_CONTINUATION is true even though we are not currently running on a continuation... or?

Please note that the HAS_EVER_RUN_CONTINUATION is not used by the mechanism to disallow certain forms of re-entering wasm. Instead, it is only used in the logic for generating backtraces: If we are using the baseline implementation and have ever run a continuation, we return an empty backtrace. In particular, the program you describe would still run normally. It's only the case that in your example returning to the main stack, we would still produce empty backtraces in the case of a trap. But I think that's acceptable; I'd rather not make the logic regarding this more complex.

Introducing the HAS_EVER_RUN_CONTINUATION flag is mostly just so that the full test suite passes when compiling with the baseline flag: There are plenty of non-wasmfx test cases that do something and inspect the resulting backtrace. These tests would fail if we unconditionally produced empty backtraces whenever the baseline feature is enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following may be faster if Cell::set unconditionally writes to the internal memory cell, though, I don't know the specifics of the internals of Cell::set

Writing to the Cell here compiles down to a simple move instruction. So there isn't much of a point in making it conditional.

I suppose this move is to an address in main memory. Writing to memory is often slower than reading from memory, hence why a branch can yield better performance -- we could potentially help the speculation engine by providing a cold-branch hint.

Introducing the HAS_EVER_RUN_CONTINUATION flag is mostly just so that the full test suite passes when compiling with the baseline flag: There are plenty of non-wasmfx test cases that do something and inspect the resulting backtrace. These tests would fail if we unconditionally produced empty backtraces whenever the baseline feature is enabled.

I see, thanks.


// Attach parent.
debug_assert!(contref.parent.is_null());
contref.parent = get_current_continuation();
Expand Down Expand Up @@ -551,6 +558,10 @@ pub mod baseline {
fn set_current_continuation(cont: *mut VMContRef) {
CC.set(cont)
}

pub fn has_ever_run_continuation() -> bool {
HAS_EVER_RUN_CONTINUATION.get()
}
}

#[allow(missing_docs)]
Expand Down Expand Up @@ -640,4 +651,10 @@ pub mod baseline {
pub fn get_current_continuation() -> *mut VMContRef {
panic!("attempt to execute continuation::baseline::get_current_continuation without `typed_continuation_baseline_implementation` toggled!")
}

#[inline(always)]
#[allow(missing_docs)]
pub fn has_ever_run_continuation() -> bool {
panic!("attempt to execute continuation::baseline::has_ever_run_continuation without `typed_continuation_baseline_implementation` toggled!")
}
}
8 changes: 8 additions & 0 deletions crates/runtime/src/traphandlers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,14 @@ mod call_thread_state {
old_last_wasm_exit_fp: Cell<usize>,
old_last_wasm_exit_pc: Cell<usize>,
old_last_wasm_entry_sp: Cell<usize>,
// Note that there is no need for saving a `old_callee_stack_chain`
// value: Due to the invariant that only the "innermost`
// `CallThreadState` executing wasm may do so off the main stack (see
// comments in `wasmtime::invoke_wasm_and_catch_traps`), there is no
// need to store the previous value of the stack chain: At the time of
// pushing a `CallThreadState` for wasm execution to the head of a
// thread's CTS list, all other states are on the
// main stack.
}

impl Drop for CallThreadState {
Expand Down
114 changes: 104 additions & 10 deletions crates/runtime/src/traphandlers/backtrace.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
//! exit FP and stopping once we reach the entry SP (meaning that the next older
//! frame is a host frame).

use wasmtime_continuations::StackChain;

use crate::arch;
use crate::{
traphandlers::{tls, CallThreadState},
Expand Down Expand Up @@ -101,14 +103,37 @@ impl Backtrace {
trap_pc_and_fp: Option<(usize, usize)>,
mut f: impl FnMut(Frame) -> ControlFlow<()>,
) {
if cfg!(feature = "typed_continuations_baseline_implementation") {
if crate::continuation::baseline::has_ever_run_continuation() {
log::info!("Backtrace generation not supported in baseline implementation once a continuation was invoked");
frank-emrich marked this conversation as resolved.
Show resolved Hide resolved
return;
}
}

log::trace!("====== Capturing Backtrace ======");

// We are only interested in wasm frames, not host frames. Thus, we peel
// away the first states in this thread's `CallThreadState` chain that
// do not execute wasm.
// Note(frank-emrich) I'm not entirely sure if it can ever be the case
// that `state` is not actually executing wasm. In other words, it may
// be the case that we always have `Some(state) == first_wasm_state`.
// Otherwise we would be building a backtrace while executing a host
// call. But those cannot trap, but only panic and we do not use this
// function to build backtraces for panics.
let first_wasm_state = state
.iter()
.flat_map(|head| head.iter())
.skip_while(|state| state.callee_stack_chain.is_none())
.next();

let (last_wasm_exit_pc, last_wasm_exit_fp) = match trap_pc_and_fp {
// If we exited Wasm by catching a trap, then the Wasm-to-host
// trampoline did not get a chance to save the last Wasm PC and FP,
// and we need to use the plumbed-through values instead.
Some((pc, fp)) => {
assert!(std::ptr::eq(limits, state.limits));
assert!(first_wasm_state.is_some());
(pc, fp)
}
// Either there is no Wasm currently on the stack, or we exited Wasm
Expand All @@ -120,33 +145,49 @@ impl Backtrace {
}
};

let first_wasm_state_stack_chain = first_wasm_state
.map(|state| state.callee_stack_chain.map(|cell| &*(*cell).0.get()))
.flatten();

// The first value in `activations` is for the most recently running
// wasm. We thus provide the stack chain of `first_wasm_state` to
// traverse the potential continuation stacks. For the subsequent
// activations, we unconditionally use `None` as the corresponding stack
// chain. This is justified because only the most recent execution of
// wasm may execute off the main stack (see comments in
// `wasmtime::invoke_wasm_and_catch_traps` for details).
let activations = std::iter::once((
first_wasm_state_stack_chain,
last_wasm_exit_pc,
last_wasm_exit_fp,
*(*limits).last_wasm_entry_sp.get(),
))
.chain(
state
first_wasm_state
.iter()
.flat_map(|state| state.iter())
.filter(|state| std::ptr::eq(limits, state.limits))
.map(|state| {
(
None,
state.old_last_wasm_exit_pc(),
state.old_last_wasm_exit_fp(),
state.old_last_wasm_entry_sp(),
)
}),
)
.take_while(|&(pc, fp, sp)| {
.take_while(|&(_chain, pc, fp, sp)| {
if pc == 0 {
debug_assert_eq!(fp, 0);
debug_assert_eq!(sp, 0);
}
pc != 0
});

for (pc, fp, sp) in activations {
if let ControlFlow::Break(()) = Self::trace_through_wasm(pc, fp, sp, &mut f) {
for (chain, pc, fp, sp) in activations {
if let ControlFlow::Break(()) =
Self::trace_through_continuations(chain, pc, fp, sp, &mut f)
{
log::trace!("====== Done Capturing Backtrace (closure break) ======");
return;
}
Expand All @@ -155,6 +196,63 @@ impl Backtrace {
log::trace!("====== Done Capturing Backtrace (reached end of activations) ======");
}

unsafe fn trace_through_continuations(
chain: Option<&StackChain>,
pc: usize,
fp: usize,
trampoline_sp: usize,
mut f: impl FnMut(Frame) -> ControlFlow<()>,
) -> ControlFlow<()> {
// Handle the stack that is currently running (which may be a
// continuation or the main stack).
Self::trace_through_wasm(pc, fp, trampoline_sp, &mut f)?;

chain.map_or(ControlFlow::Continue(()), |chain| {
debug_assert_ne!(*chain, StackChain::Absent);

let stack_limits_iter = chain.clone().into_iter();

// The very first entry in the stack chain is for what is currently
// running (which may be a continuation or main stack). However, for
// the currently running stack, the data in its associated
// `StackLimits` object is stale (see comment on
// `wasmtime_continuations::StackChain` for a description of the
// invariants).
// That's why we already handled the currently running stack at the
// beginning of the function, using data directly from the
// VMRuntimeLimits.
let remainder = stack_limits_iter.skip(1);

for (continuation_opt, limits) in remainder {
let limits = limits.as_ref().unwrap();
match continuation_opt {
Some(continuation) => {
let cont = unsafe { &*continuation };
Copy link
Member

@dhil dhil Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unsafe block annotation should arguably be around the entire match arm, because the &*continuation could violate the strict aliasing rules. We should add a comment saying that this line is the unsafe bit of the arm.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that making a piece of code unsafe doesn't change the permitted aliasing. The aliasing here should be fine: We aren't violating the rule that there must be no more than one mutable reference at a time (because there are zero mutable references here).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I am saying. The entire block would be unsafe if we were to alias the continuation. The safety of the rest of code relies on this assumption

let stack_range = (*cont.fiber).stack().range().unwrap();
debug_assert!(stack_range.contains(&limits.last_wasm_exit_fp));
debug_assert!(stack_range.contains(&limits.last_wasm_entry_sp));
// TODO(frank-emrich) Enable this assertion one we stop
// zero-ing the stack limit in
// `wasmtime_runtime::continuation::resume`
//
// debug_assert_eq!(stack_range.end, limits.stack_limit);
}
None => {
// reached stack information for main stack
}
}

Self::trace_through_wasm(
limits.last_wasm_exit_pc,
limits.last_wasm_exit_fp,
limits.last_wasm_entry_sp,
&mut f,
)?
}
ControlFlow::Continue(())
})
}

/// Walk through a contiguous sequence of Wasm frames starting with the
/// frame at the given PC and FP and ending at `trampoline_sp`.
// TODO(frank-emrich) Implement tracing across continuations.
Expand Down Expand Up @@ -186,9 +284,7 @@ impl Backtrace {
// The stack grows down, and therefore any frame pointer we are
// dealing with should be less than the stack pointer on entry
// to Wasm.
// TODO(frank-emrich) Disabled for now, as it does not hold in
// presence of continuations.
//assert!(trampoline_sp >= fp, "{trampoline_sp:#x} >= {fp:#x}");
assert!(trampoline_sp >= fp, "{trampoline_sp:#x} >= {fp:#x}");

arch::assert_fp_is_aligned(fp);

Expand Down Expand Up @@ -254,9 +350,7 @@ impl Backtrace {

// Because the stack always grows down, the older FP must be greater
// than the current FP.
// TODO(dhil): The following assertion is not always true
// in the presence of stack switching.
//assert!(next_older_fp > fp, "{next_older_fp:#x} > {fp:#x}");
assert!(next_older_fp > fp, "{next_older_fp:#x} > {fp:#x}");
fp = next_older_fp;
}
}
Expand Down