Skip to content

Commit

Permalink
storage: in remap, when since is empty, suspend instead of panic'ing
Browse files Browse the repository at this point in the history
As the comment describes, this is a race condition that is expected to
happen and it's better to suspend rather than bring down the whole
cluster, which causes pain for customers/the oncall.
  • Loading branch information
aljoscha committed Jan 29, 2025
1 parent 86b9bbc commit df33a47
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions src/storage/src/source/reclock/compat.rs
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,25 @@ where
// Allow manually simulating the scenario where the since of the remap
// shard has advanced too far.
fail_point!("invalid_remap_as_of");

if since.is_empty() {
// This can happen when, say, a source is being dropped but we on
// the cluster are busy and notice that only later. In those cases
// it can happen that we still try to render an ingestion that is
// not valid anymore and where the shards it uses are not valid to
// use anymore.
//
// This is a rare race condition and something that is expected to
// happen every now and then. It's not a bug in the current way of
// how things work.
tracing::info!(
source_id = %id,
%worker_id,
"since of remap shard is the empty antichain, suspending...");
std::future::pending::<()>().await;
unreachable!("pending future never returns");
}

assert!(
PartialOrder::less_equal(since, &as_of),
"invalid as_of: as_of({as_of:?}) < since({since:?}), \
Expand Down

0 comments on commit df33a47

Please sign in to comment.