-
Notifications
You must be signed in to change notification settings - Fork 571
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
i#6336: Fix multiple replay bugs (#6353)
Adds a new unit test which targets replaying with region-of-interest limits stopping before the end of the input streams. The test has two variations, one with no interleavings and one with interleavings, each hitting different bugs. Fixes one significant bug: + The main bug hit and fixed is a race where the candidate record already read from an input by one output is not queued until after giving up the lock and re-acquiring later: and in that window another output could pick up that input, think it's at the right spot, and miss the queued entry, which then may show up randomly later. The input queue is a deque so we can use both ends. The solution is to put the candidate record in the queue before giving up the lock, and if the input didn't change upon re-acquiring the lock, to pop it off the back. This requires some other changes: for skips we need to throw away the newly-queued item, which could include an instr (so our assert on queued instrs needs adjusting); and we now need to empty the queue for a synthetic-end marker. This was tested on the unit test, which was failing every 3rd run or so, by running 1000 times. It was also tested on a larger proprietary test which was failing every time: it now passes 2000 times in a row. Two other fixes from the interleaving test: + On synthetic end, don't skip past it: else in some cases we miss the synthetic exit record. + Don't consider to be at a stop point when still have entries in the queue. Fixes two replay bugs hit with the non-interleaving test: + A skip on replay failed to change the input as it returns out of pick_next_input_as_previously(). We change it to just break out of the loop. Without this fix, the new unit test fails to run an entire input on replay (with no interleaving). + Non-file-based readers, like the mock_reader_t using a vector, were missing a reader init call on a replay path with a skip. Without this fix, the new unit test fails to run an entire input on replay (with no interleaving). Fixes two bugs hit in a larger app which did not easily reproduce despite my attempts: + A hang: in pick_next_input_as_previously() the input instruction ordinal wasn't yet at the beyond-limit ordinal for the synthetic exit record so it kept doing a wait. The fix is to check for a synthetic exit record. This one was hard to replicate so the unit test does not reliably hit it, but the fix was tested on the original scenario. + "Failed to read from trace": pick_next_input_as_previously() returns STATUS_SKIPPED when it hits the end (beyond setting at_eof) because it needs to read the synthetic exit record inserted in the queue. But a core that was .waiting=true errors out on non-STATUS_OK: so we change that to not error on SKIPPED and it's fixed. This one was hard to replicate so the unit test does not reliably hit it, but the fix was tested on the original scenario. Fixes #6336
- Loading branch information
1 parent
910ccb2
commit 54f393b
Showing
3 changed files
with
321 additions
and
29 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.