Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messages can be received out of order across multiple channels #3

Open
brt-colleen opened this issue Jan 17, 2025 · 3 comments
Open

Comments

@brt-colleen
Copy link
Contributor

brt-colleen commented Jan 17, 2025

I have a process receiving messages from multiple processes (messages are published on different channels). I have a handler waking up for each message as I receive it. If I look at monotonic time in each handler, it does not increase monotonically.

In lockless_queue there's a race between publishing and making the data visible.

@philsc
Copy link
Contributor

philsc commented Jan 22, 2025

Would it be fair to rename the ticket to be "can be received out of order"?

@AustinSchuh AustinSchuh changed the title Messages can be sent out of order Messages can be received out of order Jan 22, 2025
@AustinSchuh AustinSchuh changed the title Messages can be received out of order Messages can be received out of order across multiple channels Jan 22, 2025
@AustinSchuh
Copy link
Member

Yea. It is unclear exactly when in the process a message is "sent". The key here is multiple channels, and what the receiver observes.

@AustinSchuh
Copy link
Member

Thinking out loud, we should be able to make time-stamping happen after the compare and exchange which does publishing.

The key here is any subsequent operation (reading or writing) needs to fill in the timestamps before proceeding. The original sender needs to also attempt to timestamp. The message needs to have a reserved value in it when originally published. The first writer of the timestamp wins, and everyone else gives up. They all need to do them in order too, monotonic then rt. The truck is to make sure performance doesn't suffer too much, both average and peak.

We also need to make sure all fetchers and watchers treat the same next state as available. That can be the publish compare and exchange, or updating the next pointer. Otherwise we could go backwards in even more subtle ways.

Today, we provide no guarantees about how much delay can be between when the rt and monotonic clocks get sampled. The RT is guidance. If we do the writes as 2 64 bit atomics writes, we arent making it worse.

The roborio is the only 32 bit platform around, and is going eol soon. @jkuszmaul and I vote that we don't fix this on 32 bit machines.

The real challenge is testing. The death test infrastructure should serve us well, along with queue racer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants