Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop of MediaSoup errors #2169

Open
jpd236 opened this issue Jun 7, 2024 · 3 comments
Open

Infinite loop of MediaSoup errors #2169

jpd236 opened this issue Jun 7, 2024 · 3 comments

Comments

@jpd236
Copy link
Contributor

jpd236 commented Jun 7, 2024

My JR instance has been up for ~3 weeks.

Starting on 6/4 (two days after #2167 happened, with nothing else in between in the server logs), the logs started showing a repeated sequence like this:

2024-06-04T02:40:07.127Z mediasoup:ERROR:Channel [pid:54 ChannelMessageRegistrator::RegisterHandler() | throwing MediaSoupError: Channel request handler with ID c93bdf42-c7e6-4f63-b288-f8d9546165d2 already exists
2024-06-04T02:40:07.170Z mediasoup:ERROR:Channel [pid:54 Worker::HandleRequest() | throwing MediaSoupError: Channel request handler with ID c93bdf42-c7e6-4f63-b288-f8d9546165d2 already exists [method:transport.produceData]
2024-06-04T02:40:07.178Z mediasoup:WARN:Channel request failed [method:transport.produceData, id:53]: Channel request handler with ID c93bdf42-c7e6-4f63-b288-f8d9546165d2 already exists [method:transport.produceData]
2024-06-04T02:40:51.254Z mediasoup:ERROR:Channel [pid:54 ChannelMessageRegistrator::RegisterHandler() | throwing MediaSoupError: Channel request handler with ID c93bdf42-c7e6-4f63-b288-f8d9546165d2 already exists
2024-06-04T02:40:51.254Z mediasoup:ERROR:Channel [pid:54 Worker::HandleRequest() | throwing MediaSoupError: Channel request handler with ID c93bdf42-c7e6-4f63-b288-f8d9546165d2 already exists [method:transport.produceData]
2024-06-04T02:40:51.255Z mediasoup:WARN:Channel request failed [method:transport.produceData, id:71]: Channel request handler with ID c93bdf42-c7e6-4f63-b288-f8d9546165d2 already exists [method:transport.produceData]
2024-06-04T02:41:20.346Z mediasoup:ERROR:Channel [pid:54 ChannelMessageRegistrator::RegisterHandler() | throwing MediaSoupError: Channel request handler with ID c93bdf42-c7e6-4f63-b288-f8d9546165d2 already exists
2024-06-04T02:41:20.348Z mediasoup:ERROR:Channel [pid:54 Worker::HandleRequest() | throwing MediaSoupError: Channel request handler with ID c93bdf42-c7e6-4f63-b288-f8d9546165d2 already exists [method:transport.produceData]
2024-06-04T02:41:20.349Z mediasoup:WARN:Channel request failed [method:transport.produceData, id:88]: Channel request handler with ID c93bdf42-c7e6-4f63-b288-f8d9546165d2 already exists [method:transport.produceData]
2024-06-04T02:41:48.808Z mediasoup:ERROR:Channel [pid:54 ChannelMessageRegistrator::RegisterHandler() | throwing MediaSoupError: Channel request handler with ID c93bdf42-c7e6-4f63-b288-f8d9546165d2 already exists
2024-06-04T02:41:48.809Z mediasoup:ERROR:Channel [pid:54 Worker::HandleRequest() | throwing MediaSoupError: Channel request handler with ID c93bdf42-c7e6-4f63-b288-f8d9546165d2 already exists [method:transport.produceData]

The IDs in the messages come and go, but each one appears tens of thousands of times.

Possibly unrelated, but this morning I got a flurry of MongoDB connection errors (MongoPoolClearedError, PoolClearedOnNetworkError, MongoServerSelectionError), and when I tried loading the site a little later it failed the first try and then was quite slow to come up, though it since recovered AFAICT aside from the log spam.

I pushed a new docker image to trigger an update from HEAD (with whatever has landed in Git in the last couple of weeks) + restart and that seems to have stopped the log spam for the moment.

@jpd236
Copy link
Contributor Author

jpd236 commented Sep 28, 2024

Just happened again (found out because my Papertrail account started approaching quota all of a sudden due to the log spam). Settled down again with just a regular restart.

@jpd236
Copy link
Contributor Author

jpd236 commented Dec 5, 2024

Someone just reported that they were having trouble staying connected to audio calls, and I checked the server logs and it had entered this state of spitting out "already exists" errors. They said:

We were all getting hold messages, and then it claimed there were 2-3 versions of each of us in the call while still not having any audio

@jpd236
Copy link
Contributor Author

jpd236 commented Jan 3, 2025

Grasping at straws a bit here, but I wonder if downgrading to mediasoup 3.11.2 would work around this. versatica/mediasoup#952, which was part of mediasoup 3.11.3, was pulled into JR in Nov 2022 in #1131. I'd have thus expected this issue to be observed by JR users in 2023/2024, but maybe somehow our team is triggering a unique scenario that hasn't been triggered before.

In the above error, ChannelMessageRegistrator::RegisterHandler is throwing because a channel already exists. Prior to that PR, if this situation occurs, registerHandler would catch the error and unregister the existing handler. While that seems like a good change in general to make, if there is a bug that causes JR to erroneously try to reuse a channel ID when it shouldn't, it could conceivably turn a one-off error that causes the existing channel to drop into an infinite loop where it repeatedly tries and fails.

So the root question is still why we're getting in this state where the channel already exists unexpectedly and isn't being reused properly, but perhaps downgrading will mitigate the impact of the bug in the near-term, if that version isn't too old/incompatible.

jpd236 added a commit to jpd236/jolly-roger that referenced this issue Jan 3, 2025
This is an attempt at working around deathandmayhem#2169. The bug indicates loops of
tens of thousands of ChannelMessageRegistrator::RegisterHandler failures
all for the same ID due to the handler already existing. In 3.11.2, the
first failure would cause the existing handler to be unregistered, which
would presumably make the second attempt succeed. In 3.11.3, it the
existing handler would be left around, which could certainly explain a
loop of repeated errors as the state isn't changing.

It seems likely that there is still another latent bug causing handlers
to be abandoned instead of being unregistered properly, but perhaps
going back to the old behavior is a better compromise until this can be
fixed properly.

See deathandmayhem#2169
jpd236 added a commit to jpd236/jolly-roger that referenced this issue Jan 3, 2025
This is an attempt at working around deathandmayhem#2169. The bug indicates loops of
tens of thousands of ChannelMessageRegistrator::RegisterHandler failures
all for the same ID due to the handler already existing. In 3.11.2, the
first failure would cause the existing handler to be unregistered, which
would presumably make the second attempt succeed. In 3.11.3, it the
existing handler would be left around, which could certainly explain a
loop of repeated errors as the state isn't changing.

It seems likely that there is still another latent bug causing handlers
to be abandoned instead of being unregistered properly, but perhaps
going back to the old behavior is a better compromise until this can be
fixed properly.

See deathandmayhem#2169
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant