Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chrome screensharing "eats" the quality of concurrent video track #1306

Open
steffentchr opened this issue Dec 5, 2020 · 56 comments
Open

Chrome screensharing "eats" the quality of concurrent video track #1306

steffentchr opened this issue Dec 5, 2020 · 56 comments
Assignees
Labels
bug filed A bug has been filed upstream for this issue Chrome Issues related to Chrome Group Issues related to Group Rooms

Comments

@steffentchr
Copy link

steffentchr commented Dec 5, 2020

OVERVIEW

Since switching to using Twilio's group room feature, we've seen egregious webcam video quality while screen sharing in Chrome. We're using the 2.18.x version in production, but this happens on the 2.20.0 branch as well.

CAUSE

The problem initially seems to be caused by this exception for Chrome screen sharing made in connection with this previous issue. Skipping this exception resolves the problem in full.

REPRODUCTION CASE

https://gist.github.com/steffentchr/3f7ad9f24c4d2b825b2fbdd300718967

For reference, I have also made a recording of the full session available here:
https://training.twentythree.net/secret/65434556/cdd955e541389dec18f9ac5c9a7ea8c8 (direct download).

DETAILS

Our problem is that the webcam track drops to very low bitrates and extremely poor quality whenever screen sharing is started Chrome within a group room. In our testing, we have been using this configuration:

  • Two participants, only one broadcasting.
  • The broadcasting participant sends two tracks from the same session.
  • A video track at 720p with priority "high".
  • A screen sharing track at 180p with priority "low" (sharing full screen from a 2020 iMac 27" fwiw)
  • The broadcasting browser tab is in focus throughout the testing session.
  • Bandwidth profiling is turned on (different combinations of codecs and profile mode options were tested with the same, disheartening result.)

The webcam video track quality stalls within 10-20s of screen sharing starting. The quality stays lowed and erratic even after screen sharing is stopped. It can take multiple minutes from when the screen sharing track is removed before video quality recovers.

In the reproduction code linked above, the code will create a high priority video track on start. Two minutes later it adds a low priority screen sharing track. Another two minutes later it removes the screen sharing track again.

During this process, there's the video quality drops from 2.3M initially to 300K after the screensharing track is added. As you can see from the chart, the lowered quality persists for a siginificant time even after the track is removed again.

Skærmbillede 2020-12-04 kl  02 35 15

When running the same code in a Twilio P2P room, the problem is gone:

Skærmbillede 2020-12-04 kl  02 35 20

As mentioned in the overview above, the problem seems to happens because of an exception in the library made for Chrome screen sharing tracks. When this exception is removed and the bitrate prefrence is correctly applied, the broadcast quality for the video track stays at the expected rate:

Skærmbillede 2020-12-05 kl  17 56 12

As an aside, we have seen production cases on p2p similar to what's reported in JSDK-2557 where Chrome's screen sharing would send 0 bytes from the encoder/the encoder wouldn't start. The upsides of the solution as implemented however do not seem to match the severe consequences on video quality reported here.

@steffentchr
Copy link
Author

Adding as well that this was previously submitted to Twilio support with ticket # 5500109.
An example of a room where the problem was observed is sid RM6c5599d1c06aeef89e26433f612dd73a.

@manjeshbhargav
Copy link
Collaborator

Hi @steffentchr ,

Sorry for the delayed response. I have created an internal JIRA ticket to track this work. I'll keep you posted.

Thanks,

Manjesh

@manjeshbhargav manjeshbhargav added Chrome Issues related to Chrome Group Issues related to Group Rooms labels Jan 30, 2021
@charliesantos charliesantos added the bug filed A bug has been filed upstream for this issue label Mar 8, 2021
@jayphawk
Copy link

jayphawk commented May 13, 2021

Hello all,

Has there been any movement on this issue?

Thanks,

Jason

@charliesantos
Copy link
Collaborator

Hey @jayphawk , thanks for the ping! We don't have any updates on this one as of now due to other higher priority items we have on the roadmap. We already have an internal tracker filed and it will be reviewed for prioritization.

Thank you,
Charlie

@jayphawk
Copy link

Thanks for the update @charliesantos. I'm disappointed it's not a higher priority since it is greatly affecting us, but I understand.

@steffentchr
Copy link
Author

@charliesantos Any news on the prioritization of this issue?

@charliesantos
Copy link
Collaborator

@steffentchr I bumped this up internally and we'll consider in the next planning session. Meanwhile, are you able to observe the same issue in our reference react app? https://github.com/twilio/twilio-video-app-react

@steffentchr
Copy link
Author

To be frank, I spent time on providing the cleanest possible reproduction case rather than debugging the example app ;)
I can categorically say that the problem appears with the Twilio Video SDK, but haven't tested when that SDK is soaked in other code.

@jayphawk may be able to add additional information around how he's observing the problem.

@jayphawk
Copy link

Unfortunately, I can really only offer an end-user prospective right now through a program called Switcher Studio.

We’ve done testing comparing the experience when screensharing is used versus when it’s not used, as well as starting screensharing and then stopping it to see whether the video recovers, which it does.

It’s an issue we’re watching closely because of the way we’re utilizing Switcher Studio and the screensharing functionality. There’s a significantly noticeable difference in video quality shortly after screensharing is initiated. Once screensharing is stopped, the video quality returns to high quality after a bit of time.

I’ve reported the issue to the Switcher Studio team, so I’ll see if I can get more information from them on what they see.

@jayphawk
Copy link

@charliesantos Any news on the prioritization of this issue?

@Bumbolio, would you mind looking into this issue from your end to see if you can replicate it? We are still experiencing this issue in Switcher Studio and it is seriously impacting our use of the platform.

@charliesantos
Copy link
Collaborator

Hi @jayphawk , thanks for the ping.

The problem initially seems to be caused by this exception for Chrome screen sharing made in connection with this previous issue. Skipping this exception resolves the problem in full.

Per your comment above, please confirm that the issue is no longer reproducible if we remove that check. See below:

if (maxBitrate === null || maxBitrate === 0) {
  removeMaxBitrate(params);
} else {
  setMaxBitrate(params, maxBitrate);
}

We have to investigate and do some testing to make sure removing that check will not cause any other issues.

@jayphawk
Copy link

Thanks for the reply, @charliesantos.

That seems to be the case from what I can tell, but I cannot take credit for the testing and documentation that @steffentchr provided. Thanks.

@charliesantos
Copy link
Collaborator

Thanks for the reply, @charliesantos.

That seems to be the case from what I can tell, but I cannot take credit for the testing and documentation that @steffentchr provided. Thanks.

@steffentchr please confirm.

@steffentchr
Copy link
Author

@charliesantos Yes, modifying the check resolves the problem. I have just retested this the master version of the SDK.

For full visibility, some notes on the testing:

git clone [email protected]:twilio/twilio-video.js.git
cd twilio-video.js 
npm i && npm build:quick

After this I used the dist/ bundle with my reproduction code linked above.

Current master exhibits the problem:
Screenshot 2021-11-19 at 15 10 31

I removed the special case check:
Screenshot 2021-11-19 at 15 01 12

With the change, the drop in quality is gone:
Screenshot 2021-11-19 at 15 15 25

@jayphawk
Copy link

Thanks @steffentchr!

@charliesantos, what else is needed to move this fix along? It would be brilliant to see this resolved after hovering for so long!

@Bumbolio
Copy link

@jayphawk We're not currently setting a max bitrate for the screen sharing tracks, so the solution above would not have any impact. If we start setting a max bitrate, it could allow for more bandwidth for other video tracks, but it would reduce the quality of the screen-sharing track. Twilio allows a maximum of 4Mbps of data for all video and audio tracks being received, so it's a careful balance between screen sharing and other video tracks. Currently, we set the screen-sharing track as having the highest priority, without the bitrate cap I could see this using up all the available bandwidth and reducing the quality of the other video tracks.

We'll look into this fix and experiment with setting a max bitrate. I have concerns that removing the JSDK-2557 patch could cause some users to not be able to do screen sharing at all in Chrome unless this bug has been fixed in Chromium.

@kasperkronborg
Copy link

What did come of the experiment with setting a max bitrate @Bumbolio?
And are there any update in when we can expect this to be fixed? If we had a date we could somewhat aim towards, that would provide us with a chance to start planning it into our internal roadmap for updating depended subsystems from the get-go.

@charliesantos
Copy link
Collaborator

Hi Everyone, we are still evaluating the right way to fix this. While @steffentchr 's suggestion may work, we're afraid it will re-introduce some of the old issues. We are also addressing other higher priority items right now while working through this. Please bear with is in the meantime.

@steffentchr
Copy link
Author

@charliesantos Appreciate the answer on status, and as customers we certainly feel the pain of other issues. Having said that, this is a confirmed issue affecting production use across your customers, which was originally reported against 2.8.x. We're now at 2.18.x and a year later, so would cherish any action taken to move us forward.

@akashdusky
Copy link

To be frank, I spent time on providing the cleanest possible reproduction case rather than debugging the example app ;) I can categorically say that the problem appears with the Twilio Video SDK, but haven't tested when that SDK is soaked in other code.

@jayphawk may be able to add additional information around how he's observing the problem.

actually is also happen in ION SFU SDK

@steffentchr
Copy link
Author

@makarandp0 I see that considerable work was done on adaptive streaming and network management at 86924be.

Is it about time to review this one again? Or at least to do away with the JSDK-2557 exception as proposed above?

@steffentchr
Copy link
Author

@charliesantos @makarandp0 For good measure I retested the bug and reproduction case linked above, and the problem remains:

Screenshot 2022-02-14 at 12 34 22

@charliesantos
Copy link
Collaborator

Thanks for the ping @steffentchr . What version of the sdk did you perform your most recent testing?

@steffentchr
Copy link
Author

@charliesantos The testing is with the 2.20.0 version directly from the CDN.

I don't know if this adds additional clarity to the issues, but this is a quick recap of throughput before, during, and after a screen sharing track is added:
Screenshot 2022-02-14 at 21 41 15

We see this issue in monitoring across customers with qualityLimitationReason reported as bandwidth.

We did some additional testing today, and potentially the bug isn't present on a wired network connection (we tested and saw the problem on multiple wifi connections; but couldn't reproduce on the single ethernet connection available, so take with a grain of salt). It isn't limited to a few network though, as we have seen the problem across thousands of Twilio rooms since the initial report. Crucially, we also tested the same pattern (adding a video and a screen sharing track) with Google Meet did not see a quality decline under the same network conditions.

Just let me know if you need any additional information; as you can tell, I'm anxious to have this issue resolved.

@charliesantos
Copy link
Collaborator

Thanks @steffentchr . Another question, are you seeing this issue on both group and P2P rooms?

@steffentchr
Copy link
Author

I the original issue report, I noted that this was only the case for group room, not p2p. But hang on, and I'll quickly confirm this....

@charliesantos
Copy link
Collaborator

@steffentchr please post the room sids here as well. Thank you!

@steffentchr
Copy link
Author

steffentchr commented Feb 14, 2022

Okay, I have just run a quick test in RMaf53102f1286e6d387294b84143596a4 with type=peer-to-peer. I repeated a few times using the same room and getting the same result, but still take with a grain of salt.

The short version is that the problem doesn't show up on the p2p room. The video bandwidth dipped a few percent while the screen sharing track was on, but nothing near what we see in the group room -- and throughout the availableOutgoingBitrate remains steady:

Screenshot 2022-02-14 at 23 22 36

Screenshot 2022-02-14 at 23 23 02

@steffentchr
Copy link
Author

For the fun of it, I did a quick reproduction in a group room, RMd8c38f31c6ea4397353af8aac788a7a1. The code for the clean reproduction case has also been updated to use for 2.20.0

@charliesantos
Copy link
Collaborator

@steffentchr we didn't see any participant joined the p2p room in your example RMaf53102f1286e6d387294b84143596a4
Can you please run an example with participants in the room?

@steffentchr
Copy link
Author

steffentchr commented Feb 15, 2022

I've run a few tests, but to make sure there's a participant joining in the p2p room the code is slightly more complex that the repro case linked above. For control, I ran a group room (RMb178eb6f46a535b9ed1badb9af404d06), which clearly shows the patterns. I also ran two tests in p2p rooms (RMda2cbd0a51b83a5b3f4dee23b3cca707 and `RMc7efaac579d116bb6ebe7f11e68de7ab).

I draw two conclusions from the tests:

  • Yes, the problems appears in P2P rooms.
  • But for whatever it's worth, the problem is less immediate in the two tests: In the group room, the quality drops happens after 1 minute, and in the p2p room it takes 2 minutes. The available bitrate is bottoms out at 3M in the p2p rooms, and under 2M in the group room (this could also be just a delay though).

I'm attaching similar visualization to above to keep everything slightly comparable.


Screenshot 2022-02-15 at 21 22 17


Screenshot 2022-02-15 at 21 22 22


Screenshot 2022-02-15 at 21 22 31

@charliesantos
Copy link
Collaborator

Hi @steffentchr thanks for providing more details. I spent some more time tracking this down but unfortunately, I'm not able to reproduce the issue.

Looking at the room sids you provided, video sdk is still at 2.18.3, chrome is at version 86, and there are no participants in the p2p room. I understand it is harder for you to test participants joining a p2p room due to the complexity of your setup as you've mentioned, but it will really help us narrow down the issue if we can get logs for both p2p and group rooms with participants joining.

I will summarize my requests. Please provide as much as possible.

  • room sids for both p2p and group rooms
  • update and test the latest version of sdk
  • test on the latest version of chrome (v98+)
  • please try on safari and firefox and let us know if it's also reproducible on both

@steffentchr
Copy link
Author

Alright, I've spent some time compiling everything for comparison. These charts are drawn up using nothing but the reproduction code now available from https://github.com/steffentchr/twilio-issue-1306. Code is vanilla against the 2.20.0 SDK. Please refer to the code to be able to consistently reproduce this as well.

For this overview, I've focused on availableOutgoingBitrate, and I've used Chrome 98 to broadcast and Firefox to receive the stream. I have included a Google Meet example of the same test case, even if it's a bit apples-and-oranges.

Screenshot 2022-02-21 at 16 04 45

How I read the findings:

  • In peer-to-peer rooms, everything works perfectly (RMbb2cc9d060dc3062d9fce090069d0af0 is a wireless example)
  • On an ethernet connection, everything works perfectly (RM44b763c7dbab95161cba0a015a11b185 is a group room, wired example)
  • On a wifi connection in a group room, the bandwidth seems to be managed much more aggressively (RM3fb67d61575862687c1a8d80646e79b0 is an example without screensharing)
  • Google (I guess) does some dynamic throttling, but on the Twilio test case add the screen sharing track flatlines from ~6M to <1M -- and then recovers immediately after the screen sharing tracks is removed (RM2303e58152b7539836690489af2f94c6 is the main example of the problem).

I also tested with Firefox broadcasting in a group room on wifi (RM472ce19fc6a8f2e77437d8f851fa933d). I don't have the same nice charts, but this is available outgoing bitrate at 5.35M before screen sharing starts:
Screenshot 2022-02-21 at 16 37 22

The bandwidth remains stable while the screen sharing track is present, for example here at 5.66M two minutes after it's been added:
Screenshot 2022-02-21 at 16 39 48

Finally, I tested with in Chrome with only a single screensharing track. In a group room on wifi (RM9339d0d8d3472ff54db87bcb8311419b) the outgoing bitrate is capped to 30000 bytes, which seems crazy on its own.

To summarize: Something is off in Chrome+group room+wifi when a screen sharing track is added. Quality plummets while the screen sharing track is on, and recovers after the screen track is removed.

@charliesantos
Copy link
Collaborator

Thanks @steffentchr for the detailed information. We'll investigate further.

@charliesantos
Copy link
Collaborator

Hi @steffentchr , the data you provided and how you presented it really helped a lot! Thanks so much!
To investigate further, and since you're able to easily reproduce it, we would like to ask if you can run additional tests when you get a chance.

For each test, please use the same configuration you have with some minor changes. All using group rooms and ignore p2p rooms.

Test 1) same test with simulcast turned off, also collect the available incoming bitrate from the receiver (subscriber) side
Test 2) same test with simulcast turned off, and no receiver on the other side. That is, just one participant publishing with no subscriber.
Test 3) same as test#1 but with simulcast turned on
Test 4) same as test#2 but with simulcast turned on

@charliesantos
Copy link
Collaborator

@steffentchr , can you also please submit a support ticket and mention this github issue? We might need some account related information from you.

@steffentchr
Copy link
Author

steffentchr commented Feb 23, 2022

@charliesantos No problem, certainly reproduction is easy.

This issue was filed support on December 4th 2020, ticket id #5500109.


This should cover the suggested test cases. Note that all the tests above are with {simulcast:false}, but that we usually use simulcast in our app, and have seen the issue consistently there.

This is also reflected in the test, with the issue present in all four test cases:
Screenshot 2022-02-23 at 22 59 16


I know it's clear from the description above that the problem here is specifically present when a screen sharing track is added.

In fact, if I add two video track, the available bandwidth goes up:

Screenshot 2022-02-23 at 23 03 14

@steffentchr
Copy link
Author

Another way of showing the same effect on how this bug is explicitly related to handling of Chrome screensharing, is to pipe the screensharing through an OffscreenCanvas -- and see how the problem magically disappears, with the same content and conditions:
Screenshot 2022-02-24 at 14 24 10

The second example on the illustration above adds a NothingProcessor video processor:

screen.addProcessor(new Twilio.VideoProcessors.NothingProcessor());

@IvanKalachev
Copy link

Hello,

We have exactly the same problem. In our app the user can have up to two video feeds and one screen share. When only one or two of the video feeds are active everything seems OK, but when the screen share track is added then the quality of the video feeds drops significantly. We are using group rooms without bandwidth profile and without simulcast.

We've implemented the proposed workaround but it seems that it has no effect. Should we have to put maxVideoBitrate on connect for workaround to take effect or should set the bitrate only for the screen share track and how?

@charliesantos
Copy link
Collaborator

Thanks @steffentchr for the detailed results! We're still reviewing and we'll update this space once we have more information.

@IvanKalachev , thanks for the report. We're still investigating this issue.

@nicopernas
Copy link

@steffentchr I'm trying to repro your scenario but I'm not able to. Any chance you could share a full WebRTC internals dump with us? It would be fantastic if you could also enable Chrome's logging like so --enable-logging=stderr --v=1 --vmodule=*webrtc*=9 and share those on too. Thanks.

(I can email you directly if you are not comfortable sharing those here)

@steffentchr
Copy link
Author

@nicopernas Of course, I'm surprised though that reproduction is proving elusive -- anyways screenshot and Chrome log attached for RM4e8fe603aa779794952fa9bc12df0596:

Chrome debug log: https://we.tl/t-ZAUKt8ERT3

Screenshot 2022-04-25 at 17 45 22

@nicopernas
Copy link

@steffentchr would it be possible to attach the full dump though? You can get it from here Screenshot from 2022-04-26 12-18-14

@steffentchr
Copy link
Author

Sure, RMca9e3921ea40d81b6c93199e71aeda8e and this timing:
Screenshot 2022-04-26 at 12 36 25

webrtc internals.zip

@nicopernas
Copy link

Thanks for sharing. Here's a plausible explanation to what you are seeing with a weird twist at the end.

This picture here shows the available outgoing bitrate and "bytes sent (in bps)" on the first plot as well as the reported packet loss on the camera feed track in a second plot. I added the vertical line to visualize where the screen share track started.

webrtc-internals-from-customer

The key thing to notice is how the packet loss trends up constantly, just when the bandwidth estimation plummets. A clear explanation for that would be that indeed your network is congested so in order to prevent packet loss, the backend tells the browser to "start sending less bytes". When it does and the bandwidth estimation decreases (i.e. available outgoing bitrate) the packet loss is gone.

The following data was taken from our monitoring infrastructure. We can see the REMB values (i.e. bandwidth estimation) sent by the twilio backend to your browser. This is only to show that it is us telling your browser to "send less bytes"
backend-remb

The thing that still bothers me is why does this happen only when the screen share is added. So far, I hadn't been able to repro your scenario but having seen the increasing packet loss in your data gave me an idea: I took my laptop and went outside to the terrace (far away from my home's router). It only took me one try to get an exact repro.

webrtc-internals-repro7

The important part from my repro is that there's very little to no packet loss being reported on the camera feed track.

From the repro, I collected extra data that I couldn't get from you so we might get lucky now :)

@steffentchr
Copy link
Author

Hi Nicolás -- great to hear that were able to reproduce, and it's quite fascinating if there's a different in packet loss on the two different feeds, going over the same connection.

From the write-up is that I agree with the assumption that this is related to network congestion, but it's worth adding that I don't believe it to be my network -- rather this is pretty common. We log webrtc stats from rooms every ten seconds and I looked at the last ~600k reports:

  • With just a video feed, 8% of reports have limited available bandwidth (<1.5 mbps)
  • With both a video feed and screen feed, that number jumps to 21%.

Anyways, great that you have the data -- so please keep us updated as you work the problem.

@nicopernas
Copy link

@steffentchr have you got a testing account with us? I'd like to enable something in the backend for you to test. Could I use the same account behind room RMca9e3921ea40d81b6c93199e71aeda8e?

@steffentchr
Copy link
Author

Hi Nicolás -- please use ACfff0fa30c3504381fbf5e5bc709f1820 for test. (I believe it already has a slightly different congestion algorithm enabled, so please double check that there's not a clash there. Either way, we're excited to test.)

@nicopernas
Copy link

@steffentchr can you please re-test the same scenario using the ACfff0fa30c3504381fbf5e5bc709f1820 account and share the room sid?

@steffentchr
Copy link
Author

I don't know @nicopernas -- I think our test account has an extremely aggressive congestion control algo turned on, which makes everything pretty erratic. Any chance of turned it off for me to run a clean test?

For reference, these are the results in RMbaf6da5e45584324333c1cd8e54cc2e0:
Screenshot 2022-05-02 at 17 54 21
As you can tell, everything goes a bit in all direction, but the key take away is maybe that there are short-lived peaks -- and cases where the bandwidth bottoms out at 200k with screensharing on.

@steffentchr
Copy link
Author

Any news @nicopernas ? 😄

@nicopernas
Copy link

Apologies for the silence. What you are seeing there is the browser's congestion control going haywire. If you look at the browser debug logs, you'll see Chrome's probing mechanism "saying that it can't seem to receive that much bandwidth", thus causing the bw estimation to drop. This seems to happen only when a screen share track is published along with a "normal" video track. Note though that the original repro you got was using the "legacy" congestion control algorithm (that runs on the receiver) that we are currently trying to phase out, so even if we fixed that scenario it won't be of a lot of help. Hope that makes sense.

Since this seems to be happening only with Chrome, we are reaching out to their team so they can help us understand what's going on. I'll keep you posted.

@xtianjohns
Copy link

@nicopernas or @makarandp0, is there any update from Chrome on this problem? Is there a bug opened with them where we might be able to track progress or see similar complaints from other folks doing screen sharing over WebRTC in Chrome?

@charliesantos
Copy link
Collaborator

Hey @xtianjohns here's a chrome bug related to this issue https://bugs.chromium.org/p/webrtc/issues/detail?id=14051

@steffentchr
Copy link
Author

@charliesantos Great to see that this is being escalated and being looked at. The conversation on the chrome bug is ~6 weeks old by now though and seems to be pointing to Twilio configurations rather than something within the Chrome code base. What's the next step? Anything we can do to help this along?

@charliesantos
Copy link
Collaborator

Thanks for the ping @steffentchr . We're currently looking at this. We'll let you know if we need additional information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug filed A bug has been filed upstream for this issue Chrome Issues related to Chrome Group Issues related to Group Rooms
Projects
None yet
Development

No branches or pull requests

10 participants