-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Youtube videos are not working anymore #323
Comments
It looks like Youtube player has changed and is now issuing range requests. Old URLs looked like this:
New URLs looks like this:
Among other changes, there is a new Transferring this issue to warc2zim repo. |
Unfortunately, the new It also looks that we should take into account two different kinds of dataset, one with an What we have in the ZIM:
First requests issued are ok:
But then for some reason, the player seems to try to fetch less bytes than on real website, and then realizing these calls doesn't work it tries to fetch even fewer bytes (which do not work either):
And it stays stuck trying to fetch these bytes, cycling forever. It might be noteworthy that just before these "changed" ranges, there are two requests which are failing with a 404 while in real situation they return a 204. This is expected because warc2zim does not support 204. I don't think there is any relation because these calls do not happen at exactly the same moment on real website, so probably not related.
Other calls are also occurring on after the failures, but they are probably less interesting to analyze. |
Returning 200 empty response on calls to www.youtube.com/api/stats/* and www.youtube.com/ptracking* does not help at all (originally it is supposed to be 204, maybe that's the problem). |
Can we "force" YT to behave the previous way with a user agent maybe ? |
would be a workaround not a fix obviously |
I spent some time trying to figure out if there is a new switch in player configuration which might be used to disable the new behavior and go back to previous one, but I failed so far.
This seems to be a promise track, on iOS 13 we still have the "old" mode while on iOS 17 we have the "new" mode. Trying to figure out if I achieve to find the "magic switch" |
Player code is composed mostly of 1 HTML and 3 JS files are purely identical between iOS v13 and v17:
https://www.youtube.com/youtubei/v1/player?prettyPrint=false : (edit: JSON with) lot's of different things, but nothing looking like a switch, mostly only URLs which are different due to tracking https://www.youtube.com/embed/hQXa6TkSeH0 : this is the HTML, and seemed to be the most promising file, with lots of differences in the All these modifications were done successfully ... but the player behavior is still the same, it adds range query parameter in the calls to So the more likely is that detection is done client-side on how to call the |
I struggle to produce again the old behavior with the crawler. For the record:
With all this, I'm now mostly stuck: I do not achieve to exhibit the old behavior inside the crawler, and I fail to find which switch / modification is needed in Youtube player configuration / codebase to exhibit the old behavior. Of course, I may have missed something (it would be a good news indeed). |
@benoit74 If the whole mp4 has actually been scraped and is in the ZIM, then I think there is hope. A bit of brainstorming, in order of viability:
As Range Requests are also involved in solving kiwix/kiwix-js#1256, I'll do some investigating as well. I think the test ZIM has disappeared, though. Do we have one to work on that is displaying the 4-second bug? |
Thank you!
There is always hope in IT ^^ Sometimes it demand big efforts. The whole video is indeed present inside the WARC. It is not yet correctly transferred to the ZIM (but this is mostly a detail).
WARC is valid (it represent with high fidelity what happens in the browser) BUT not replayable with replayweb.page because wabac suffers from the same problem as warc2zim
We do not use upstream RegEx / fuzzy rules. I've modified the fuzzy rules (by adding the range + aitags distinction) so that everything is inside the ZIM. Unfortunately this is not sufficient because the player dynamically adjust the range requested.
There is no more real Range Requests like few days ago (with Range header, 206 return code, ...). There is just "normal" GET request with 200 return code. And the range is specified as a query parameter, the browser and all other systems have no idea it is in fact a range request.
Again, no more range requests. I tried to find a magic switch in player configuration to restore these Range requests which were working few days ago, no luck so far.
We always refused to go this way, I don't think this is going to change, but good to mention as a nuclear solution.
I will update the issue with links in few minutes. |
So this issue is highly relevant for @ikreymer too, as a fix for YouTube video replay in Replayweb.page might well be applicable or adaptable here? |
I've uploaded ZIM and WARC to https://tmp.kiwix.org/ci/test-warc/100r.co/ Both ZIMs have been built right now we the same 2.0.2 warc2zim.
Yep, I just waited a bit to have enough information before opening an issue in their repo, to confirm I wasn't reporting false information and to avoid duplicating effort if the fix was "achievable" in few hours by me (they do sufficient efforts for us, I don't mind to help them a bit when possible). It's now time to open the ticket. |
Nota: test WARC and ZIM have only the https://100r.co/site/orca.html page to be a lightweight as possible, so do not try to navigate outside of the page ^^ |
I've downloaded those files, and I corroborate (via testing in the Kiwix PWA and with replayweb.page) that the "new" versions of both the ZIM and the WARC are unable to play the Orca video, while the "old" versions of both can play the Orca video. |
Problem is fixed in Browsertrix Crawler 1.2.0, see webrecorder/wabac.js#182 Transferring this back to zimit, nothing needs to be done in warc2zim |
That's good news. I'm glad a simple upgrade resolves it. |
Yep, me too. But it is important to note that's only because webrecorder team was capable to quickly consider the issue and find and implement a workaround (and it is only a workaround, not sure it will work on the medium term, we still have a pretty big concern on this new player, they just forced it to fallback to previous "mode") |
Ah, thanks for the explanation. Well at least it buys some time to find a longer-term solution... |
Well, it was always a workaround, it was doing the same thing, just in a different way - it was explicitly disabling DASH via rewriting in JS, now its done by disabling |
Again, Youtube videos are not working anymore.
Problem is different than previous times.
Test has been done with current codebase (2.0.2) and a single page at https://100r.co/site/orca.html
Issue is not limited to 100r.co website, same problem has been observed on test website for instance.
Youtube video is correctly playing a bit better on replayweb.page even with recent WARC, but only for the first 3 or 4 secs of the video. And fuzzy rule seems to not be properly applied at all here, it loads URLs like
https://replayweb.page/w/id-91fd65096030/20240620091901mp_/https://rr3---sn-ixh7yn7e.googlevideo.com/videoplayback?expire=1718896744&ei=CPRzZpLcFfWPv_IPjvSygAo&ip=135.181.181.97&id=o-AL0BCMRMuaikI70Sw1cEvsUP0llpXaRbt2RUG1K9pUl0&itag=243&aitags=133,134,135,160,242,243,244,278&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ==&mh=kw&mm=31,26&mn=sn-ixh7yn7e,sn-ugo7dn76&ms=au,onr&mv=m&mvi=3&pl=17&initcwndbps=458750&bui=AbKP-1Mo9GjiRwRL0ikYYxkrsoWyL4xu850dxEbOX2h0DN0GFKnazM15XVrw7NHp2_4QcfFEvMbTM4Kp&spc=UWF9f5U2kJlHxL16IKZBjf75XUuEd1D1UAZ7taixAMll90pghSMp&vprv=1&svpuc=1&mime=video/webm&ns=0J7ZUEUnZavF_9CYyLqdyIoQ&rqh=1&gir=yes&clen=1005408&dur=75.242&lmt=1636152001175039&mt=1718874771&fvip=4&keepalive=yes&c=WEB_EMBEDDED_PLAYER&sefc=1&txp=5311222&n=hMSqp81HO-ISEA&sparams=expire,ei,ip,id,aitags,source,requiressl,xpc,bui,spc,vprv,svpuc,mime,ns,rqh,gir,clen,dur,lmt&sig=AJfQdSswRQIgRtqyjdlrY2MlyHCQKOG0GqLCG8_iwSmp9RmCzLI8dZ4CIQCXgGClxMhvR1jhejK7i9fG3E6dsjNSmb3WX88lva7fNQ==&lsparams=mh,mm,mn,ms,mv,mvi,pl,initcwndbps&lsig=AHlkHjAwRQIgPb5Nsm9l8o1yBGPu3Y9OIr-UIZvJTY09UNOyLDM8-70CIQCty0yVKGQsCtS7GBE5FeDwqkiWEh3nSbVkrGY3y4bftw==&alr=yes&cpn=M9Gnq1A5TBTMVkJn&cver=1.20240616.00.00&range=197056-262591&rn=15&rbuf=3032&pot=KoEQBFBNRDpVbmRlZmluZWQ6Ok9xQGh0dHBzOi8vcmVwbGF5d2ViLnBhZ2Uvdy9pZC05MWZkNjUwOTYwMzAvMjAyNDA2MjAwOTE5MDFtcF8vaHR0cHM6Ly93d3cueW91dHViZS5jb20vcy9wbGF5ZXIvODQzMTRiZWYvcGxheWVyX2lhcy52ZmxzZXQvZW5fVVMvYmFzZS5qczo5OTc6MTMxCldnYUBodHRwczovL3JlcGxheXdlYi5wYWdlL3cvaWQtOTFmZDY1MDk2MDMwLzIwMjQwNjIwMDkxOTAxbXBfL2h0dHBzOi8vd3d3LnlvdXR1YmUuY29tL3MvcGxheWVyLzg0MzE0YmVmL3BsYXllcl9pYXMudmZsc2V0L2VuX1VTL2Jhc2UuanM6MTAwOTozNQpZZ2EvPEBodHRwczovL3JlcGxheXdlYi5wYWdlL3cvaWQtOTFmZDY1MDk2MDMwLzIwMjQwNjIwMDkxOTAxbXBfL2h0dHBzOi8vd3d3LnlvdXR1YmUuY29tL3MvcGxheWVyLzg0MzE0YmVmL3BsYXllcl9pYXMudmZsc2V0L2VuX1VTL2Jhc2UuanM6MTAxMjozODkKSWFAaHR0cHM6Ly9yZXBsYXl3ZWIucGFnZS93L2lkLTkxZmQ2NTA5NjAzMC8yMDI0MDYyMDA5MTkwMW1wXy9odHRwczovL3d3dy55b3V0dWJlLmNvbS9zL3BsYXllci84NDMxNGJlZi9wbGF5ZXJfaWFzLnZmbHNldC9lbl9VUy9iYXNlLmpzOjE2OTo0MAppYWEvdGhpcy5uZXh0QGh0dHBzOi8vcmVwbGF5d2ViLnBhZ2Uvdy9pZC05MWZkNjUwOTYwMzAvMjAyNDA2MjAwOTE5MDFtcF8vaHR0cHM6Ly93d3cueW91dHViZS5jb20vcy9wbGF5ZXIvODQzMTRiZWYvcGxheWVyX2lhcy52ZmxzZXQvZW5fVVMvYmFzZS5qczoxNzA6OTIKYkBodHRwczovL3JlcGxheXdlYi5wYWdlL3cvaWQtOTFmZDY1MDk2MDMwLzIwMjQwNjIwMDkxOTAxbXBfL2h0dHBzOi8vd3d3LnlvdXR1YmUuY29tL3MvcGxheWVyLzg0MzE0YmVmL3BsYXllcl9pYXMudmZsc2V0L2VuX1VTL2Jhc2UuanM6MTc0OjQwCnByb21pc2UgY2FsbGJhY2sqZkBodHRwczovL3JlcGxheXdlYi5wYWdlL3cvaWQtOTFmZDY1MDk2MDMwLzIwMjQwNjIwMDkxOTAxbXBfL2h0dHBzOi8vd3d3LnlvdXR1YmUuY29tL3MvcGxheWVyLzg0MzE0YmVmL3BsYXllcl9pYXMudmZsc2V0L2VuX1VTL2Jhc2UuanM6MTc2OjkxCnByb21pc2UgY2FsbGJhY2sqZkBodHRwczovL3JlcGxheXdlYi5wYWdlL3cvaWQtOTFmZDY1MDk2MDMwLzIwMjQwNjIwMDkxOTAxbXBfL2h0dHBzOi8vd3d3LnlvdXR1YmUuY29tL3MvcGxheWVyLzg0MzE0YmVmL3BsYXllcl9pYXMudmZsc2V0L2VuX1VTL2Jhc2UuanM6MTc2OjEwMQpwcm9taXNlIGNhbGxiYWNrKmZAaHR0cHM6Ly9yZXBsYXl3ZWIucGFnZS93L2lkLTkxZmQ2NTA5NjAzMC8yMDI0MDYyMDA5MTkwMW1wXy9odHRwczovL3d3dy55b3V0dWJlLmNvbS9zL3BsYXllci84NDMxNGJlZi9wbGF5ZXJfaWFzLnZmbHNldC9lbl9VUy9iYXNlLmpzOjE3NjoxMDEKcHJvbWlzZSBjYWxsYmFjaypmQGh0dHBzOi8vcmVwbGF5d2ViLnBhZ2Uvdy9pZC05MWZkNjUwOTYwMzAvMjAyNDA2MjAwOTE5MDFtcF8vaHR0cHM6Ly93d3cueW91dHViZS5jb20vcy9wbGF5ZXIvODQzMTRiZWYvcGxheWVyX2lhcy52ZmxzZXQvZW5fVVMvYmFzZS5qczoxNzY6MTAxCnByb21pc2UgY2FsbGJhY2sqZkBodHRwczovL3JlcGxheXdlYi5wYWdlL3cvaWQtOTFmZDY1MDk2MDMwLzIwMjQwNjIwMDkxOTAxbXBfL2h0dHBzOi8vd3d3LnlvdXR1YmUuY29tL3MvcGxheWVyLzg0MzE0YmVmL3BsYXllcl9pYXMudmZsc2V0L2VuX1VTL2Jhc2UuanM6MTc2OjEwMQpwcm9taXNlIGNhbGxiYWNrKmZAaHR0cHM6Ly9yZXBsYXl3ZWIucGFnZS93L2lkLTkxZmQ2NTA5NjAzMC8yMDI0MDYyMDA5MTkwMW1wXy9odHRwczovL3d3dy55b3V0dWJlLmNvbS9zL3BsYXllci84NDMxNGJlZi9wbGF5ZXJfaWFzLnZmbHNldC9lbl9VUy9iYXNlLmpzOjE3NjoxMDEKcHJvbWlzZSBjYWxsYmFjaypmQGh0dHBzOi8vcmVwbGF5d2ViLnBhZ2Uvdy9pZC05MWZkNjUwOTYwMzAvMjAyNDA2MjAwOTE5MDFtcF8vaHR0cHM6Ly93d3cueW91dHViZS5jb20vcy9wbGF5ZXIvODQzMTRiZWYvcGxheWVyX2lhcy52ZmxzZXQvZW5fVVMvYmFzZS5qczoxNzY6MTAxCnByb21pc2UgY2FsbGJhY2sqZkBodHRwczovL3JlcGxheXdlYi5w&ump=1&srfvp=1
This kind of URLs are supposed to be rewritten with a fuzzyrule.
The text was updated successfully, but these errors were encountered: