-
-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable injection of javascript content coming from ZIM files in jQuery mode #152
Comments
The PhET Simulations ZIM file is both small and relies on Javascript to render and use the exercises. It might be a useful testbed to test the support of JavaScript in kiwix-html5... |
@mossroy Were the JavaScript issues due to the decompression engine hanging, or were they to do with the interaction of JavaScript with the loaded page or app operation? If hanging, then it was probably a similar issue to overloading the engine with SVG decompression or even CSS decompression. As plain-text files are highly compressed, the engine deals poorly with decompressing several things thrown at it at once. I'm wondering if it will be feasible to re-enable JavaScript injection with a bit of pipeline management. |
It would probably be worth testing again : I don't remember exactly what were the symptoms, and they might have disappeared (if we're very lucky), with the improvements of recent browser engines. In any case, it will be more difficult in jQuery mode, because, unlike images and CSS, the injection timing of javascript code can sometimes be important. For example, some javascript code is sometimes launched on the "load" event of the page, because the javascript code of the "head" section should be ready at this time. If we don't inject the javascript at the right time, these javascript calls might fail. |
After a quick test, I managed to enable javascript in ServiceWorker mode, and it does seem to solve #355. |
In jQuery mode, I manage to inject the javascript content in the src tag, but it does not seem to be executed by the browser engine. I don't know if there's a way to ask for that. A solution that should work is to inject the javascript content inside the html string, before putting it in the DOM. I don't like that, as we have to use regular expressions to find the src tags. But it's probably the only way to keep the execution order of javascript. |
Maybe we can extract the JavaScript file content from the ZIM and make it into a blob, like we do with images and then inject them into the document in the right order? I do this with stylesheets already in Kiwix JS Windows, i.e. I extract the code, store it in the cache, and then insert the code as a blob file. I haven't tried it yet with JavaScript, but this is how I do it with CSS:
I'm pretty sure the DOM will execute a JavaScript URL injected using DOM methods even after the document has already loaded. I'd be happy to do some tests when I get some time, maybe just with dummy content to see if the principle works and on which systems. |
A consideration in working out a strategy for this: I believe we probably must delay the extraction of JavaScript from the ZIM until after the main page has loaded. The newer ZIMs have no fewer than six scripts for each page:
These are located just before the close body tag, so they are intended to be run after first paint. But more seriously for us, unless the scripts are stored uncompressed in the ZIM, we have little hope of extracting all of that with our current engine before we display the HTML and maintaining an acceptable user experience on lower-spec hardware. Have you been able to determine if these are as highly compressed as CSS is (which is extremely slow to extract on lower-spec hardware)? Unless we're very lucky with them, I think we should aim to extract them with or after images, and inject them into the DOM in the right order once we've gathered them all. We can probably inject them with element-creation and appending DOM methods, possibly as script blocks or possibly as blobs. |
I did not check, but it seems very probable that these files are stored compressed : in terms of ZIM file size, it would be a waste not to compress these textual files. |
Commit 0b79404 on the It's experimental but working generically with the supplied mw-offliner scripts and events. It's probably a long way from merging, but hopefully can be tested, improved, etc. It's all done with DOM methods. It can be applied to jQuery mode and ServiceWorker (untested) depending on the extent to which ServiceWorker mode is affected by the CSP. To prevent potential performance issues on large documents, I limit the number of events searched for, with a note to DEV to add more if needed. I also limit the number of tags to look inside (with querySelectorAll). The test code is running with events The most important thing here is that, even if we end up not needing to support inline events and scripts (though PhET may well have some), the methods used for attaching and running scripts here can easily be applied to scripts taken from the ZIM, if the store CSPs support running scripts from blobs. |
The latest commits (b67c5da and subsequent) radically simplify the procedure, and make it fully generic by using regexes. It actually turned out to be a lot easier than I thought. No longer any need to specify which tags or which events. The code has a function for attaching found functions with a single Blob that contains all functions. The next step is extracting JavaScript and attaching to the iframe, but it must be done after images and css are extracted, so we need to implement some form of queue. Any ideas gratefully received. The most basic method is to check the total number of images and keep count as each loads. But there must be better methods of queue control with promises... |
One minor bug I found : on wikivoyage_en_all_novid_2018-03.zim , click on Europe, then on any link, then use the button to go back to Europe : it triggers the following error in the console : That's really cool, and seems to work on current wikimedia ZIM files. But it does not work in every context : it seems to work in a Chrome extension, but not in a Firefox extension, which does not seem to allow javascript as blob. See at the end of https://developer.mozilla.org/Add-ons/WebExtensions/manifest.json/content_security_policy. Inline javascript will give us a lot of difficulties with these various CSPs. |
Ah yes, I didn't actually test it with the back button. Thanks for pointing that out. Need some code to handle going back. On Web extensions in Firefox, the implication of that page seems to be that we would need to loosen the CSP by specifying the sript-src as "blob:" in the manifest. That page seems to imply developers have some control over that. I agree the best solution is to work with mwoffliner etc. to ensure future ZIMs don't use inline JavaScript. This is intended as a workaround for now, but partly also as a stepping stone on the way to full support for scripts in the ZIM. On the latter point, I was wondering how difficult it would be to run xzdec.js or the file-reader wrappers as a web worker. If I understood what @sharun-s was doing with workers, I believe he out-sourced looking up dir entries to workers, which is slightly different from running the decompression in the worker. I might have misunderstood, but that's what I got from looking through the code for his finder.js. Using a web worker to decompress specific (groups of) files would also be the reverse of the Service Worker model. @mossroy, please correct this, but this is what I understand is happening with Service Worker: it deals with the browser's requests for files and passes messages to the main app asking it to decompress things the browser needs, which are then passed back to Service Worker via postMessage. Using a Web worker would kind of be the opposite, no? We just hive off the heavy grunt work to another thread and wait for it to post the content back. Shouldn't this be relatively easy? Might it even be possible to do it with some switch in require.js (see http://requirejs.org/docs/api.html#webworker)? I'm just thinking practically on how we add decompression of JavaScript to an already over-taxed app.... |
Hi @mossroy , this turns out to be a blocker bug even on our current master. I've been doing some tests on the history.back issue, which only occurs in Firefox (I haven't tested Chrome). What happens is that our iframe.onload function gets fired by Firefox after it returns the iframe to its previous state. This has some bad consequences: we re-parse all the anchors, for example. As a result, even in master, at least on Firefox Quantum, none of the internal links work after going back to a previous page, presumably because they have conflicting event listeners added. Can you confirm? If you can, I'll add an issue for this, because it's a blocker. This doesn't happen in Edge, where everything works fine (Edge actually reloads the previous page from the ZIM, so it's obviously putting the DOM back to an earlier state than Firefox is doing and is re-running app.js, whereas Firefox only seems to run the iframe.onload event). |
That's right, I see that regression too (on Firefox 52 ESR, master branch) : please create a separate issue for that (for version 2.3). |
Regarding ServiceWorker and/or WebWorker :
|
NB : I think the require.js feature you mention only allows the use of require.js inside a WebWorker. It does not create automatically a WebWorker around our javascript files. |
I've now rebased this javaScript-support branch onto the latest fixes in #366 that inject the stripped class(es) from the html tag into the body tag. And as a result, capable clients (Edge, IE and Firefox in browser context, not tested on Chrome) can now open and close content underneath the h2 headings by clicking on them, using the routines supplied in the ZIM to do this. This is a proof of concept, and is useful in showing that the .innerHTML injection method is not incompatible with JavaScript support. |
The JavaScript support branch has now been rebased on the latest master, so it now includes support in both jQuery and SW modes. |
I've tested a bit, and it works pretty well on PhET (with some exceptions you mentioned) : congratulations! It's a very technical task you're working on, as you're handling things a browser should natively handle. |
I rebased the JavaScript Support branch to enable testing of @ISNIT0's polyfill script for browsers that don't support the |
As a separate issue, due to the way this branch loads CSS (I think), the landing page is not displayed well, and has to be resized for the images to be rendered correctly with the CSS. This needs a fix in this branch, but that's not the point of this rebase. |
Just FYI, this will be necessary if you want to take benefit of the soon to be released first ZIM JS API. |
@kelson42 I'm interested in this. Enabling JS support in jQuery mode is possible, and it works with non-complex scripts. It doesn't support complex things like ReactJS currently, as it is too complicated to inject all the required script blocks in a compatible way, and this is an issue with PhET ZIMs. But it may be possible to inject known scripts where the functionality is useful and if the API is fairly generic across ZIM types. Once we have a sample, I'd like to test the possibilities. |
I've rebased the JavaScript support branch on master. |
The JavaScript Support branch has been rebased in order to test #650 (the ZIMs are ZSTD encoded). |
I don't think we're going to provide even any rudimentary JS support in jQuery mode any more due to #727. While I did try to rebase the JS Support branch a couple of months ago, there are now so many inline scripts in the average ZIM (see #789 for a consequence of this), that things do not work well. We are also close to making Service Worker the default, which does not need these workarounds. |
The current Ray Charles ZIM file does not have any testable javascript content.
And it was causing issues in Firefox and Firefox OS, so I disabled the injection of javascript (both in jQuery and ServiceWorker modes)
So we have to find a test ZIM file that has some, and re-enable injection of javascript
The text was updated successfully, but these errors were encountered: