Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable injection of javascript content coming from ZIM files in jQuery mode #152

Closed
mossroy opened this issue Jan 8, 2016 · 34 comments
Closed
Assignees
Milestone

Comments

@mossroy
Copy link
Contributor

mossroy commented Jan 8, 2016

The current Ray Charles ZIM file does not have any testable javascript content.
And it was causing issues in Firefox and Firefox OS, so I disabled the injection of javascript (both in jQuery and ServiceWorker modes)

So we have to find a test ZIM file that has some, and re-enable injection of javascript

@julianharty
Copy link
Contributor

The PhET Simulations ZIM file is both small and relies on Javascript to render and use the exercises. It might be a useful testbed to test the support of JavaScript in kiwix-html5...

@mossroy mossroy removed this from the v2.2 milestone Jan 4, 2018
@mossroy mossroy added this to the v2.3 milestone Jan 31, 2018
@Jaifroid
Copy link
Member

@mossroy Were the JavaScript issues due to the decompression engine hanging, or were they to do with the interaction of JavaScript with the loaded page or app operation? If hanging, then it was probably a similar issue to overloading the engine with SVG decompression or even CSS decompression. As plain-text files are highly compressed, the engine deals poorly with decompressing several things thrown at it at once.

I'm wondering if it will be feasible to re-enable JavaScript injection with a bit of pipeline management.

@mossroy
Copy link
Contributor Author

mossroy commented Apr 8, 2018

It would probably be worth testing again : I don't remember exactly what were the symptoms, and they might have disappeared (if we're very lucky), with the improvements of recent browser engines.

In any case, it will be more difficult in jQuery mode, because, unlike images and CSS, the injection timing of javascript code can sometimes be important. For example, some javascript code is sometimes launched on the "load" event of the page, because the javascript code of the "head" section should be ready at this time. If we don't inject the javascript at the right time, these javascript calls might fail.
In ServiceWorker mode, it should work naturally, as the browser engine will handle this by itself.

@mossroy
Copy link
Contributor Author

mossroy commented Apr 11, 2018

After a quick test, I managed to enable javascript in ServiceWorker mode, and it does seem to solve #355.
I'll try to test more later, and try to enable it in jQuery mode (which might be harder)

@mossroy
Copy link
Contributor Author

mossroy commented Apr 11, 2018

In jQuery mode, I manage to inject the javascript content in the src tag, but it does not seem to be executed by the browser engine. I don't know if there's a way to ask for that.
I tried to use eval() to execute it instead : it works, but seems to be executed in the global context of the document. As a consequence, the jQuery script (coming from the ZIM file) tries to load and sees that it is already loaded (because kiwix-js also uses jQuery). This eval() should be run in the context of the iframe : I don't know if it's possible.

A solution that should work is to inject the javascript content inside the html string, before putting it in the DOM. I don't like that, as we have to use regular expressions to find the src tags. But it's probably the only way to keep the execution order of javascript.

@Jaifroid
Copy link
Member

Maybe we can extract the JavaScript file content from the ZIM and make it into a blob, like we do with images and then inject them into the document in the right order? I do this with stylesheets already in Kiwix JS Windows, i.e. I extract the code, store it in the cache, and then insert the code as a blob file. I haven't tried it yet with JavaScript, but this is how I do it with CSS:

var cssBlob = new Blob([content], { type: 'text/css' });
var newURL = [fileDirEntry.namespace + "/" + fileDirEntry.url, URL.createObjectURL(cssBlob)];
blobArray.push(newURL);
if (cssBlobCache) {
    cssBlobCache.set(newURL[0], newURL[1]);
} 
injectCSS();

I'm pretty sure the DOM will execute a JavaScript URL injected using DOM methods even after the document has already loaded.

I'd be happy to do some tests when I get some time, maybe just with dummy content to see if the principle works and on which systems.

@Jaifroid
Copy link
Member

A consideration in working out a strategy for this:

I believe we probably must delay the extraction of JavaScript from the ZIM until after the main page has loaded. The newer ZIMs have no fewer than six scripts for each page:

<script src="../-/j/js_modules/jsConfigVars.js"></script>
<script src="../-/j/js_modules/startup.js"></script>
<script src="../-/j/js_modules/jquery.js"></script>
<script src="../-/j/js_modules/mediawiki.js"></script>
<script src="../-/j/js_modules/site.js"></script>
<script src="../-/j/js_modules/ext.cite.a11y.js"></script>

These are located just before the close body tag, so they are intended to be run after first paint. But more seriously for us, unless the scripts are stored uncompressed in the ZIM, we have little hope of extracting all of that with our current engine before we display the HTML and maintaining an acceptable user experience on lower-spec hardware. Have you been able to determine if these are as highly compressed as CSS is (which is extremely slow to extract on lower-spec hardware)? Unless we're very lucky with them, I think we should aim to extract them with or after images, and inject them into the DOM in the right order once we've gathered them all. We can probably inject them with element-creation and appending DOM methods, possibly as script blocks or possibly as blobs.

@mossroy
Copy link
Contributor Author

mossroy commented Apr 13, 2018

I did not check, but it seems very probable that these files are stored compressed : in terms of ZIM file size, it would be a waste not to compress these textual files.
And I agree with you that it might give the same performance issues as with CSS.

@Jaifroid
Copy link
Member

Commit 0b79404 on the javaScript-support branch now adds generic support for converting inline JavaScript, both inside <script>...</script> tags and as inline events such as onclick="...", and onmouseover="..." to attached scripts and to addEventListener events. It works for browsers and CSPs that support running scripts from blobs. Full functionality in Edge, Firefox and Internet Explorer (untested in Chrome yet). In FFOS, it degrades gracefully, with no errors. This branch is rebased on top of our current PR for opening headings, so it starts with all headings open by default.

It's experimental but working generically with the supplied mw-offliner scripts and events. It's probably a long way from merging, but hopefully can be tested, improved, etc. It's all done with DOM methods. It can be applied to jQuery mode and ServiceWorker (untested) depending on the extent to which ServiceWorker mode is affected by the CSP.

To prevent potential performance issues on large documents, I limit the number of events searched for, with a note to DEV to add more if needed. I also limit the number of tags to look inside (with querySelectorAll). The test code is running with events ["click", "mouseover"] and with tags ["h2", "p", "a"] just for testing purposes, and I'm not seeing performance issues. Adding more is just a matter of editing the signalled arrays. I realize this is not fully satisfactory, but an even more generic search would require a fair amount of Regular Expression intervention before we inject the HTML, and may not have performance advantages over simply expanding the lists sensibly and using querySelectorAll. If you can think of a better solution, I'm all ears.

The most important thing here is that, even if we end up not needing to support inline events and scripts (though PhET may well have some), the methods used for attaching and running scripts here can easily be applied to scripts taken from the ZIM, if the store CSPs support running scripts from blobs.

@Jaifroid
Copy link
Member

Jaifroid commented Apr 29, 2018

The latest commits (b67c5da and subsequent) radically simplify the procedure, and make it fully generic by using regexes. It actually turned out to be a lot easier than I thought. No longer any need to specify which tags or which events.

The code has a function for attaching found functions with a single Blob that contains all functions.
There is also an alternative function that does the same but using the data: URI scheme. Both are provided in case one turns out to be more acceptable than another when it comes to testing CSP for extensions.

The next step is extracting JavaScript and attaching to the iframe, but it must be done after images and css are extracted, so we need to implement some form of queue. Any ideas gratefully received. The most basic method is to check the total number of images and keep count as each loads. But there must be better methods of queue control with promises...

@mossroy
Copy link
Contributor Author

mossroy commented May 3, 2018

One minor bug I found : on wikivoyage_en_all_novid_2018-03.zim , click on Europe, then on any link, then use the button to go back to Europe : it triggers the following error in the console :
13:02:23,100 [attachInlineFunctions] The specified functions could not be found in the content window! 1 uiUtil.js:130:17
attachInlineFunctions file:///path/to/kiwix-js/www/js/lib/uiUtil.js:130:17
parseEvents/< file:///path/to/kiwix-js/www/js/app.js:899:21
createScriptBlob/newScript.onload file:///path/to/kiwix-js/www/js/lib/uiUtil.js:65:17

That's really cool, and seems to work on current wikimedia ZIM files. But it does not work in every context : it seems to work in a Chrome extension, but not in a Firefox extension, which does not seem to allow javascript as blob. See at the end of https://developer.mozilla.org/Add-ons/WebExtensions/manifest.json/content_security_policy.

Inline javascript will give us a lot of difficulties with these various CSPs.
I think we should also discuss with the openzim team (mwoffliner, and the other ones), to see if it would be possible to generate the ZIM files without inline javascript at all.

@Jaifroid
Copy link
Member

Jaifroid commented May 3, 2018

Ah yes, I didn't actually test it with the back button. Thanks for pointing that out. Need some code to handle going back.

On Web extensions in Firefox, the implication of that page seems to be that we would need to loosen the CSP by specifying the sript-src as "blob:" in the manifest. That page seems to imply developers have some control over that.

I agree the best solution is to work with mwoffliner etc. to ensure future ZIMs don't use inline JavaScript. This is intended as a workaround for now, but partly also as a stepping stone on the way to full support for scripts in the ZIM.

On the latter point, I was wondering how difficult it would be to run xzdec.js or the file-reader wrappers as a web worker. If I understood what @sharun-s was doing with workers, I believe he out-sourced looking up dir entries to workers, which is slightly different from running the decompression in the worker. I might have misunderstood, but that's what I got from looking through the code for his finder.js.

Using a web worker to decompress specific (groups of) files would also be the reverse of the Service Worker model. @mossroy, please correct this, but this is what I understand is happening with Service Worker: it deals with the browser's requests for files and passes messages to the main app asking it to decompress things the browser needs, which are then passed back to Service Worker via postMessage. Using a Web worker would kind of be the opposite, no? We just hive off the heavy grunt work to another thread and wait for it to post the content back. Shouldn't this be relatively easy? Might it even be possible to do it with some switch in require.js (see http://requirejs.org/docs/api.html#webworker)?

I'm just thinking practically on how we add decompression of JavaScript to an already over-taxed app....

@Jaifroid
Copy link
Member

Jaifroid commented May 4, 2018

Hi @mossroy , this turns out to be a blocker bug even on our current master. I've been doing some tests on the history.back issue, which only occurs in Firefox (I haven't tested Chrome). What happens is that our iframe.onload function gets fired by Firefox after it returns the iframe to its previous state. This has some bad consequences: we re-parse all the anchors, for example. As a result, even in master, at least on Firefox Quantum, none of the internal links work after going back to a previous page, presumably because they have conflicting event listeners added. Can you confirm? If you can, I'll add an issue for this, because it's a blocker. This doesn't happen in Edge, where everything works fine (Edge actually reloads the previous page from the ZIM, so it's obviously putting the DOM back to an earlier state than Firefox is doing and is re-running app.js, whereas Firefox only seems to run the iframe.onload event).

@mossroy
Copy link
Contributor Author

mossroy commented May 4, 2018

That's right, I see that regression too (on Firefox 52 ESR, master branch) : please create a separate issue for that (for version 2.3).
It looks like we'll have to wait a bit more before releasing 2.3...

@mossroy
Copy link
Contributor Author

mossroy commented May 6, 2018

Regarding ServiceWorker and/or WebWorker :

  • you got it right regarding how the ServiceWorker currently works in kiwix-js. It is not optimal, but we have to workaround the fact that a ServiceWorker can not access the DOM, and can not read our ZIM files directly (at least last time I checked)
  • a WebWorker is basically some code that can run in a different thread, in parallel with the main thread. As a ServiceWorker, it runs in a separate context, without DOM access. It would probably be efficient to run decompression inside a WebWorker, and it would help for both jQuery and ServiceWorker modes. But, if we're lucky, the WebAssembly technology might give us that for free : that's why I did not investigate much on it.

@mossroy
Copy link
Contributor Author

mossroy commented May 6, 2018

NB : I think the require.js feature you mention only allows the use of require.js inside a WebWorker. It does not create automatically a WebWorker around our javascript files.

@Jaifroid
Copy link
Member

Jaifroid commented May 9, 2018

I've now rebased this javaScript-support branch onto the latest fixes in #366 that inject the stripped class(es) from the html tag into the body tag. And as a result, capable clients (Edge, IE and Firefox in browser context, not tested on Chrome) can now open and close content underneath the h2 headings by clicking on them, using the routines supplied in the ZIM to do this. This is a proof of concept, and is useful in showing that the .innerHTML injection method is not incompatible with JavaScript support.

@Jaifroid
Copy link
Member

The JavaScript support branch has now been rebased on the latest master, so it now includes support in both jQuery and SW modes.

@mossroy
Copy link
Contributor Author

mossroy commented Nov 18, 2018

I've tested a bit, and it works pretty well on PhET (with some exceptions you mentioned) : congratulations!
I did not have a look at the code yet, as I don't have much time for that, and try to give higher priority to the work on ServiceWorker mode.

It's a very technical task you're working on, as you're handling things a browser should natively handle.
If you want more ZIMs to test, you can try gutenberg for example (at least the main page seems to use some javascript)

@Jaifroid
Copy link
Member

I rebased the JavaScript Support branch to enable testing of @ISNIT0's polyfill script for browsers that don't support the details tag and can't run SW mode ( (e.g. FFOS and UWP) . Below is what it looks like in Edge in jQuery mode. Note the sections are closed by default in desktop view. Clicking to open and close them works, and so does tapping (touchscreen).

image

@Jaifroid
Copy link
Member

As a separate issue, due to the way this branch loads CSS (I think), the landing page is not displayed well, and has to be resized for the images to be rendered correctly with the CSS. This needs a fix in this branch, but that's not the point of this rebase.

@mossroy mossroy modified the milestones: v2.6, v2.7 Jul 14, 2019
@kelson42
Copy link
Collaborator

Just FYI, this will be necessary if you want to take benefit of the soon to be released first ZIM JS API.

@Jaifroid
Copy link
Member

@kelson42 I'm interested in this. Enabling JS support in jQuery mode is possible, and it works with non-complex scripts. It doesn't support complex things like ReactJS currently, as it is too complicated to inject all the required script blocks in a compatible way, and this is an issue with PhET ZIMs. But it may be possible to inject known scripts where the functionality is useful and if the API is fairly generic across ZIM types. Once we have a sample, I'd like to test the possibilities.

@Jaifroid
Copy link
Member

I've rebased the JavaScript support branch on master.

@mossroy mossroy modified the milestones: v2.7, v2.8 Mar 29, 2020
@mossroy mossroy modified the milestones: v2.8, v2.9 Jul 11, 2020
@Jaifroid Jaifroid modified the milestones: v3.0, v3.1 Oct 3, 2020
@Jaifroid
Copy link
Member

The JavaScript Support branch has been rebased in order to test #650 (the ZIMs are ZSTD encoded).

@Jaifroid Jaifroid modified the milestones: v3.1, v3.2 Dec 6, 2020
@mossroy mossroy modified the milestones: v3.2, Backlog Aug 22, 2021
@mossroy mossroy changed the title Enable injection of javascript content coming from ZIM files Enable injection of javascript content coming from ZIM files in jQuery mode Aug 22, 2021
@Jaifroid
Copy link
Member

I don't think we're going to provide even any rudimentary JS support in jQuery mode any more due to #727. While I did try to rebase the JS Support branch a couple of months ago, there are now so many inline scripts in the average ZIM (see #789 for a consequence of this), that things do not work well. We are also close to making Service Worker the default, which does not need these workarounds.

@Jaifroid Jaifroid modified the milestones: Backlog, v3.8 Apr 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants