Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use iframe instead of introduce taskbar. #394

Closed
mgautierfr opened this issue Jul 28, 2020 · 19 comments · Fixed by #716
Closed

Use iframe instead of introduce taskbar. #394

mgautierfr opened this issue Jul 28, 2020 · 19 comments · Fixed by #716
Assignees
Milestone

Comments

@mgautierfr
Copy link
Member

mgautierfr commented Jul 28, 2020

Rational

A number of issues has been raise recently about the "taskbar".
Some are pretty relative to the warc2zim project : #391
Some of them are more generic and came from other (old) discussion/issues : #392, #322, kiwix/kiwix-tools#17, (kiwix/kiwix-tools#59)

I personally always have been a bit disturbed by the fact we insert UI/chrome content in the main content itself.

All this make me think that maybe introduce thing not related to the content in the content is not the good idea. As the issue title imply, using a iframe inside a UI to display the (raw) content is probably better than changing the content to insert UI.

The change

I've already worked on that on ideascube side (https://framagit.org/ideascube/ideascube/-/merge_requests/590) but never finish this.

The main idea is to use a "proxy page" (a viewer) to display the article content in the iframe.
The viewer uses js script to keep track of the iframe url and so the navigation is not broken (page title, history, ...)

I haven't think totally on the url organization (that's also why there is this issue) but we could have :

  • https://example.com/zim_name/foo.html : kiwix-serve return the viewer for this. The js viewer parse the url and set the iframe to https:/example.com/raw?content=zim_name&path=foo.html (return the raw article without modification). It would not change the article url, but would break the caching system.
  • https://example.com/viewer#content=zim&path=foo.html would be the viewer (browser/server cache efficient as the frament is not send to the server/taken into account in the cache browser). The js viewer parses the frament and call https://example.com/zim_name/foo.html.
  • Any other combination we can find.

The second has my preference because of the following. Although it change the url (previous url are now raw, they work but there is no topbar anymore).

One avantage of a iframe hover #392 is that it doesn't introduce any change in the zim file itself
(except if zim file provides its own viewer (warc))

Compatibility with warc base zim :

They could have a metadata telling they provide they own player, kiwix-serve would use the viewer in the zim file instead of the integrated one.
As the random url served by kiwix-serve will be a redirection to https://example.com/viewer#..., random would work. Same for the search/suggestion (but it would be to the viewer to provide the ui anyway).

Customization :

While the vendor proxy could catch the https://example.com/viewer request and provide its own, it would break the warc zim files (who need a specific viewer).
Instead, the viewer page could load a wellknown js url. The vendor could catch this request and serve it. It would be to this js to modify the viewer dom (add logo, change colors, home button's url).

NO JS fallback.

The viewer can have a noscript element providing a link to the raw main page. From now, all link in the page would be point to the "raw" url. But, as said @tim-moody we can almost assume that all browser has js/iframe support.

Drawbacks

The only one I see is that the "basic" url are now raw url.

  • It make the url less user friendly (but understandable).
  • It change the meaning of the url. It should not be a problem for classic navigation as the kiwix-serve home page would point the viewer instead of the direct url. But already existing link/bookmark will now point to the raw page.

We could mitigate this by having the https:/example.com/raw?content=zim_name&path=foo.html to serve the raw content and https://example.com/zim_name/foo.html` redirect to the viewer. But break the nojs browsing. Maybe cookies can help us to track the situation.

@mgautierfr
Copy link
Member Author

@Jaifroid while you are not impacted by this potential change, your comment openzim/warc2zim#17 (comment) made me think that you also use iframe in kiwix-js. Would be nice to have your insight on this (and maybe share some code/tips between kiwix-lib and kiwix-js)

@tim-moody
Copy link

@mgautierfr a couple of comments:

  • Please don't assume that zims are in the web root. We use --urlRootLocation=/kiwix/.
  • I am very allergic to iframes, maybe because of the former practice of embedding whole web sites in them, but I find them unreliable.
  • If you want to have a raw mode from kiwix-serve where only the content without any navigation framework is returned I could see the value of that. The kiwix independent viewer js would set the returned html in a div of its choosing. Search would have to do the same. (I currently parse the output from search in order to strip out the header and put the links in my own div.) You would also need to to fix up urls appropriately as is currently done with urlRootLocation. The url could use the same pattern as search, say
    <domain>://<RootLocation>/<zim/?raw=<url> I guess if someone wanted to put the returned results in an iframe it would work.
  • I suppose mobile and desktop viewers could also each apply their own css.

@Jaifroid
Copy link
Member

Jaifroid commented Jul 28, 2020

@mgautierfr: Oh yes, actually what you say makes me think that the kind of solution you're suggesting for Kiwix Serve would also work in Kiwix JS to solve kiwix/kiwix-js#644.

Yes, we use an iframe, and inject the content into it when we first load an article (usually a dummy article first, to properly initialize the iframe, then the landing page of the ZIM that has just been loaded). After the first "real" article from the ZIM is injected into the iframe, then a Service Worker acts as a backend "server", intercepting clicks on content in the iframe (if it is to internal content) and returning the requested article, very much like a webview control in other languages. We call this "SW mode". Code where this is injected is here.

The other way is for clients that don't support Service Worker. In that case, we inject the content into the same iframe, and then do DOM manipulation on it: we extract all the links to ZIM content and attach JavaScript (like click events), or manually extract and attach images, etc. It's not very good with dynamic content, hence this method is will be deprecated though it's still the "default" (we call it "jQuery mode", even though we're phasing out jQuery and it's no longer necessary). Code where the DOM manipulation starts is here.

However, it gives me the idea that we could have a third mode for ZIMs that come with their own built-in player, in which we'd basically "get out of the way", hide all UI, and just facilitate the loading of the player, not register a competing Service Worker. @ikreymer: would that be workable (for Kiwix JS and WARC)?

@Jaifroid
Copy link
Member

@tim-moody iframes have come on a long way from the old days when they were seen as a non-standard proprietary thing. They are now fully Standards spec. Nothing unreliable about them!

Unless you're willing to go the React way and delve into the Shadow DOM, it's still the only reliable way of attaching content that has its own context, completely separate from the main DOM. CSS, JS, can all be applied in the iframe as if it's running in a Web View, and it is safely isolated too. There is one issue with attaching competing Service Workers, where they both control the same iframe, that is not allowed.

@ikreymer
Copy link

Re: iframes, I second the opinion that iframes are the way to go, and are the most reliable way to replay content from another site, and wrap it with a banner/header/taskbar/etc... For example, Internet Archive does not use iframes for their banner, and can get results like: http://web.archive.org/web/20200728133603/http://example.com/ (wrong styling) or http://web.archive.org/web/20200728142415/https://twitter.com/ (top of page covered up)
Though they've made efforts to improve this, I think it will never be perfect because any inserted content can always conflict with either the styles or the layout.

More generally, if the idea is to move to an iframe and JS based viewer, the replayweb.page system could be adopted for that as well.. It is being package into a ZIM mostly because that is what was requested, but it could also act as the default viewer too, and handle search, suggestions, etc.. The viewer I'm building is designed to be generic and not tied to the WARC format..

There could be a service-worker based viewer that renders the taskbar, and a way to 'theme' or 'style' the taskbar. Just wanted to mention this as it seems like this is solving some of the same issues just in a different way.. (The taskbar could even be provided as a webcomponent to make styling it easier)

Once the service worker based solution is figured out on all platforms, my recommendation would be to consider adopting it for rendering of all ZIM files in a consistent way, to minimize the maintenance of multiple rendering paths with different JS-based viewers..

@Jaifroid
Copy link
Member

@ikreymer This looks good -- but I'd just caution we still support some very old browsers (IE11, and also Chromium on Windows XP and Vista!!). Of course there comes a time when we have to stop supporting those, but apparently we still have users (from Global South or behind censorship curtains)...

@mgautierfr
Copy link
Member Author

Please don't assume that zims are in the web root. We use --urlRootLocation=/kiwix/.

I haven't spoken about it but I have not forgot :)
This should work the same way with https://example.com/kiwix/viewer#content=zim&path=foo.html and https://example.com/kiwix/zim_name/foo.html.
"Simply", the viewer should be aware of that and correctly parse/recreate url accordingly.
@ikreymer you are probably the one here to have to take this into account for now.

You would also need to to fix up urls appropriately as is currently done with urlRootLocation

The viewer should be aware of that, but on kiwix-serve side it should be ok.

The other way is for clients that don't support Service Worker. In that case, we inject the content into the same iframe, and then do DOM manipulation on it

My PR succeed to track the change without Service Worker or DOM manipulation or track of the user input.

We simply let the iframe live its life (as the url are valid).
Just track the iframe location to update the title and the undo/redo stack.

What ServiceWorker or DOM manipulation provide more ?

The viewer I'm building is designed to be generic and not tied to the WARC format..

With all the different code we have (warc viewer, viewjs, my ideascube PR) we should succeed to do something.

Once the service worker based solution is figured out on all platforms, my recommendation would be to consider adopting it for rendering of all ZIM files in a consistent way, to minimize the maintenance of multiple rendering paths with different JS-based viewers..

The idea here is to use a iframe only for the content served by kiwix-serve (in a browser). Other clients (kiwix-desktop, android, ios, macos) will not use it as they come with their own UI.

The fact that warc based zim need a viewer to be correctly viewable (or not) on all platform is different from this issue.

but I'd just caution we still support some very old browsers

Yes. With iframe and js, we may lost the topbar on very old browsers. But the browsing should be still possible as basic urls (https://example.com/kiwix/zim_name/foo.html) are valid urls with valid raw content (as far as zim file contains valid content without js).

@mgautierfr
Copy link
Member Author

I currently parse the output from search in order to strip out the header and put the links in my own div

This is the kind of sentence that make me think we should rethink a bit kiwix-serve.
kiwix-serve is created assuming it is a "full application" (UI, integrated search, listing available content, navigation between zim files, ...).

It works pretty well for usage like https://library.kiwix.org or some local server.
But most of the time, kiwix-serve is deploy with other systems, needing to interact with it and customize it. They provide their own UI, theming, main page, search system (searching in zim file and other contents), ...

It is a pity to see that we spend time to generate the html results of a search (#148 (comment)) and then you just parse it to get the raw result and recreate your own stuff.
But this should be tracker here #395


Everything to say that if we go this way, we could end with a kiwix-serve doing a really little (just extract content from zim file and pass it without any parsing/modification). The only real work would be to do the search itself (but without rendering the results).

All the fancy work would be made in the browser itself (either with js provided by kiwix-serve itself or provided by the vendor).

@Jaifroid
Copy link
Member

Jaifroid commented Jul 28, 2020

What ServiceWorker or DOM manipulation provide more ?

DOM manipulation mode (in Kiwix JS) is currently a bit faster because we don't bother with extracting JavaScript in that mode!! (It's really hard to attach and run in a compatible way, except for the most basic scripts.) But the main advantage is compatibility with any browser (IE11, Firefox OS, Windows Mobile...). The main disadvantage is that dynamic content (PhET, custom search tools in Gutenberg) doesn't work in this mode.

Service Worker mode is much more complete, and it runs dynamic content, but it is incompatible with the mentioned browser engines (and some others) that will never be updated to provide Service Worker support.

I should be clear that we use Service Worker to provide the URL capturing that other systems provide through WebView controls. It's (currently) external to the content. The problem is arising because WARC is starting to include a Service Worker that conflicts with our Service Worker. That shouldn't be a problem for Kiwix Serve, which I think only needs to care about the environment Kiwix Serve is running in, not so much the client's environment.

@ikreymer
Copy link

ikreymer commented Jul 28, 2020

The idea here is to use a iframe only for the content served by kiwix-serve (in a browser). Other clients (kiwix-desktop, android, ios, macos) will not use it as they come with their own UI.

The fact that warc based zim need a viewer to be correctly viewable (or not) on all platform is different from this issue.

Well, this seems to all be related.. It seems like there's way too many configurations/combinations possible!
You have: non-iframe viewer x iframe viewer x service worker iframe viewer x android x desktop x ios x kiwixjs!

Here's my suggestions (which of course you're free to ignore :)

  1. Keep current system for older browsers/backwards compatibility.

  2. Build a new JS, iframe-based replay system that works with the both existing ZIMs and ZIMs created from WARCs.
    The main differences are really that there are two lookups (A/ and H/) instead of one (A/)
    The system will use a service worker if available, but maybe could further 'gracefully degrade' to a non-SW mode for regular ZIMs if they are not available.

    There would be a way to style/theme the look.

  3. Each of kiwix-serve, kiwix-apple, kiwix-desktop, kiwix-android would implement a consistent set of JSON based apis, for example:

  • /api/load?zim...&article=... - load the raw content from the ZIM
  • /api/search...
  • /api/suggestions...
  • /api/external - (return if url is blocked or not)
    ... there's probably more apis that I'm missing but they can all be enumerated

Each of kiwix-serve/kiwix-apple/kiwix-desktop/kiwix-android/kiwix-js would not have their own UI but serve using a unified JS-based UI.

  1. for kiwix-js, it would implement these apis not via http but via an importable library in the service worker: via importScript('kiwix.js'). Then, the equivalent functions can be called like so: kiwixjs.load(...), kiwixjs.search(...)

I think having a new, unified JS-based viewer for 'modern' browsers and each platform implementing a consistent set of APIs will make the maintenance of this a lot simpler in the long term.

The unified JS viewer would of course need some more thought and design, but I think is definitely doable.
Again, I know this is not what's planned so far, and feel free to ignore, but these are my suggestions on learning about all the different variations and approaches :)

@tim-moody
Copy link

I like the idea of increasing the api to kiwix-serve.

However, I hope all of this is supplemental to the current kiwix-serve and that kiwix-serve can still be made to operate as it currently does if desired.

As far as a 'a new, unified JS-based viewer' is concerned I thought the motivation was to eliminate the task bar and gain more flexibility to integrate with other systems. So this viewer needs to be an option and not reintroduce constraints on the front end that have been removed from the backend.

@ikreymer
Copy link

I like the idea of increasing the api to kiwix-serve.

However, I hope all of this is supplemental to the current kiwix-serve and that kiwix-serve can still be made to operate as it currently does if desired.

Yes, I was suggesting not dropping any backwards compatibility if possible..

As far as a 'a new, unified JS-based viewer' is concerned I thought the motivation was to eliminate the task bar and gain more flexibility to integrate with other systems. So this viewer needs to be an option and not reintroduce constraints on the front end that have been removed from the backend.

Yes, my suggestion would be to fully de-couple the backend from the frontend. There could be an official frontend JS viewer that works on all platforms using the specified API. Of course, you could also run a completely different viewer or access the api as needed, maybe even on a different host if CORS is enabled. (I don't know your exact use case so maybe this is wrong).

@stale
Copy link

stale bot commented Sep 26, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@rgaudin
Copy link
Member

rgaudin commented Jun 21, 2021

Just documenting another annoyance of the current toolbar : whenever you use an anchor, the content that is being anchored is usually partially hidden behind the bar.

Screen Shot 2021-06-21 at 15 48 39

Screen Shot 2021-06-21 at 15 48 58

rgaudin added a commit to openzim/sotoki that referenced this issue Jul 1, 2021
StackExchange's CSS is setting some property on all elements : margins on <inputs />
and font-size on body that ends-up conflicting with the inserted Kiwix toolbar in
kiwix-serve.

IMO, this is a Kiwix-serve bug and scrapers should not care about how the reader might
affect its content. Related issue: kiwix/libkiwix#394

As a short-term measure, some kiwix-related styles have been overwritten in the scraper.
@rgaudin
Copy link
Member

rgaudin commented Aug 19, 2021

Bumping this once again. Still a pain to write scrapers reusing source-website CSS, having to deal with kiwix-serve's intrusion.

Screen Shot 2021-08-19 at 20 08 11

@stale stale bot removed the stale label Aug 19, 2021
@kelson42
Copy link
Collaborator

Once the next release are done, happy that we give a try to this idea.

@kelson42 kelson42 pinned this issue Aug 19, 2021
@kelson42 kelson42 added this to the 10.1.0 milestone Aug 19, 2021
@rgaudin
Copy link
Member

rgaudin commented Nov 11, 2021

yet another CSS hack in kolibri2zim. Not spamming, archiving where the hacks are.

@kelson42
Copy link
Collaborator

kelson42 commented Feb 3, 2022

@mgautierfr I want to confirm that to my opinion we are ready to go with this, using the simpliest variant with iframe. The goal is to avoid modifying the DOM of the article. Still, in a first time I would recommend to build this as an option to let us time to fully evaluate the solution.

@ikreymer Like @mgautierfr I want to be cautious to have a reader in JS, in particular if it use service-worker. It is quite clear that the WARC case has shown a few problems in the way how we deal with content. Part of them are already fixed with libzim7 and are currently been brought to warc2zim. A full re-evaluation of the situation will happen in Q1/Q2 2022 (hopefully with your inputs) and we will then decide about the next steps. Primary goal is definitly to remove the hacks we have been forced to do in the warc2zim project.

@veloman-yunkan
Copy link
Collaborator

  • The js viewer parse the url and set the iframe to https:/example.com/raw?content=zim_name&path=foo.html (return the raw article without modification)

Such URL scheme for iframe content will prevent the links inside the iframe from working automatically. The current implementation of the /raw endpoint (/raw/BOOKNAME/content/PATH/TO/ZIM/ENTRY) should work out-of-the box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants