-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing images in Wikipedia articles #141
Comments
Thanks @WolfgangDpunkt for the report, the issue should be fixed in version |
Thank you very much! I have completed the update and progress is noticeable. Indeed, it now works with the example article "Canada" from the English Wikipedia. If you can find the patience to work on this problem further, I would be happy. Since there are hardly any other reliable tools to convert wiki articles to epub books via command line, I think the bug has a high relevance. In this article, for example, almost all the pictures are missing: However, there does not seem to be a fundamental problem with international language versions of Wikipedia. The photo "Sachertorte" is missing in the epub, for example: In fact, the debug log does not mention the filename of this photo either, for whatever reason this photo is ignored during the download (https://upload.wikimedia.org/wikipedia/commons/b/b8/Sachertorte_DSC03027.JPG) |
Thanks for pointing out the broken pages, it will help out with debugging. This is mostly Readability removing the images, I will investigate how to prevent that from |
Seems that the HTML markup for images in Wikipedia is going to change soon: https://diff.wikimedia.org/2022/11/28/tech-news-2022-48/ (via @simevidas), so that may make handling them a bit easier. |
…e images to not be fetched for EPUB archive (Re: #141)
It turns out that there was more than one issue at play preventing one image or the other from being properly fetched/bundled:
There may be additional issues with Readability as mentioned in earlier comments, but I'm confident upgrading to |
Environment
node --version
: v17.9.0npm --version
: 8.18.0yarn --version
, if using Yarn:percollate --version
: v2.2.0Description
When I convert Wikipedia articles to epubs with this otherwise great and very useful tool, some of the images get lost. An adblocker is not used in this environment.
Here is my command line
percollate epub --individual --output /home/Perco-Epubs/ https://en.wikipedia.org/wiki/Canada --debug
And here is the resulting epub. I had to zip it, as Github does not accept epub files:
-Canada.epub.zip
And here's the direct comparison, in the "British North America" section the web version has two images, the epub version zero.
There are indeed images in the epub, percollate does not ignore all images, but most of them.
What could be the reason? Thanks a lot!
Here comes the debug log:
The text was updated successfully, but these errors were encountered: