You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When saving PDF files using PyMuPDF with linear=True for web optimization, the PDF fails to linearize properly, resulting in integrity errors. These failures were initially observed in PDF proxies generated by AWS Lambda running pymudf (Python): doc.save(output_file_path, garbage=4, clean=True, deflate=True, linear=True)
Upon further investigation using QPDF, the issue was reproducible on a local machine with pymupdf cli.
Steps to Reproduce
Start with a non linearized pdf and run pymupdf command:
PyMuPDF clean -linear input.pdf output.pdf
check
qpdf --check output.pdf
checking output.pdf
PDF Version: 1.7
File is not encrypted
File is linearized
WARNING: output.pdf: error encountered while checking linearization data: overflow reading bit stream: wanted = 32; available = 16
qpdf: operation succeeded with warnings
Observe linearization failures
Expected vs Actual Behavior
Expected: PDF should save with deflate compression and linearized structure without errors. Actual: File produces validation errors indicating invalid linearization structure.
User Impact
If the linearization isn't working correctly:
User Experience: The PDF will still download, but users won't be able to view the first page immediately. Instead, they might have to wait for the entire file to download before they can start reading.
Page Order: The pages might not load in the intended order, which can be confusing and disrupt the reading experience.
This an upstream problem (MuPDF). We will create a report in their issue system.
You can recreate the issue without using PyMuPDF via this MuPDF CLI command:
mutool clean -lggggsz Twitter.4.pdf
Thereafter, running qpdf with the generated output PDF out.pdf shows the problem.
@JorjMcKie Added additional content how this effects end-users :)
If the linearization isn't working correctly:
User Experience: The PDF will still download, but users won't be able to view the first page immediately. Instead, they might have to wait for the entire file to download before they can start reading.
Page Order: The pages might not load in the intended order, which can be confusing and disrupt the reading experience.
Linearization Error with linear=True Parameters
Problem Description
When saving PDF files using PyMuPDF with linear=True for web optimization, the PDF fails to linearize properly, resulting in integrity errors. These failures were initially observed in PDF proxies generated by AWS Lambda running pymudf (Python):
doc.save(output_file_path, garbage=4, clean=True, deflate=True, linear=True)
Upon further investigation using QPDF, the issue was reproducible on a local machine with pymupdf cli.
Steps to Reproduce
Expected vs Actual Behavior
Expected: PDF should save with deflate compression and linearized structure without errors.
Actual: File produces validation errors indicating invalid linearization structure.
User Impact
If the linearization isn't working correctly:
User Experience: The PDF will still download, but users won't be able to view the first page immediately. Instead, they might have to wait for the entire file to download before they can start reading.
Page Order: The pages might not load in the intended order, which can be confusing and disrupt the reading experience.
Environment Information
Additional Context
This seems to happen with almost any pdf file
Twitter 4_linerarized.pdf
Twitter 4.pdf
How to reproduce the bug
PyMuPDF version
1.25.2
Operating system
MacOS
Python version
3.11
The text was updated successfully, but these errors were encountered: