doc.save with linear=True PDF fails linearization check with 'overflow reading bit stream' #4263

EddieOner · 2025-01-31T21:00:16Z

Linearization Error with linear=True Parameters

Problem Description

When saving PDF files using PyMuPDF with linear=True for web optimization, the PDF fails to linearize properly, resulting in integrity errors. These failures were initially observed in PDF proxies generated by AWS Lambda running pymudf (Python):
doc.save(output_file_path, garbage=4, clean=True, deflate=True, linear=True)

Upon further investigation using QPDF, the issue was reproducible on a local machine with pymupdf cli.

Steps to Reproduce

Start with a non linearized pdf and run pymupdf command:

PyMuPDF  clean -linear input.pdf output.pdf

check

qpdf --check output.pdf
checking output.pdf
PDF Version: 1.7
File is not encrypted
File is linearized
WARNING: output.pdf: error encountered while checking linearization data: overflow reading bit stream: wanted = 32; available = 16
qpdf: operation succeeded with warnings

Observe linearization failures

Expected vs Actual Behavior

Expected: PDF should save with deflate compression and linearized structure without errors.
Actual: File produces validation errors indicating invalid linearization structure.

User Impact

If the linearization isn't working correctly:

User Experience: The PDF will still download, but users won't be able to view the first page immediately. Instead, they might have to wait for the entire file to download before they can start reading.

Page Order: The pages might not load in the intended order, which can be confusing and disrupt the reading experience.

Environment Information

pymupdf version: 1.25.2
qpdf version: 11.9.1
OS: Mac 15.3

Additional Context

This seems to happen with almost any pdf file

Twitter 4_linerarized.pdf
Twitter 4.pdf

How to reproduce the bug

pip install pymupdf
brew install qpdf

PyMuPDF version

1.25.2

Operating system

MacOS

Python version

3.11

The text was updated successfully, but these errors were encountered:

JorjMcKie · 2025-02-01T11:01:11Z

This an upstream problem (MuPDF). We will create a report in their issue system.
You can recreate the issue without using PyMuPDF via this MuPDF CLI command:

mutool clean -lggggsz Twitter.4.pdf

Thereafter, running qpdf with the generated output PDF out.pdf shows the problem.

JorjMcKie · 2025-02-01T11:06:56Z

Here is the link to MuPDF's issue: https://bugs.ghostscript.com/show_bug.cgi?id=708278

EddieOner · 2025-02-03T17:38:41Z

@JorjMcKie Added additional content how this effects end-users :)
If the linearization isn't working correctly:

User Experience: The PDF will still download, but users won't be able to view the first page immediately. Instead, they might have to wait for the entire file to download before they can start reading.

Page Order: The pages might not load in the intended order, which can be confusing and disrupt the reading experience.

JorjMcKie added the upstream bug bug outside this package label Feb 1, 2025

EddieOner closed this as completed Feb 3, 2025

EddieOner reopened this Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc.save with linear=True PDF fails linearization check with 'overflow reading bit stream' #4263

doc.save with linear=True PDF fails linearization check with 'overflow reading bit stream' #4263

EddieOner commented Jan 31, 2025 •

edited

Loading

JorjMcKie commented Feb 1, 2025

JorjMcKie commented Feb 1, 2025

EddieOner commented Feb 3, 2025 •

edited

Loading

doc.save with linear=True PDF fails linearization check with 'overflow reading bit stream' #4263

doc.save with linear=True PDF fails linearization check with 'overflow reading bit stream' #4263

Comments

EddieOner commented Jan 31, 2025 • edited Loading

Linearization Error with linear=True Parameters

Problem Description

Steps to Reproduce

Expected vs Actual Behavior

User Impact

Environment Information

Additional Context

How to reproduce the bug

PyMuPDF version

Operating system

Python version

JorjMcKie commented Feb 1, 2025

JorjMcKie commented Feb 1, 2025

EddieOner commented Feb 3, 2025 • edited Loading

EddieOner commented Jan 31, 2025 •

edited

Loading

EddieOner commented Feb 3, 2025 •

edited

Loading