Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long rendering times when grid items span #2336

Open
panghal0 opened this issue Dec 26, 2024 · 5 comments
Open

Long rendering times when grid items span #2336

panghal0 opened this issue Dec 26, 2024 · 5 comments
Labels
performance Too slow renderings

Comments

@panghal0
Copy link

panghal0 commented Dec 26, 2024

Hi team, I was wondering if there are some best practices/recommendations that one should keep in mind to while generating large PDF documents to optimize the generation time? I have gone through the documentation and couldn't find anything related.

In our app we generate HTML reports which sometimes result in upto 100MB+ of html files. Now few of the users are asking for report in PDF format. Since we already have system in place to generate HTML reports, we thought of using WeasyPrint to convert these HTMLs to PDFs.

I tried converting 1 HTML to PDF using WeasyPrint, but it is taking too long to generate PDF. It took around 40 minutes for converting a 50MB HTML to PDF and this generation time shoots upto 2.25 Hours for a 100MB HTML.
I tried using it via command line, like this

python3 -m weasyprint ./Desktop/Reports/report_name/report.html  ./Desktop/Reports/report_name/report.pdf

So I'll really appreciate if some one can guide me to some sort of checklist/best practice like keeping DOM tree to certain limit or any specific CSS properties to use or not use, or anything in general which can help reducing the PDF generation time.

By the way our HTML and CSS structure is pretty basic (using CSS Grid though). Here is a sample of the html/css code.

Aside form the questions, thanks a lot for the great tool. ❤

@panghal0 panghal0 changed the title Performance considerations/best practices/checklist for generating large PDF documents [Help Needed] Performance considerations/best practices/checklist for generating large PDF documents Dec 26, 2024
@liZe liZe added the performance Too slow renderings label Dec 26, 2024
@liZe
Copy link
Member

liZe commented Dec 26, 2024

Hi!

In our app we generate HTML reports which sometimes result in upto 100MB+ of html files.

Then you could probably benefit from our professional support. 😄

By the way our HTML and CSS structure is pretty basic (using CSS Grid though).

Removing display: grid saves most of the generation time (from ~13s to ~1s for your example), there’s probably a performance issue in our grid code. Let me ask my profiler!

Aside form the questions, thanks a lot for the great tool. ❤

You’re welcome. ❤

@liZe
Copy link
Member

liZe commented Dec 26, 2024

The problem comes from:

if size_contribution in ('minimum', 'min-content'):
space = min_content_width(context, item)

Calling min_content_width on text is quite long, because we need to render all the words one by one to find the longest. This function is almost never called for normal layouts, but it’s necessary for grid (and flex) layout.

The overall _distribute_extra_space algorithm blindly follows the specification’s pseudo-algorithm, there are probably many optimizations waiting to be found. I’ll try to find is there’s an easy workaround for your case or a quick fix in the code.

@liZe
Copy link
Member

liZe commented Dec 26, 2024

That’s actually extremely long because you use word-break: break-all for almost all your text, so each word letter is actually rendered separately to find the min-width of your grid items 😄.

A possible workaround is to use word-break: break-all only when you really need it. When I use it only for the hash and the path, it only takes ~2s. Far from perfect, but already much better.

(I’ve also found a typo, and fixing it makes the grid layout go… even slower 😒. Note that the problem only happens when cells span, ie. when they take multiple rows or columns.)

(And it looks like break-inside: avoid doesn’t work for grid containers, I’ve added an item to #2145.)

So I'll really appreciate if some one can guide me to some sort of checklist/best practice like keeping DOM tree to certain limit or any specific CSS properties to use or not use, or anything in general which can help reducing the PDF generation time.

There’s a short list of best practices in the documentation, we could add grid and flex layouts to tables in the list of possibly long renderings.

@liZe liZe changed the title [Help Needed] Performance considerations/best practices/checklist for generating large PDF documents Long rendering times when grid items span Dec 26, 2024
@ollejernstrom
Copy link

@liZe How do you profile, is this something I can do by myself to find bottle necks.

@liZe
Copy link
Member

liZe commented Jan 2, 2025

@liZe How do you profile, is this something I can do by myself to find bottle necks.

There are many ways, but here is my way:

  • install Graphviz (not a Python package) and gprof2dot (Python package)
  • python -m cProfile -o /tmp/speed -m weasyprint /tmp/grid.html /tmp/grid.pdf
  • gprof2dot -f pstats /tmp/speed | dot -Tsvg -o /tmp/grid.svg

You then get a SVG with the time spent in each function and how many times it’s been called.

The main problem in your case is that we have to get the width of all characters independently to resolve min_content_width when word-break: break-all is set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Too slow renderings
Projects
None yet
Development

No branches or pull requests

3 participants