fix: truncate long lines in PDF conversion #439

ublefo · 2024-04-27T14:33:56Z

This fixes the PDF processing issue when there are super long lines in the submitted code files by using cut and fold from coreutils to wrap lines. Additionally, introduce https://github.com/ruby/shellwords for proper shell escape.

There are two limits configured, the hard limit is 1000 characters, and the soft limit is 160 characters. Any lines over 1000 characters will be truncated, and then line breaks will be applied for any lines over 160 characters long. If the rendered file has been modified, a warning will be added into the PDF document to indicate the rendered file differs from the original submission. Unit tests and test files have been added for these changes.

Inkscape has been replaced with librsvg for SVG to PDF conversion in jupynotex. Inkscape pulls in a lot of unnecessary dependencies (dbus, gtk, etc.), and it emits annoying irrelevant errors that has to be redirected to /dev/null. rsvg-convert is much more lightweight in comparison, performs equally well and has none of the downsides of Inkscape: https://gitlab.gnome.org/GNOME/librsvg

Generated result (excluding Jupyter Notebook):

Jupyter Notebook:

PDF processing will fail with "! Dimension too large." if a line of text is way too long. Implement a simple processing helper method to call fold (from coreutils) to fold long lines for all code files.

We shouldn't modify student submissions, instead we use the temp file when rendering them to PDF. Cleanup will be performed after rendering is complete.

Run fold on provided files and compare the output with diff. If the file doesn't contain any lines that are over the configured threshold, it will be identical to the original. In this case we replace the temp file with a symlink for easy identification in the template.

Redirect stdout to /dev/null since we don't need the diff output

Set a hard limit of 1000 characters, and truncate everything in the same line after the limit is reached. Otherwise we could get PDF files with hundreds of pages which is completely unreadable.

macite

Can you see if we can do this with re-writing the original rather than creating the tempfile?

app/models/task.rb

app/helpers/file_helper.rb

app/models/task.rb

app/views/task/task_pdf.pdf.erb

https://gitlab.gnome.org/GNOME/librsvg

ublefo added 12 commits April 28, 2024 00:30

chore: add shellwords gem for proper shell path escape

5bf5857

fix: process code files to remove long lines

d82c3f8

PDF processing will fail with "! Dimension too large." if a line of text is way too long. Implement a simple processing helper method to call fold (from coreutils) to fold long lines for all code files.

refactor: write line-wrapped code files into temp files

15edb68

We shouldn't modify student submissions, instead we use the temp file when rendering them to PDF. Cleanup will be performed after rendering is complete.

fix: silence diff

a538888

Redirect stdout to /dev/null since we don't need the diff output

fix: change default column width limit to 160 characters

3f134b5

enhance: add a notice in the pdf template if file is modified

a802053

enhance: truncate super long lines in code files rendered in pdf

3e6bd55

Set a hard limit of 1000 characters, and truncate everything in the same line after the limit is reached. Otherwise we could get PDF files with hundreds of pages which is completely unreadable.

refactor: update temp file names

b12b2f9

quality: add unit test for code submissions with long lines

cf0200f

quality: ensure long line notice is not included when not applicable

5ad8376

fix: update comment to reflect temp filename changes

bb930e4

ublefo mentioned this pull request Apr 27, 2024

fix: pdf conversion for long lines thoth-tech/doubtfire-api#17

Closed

macite requested changes May 2, 2024

View reviewed changes

app/models/task.rb Outdated Show resolved Hide resolved

app/helpers/file_helper.rb Show resolved Hide resolved

app/models/task.rb Outdated Show resolved Hide resolved

app/views/task/task_pdf.pdf.erb Outdated Show resolved Hide resolved

ublefo added 7 commits May 3, 2024 07:08

refactor: avoid making temp files since we are working with copies

1610d76

chore: update jupynotex.py to latest version

dd09a30

fix: truncate unreasonably long lines (1000 char) in source cells

42998fe

fix: allow generation of notebooks with one cell (upstream bug)

7eecc0f

fix: force line wraps for source cells

d6e97dc

quality: test pdf conversion on ipynb with long lines

03a3ca7

fix: replace deprecated option and silence useless errors for inkscape

96351e8

ublefo force-pushed the pdf-long-lines branch from f964f27 to 96351e8 Compare May 2, 2024 21:08

enahnce: replace inkscape with librsvg

490fdb5

https://gitlab.gnome.org/GNOME/librsvg

ublefo requested a review from macite May 2, 2024 21:42

Merge branch 'development' into pdf-long-lines

757adba

macite merged commit 2425997 into doubtfire-lms:development May 25, 2024
2 of 3 checks passed

ublefo deleted the pdf-long-lines branch June 2, 2024 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: truncate long lines in PDF conversion #439

fix: truncate long lines in PDF conversion #439

ublefo commented Apr 27, 2024 •

edited

Loading

macite left a comment

fix: truncate long lines in PDF conversion #439

fix: truncate long lines in PDF conversion #439

Conversation

ublefo commented Apr 27, 2024 • edited Loading

macite left a comment

Choose a reason for hiding this comment

ublefo commented Apr 27, 2024 •

edited

Loading