Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfDocument::with_html does not exist, and some "advanced" HTML panics #202

Open
tgrushka opened this issue Jan 12, 2025 · 2 comments
Open

Comments

@tgrushka
Copy link

Hoping to try rendering HTML to PDF -- would be great as this crate doesn't depend on building the deprecated (abandoned?) libwkhtmltox C library and doesn't seem to require headless chrome... although that would work too.

Trying to figure out:

PdfDocument::new(&invoice.title())
                    .html2pages(&html, options)
                    .expect("Failed to render HTML to PDF")
                    .save(&PdfSaveOptions::default())...

Obviously not, there's no .save(...) on Vec<PdfPage>

Is there a way to .collect::<PdfDocument>() the pages?

🤔

                PdfDocument::new(&invoice.title())
                    .html2pages(&html, options)
                    .expect("Failed to render HTML to PDF")
                    .iter()
                    .collect::<PdfDocument>()
                    .save(&PdfSaveOptions::default()),

Nope:

a value of type `printpdf::PdfDocument` cannot be built from an iterator over elements of type `&printpdf::PdfPage`
the trait `std::iter::FromIterator<&printpdf::PdfPage>` is not implemented for `printpdf::PdfDocument`

Searched code for with_html and it's only in README file -- is this still being implemented?

...

Figured it out:

let options = XmlRenderOptions {
    // named images to be used in the HTML, i.e. ["image1.png" => DecodedImage(image1_bytes)]
    images: BTreeMap::new(),
    // named fonts to be used in the HTML, i.e. ["Roboto" => DecodedImage(roboto_bytes)]
    fonts: BTreeMap::new(),
    // default page width, printpdf will auto-page-break
    page_width: Mm(216.0),
    // default page height
    page_height: Mm(280.0),
    components: Vec::new(),
};

let mut pdf = PdfDocument::new(&invoice.title());
let pages = pdf
    .html2pages(&html, options)
    .expect("Failed to render HTML to PDF pages");
let pdf_bytes: Vec<u8> = pdf.with_pages(pages).save(&PdfSaveOptions::default());

🎉

But then:

Failed to render HTML to PDF pages: "Error parsing XML: Unknown token at line 1:7"

Bummer... 😞

It's HTML that starts like the following, which I assume is the problem (needs to be "simple"?):

<!DOCTYPE html>
<html lang="en">
<head>
    <!--[if !mso]><!-->
    <meta name="color-scheme" content="light dark">
    <meta name="supported-color-schemes" content="light dark">
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Invoice</title>
        <style>
        /* Reset for email clients */
        * {
            -webkit-text-size-adjust: 100%;
            -ms-text-size-adjust: 100%;
            box-sizing: border-box;
            margin: 0;
            padding: 0;
        }
        
        body {
            font-family: -apple-system, BlinkMacSystemFont, Arial, sans-serif;
            line-height: 1.4;
            margin: 0;
            padding: 0.75em;
        }
...
@fschutt
Copy link
Owner

fschutt commented Jan 13, 2025

Yeah, as stated in the README, the HTML support is very, very, very alpha. It should be ready by 0.8 (so that you can layout reports and simple books / booklets / menu cards - NOTHING fancy). I delayed working on the release due to some font problems, the only thing that is finished is porting the 0.7 API to 0.8 (there are large refactoring breaks, but it's still "useable" to migrate from 0.7 to 0.8). I thought I'd finish it in December, but then I had some other project come up so I hope to tackle it in February.

The example in the README is outdated. It's html2pages now because I wanted a fn(html) -> Vec<PdfPage> so that later on you can manipulate the resulting PDF pages (insert / remove pages, manipulate page content from JS, serialize pages to JSON, etc.). The with_html function always

The "HTML parser" is really just an XML parser responding to HTML keywords. I wanted to keep the WASM build size minimal, so I don't accept HTML5. It only does style matching towards the DOM nodes.

<html>
  <body>
     <div style='padding:10px;background:lightblue;'>
       <p id='simpletext'>Very long text that breaks into multiple lines. asdfasd asdfasdf adsfasdf ladsjfplasdjf asdlfkjasdfl lasdkjfasdölkjf</p>
     </div>
  </body>
  <head>
    <style>
     #simpletext {
       font-size:12px;
       border:1px solid black;
       font-family:Times-Bold;
     }
    </style>
  </head>
</html>

See the SYNTAX.md for how it SHOULD be (but not all features like pagination or even images work yet). It's very very alpha.

Again, it's a mini-mini-HTML-CSS-layout engine, there are lots of bugs. It's more in a tech-demo-what-could-be stage, not in a stage where you can plug in HTML5 and get a PDF out.

Go to https://fschutt.github.io/printpdf/ and see if it works for your purpose. It's likely it won't work since I'm still working on the font subsetting problem (aka file size explodes when you embed Chinese fonts).

@fschutt
Copy link
Owner

fschutt commented Jan 13, 2025

Is there a way to .collect::() the pages?

You'd still want to set the title on the document, so I think it would only save a couple of lines. I need the &self reference to add images and fonts.

Failed to render HTML to PDF pages: "Error parsing XML: Unknown token at line 1:7"

It does not like the <!doctype>, it does not like the unclosed <meta> tags (allowed in HTML, not allowed in XML).

I could get it to render this:

<html>
<head>
    <title>Invoice</title>
        <style>
        /* Reset for email clients */
        * {
            -webkit-text-size-adjust: 100%;
            -ms-text-size-adjust: 100%;
            box-sizing: border-box;
            margin: 0;
            padding: 0;
        }
        
        body {
            font-family: -apple-system, BlinkMacSystemFont, Arial, sans-serif;
            line-height: 1.4;
            margin: 0;
            padding: 0.75em;
        }
</style>
</head>
<body>
<p>Hello</p>
</body>
</html>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants