-
-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't include unused fonts in the PDF document #134
Conversation
So I actually looked a bit into subsetting with There currently is a problem with the I'll look a bit more into this and do some more testing for my own project. Let me know if you would be interested in an implementation of this into the crate. A pretty simple implementation could be to save all chars that get printed into a hashset linked to the current font. Then the font could be trimmed before writing it into the output PDF. That's probably not the most performant way to do this, but it might be fine. Especially if it is an optional feature. Edit: I just saw that the PDF text operations mostly use the codepoints directly and not the characters. That of course makes this more difficult. In theory it would still be possible to somehow remap the codepoints before finally writing the output PDF, but that seems a bit more tedious |
Ok I did in fact manage to get automatic subsetting working in My implementation runs when Since it was honestly kind of a hassle to work with the current codebase, I merged the currently open PR #131 before implementing this feature. If that gets merged, it would be pretty easy to integrate |
@dnlmlr merged, can you rebase / fix? thanks |
- Todo: Don't panic!
- Instead of mapping the GIDs via the unicode values, we can just make use of the fact that the new GIDs are issued in the same order as the glyphs_to_keep are provided - This means that the mapping can be significantly simplified as the first *old* GID will have the new GID `1`, the next will have the new GID `1` and so on - This also fixes the issue of glyphs that don't have a unicode value. These couldn't be mapped correctly before. This includes some specific math symbols like for example the glyph `radical.v1`
- This is still not optimal since errors are handled silently and simply cause a fallback to not using subsetting
500e6ef
to
248974c
Compare
I rebased my current This also contains another |
lgtm, although I think I should slowly work towards a proper data model for PdfPage, so that manipulation becomes easier |
So I have seen some talk about using
allsorts
to do actual subsetting which would significantly reduce the PDF size. This is not actually removing glyphs but at least it is omitting completely unused fonts from the PDF output.I am using the
PdfLayerReference::set_font
function to mark fonts as used and then skip unmarked fonts inFontList::into_with_document
. This is a rather simple hack, but it allows for adding all the fonts you want without wasting space on fonts / font variants that are not used at all. The main usecase for me is with thegenpdf
create where it is required to add all 4 variants of a font (regular, italic, bold, italic-bold) even if not all of them are used.Let me know if there is something I missed here which could cause problems