Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is this support font name contains Chinese characters? #38

Open
ZhangTiny1703 opened this issue Dec 10, 2019 · 16 comments
Open

is this support font name contains Chinese characters? #38

ZhangTiny1703 opened this issue Dec 10, 2019 · 16 comments
Labels

Comments

@ZhangTiny1703
Copy link

ZhangTiny1703 commented Dec 10, 2019

I add some thing in cjkgs-founder.dat:

Name: 楷体_GB2312
Class: GB
TTFname: KaiTi_GB2312.ttf

Name: 方正黑体_GBK
Class: GB
TTFname:  FZHTK.TTF

Name: 方正细等线简体
Class: GB
TTFname:  FZXDXJW.TTF

use command perl cjk-gs-integrate.pl to generate cidfmap.local, it's show:

/方正细等线简体 << /FileType /TrueType
  /Path pssystemparams /GenericResourceDir get
  (CIDFSubst/FZXDXJW.TTF) concatstrings
  /CSI [(GB1) 5] >> ;

/方正黑体_GBK << /FileType /TrueType
  /Path pssystemparams /GenericResourceDir get
  (CIDFSubst/FZHTK.TTF) concatstrings
  /CSI [(GB1) 5] >> ;

/楷体_GB2312 << /FileType /TrueType
  /Path pssystemparams /GenericResourceDir get
  (CIDFSubst/KaiTi_GB2312.ttf) concatstrings
  /CSI [(GB1) 5] >> ;

use commandgs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dPDFSTOPONERROR -dNOOUTERSAVE -dCompressFonts=true -dSubsetFonts=false -dEmbedAllFonts=true -sColorConversionStrategy=RGB -dCompatibilityLevel=1.6 -sOutputFile=output.pdf 1000027661706311.pdf to convert pdf ,its error:

GPL Ghostscript 9.50 (2019-10-15)
Copyright (C) 2019 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
While reading gs_cidfm.ps:
Error: /syntaxerror in (binary token, type=150)
Operand stack:
(gs_cmap.ps\000gs_setpd.ps\000gs_fapi.ps\000gs_typ32.ps\000gs_frsd.ps\000gs_ll3.ps\000gs_icc.ps\000gs_mex_e.ps\000gs_mro_e.ps\000gs_pdf_e.ps\000gs_wan_e.ps\000pdf_ops.ps\000pdf_rbld.ps\000pdf_base.ps\000pdf_draw.ps\000gs_cff.ps\000gs_mgl_e.ps\000gs_ttf...)   (gs_cidfm.ps)   1   --nostringval--   FZBSJW--GB1-0   --dict:3/3(G)--   FZDBSJW--GB1-0   --dict:3/3(G)--   FZHTJW--GB1-0   --dict:3/3(G)--   FZHTK--GBK1-0   --dict:3/3(G)--   FZSSK--GBK1-0   --dict:3/3(G)--   FZXBSK--GBK1-0   --dict:3/3(G)--   FangSong   --dict:3/3(G)--   KaiTi   --dict:3/3(G)--   KaiTi_GB2312   --dict:3/3(G)--   MicrosoftYaHei   --dict:4/4(G)--   MicrosoftYaHei-Bold   --dict:4/4(G)--   MicrosoftYaHeiLight   --dict:4/4(G)--   NSimSun   --dict:4/4(G)--   SimHei   --dict:3/3(G)--   SimSun   --dict:4/4(G)--   WenQuanYiZenHei   --dict:4/4(G)--   WenQuanYiZenHei-Adobe-CNS1   --dict:4/4(G)--   WenQuanYiZenHeiMono   --dict:4/4(G)--   WenQuanYiZenHeiMono-Adobe-CNS1   --dict:4/4(G)--   WenQuanYiZenHeiSharp   --dict:4/4(G)--   WenQuanYiZenHeiSharp-Adobe-CNS1   --dict:4/4(G)--   YouYuan   --dict:4/4(G)--   
Execution stack:
   %interp_exit   --nostringval--   --nostringval--   %loop_continue   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   --nostringval--   --nostringval--   --nostringval--   --nostringval--   1817   5   6   %oparray_pop   --nostringval--   %errorexec_pop   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push
Dictionary stack:
   --dict:918/1123(G)--   --dict:0/20(G)--   --dict:71/200(L)--   --dict:918/1123(G)--   --dict:9/14(G)--   --dict:1/1(G)--
Current allocation mode is global
Current file position is 2184

I guess ghostscript don't support Chinese ,and files all are ascii text. but I read your blog :

Name: HiraKakuPro-W3
Class: Japan
Provides(40): GothicBBB-Medium
Provides(40): A-OTF-GothicBBBPro-Medium
Filename(20): ヒラギノ角ゴ Pro W3.otf
Filename(10): HiraKakuPro-W3.otf

some 漢字 とかたかな in config file ?

@aminophen
Copy link
Member

The .dat file in this project is called database file, not directly used as a config file of Ghostscript.

The Filename: entry can contain any Unicode characters including 漢字/ひらがな/カタカナ, because that entry is used only for generating symlinks (the name of symlink should be ASCII-only) in Ghostscript resource directory. However, other entries (Name: and PSName:) are written to the Ghostscript config file, so they should not contain any 漢字/ひらがな/カタカナ characters because Ghostscript does not support them.

The Name: entry should be a PostScript name of a font; not a full font name. I don't have KaiTi_GB2312.ttf, but is it correct that it has a PostScript name "楷体_GB2312"? I've never seen such a font that have 漢字/ひらがな/かたかな characters.

@ZhangTiny1703
Copy link
Author

Thank you very much for your reply.
when I use the commandgs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dPDFSTOPONERROR -dNOOUTERSAVE -dCompressFonts=true -dSubsetFonts=false -dEmbedAllFonts=true -sColorConversionStrategy=RGB -dCompatibilityLevel=1.6 -sOutputFile=output.pdf 1000027661706311repair1.pdfto convert the pdf to pdf/a. it run and report an error:

Can't find CID font "·½ֽºی䞇BK".
Attempting to substitute CID font /Adobe-GB1 for /·½ֽºی䞇BK, see doc/Use.htm#CIDFontSubstitution.
The substitute CID font "Adobe-GB1" is not provided either. attempting to use fallback CIDFont.See doc/Use.htm#CIDFontSubstitution.
Loading a TT font from /usr/local/share/ghostscript/9.50/Resource/CIDFSubst/DroidSansFallback.ttf to emulate a CID font Adobe-GB1 ... Done.
Can't find CID font "·½ֽϸµɏ߼󍣢.
Attempting to substitute CID font /Adobe-GB1 for /·½ֽϸµɏ߼󍣬 see doc/Use.htm#CIDFontSubstitution.
Can't find CID font "¿¬ͥ_GB2312".
Attempting to substitute CID font /Adobe-GB1 for /¿¬ͥ_GB2312, see doc/Use.htm#CIDFontSubstitution.
Loading NimbusRoman-Regular font from /usr/local/share/ghostscript/9.50/Resource/Font/NimbusRoman-Regular... 9356676 7963478 3568272 2060641 3 done.
Page 2
Can't find CID font "·½ֽºی䞇BK".
Attempting to substitute CID font /Adobe-GB1 for /·½ֽºی䞇BK, see doc/Use.htm#CIDFontSubstitution.
Loading a TT font from /usr/local/share/ghostscript/9.50/Resource/CIDFSubst/DroidSansFallback.ttf to emulate a CID font Adobe-GB1 ... Done.
Can't find CID font "·½ֽϸµɏ߼󍣢.
Attempting to substitute CID font /Adobe-GB1 for /·½ֽϸµɏ߼󍣬 see doc/Use.htm#CIDFontSubstitution.
Page 3
Can't find CID font "·½ֽϸµɏ߼󍣢.
Attempting to substitute CID font /Adobe-GB1 for /·½ֽϸµɏ߼󍣬 see doc/Use.htm#CIDFontSubstitution.
Loading a TT font from /usr/local/share/ghostscript/9.50/Resource/CIDFSubst/DroidSansFallback.ttf to emulate a CID font Adobe-GB1 ... Done.
Can't find CID font "·½ֽºی䞇BK".
Attempting to substitute CID font /Adobe-GB1 for /·½ֽºی䞇BK, see doc/Use.htm#CIDFontSubstitution.
Can't find CID font "¿¬ͥ_GB2312".
Attempting to substitute CID font /Adobe-GB1 for /¿¬ͥ_GB2312, see doc/Use.htm#CIDFontSubstitution.

when i change the Shell‘s encoding to GB2312,its shows:

Can't find CID font "方正黑体_GBK".
Attempting to substitute CID font /Adobe-GB1 for /方正黑体_GBK, see doc/Use.htm#CIDFontSubstitution.
The substitute CID font "Adobe-GB1" is not provided either. attempting to use fallback CIDFont.See doc/Use.htm#CIDFontSubstitution.
Loading a TT font from /usr/local/share/ghostscript/9.50/Resource/CIDFSubst/DroidSansFallback.ttf to emulate a CID font Adobe-GB1 ... Done.
Can't find CID font "方正细等线简体".
Attempting to substitute CID font /Adobe-GB1 for /方正细等线简体, see doc/Use.htm#CIDFontSubstitution.
Can't find CID font "楷体_GB2312".
Attempting to substitute CID font /Adobe-GB1 for /楷体_GB2312, see doc/Use.htm#CIDFontSubstitution.
Loading NimbusRoman-Regular font from /usr/local/share/ghostscript/9.50/Resource/Font/NimbusRoman-Regular... 9356676 7963478 3568272 2060641 3 done.
Page 2
Can't find CID font "方正黑体_GBK".
Attempting to substitute CID font /Adobe-GB1 for /方正黑体_GBK, see doc/Use.htm#CIDFontSubstitution.
Loading a TT font from /usr/local/share/ghostscript/9.50/Resource/CIDFSubst/DroidSansFallback.ttf to emulate a CID font Adobe-GB1 ... Done.
Can't find CID font "方正细等线简体".
Attempting to substitute CID font /Adobe-GB1 for /方正细等线简体, see doc/Use.htm#CIDFontSubstitution.
Page 3
Can't find CID font "方正细等线简体".
Attempting to substitute CID font /Adobe-GB1 for /方正细等线简体, see doc/Use.htm#CIDFontSubstitution.
Loading a TT font from /usr/local/share/ghostscript/9.50/Resource/CIDFSubst/DroidSansFallback.ttf to emulate a CID font Adobe-GB1 ... Done.
Can't find CID font "方正黑体_GBK".
Attempting to substitute CID font /Adobe-GB1 for /方正黑体_GBK, see doc/Use.htm#CIDFontSubstitution.
Can't find CID font "楷体_GB2312".
Attempting to substitute CID font /Adobe-GB1 for /楷体_GB2312, see doc/Use.htm#CIDFontSubstitution.

These warining indicate that the pdf needs these fonts with Chinese characters。
then I use Adobe acrobat open the pdf ,File>Properties>Fonts, view the result :

Snipaste_2019-12-11_09-12-17

1000027661706311.pdf

@aminophen
Copy link
Member

Sorry, we've never tested such a usage that embeds a real font into a PDF with non-embedded fonts; the script was originally intended for proper setup of Ghostscript for PostScript -> PDF conversion when the PostScript input file contains some CJK font names. After small testing, I found out that it works when the PostScript Name written in the PDF input file contains ASCII characters, but fails when the PostScript Name contains other characters like "方正黑体_GBK". It seems that this is due to the limitation of Ghostscript.


However, I also found that a fallback font named "Adobe-GB1" can be used instead of such a non-ASCII PostScript Name. The hint was there in your post:

Can't find CID font "方正黑体_GBK".
Attempting to substitute CID font /Adobe-GB1 for /方正黑体_GBK, see doc/Use.htm#CIDFontSubstitution.

Currently the file cjkgs-founder.dat contains

# FZShuSong-Z01
Name: FZSSK--GBK1-0
Class: GB
Provides(55): STSong-Light
TTFname: FZSSK.TTF

When I add Provides(55): Adobe-GB1 like the below ...

# FZShuSong-Z01
Name: FZSSK--GBK1-0
Class: GB
Provides(55): STSong-Light
Provides(55): Adobe-GB1
TTFname: FZSSK.TTF

the resulting PDF have FZSSK.TTF embedded. Of course there is a limitation that all such fonts are converted to a single font, regardless of the original font family (Serif or San-Serif). --- and also, some non-CJK fonts are not rendered correctly at least on my side, but this should be unrelated to this topic ...

20191212-cjkgs-embed

@HinTak
Copy link

HinTak commented Mar 18, 2021

@aminophen you should read the "CIDFontSubstitution" section of the doc/Use.htm file in the ghostscript source. Non-ascii postscript font name can be used but need to be hex-escaped . Thus instead of /hanzi for the first part of cidfmap entry, you can do <68616E7A69> cvn (68616E7A69 is hanzi in hex). Hope this helps. You need to convert the Chinese characters to hex.

@HinTak
Copy link

HinTak commented Mar 18, 2021

Thus if you have "方正黑体_GBK" as a truetype font, you can tell ghostscript by make a hex entry of that name.

@aminophen
Copy link
Member

@HinTak Thanks for the information ;-) But I don't have such truetype font, so I can't test it.

@ZhangTiny1703 You have 3 truetype fonts, right? If so, could you test hex entries of those fonts?

@HinTak
Copy link

HinTak commented Mar 19, 2021

@aminophen it is not what fonts you have, but what pdf requires such fonts (and not have them embedded). The above is one, I imagine.

@aminophen
Copy link
Member

@HinTak Hmm, then it might not be what we can support by this project. For example FZHTK.TTF is already registered in cjkgs-founder.dat; in this case, adding an alias "方正黑体_GBK (in hex) => FZHTK--GBK1-0" in cidfmap.aliases should work. As I don't know what real font is embedded in what name in a PDF, I don't want to add such aliases. I mean, a simpler name "方正黑体" instead of "方正黑体_GBK" may appear in some PDF ???

@aminophen
Copy link
Member

Or, should I simply add a code to encode some non-ASCII characters into hex, for those who want to add user-defined database containing such characters?

@HinTak
Copy link

HinTak commented Mar 19, 2021

@aminophen yes, I got here because I was looking for an answer for a pdf with "黑体" as one of it not-embedded font names. (looks like made by the same piece of sh*t software) I knew an answer exists, as I used to work in ghostscript and even that part of it... Anyway, it is as I wrote, you do <68616E7A69> cvn and it is documented in the correct place. One just needs to read it.

The table is non-exclusive - you can have multiple font names mapped to the same font file (ie substitution), and also same font names mapped to different font files (latter entries override earlier ones, I think).

@HinTak
Copy link

HinTak commented Mar 19, 2021

Here is a "cidfmap" file which would process the pdf posted above (% for comments):

%%% 方正细等线简体
<B7BDD5FDCFB8B5C8CFDFBCF2CCE5> cvn << /FileType /TrueType /Path (方正细等线简体.ttf) /CSI [(GB1) 2] >> ;
%%% 方正黑体_GBK
<B7BDD5FDBADACCE55f47424b> cvn << /FileType /TrueType /Path (FZHTK.TTF) /CSI [(GB1) 2] >> ;
%%% 楷体_GB2312
<BFACCCE55f474232333132> cvn << /FileType /TrueType /Path (KaiTi_GB2312.ttf) /CSI [(GB1) 2] >> ;

edit the path as appropriate for yourself . (I have all of them symlinked in the current directory, and one of them with a chinese name "方正细等线简体.ttf" too, but it will be different for you).

This makes the Chinese content rendering as intended. There seems to be a bug with ghostscript for the english fonts. I just filed as https://bugs.ghostscript.com/show_bug.cgi?id=703716 .

This is the converted output:
output.pdf

Note the chinese content is correct, the english parts is not. See https://bugs.ghostscript.com/show_bug.cgi?id=703716 .

@HinTak
Copy link

HinTak commented Mar 19, 2021

BTW, it is exactly as Ken Sharp replied on stackoverflow (to the same reporter, I think) - he just have not given you an actual example of how. e.g. "BFACCCE55f474232333132" is "楷体_GB2312" in GB2312 encoding in hex, "5f474232333132" is "_GB2312", which is the same in utf8 /ascii encoding as in GB2312 encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants
@aminophen @HinTak @ZhangTiny1703 and others