Ghostscript font substitution riddle

Ghostscript font substitution riddle - ghostscript

How can I get Ghostscript to substitute Arial/Arial-Bold for Arial+000040/Arial,Bold+000041 when reading
jhtest.pdf?
Ghostscript insists on substituting Helvetica-Bold for both fonts.
Changing the font name in the pdf using vim in binary mode helps - jhtest-patched.pdf
Log for jhtest.pdf
GS_FONTPATH=C:\Windows\Fonts
gs -dNOPAUSE -dBATCH -dCCFONTDEBUG -sDEVICE=nullpage jhtest.pdf
GPL Ghostscript 9.18 (2015-10-05)
Copyright (C) 2015 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
Scanning C:\windows\Fonts for fonts... 666 files, 473 scanned, 447 new fonts.
Querying operating system for font files...
Substituting font Helvetica-Bold for Arial+000040.
Loading NimbusSanL-Bol font from %rom%Resource/Font/NimbusSanL-Bol... 8611036 7144230 2673392 1348904 3 done.
Substituting font Helvetica-Bold for Arial,Bold+000041.
Substituting font Times-Bold for TimesNewRoman,Bold+000013.
Loading NimbusRomNo9L-Med font from %rom%Resource/Font/NimbusRomNo9L-Med... 8870100 7399404 3366000 1964135 3 done.
Log for jhtest-patched.pdf - Arial and Arial-Bold are substituted as expected.
GS_FONTPATH=C:\Windows\Fonts
gs -dNOPAUSE -dBATCH -dCCFONTDEBUG -sDEVICE=nullpage jhtest-patched.pdf
GPL Ghostscript 9.18 (2015-10-05)
Copyright (C) 2015 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
Can't find (or can't open) font file %rom%Resource/Font/ArialMT.
Can't find (or can't open) font file ArialMT.
Can't find (or can't open) font file %rom%Resource/Font/ArialMT.
Can't find (or can't open) font file ArialMT.
Scanning C:\windows\Fonts for fonts... 666 files, 473 scanned, 447 new fonts.
Can't find (or can't open) font file %rom%Resource/Font/ArialMT.
Can't find (or can't open) font file ArialMT.
Loading ArialMT font from C:\windows\Fonts/arial.ttf... 8312100 3435413 4127492 2703302 3 done.
Can't find (or can't open) font file %rom%Resource/Font/Arial-BoldMT.
Can't find (or can't open) font file Arial-BoldMT.
Loading Arial-BoldMT font from C:\windows\Fonts/arialbd.ttf... 8369364 3483445 6172560 4696464 3 done.
Querying operating system for font files...
Substituting font Times-Bold for TimesNewRoman,Bold+000013.
Loading NimbusRomNo9L-Med font from %rom%Resource/Font/NimbusRomNo9L-Med... 8413932 3678215 7135440 5602384 3 done.

This looks like an attempt by the creating software to include a subset font (subset fonts are normally named with a 6 letter 'tag' a plus sign and then the original font name). However this isn't (obviously) a font corresponding to that scheme.
The fonts are not embedded, whihc is frankly a bad idea, and the names are non-standard. This means that the PDF consumer must use a substitute font. The default substitute font for Ghostscript is Helvetica, which is why you get that.
If you change the fontnames to match the 'real' font name, then Ghostscript (and other PDF consumers) are able to find Arial as a substitute.
In order to get Ghostscript to find the 'mangled' names in your file, you would have to specifically define a substitute for those exact font names.
Since you are using Windows your build is using a ROM file system. However, to complicate matters, you seem o be using a Linux version of Ghostscript (gs instead of gswin32 or gswin64).
This makes me unsure what exactly you are doing. However, if I get the Ghostscript source, modify the file /ghostpdl/Resource/Init/fontmap.GS:
/Arial+000040 /ArialMT ;
and then run Ghostscript:
gswin32c -I/ghostpdl/Resource/Init jhtest.pdf
the result is that Arial is used for Arial+000040. You will need to modify this to suit your environment, and you will need to locate the resource files appropriate to the version of Ghostscript which you are using (because they are versioned).
You can then add as many substitutes as you like.
Or you can get 'Visual Software' to produce more sensible PDF file which have the font embedded. Or faling that at least don't mangle the font names.

Related

why is ghostscript replacing embedded fonts?

well I've given up. I don't think I understand how GS works... as far as I understand GS replaces all fonts that are not embedded and should not touch already embedded ones? why is it replacing them? I have a pdf file that contains 2 embedded fonts and 1 not embedded (ArialMT).
I'm using command:
"gswin64c.exe -I "C:/Program Files/gs/gs9.56.1/Resource/Init" -sFONTMAP="Fontmap.GS" -dNOSAFER -dPDFACompatibilityPolicy=1 -sColorConversionStrategy=LeaveColorUnchanged -dBATCH -dNOPAUSE -sDEVICE="pdfwrite" -dAutoRotatePages=/None -dPDFA=3 -sOutputFile="pdfa.pdf" "original.pdf"
all I get with this command is this error:
GPL Ghostscript 9.56.1: Actual TT subtable offset xxxxx differs from one in the TT header yyyy. (multiple ones like this)
The following errors were encountered at least once while processing this file:
error executing PDF token
**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> StreamServe Communication Server 16.6.1 GA Build 319 (64 bit) <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
the output is a pdf without ANY fonts...
is there any way to force GS not to replace font if it wasn't found on the system?
why is it replacing ArialMT with NimbusSans-Regular even though I have declared a specific path to ArialMT in my FontMap.GS file?
I'd rather not share this pdf file as it contains sensitive customer data.
(osadzony podzestaw=embedded subset)

Ghost Script substitution will require embeddable fonts on windows those are usually stored in C:\Windows\Fonts
Thus if font substitution was simple (without look-up) your command could be simplified
gswin64c.exe -sFONTPATH="C:\Windows\Fonts" -dNOSAFER -sDEVICE=pdfwrite -dNEWPDF=false -dPDFA=3 -dPDFACompatibilityPolicy=1 -sColorConversionStrategy=LeaveColorUnchanged -dAutoRotatePages=/None -o"pdfa.pdf" "original.pdf"
you need to add -dNEWPDF=false Since to include additional mapping you add -I "C:/Program Files/gs/gs9.56.1/Resource/Init" -sFONTMAP=Fontmap.gs
Thus the following should be a startpoint
gswin64c.exe -sFONTPATH="C:\Windows\Fonts" -I "C:/Program Files/gs/gs9.56.1/Resource/Init" -sFONTMAP=Fontmap.gs -dNOSAFER -sDEVICE=pdfwrite -dNEWPDF=false -dPDFA=3 -dPDFACompatibilityPolicy=1 -sColorConversionStrategy=LeaveColorUnchanged -dAutoRotatePages=/None -o"pdfa.pdf" "original.pdf"
It will not remove warnings using a PDF file from the same developer, the difference was now there is no mention of Nimbus, but the substitutions should be better/fuller as the warning messages verified the fonts were eventually applied from windows
Note the file is smaller although the fonts are embedded, and in side by side comparison they look the same.
GPL Ghostscript 9.56.1: PDFA doesn't allow images with Interpolate true.
and
The following errors were encountered at least once while processing this file:
missing white space after number
error executing PDF token
**** This file had errors that were repaired or ignored.
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
from their report.
If I save the file from Acrobat the file size drops but the same issues reside

How to use custom fonts in Ghostscript/PostScript?

I convert a PostScript file to PDF by Ghostscript. I have a problem embedding/installing Type 1 fonts.
For installing Type 1 Font, I can add the PFA file path to Ghostscript Fontmap, which should be in /usr/share/ghostscript/version/FONTMAP, but I have no such file in /usr/share/ghostscript/9.50` or similar folders on Ubuntu 20.04.
How can I include the font file directly within the script:
Instead of
/Times-Bold findfont 10 scalefont setfont
something like
(/home/font.pfa) 10 scalefont setfont
Does PostScript/Ghostscript use AFM file data or read the glyph widths or just from the glyph structure provided in PFA file?

The fonts and Fontmap file can be placed in several directories. Here is a typical search path:
/usr/share/ghostscript/9.52/Resource/Init/Fontmap
/usr/share/ghostscript/9.52/lib/Fontmap
/usr/share/ghostscript/9.52/Resource/Font/Fontmap
/usr/share/ghostscript/fonts/Fontmap
/usr/share/fonts/Type1/Fontmap
/usr/share/fonts/Fontmap
I sometimes use fonts that are not installed in the search path just in the current working directory. I use the gs -P and either of these work:
(font.pfa) 12 selectfont
/font.pfa 12 selectfont
The search path can also be modified by adding the directories to the GS_FONTPATH or GS_LIB environment variables.
The AFM file is not mandatory and the metrics can be obtained from the font alone. Some programs use the AFM file instead of the actual fonts and so they are needed for those programs.

You do, you just have the path/name slightly wrong:
/usr/share/ghostscript/9.50/Resource/Init/Fontmap.GS
You can also use your own custom fontmap using a command line parameter: "-sFONTMAP=/path/to/custom/fontmap" (best to copy the system one, and add your customisations to the copy)
You can't, not like that, anyway - that's not how Postscript works. Postscript always references fonts by name (not by file/path), so whilst there ways to read the font file(s), you still need to know the font name(s) in order to scale and set the font(s).
Ghostscript does not use AFM files, it gets the metrics from the fonts and glyph outlines.
Hope that helps some....

Why won't Ghostscript recognize my modified 'cidfmap' file?

I'm attempting to use Ghostscript 9.27 on Windows 10 Pro to compress a PDF with CID fonts, using a modified 'cidfmap' file ($GS_HOME/Resource/Init/cidfmap). However, Ghostscript doesn't seem to recognize my changes to 'cidfmap', and instead wants to load the DroidSansFallback TrueType font to emulate the missing CID font.
I have tried using the "-I" command line parameter to tell Ghostscript to use the modified file in the $GS_HOME/Resource/Init directory, as specified in the documentation.
I've also tried building the source code within Developer Command Prompt for VS 2017, using the following command (and no errors):
nmake /A psi/msvc.mak MSVC_VERSION=15 WIN64=
Below is the full Ghostscript command I am running in the command prompt:
gswin64c.exe -I"C:/Program Files/gs/ghostscript-9.27/Resource/Init" -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dPDFSTOPONERROR -dBATCH -dNOPAUSE -sOutputFile=output.pdf m0001-062-1.pdf
the record added to the 'cidfmap' file (it's the only one):
/MSPGothic << /FileType /TrueType /Path ("C:/Windows/Fonts/msgothic.ttc") /SubfontID 0 /CSI [(Japan1) 2] >> ;
and the output from Ghostscript I've been receiving in both cases:
GPL Ghostscript 9.27 (2019-04-04)
Copyright (C) 2018 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 4.
Page 1
Loading NimbusRoman-Bold font from C:/Program Files/gs/ghostscript-9.27/Resource/Font/NimbusRoman-Bold... 4570288 3226611 4074256 2336262 4 done.
Page 2
Page 3
Querying operating system for font files...
Substituting font Helvetica for ArialMT.
Loading NimbusSans-Regular font from C:/Program Files/gs/ghostscript-9.27/Resource/Font/NimbusSans-Regular... 5086792 3742157 2284000 967988 4 done.
Substituting font Helvetica-Narrow for ArialNarrow.
Loading NimbusSansNarrow-Regular font from C:/Program Files/gs/ghostscript-9.27/Resource/Font/NimbusSansNarrow-Regular... 5273304 3930300 2397536 1064531 4 done.
Substituting font Helvetica-Bold for Arial-BoldMT.
Loading NimbusSans-Bold font from C:/Program Files/gs/ghostscript-9.27/Resource/Font/NimbusSans-Bold... 5500440 4150230 3021540 1680111 4 done.
Can't find CID font "MSPGothic".
Attempting to substitute CID font /Adobe-Japan1 for /MSPGothic, see doc/Use.htm#CIDFontSubstitution.
The substitute CID font "Adobe-Japan1" is not provided either. attempting to use fallback CIDFont.See doc/Use.htm#CIDFontSubstitution.
Loading a TT font from C:/Program Files/gs/ghostscript-9.27/Resource/CIDFSubst/DroidSansFallback.ttf to emulate a CID font Adobe-Japan1 ... Done.
Page 4
Can't find CID font "MSPGothic".
Attempting to substitute CID font /Adobe-Japan1 for /MSPGothic, see doc/Use.htm#CIDFontSubstitution.
Loading a TT font from C:/Program Files/gs/ghostscript-9.27/Resource/CIDFSubst/DroidSansFallback.ttf to emulate a CID font Adobe-Japan1 ... Done.
It seems as if I've missed something simple here, as others with similar questions got it working with just the "-I" command line parameter.
What am I doing wrong?

The problem is (once I looked carefully enough!) clear. You've put quotes "" around the filename.
The '(' and ')' characters are string delimiters in PostScript, not " (and the cidfmap file is read as a PostScript program) so by doing that you've made the " characters part of the path. Unsurprisingly Ghostscript can't find a path beginning with "
So if you change your cidfmap entry to:
/MSPGothic << /FileType /TrueType /Path (C:/Windows/Fonts/msgothic.ttc) /SubfontID 0 /CSI [(Japan1) 2] >> ;
you should find it works, it does for me.

How do I convert a Markdown document with Japanese to Beamer?

For example, I have this Markdown document:
## Markdown test
Japanese 日本語
I run:
pandoc mwe.rmd -t beamer -o mwe.pdf --latex-engine=xelatex -V mainfont=MS\ Mincho
The words 日本語 simply disappeared in the resulted file. If I don't use the Beamer template then it works correctly.
I don't have to use pandoc. Anything that gets me from Markdown to PDF (slides) on a Mac (with MacTex) would work for me.
If there is no easy solution, I'd be okay with anything that results in non-Beamer PDF slides.

I'm assuming that you do have installed a font named MS Mincho on your system, and it shows up when your start the Font Book.app? (Looks like, otherwise your "normal" PDF output wouldn't work, but you said it does...)
There are various options to check and verify, which exact font name you should use.
1. Font Book.app (GUI application)
Start Font Book.app.
Type mincho into the top right search box.
All installed fonts with 'Mincho' in their names show up.
Click on one of the font faces (NOT the main entry!) in the list.
A font sample will be displayed.
Click on the button with the little i-logo.
The font's metadata will display.
From the font's metadata you can infer the PostScript name and the Full name of the font. Both should work with XeLaTeX. (I usually put quotes around font names with spaces: -V mainfont="YuMincho Medium"
Here is a screenshot with the relevant parts of the Font Book.app UI highlighted in red. Sorry, I do not have MS Mincho installed, I can only show it with another font:
2. fc-list (command line utility)
fc-list is a command line utility that is available via the MacPorts fontconfig package.
If you have it installed, use it.
To get a list of font names available for XeTeX, you can simply run:
fc-list -f "%{family}\n"
fc-list :outline -f "%{family}\n"
The second command suppresses the listing of bitmap only fonts. Such fonts are unusable for TeX. -- For some more verbosity, and a nice formatting of the info, you could also run:
fc-list :outline -f " family: %{family}\nfullname: %{fullname}\n file: %{file}\n\n"
To get a list of names containing 'Mincho', run:
fc-list -f "%{family}\n" | grep -i mincho
Change your setup
Now that this smaller problem ("Which font names should I use?") is out of the way, lets deal with your main one:
The Pandoc Beamer template (and standard Beamer itself) does not use the \setmainfont command. Therefor putting -V mainfont=... onto the Pandoc command line does not do anything.
You can check this by querying the default internal template used by Pandoc to produce beamer output:
$ pandoc -D beamer | less
Search for a $mainfont$ variable in there and you'll find none!
You have to modify your setup a bit to get success:
First, create a simple text file named mincho.tex with the following two lines of content (I'm using my Mincho font name here, so I can really test if my advice will work):
\usepackage{xeCJK}
\setCJKmainfont{YuMincho Medium}
The xeCJK package is required by XeLaTeX for supporting Japanese (and Chinese+Korean) fonts.
Second, add -H mincho.tex to the command line so the above code snippet is included into the LaTeX code generated by Pandoc.
This is the complete command to convert your Markdown to Beamer-PDF:
pandoc \
mwe.rmd \
-t beamer \
-o mwe.pdf \
--latex-engine=xelatex \
-H mincho.tex
Result (screenshot):
The fonts used by the Beamer-PDF are these:
$ pdffonts mwe.pdf
name type encoding emb sub uni objID
----------------------------------- ------------ ----------- --- ----- ---- -----
TZVOMD+LMSans8-Regular-Identity-H CID Type 0C Identity-H yes yes yes 7 0
WMSBXQ+LMSans12-Regular-Identity-H CID Type 0C Identity-H yes yes yes 30 0
FXCTKJ+LMSans10-Regular-Identity-H CID Type 0C Identity-H yes yes yes 32 0
NXJKDD+YuMin-Medium-Identity-H CID Type 0C Identity-H yes yes no 34 0

abbyy finereader.exe looking for cmd commands to use in other programms

I just bought abbyy finereader 11 copr to rund it from another programm, but i cant find any commends to be used for finereader.exe.
so without any commands it simply openens and scans but i need to tell it where to save the document and how to name and the to close the app again, also it would be cool to have it as a background task.

While doing my OCR research project, found one. Works with FR12, didn't tested with earlier versions.
FineCmd.exe PRESS2.TIFF /lang Mixed /out C:\temp\result.txt /quit
general command line: <open_keys/scanning> [<recognition_keys>] [<export_keys>]
<open_keys/scanning> ::= ImageFiles | /scan [SourceName] | /file [filename1 filename2], where
ImageFiles - list of files for recognition
SourceName - images source (scanner); if not specified, current is used
filename.. - list of files for recognition
<recognition_keys> ::= [/lang Language] [/optionsFile OptionsFileName], where
Language - name of language in English (russian, greek, Mixed)
OptionsFileName - path to options file
<export_key> ::= /out ExportFile | /send Target, where
ExportFile - name of file with extension to save file to
(txt, rtf, doc, docx, xml, htm(l), xls, xlsx, ppt, pptx, pdf, dbf, csv, lit);
Target - name of target app where to open
(MSWord, MSExcel, WordPro, WordPerfect, StarWriter, Mail, Clipboard, WebBrowser, Acrobat, PowerPoint)
This command opens FR ui, processes the file and then closes it (if you pass argument /quit). FineCmd.exe located in FR directory where you installed it

Hello I saw this msg very late but i m using ABBYY command line for 10years .
I prefer ABBYY 8 because makes same good job faster and does not open any GUI . It comes with FineOCR.exe:
"C:\...\ABBYY FineReader 8\FineOCR.exe" %1 /lang greek english /send MsWord
It does OCR and opens MsWord . FineOCR.txt is a simple help file.
Regarding ABBYY 11,12 (all versions) there is a FineCmd.exe . Using something like:
"c:\...\FineReader\FineCMD.exe" %1 /lang greek english /send MsWord
does what FineOCR did before (but no .txt help file)

Unfortunately, Such a professional OCR software doesn't support command line utilities. For batch processing, it offers HOT FOLDER utility inside it (from GUI). http://informationworker.ru/finereader10.en/hotfolder_and_scheduling/installandrun.htm
If you want to make OCR batch processing from your program, they sell another software, called 'ABBYY Recoginition Server'.
There also offer a comprehensive API for programmers : http://www.abbyy.com/ocr_sdk_windows/technical_specifications/developer_environment/
If your plan is to batch process them and write the contents to a Database, you can also do a programmatical trick to overcome such limitation, as I did recently in one of my projects (It is a bit offline-way but it is simple and works) : While parsing the files and putting them to your Database table from your program, move (or copy) them all into a folder while changing their filename to include an ID from your Database table. Then use 'hot folder' utility to OCR all files, by having the same filename with TXT extention (It is set from 'hot folder' settings). Then in your program parse the folder's text files, get their content as string, and parse the table IDS from filename, the rest is updating your table with that information.)

An year later, ABBYY does support command line usage: http://www.ocr4linux.com/en:documentation

Version 14 does not save the output file using:
FineCmd.exe PRESS2.TIFF /lang Mixed /out C:\temp\result.txt /quit
or
FineCmd.exe PRESS2.TIFF /lang Mixed /out C:\temp\result.txt
Versions 11 & 12 work well using the above commands (does save the output) but does display the GUI which can be closed using /quit.
Versions 9 & 10 don't come with FineCmd.exe or FineOCR.exe.
Version 8 can OCR and send the output to an application of choice but cannot save using /out. In my experience it does open the GUI.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio