Question about retaining mixed plex when converting from PS to PDF - ghostscript

Good day,
We print Postscript files directly on industrial Xerox printers.
One client's Postscript files were getting garbled due to a font issue that I was unable to track down, so I used Adobe's Distiller to convert from PS to PDF. The same font issues turned up in the PDFs that were generated from Distiller. No amount of option tweaking helped me out, and find/replace font operations using the Callas pdfToolbox didn't work out for me.
So, I downloaded Ghostscript and spent an entertaining hour remembering how DOS worked. I was eventually able to convert several PS files into flawless-looking PDFs by going to the Ghostscript directory and doing this:
gswin64 -dQUIET -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=myoutputfilename.pdf myinputfilename.ps
But, I didn't think things all the way through because now I'm faced with the problem of mixed-plex. Some of the documents in the file are one-page documents and some are two-page documents, which should be printed duplex.
PS handles all of this for us when we put it on one of the Xerox printers. PDF, of course, does not. I can only specify simplex or duplex on the printer - so it's either one or the other, which doesn't work for a PDF with both.
Is there any clean, (or dirty), way to get around this? I was thinking of somehow instructing Ghostscript to insert blank pages after every simplex page of a PS file, and then just printing the entire PDF duplex, but have no idea how I would begin to do this.
Any assistance greatly appreciated. :)

It 'sounds like' you have concatenated several PostScript program together here, is that the case ?
This isn't really a great idea, it can lead to incorrect output, I wonder if this is the source of your problem with Distiller and your printer.
Have you tried producing PostScript instead of PDF, by using the ps2write device instead of pdfwrite ? While this won't carry any of the device-specific controls (such as /Duplex), you can easily put them back. In fact recent versions of the device will allow you to specify code to be inserted at document and/or page level.

Related

Ghostscript - Indentation of postscript code

Is there an option for to me to ask Ghostscript to indent the Postscript it creates?
Everything starts at the beginning of a line and I find it difficult to follow.
Alternatively, I am using Emacs and ps-mode.
If anyone know how to indent code in this mode I would appreciate a tip (apologize because this may not be relevant to this StackExchange)
No, there is no option for indenting the output.
PostScript is pretty much regarded as a write-only language anyway, and the output of ps2write (which is what I assume you are using though you don't say) is particularly difficult since it fundamentally outputs PDF syntax with a PostScript program on the front to parse it into PostScript operations.
Why do you want to read it ?
[EDIT]
You can always edit your question, you don't need to post a new answer.
I'm afraid what you want to do isn't as simple as you might think.
It might be possible for this use case if the PDF files you receive are always created the same way, but there are significant problems.
The font you use as a substitute for the missing font must be encoded the same way. Say for example the font in the PDF file is encoded so that 0x41 is 'A', you need to make sure that the replacement font is also encoded so that 0x41 is an 'A'. So just the findfont, scalefont, setfont sequence is not always going to be sufficient, sometimes you will need to re-encode the font.
CIDFonts will be a major stumbling block. Firstly because ps2write simply doesn't emit CIDFonts at all. These were not part of level 2 PostScript. As a result all text in a CIDFont will be embedded as bitmaps. If your original file doesn't contain the CIDFont then you'll get the fallback CIDFont bitmapped.
Secondly CIDFonts can use multiple-byte character codes, of variable length. You can't simply replace a CIDFont with a Font, it just won't work.
The best solution, obviously, is to have the PDF files created with the fonts required embedded. This is best practice. If you can't get that, then I'd suggest that rather than trying to hand edit PostScript, you use the fontmap.GS and cidfmap files which Ghostscript uses to find font.
Ghostscript already has a load of code to do font substitution automatically, using both Fonts and CIDFonts as substitutes, and it does all the hard work of re-encoding the fonts or building CMaps as required. If you are on Windows much of this may already be done for you, when you install Ghostscript it will ask if you want to create font mappings. If you said yes then it will
Add the font substitutions you want to use in those files (they have comments explaining the layout) and then use the pdfwrite device to make a new PDF file. Set EmbedAllFonts to true (you may need to add a AlwayEmbed font array as well, listing the fonts specifically) and SubsetFonts to false.
That should create a new PDF file where the missing fonts have been replaced by your defined substitutes, those substitutes will have been embedded in the new PDF file and they have will not been subset (Acrobat will generally refuse to edit text in a subset font).
The switches I mentioned above are standard Adobe Distiller parameters, but they are documented for pdfwrite here. There's some documentation on adding fonts here and here and specifically for CIDFonts here.
Basically I'd suggest you define your substitutions and let Ghostscript do the work for you.
This is not an answer to the problem but rather an answer to KenS's question about "Why do you want to read it?"
I tried to put it in the comment box but it was too long.
I am a retired engineer with a strong programming background.
I would like to read and understand the postscript code for the reason shown below.
I play duplicate bridge as a hobby. I recieve a PDF file of what is know as a convention card (a single page document of bridge agreements).
Frequently I would like to edit these files.
When I open with Adobe Illustrator I have to spend a significant amount of time replacing fonts that are not on my system with fonts that I do have.
I can take the PDF and export it as a postscript file using Ghostscript.
I was going to write a little program to replace the embedded fonts with the fonts that I use to replace them.
I was going to leave the postscript file unaltered and insert things like
/HelveticaMonospacedPro-RG findfont
12 scalefont setfont
just above where the text is written.
I was planning on using the fonts that I have on my system (e.g., HelveticaMonospacedPro-RG).

GhostScript Image Quality issue while Printing

Am using GhostScript.Net 1.2.0 version. Am converting a pdf file into list of images to print. My Printed image height and width is fine but the printed image quality is poor. Please help me how to improve the image quality while converting a pdf to image using ghostscript.net
You need to either take this up with the Ghostscript.Net maintainer or find some way to tell us what command line/configuration you are using (ALL of it!), you will also need to supply an example file and define what you find objectionable in your current prints. 'image quality is poor' is extremely subjective, not helpful at all, there could be many, many reasons for 'poor quality', starting with your input file.
You also need to state what operating system you are using, and what your printing setup is. If you have tried anything already, then you need to say what you have done or we will waste much time suggesting dead ends.
Note that if you are using the mswinpr2 device, there may be little that can be done as that relies on the printer driver in the Windows system to do the actual printing.

Add page to multiple PDFs in batch without messing with fonts

I'm trying to use Ghostscript to append a PDF as "last page" to multiple other PDFs. The problem I'm encountering is that Ghostscript walks through the whole PDF and does a bunch of font substitution.
I'm using the following batch script:
FOR %%G IN (*.pdf) DO IF NOT %%G==lastpage.pdf gswin64c -sDEVICE=pdfwrite -sOutputFile="output\%%G" -dNOPAUSE -dBATCH "%%G" lastpage.pdf
Example Error:
Page 12
Substituting font Courier for GGCJBF+Courier.
I will also sometimes get other errors, like this:
jbig2dec FATAL ERROR decoding image: prevent DOS while decoding height classes (segment 0x00)
failed to create parsed JBIG2GLOBALS object.
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
All I need gs to do is append my lastpage.pdf to the existing PDFs without walking through the entire PDF I'm appending to, especially with font substitution, because I will not have most of the fonts other people are using in their PDFs.
Is it possible in gs to simply append without walking through every page of the PDF? Is there another tool that will allow appending of PDFs in batches without this issue?
You need to be aware that Ghostscript does not simply manipulate the incoming PDF file, so you aren't 'appending' a page. What it does is interpret the incoming file into marking operations, pass those to a device, and that device takes further action on them. Rendering devices write to a bitmap, pdfwrite reassembles the marking operations into a brand new file.
That's why it 'walks through the whole file', its the way it works. There are advantages to this (its possible to alter the file contents for example) and disadvantages.
Now if you are getting a font substitution for an embedded font, there's something wrong with the embedded font (or possibly you are using a really old version of Ghostscript with a bug). You could try a newer version of Ghostscript but you're never going to get away from processing the entire input file.
Why not try pdftk.

Image Conversion library: Word, PDF, Excel to Images

We have a requirement to convert any incoming documents which are either in Excel, PDF and Word to images. Any recommendation?
I am NOT sure whether ImageMagik would do this but my understanding it is ONLY for format conversion of images and I guess handles PDF as well. What about Excel and Word?
Thanks in advance
You could convert everything to pdf first using:
$ libreoffice --headless --invisible --convert-to pdf *.libreofficeextension
and then use imagemagick...
you might have some formatting issues in word and especially in powerpoint
You're correct -- imagemagick won't handle the MS Office formats because it only handles image format conversion.
For PDFs, can just use imagemagick directly:
convert -density 400 filename.pdf filename.jpeg
It will give you files:
filename[0].jpg
filename[2].jpg
...
filename[N-1].jpg
Where N was the number of pages in your document. pdf2ps will achieve the same thing, but you'll need to play around with the command-line parameters to get the same output quality.
For the MS Office products, I remember that there is some sort of API that allows you access to the suite's features (this was MS Office 2007, from memory), like opening a file and exporting it to PDF. If you can get things out to PDF, then you can use the method above to convert it to images. Some negative points:
This was many years ago at my previous job, and I can't remember what exactly it was called or how to use it.
I remember the output PDF formatting wasn't great (not 100% like it appears on the screen) but it readable. This may have improved since I last used it.
I have a vague recollection of it firing up an Excel window in the background, so it's not entirely a command-line solution (may be unsuitable for servers)
Quite old question still this is how I solved:
use Windows machine
Install MS Office suit
Use https://officetopdf.codeplex.com/ for converting any office format to PDF
Use Imagemagick for pdf to image format.
Hope it helps someone.

Hooks in ghostscript

Anyone know the right places to hook into ghostscript, so that when interpreting a ps file, I can get logs of all calls of the form:
draw_character(float x, float y, string font_name, int char_id); ?
Basically I want to take a postscript file, and get a list of where all characters are drawn to the screen.
Thanks!
I'm not sure if this answer is going to help you... but do you know how to harvest debugging information from Ghostscript on the commandline? Simply add "-dDEBUG" to the commandline and it will spit out lots of additional info. To get debugging info from only specific topics, you have these options:
-dCCFONTDEBUG Compiled-in Fonts
-dCFFDEBUG CFF Fonts
-dCMAPDEBUG CMAP
-dDOCIEDEBUG CIE color
-dEPSDEBUG EPS handling
-dFAPIDEBUG Font API
-dINITDEBUG Initialization
-dPDFDEBUG PDF Interpreter
-dPDFOPTDEBUG PDF Optimizer (Linearizer)
-dPDFWRDEBUG PDF Writer
-dSETPDDEBUG setpagedevice
-dSTRESDEBUG Static GS Resources
-dTTFDEBUG TTF Fonts
-dVGIFDEBUG ViewGIF
-dVJPGDEBUG ViewJPEG
Possibly, a PostScript programmer guru could write a little PostScript program that could do what you want by re-defining one of the operators (showglyph?) in a way that it prints out the info you want instead of (or before) drawing each individual character and run that against your target PS file.
Maybe you should ask your question in comp.text.pdf or in comp.lang.postscript ?

Resources