GhostScript generated PDF shows hidden text

GhostScript generated PDF shows hidden text - ghostscript

I have some postscript files that are hiding text by showing white text on top of it.
Here is a very simple example to illustrate the issue:
%!
/Times-Roman findfont
20 scalefont
setfont
newpath
0 setgray
72 72 moveto
(Hello, world!) show % Show some text
72 72 moveto
1 setgray
(Hello, world!) show % Hide some text
showpage
If I send this file directly to the printer, the hidden text is not printed.
However, when I use GhostScript (version 9.21) to convert this PS to a PDF, I can still see the outline of the text a little bit. This was the command I used:
gswin32.exe -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -sOutputFile=C:\temp\output.pdf test.ps
I also tried setting the colors to CMYK with but that didn't help.
How can I conifgure GhostScript to generate a PDF without showing this 'hidden' text?
kind regards,
Maarten Coene

Well obviously the text isn't hidden, its been overdrawn in white.
I can see three likely possibilities and without seeing what you do, I can't decide which is happening.
1) Obviously the PDF consumer will have to turn black and white gray specifications into 'something else', usually RGB but possibly CMYK depending on whether you are viewing the PDF file on screen or printing it. If you see the entire solid text, but faintly, then that's what's happening.
2) Possibly the PDF consumer doesn't match up the two sets of text precisely. If you see partial outlines of the text, then that's what's going on.
3) Or (here's the sneaky bit) possibly your viewer uses some kind of anti-aliasing. If the black underlying text is anti-aliased, but the white isn't, then you'll see a sort of 'halo'. The entire outline of the text will be visible, probably in a faint gray, but the interior will be white.
I took your example file, and ran it through the current HEAD version of Ghostscript, writing the PDF file uncompressed and the resulting page content is:
5 0 obj
<</Length 6 0 R>>
stream
q 0.1 0 0 0.1 0 0 cm
/R7 gs
0 g
q
10 0 0 10 0 0 cm BT
/R8 20 Tf
1 0 0 1 72 72 Tm
(Hello, world!)Tj
ET
Q
1 g
q
10 0 0 10 0 0 cm BT
/R8 20 Tf
1 0 0 1 72 72 Tm
(Hello, world!)Tj
ET
Q
Q
endstream
endobj
As you can see, this maintains the pure black and pure white colour specifications for the text (the ExtGState simply sets the overprint mode to 1) and positions each piece of text in precisely the same place.
My guess is that your viewer is using anti-aliasing to draw the black text, but not the white text. FWIW Adobe Acrobat does not show this behaviour for me.

Related

Storing format for an image with its legend

I am trying to store images of plants and their legends (as text) together. However I can't find a straightforward way to do this.
I can of course use an "advanced" text editor (by advanced, I mean with formatting, not just raw text) in which I would import the image and write the text, before exporting in PDF. I have also thought about html, which could be used to create one stand-alone local web page for each pair image-legend. But still, there would be 2 files per pair : one for the image and one for the html code.
However those are quite heavy procedures and I would be much more satisfied if I could "simply" use a rawer format in which the image's data and the text are sort of concatenated, or so...
Do you know of any format of this kind ? If not I'd better just code it myself...
Thank you in advance !

Images can be polyglots of image plus text (not advisable)
Images can hold text as steganography (also unadvisable)
Images can hold textual metadata think Exif, Jpg comments, Tiff tags or IPTC
You could even add a legend strip into base of image, but that's not "text". At time of placement you paste both image and text.
HTML can hold image as text.base64 but the textual image requires 133% storage
FB2 is similar in that it is xml with encoded images but the advantage of being stored as zipped FB2Z thus nearest your concatenated requirement
PDF can hold both natively and if done right with less overhead than html but a bit more than exif.img
If done well as PDF/A both the image and text can be perfectly extracted raw from a PDF so image could be discarded, however, it is all too often that they are mashed beyond pure extraction or even reuse.
But in my case I can extract the image at 100% scale so its returned from this mini PDF here is the text
Hello, Flowers!
Microsoft Windows Welcome Scan
This was the code to store both together using cross platform Artifex Mutool
mutool create -o "output.pdf" -O ascii "Page1.txt" ["page2.txt" ...]
%%MediaBox 0 0 595 842
%%Font Helv Helvetica Latin
%%Image Flowers1 C:/Users/name/Documents/WelcomeScan.jpg
% Draw an image. x width, H line elevation (y skew), x skew, y height, left offset, bottom offset, units are pt.'s cm is not centimetres
q 512 0.0 0.0 384 41.5 400 cm /Flowers1 Do Q
% Draw a rectangle. move line fill
q 1 0.5 1 rg 41.5 370 m 553.5 370 l 553.5 270 l 41.5 270 l f Q
% Show some text.
q 0 0 1 rg
BT /Helv 24 Tf 210 330 Td (Hello, Flowers!) Tj ET
BT /Helv 24 Tf 100 290 Td (Microsoft Windows Welcome Scan) Tj ET
Q
Notes
%%MediaBox is Paper Size in points thus above = A4 Portrait
%%Font needs to be added for text Style (Language) to use later
%%Image needs internal name(s) and full path for pre-load Note this image is 1024x768 when extracted # 100% but will be displayed by choice at 50% (512x384)
Lines starting with single % are comments to remind me of pseudo PS directives to layout content. The blocks q ... Q are the guts of the page and are heavily abbreviated (after the value) thus 1 0.5 1 rg is 50% green in RGB ! Remove them in a working template or else they may be added to the PDF :-)
The trick is knowing how a PDF works page wise and places vectors or scaled images or text from bottom left origin bounded by a media box. Mutool takes the script and adds all the necessary overhead data for a valid PDF.
All the above can be easily templated and run with CMD or BASH, much in the same way an ePub can be templated then call TAR to convert folder into folder.epub, but the more complex ePub structure is not so easy to write in a script, thus suggest using a scriptable lib.
ePub is the goto answer since xhtml and image are zipped in their native formats, and can be easily printed to PDF or converted to normal HTML + images

Ghostscript to concatenate multiple eps files into one big eps file

My task is to combine multiple small EPS files into one big EPS, with a condition that those small EPSs should not overlap each other.
I was hoping that this could be done programmatically, rather than manually adjusting them using GUI tools.
I've tried ghostscript commands but I ended up with those small eps on top of each other.
I also have a look at psutils (psnup/pstops) but I'm not really sure if it could help me.
I don't mind using heavier program/lib like Ghost4j (though I might have to add more functions there if it does not support my need). I just want to make sure that this cannot be done lightweight-ly or with existing tools.
Thank you!

Are you aware of how EPS files are supposed to be used ? The point of an EPS file is that it is intended to be used as a 'black box' by an application.
When the application creates a PostScript program, it can include the EPS, without knowing anything about it other than its size, in the final output. So when the PostScript is generated, the application knows the size of the EPS, and modifies the CTM so as to scale the content as required, and locate it on the page.
If you want to use multiple EPS files then you must do the same, you must modify the CTM between each EPS file so that it is placed at the size and position on the page that you require. If you don't do this, then they all end up at the current position and scale on the page. As you say they end up on top of each other.
Now the whole point of an EPS file is that it can be placed programmatically, but you have to write the program to do it :-)
First you need to parse the Bounding Box from the EPS file. If the EPS is properly conforming this will be the %%BoundingBox and optionally the %%HiResBondingBox comments.
Armed with that information, you then need to decide what size of media you are using and/or how to scale the EPS files to fit the desired media.
You then start a new PostScirpt program which begins by requesting a specific media size, then uses the scale and translate operators to move to the correct position on the media, and then executes the first EPS file (either by inclusion of the content, or by using the run operator).
Repeat the process for each EPS file.
Finally write the new content using the showpage operator
Assuming you have used the eps2write device in Ghostscript, the resulting file will be a new EPS file which embodies the content of the individual EPS files, scaled and placed as you wish.
So for example (all values are imaginary example data only):
%!
<< /PageSize [612 792] >> setpagedevice
gsave
306 396 moveto
0.5 0.5 scale
(example1.eps) run
grestore
gsave
306 0 moveto
1.5 1.5 scale
(example2.eps) run
grestore
gsave
0 396 moveto
(example3.eps) run
grestore
gsave
0 0 moveto
0.66 0.66 scale
(example4.eps) run
grestore
showpage

TTF in PostScript using Ghostscript not showing glyphs in Adobe lIlustrator

So, I have a PostScript that I'm converting into PDF using a custom font i.e. one that isn't included in the computer/Ghostscript library originally.
The font is rendered correctly in the PDF (visbly and its embedded stated by the pre-flight prespress analysis), and when the PDF is viewed in Photoshop it looks good also.
However, the issue is when I bring it into Illustrator the font glyphs are not recognised and appear as a .notdef character (a rectangle with a cross through it).
Within Illustrator I have the font installed when I view it in my dropdown of fonts. But this still doesn't help.
Has anyone else had this issue or can even replicate it?
I have used two (following) processed to include fonts into my ghostscript environment to troubleshoot but both produce incorrect results.
Converted the TTF into Type42
Added TTF into Ghostscript FontMap
Attached is the ZIP file containing the PDF and the TTF font I've used (I have used others also with the same results). If you need anymore files please let me know and I'll update.
Zip file
Below is the PostScript file (very simple) and my execution.
%!ps-nonconforming
/inch {72 mul} bind def
/Pacifico 20 selectfont
1 inch 10 inch moveto
/fontheight currentfont dup /FontBBox get dup 3 get % top
exch 1 get sub % top - bottom
exch /FontMatrix get 3 get mul def % adjusted by height multiplier
/lineheight fontheight 1 mul def % add 20% for line spacing
/newline {0 lineheight neg rmoveto} bind def % negate height to move downwards
gsave (lineheight: ) show lineheight 20 string cvs show grestore
newline gsave (Museo) show grestore
Command:
gs -o fonttest.pdf -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 \
-dPDFSETTINGS=/prepress fonttest.ps
P.S.: I know that this could be an Illustrator bug, to which I have opened up a support ticket, but this can also be the way I'm embedding the font or someone out there may just have the answer :D

I found this thread UTF-8 PDF generated with TCPDF showing up fine in Adobe Acrobat but corrupted in Illustrator and Google preview which spoke about corruption in Illustrator.
Using this I thought that it was likely subsetting causing the issue in ghostscript.
Found this thread How to make GhostScript PS2PDF stop subsetting fonts and applied the Ghostscript options to stop font subsetting.
So in my command to Ghostscript I use the
-dSubsetFonts=false
and that worked! In Illustrator the font displays as expected.
So my full gs command is:
gs -o output.pdf -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 \
-dSubsetFonts=false -dPDFSETTINGS=/prepress input.ps

How do I import a png file in Postscript with Ghostscript?

I'm trying to place a png image on a postscript document for conversion to a pdf file using Ghostscript (v 9.15) ps2pdf. I've found that the following code works nicely with a jpg file, but I need to import png files instead. It looks like i must need a different filter, but I can't find one that works. Does anyone have a solution?
239 % number of pixels in the horizontal axis
67 % number of pixels in the vertical axis
8 % bits per color channel (1, 2, 4, or 8)
[239 0 0 -67 0 67] % transform array... maps unit square to pixel [ w 0 0 -h 0 h ]
(My_Logo.jpg) (r) file % see page 587 and page 77 for more details
/DCTDecode filter % see page 589
false % pull channels from separate sources
3 % 3 color channels (RGB)
colorimage % see page 544 and page 288 for more detail

PostScript doesn't support PNG directly, it does support JPEG which is why your code above works.
If you want to read image data from a PNG file you will need to open the file, strip the header, then read each chunk individually parsing the data from it. It might be easiest to write the bitmap data to an intermediate file, but its perfectly possible to write a stream decoder to supply the data as required for a procedural image data source.
Fortunately PostScript (level 3 for certain, most versions of level 2) does support Flate, so you don't have to write the decompression code in PostScript, you can use the filter directly.
You will need to specify a colour space, depending on whether the PNG uses a palette or not.
PostScript is a programming language, so this is all possible, it will take an experienced PostScript programmer a couple of days to write and debug it I should think.
NOTE! PostScript does not support transparency, so you cannot apply alpha channels from PNG files at all.

Simple way to add an image in postscript

I am trying to write a document in postscript.
Thus far I've been able to write simple text, and work with lines and shapes.
I'm now trying to add some images to the document. After searching on-line I can't seem to find any clear way to do this.
the snip it below is a hello world:
%!PS
/Times
20 selectfont
20 800 moveto
(Hello World!) show
showpage
All I want to do is simply insert an image (eg PNG, JPG, GIF) by specifying the x and y, co-ordinates.
Any help would be much appreciated.

There is a simple method and Postscript does support the jpeg format. If you are using ghostscript you may have to use the -dNOSAFER option to open files. Here is an example:
gsave
360 72 translate % set lower left of image at (360, 72)
175 47 scale % size of rendered image is 175 points by 47 points
500 % number of columns per row
133 % number of rows
8 % bits per color channel (1, 2, 4, or 8)
[500 0 0 -133 0 133] % transform array... maps unit square to pixel
(myJPEG500x133.jpg) (r) file /DCTDecode filter % opens the file and filters the image data
false % pull channels from separate sources
3 % 3 color channels (RGB)
colorimage
grestore

Use a program like convert and then remove any extra code it generated.

You can download the PostScript Language Reference, third edition from adobe (this is the "bible book" for postscript). Chapter 4.10 Images would be a good starting point.

Is this a late answer! The problem with -dNOSAFER prevented me from using the other solutions, so I did the following:
Use Python to read the JPG file as binary and make it a string, compatible with /ASCIIHexDecode:
''.join(["%02x" % ord(c) for c in open(filename, "rb").read()])
Then instead of reading and decoding the image file from the postscript file, paste the above computed string in the postscript file, and filter it, first through /ASCIIHexDecode then /DCTDecode:
(ffd8ffe000104a46494600010102002700270000ffdb004300030202020202030202020303030304060404040404080606050609080a0a090809090a0c0f0c0a0b0e0b09090d110d0e0f101011100a0c12131210130f101010ffdb00430103030304030408040408100b090b1010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010ffc00011080010001003011100021101031101ffc400160001010100000000000000000000000000060507ffc40026100002020201030207000000000000000001020304051106071221001315163132414252ffc400160101010100000000000000000000000000070403ffc4002911000201030105090100000000000000000102030004210711123151531314324142617381d1d3ffda000c03010002110311003f00de311d00e0478be19acddc79b0f8ba734aef8aa8a59a4af1c9bdc96159beef275e4efd1ccfa5f2aceea2f8e09f41e7f252a47ab4c4093ba71ceced387b7828b724e87705b588c8478ecac114e28d89e36f83d65d7643ee7eb60b03a23f1f5dff002daaacf4ae479954df1e3d33fd2b593599628d89b0071d5fae9d3bc5750b8a3f1ae3cc9cd3031b4789c689236ce568de374af543ab21b51b2b03138208076a3cef4c8b935acaf3bb05c12685036e285e550b3bccf8a41c7b2327ce78c9a6188b917b2995ab20676a8102af6dc76624c680011f9d8f0005095da5b491ccaec303f0d4f292ebba01cecf23cc57ffd9>)
/ASCIIHexDecode
filter % ascii to bytes
0 dict
/DCTDecode % jpg to explicit
filter
the above snippet replaces (myJPEG500x133.jpg) (r) file /DCTDecode filter in the otherwise very helpful #Hath995 answer.
if you want something else than JPEG but still RGB (i.e.: you want something for which postscript has no decoder), and you can use Python to prepare your postscript file, you can use PIL, like this (it ignores the transparency byte, which is a on/off operation in postscript):
import PIL.Image
i = PIL.Image.open("/tmp/from-template.png")
import itertools
''.join(["%02x" % g
for g in itertools.chain.from_iterable(
k[:3] for k in i.getdata())])
for indexed files I would not know, but it can't be difficult to work it out.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio