GhostScript: Bug under High Sierra setting PDF Metadata? - shell

Can anyone confirm the following behaviour on MacOS 10.13 High Sierra with GhostScript? I don't get the problem when using 10.12 Sierra.
When I create PDFs in GhostScript, it always leaves the Title, Author and other metadata blank.
I know that you can set the metadata with PDFmarks within the PostScript file itself (or a secondary merged PS file that just contains PDFmarks), but that requires manually setting the field for each file.
Currently, my PDFs fail PDF-X validation until I manually add the metadata.
My PostScript does contain DSC comments, and the GS documentation implies that this should be picked up, as ParseDSCCommentsForDocInfo is true by default.
/usr/local/bin/gs \
-dPDFX \
-dNOPAUSE \
-dBATCH \
-sDEVICE=pdfwrite \
-sOutputFile="$filename" \
-dProcessColorModel=/DeviceCMYK \
-dCompatibilityLevel=1.4 \
"$f" \
/Library/PostScript/PDFX_def.ps
The source of the PostScript does not seem to be a factor: Adobe apps like InDesign, MacOS's cgpdftops, all behave similarly.
Here's a simple test PS file. Distiller takes the comments and uses them for DocInfo; GhostScript doesn't on MacOS 10.13.
%!PS-Adobe-3.0
%%Title: (MyTitle.file)
%%Creator: (MyApp: cgpdftops CUPS filter)
%%CreationDate: (Saturday, February 03 2018 10:19:01 GMT)
%%For: (User Me)
%%BoundingBox: 36 36 576 756
%%Pages: 1
%%LanguageLevel: 2
%%EndComments
%%BeginSetup
% this is where fonts would be embedded
%%EndSetup
%%Page: (1) 1
%%BeginPageSetup
% this is where page-specific features would be specified
%%EndPageSetup
% Draw a black box around the page
0 setgray
1 setlinewidth
36 36 540 720 rectstroke
% Draw a two inch blue circle in the middle of the page
0 0 1 setrgbcolor
306 396 144 0 360 arc closepath fill
% Draw two half inch yellow circles for eyes
1 1 0 setrgbcolor
252 432 36 0 360 arc closepath fill
360 432 36 0 360 arc closepath fill
% Draw the smile
1 setlinecap
18 setlinewidth
306 396 99 200 340 arc stroke
% Print it!
showpage
%%EOF

As of MacOS Mojave 10.14.3 and GhostScript version 9.26, this now fixed. I can only presume it was a bug in GhostScript or MacOS.

Related

Otool - Get file size only

I'm using Otool to look into a compiled library (.a) and I want to see what the file size of each component in the binary is. I see that
otool -l [lib.a]
will show me this information but there is also a LOT of other information I do not need. Is there a way I can just see the file size and not everything else? I can't seem to find it if there is.
The size command does that, e.g.,
size lib.a
will show the size of each object stored in the lib.a archive. For example:
$ size libasprintf.a
text data bss dec hex filename
0 0 0 0 0 lib-asprintf.o (ex libasprintf.a)
639 8 1 648 288 autosprintf.o (ex libasprintf.a)
on most systems. OS X format is a little different:
$ size libl.a
__TEXT __DATA __OBJC others dec hex
86 0 0 32 118 76 libl.a(libmain.o)
75 0 0 32 107 6b libl.a(libyywrap.o)
Oddly (though "everyone" implements it), I do not see size on the POSIX site. OS X has a manual page for it.

PDF: extracted images are sliced / tiled

Image extraction with pdfimages and mupdf/mutool works fine so far.
Images in PDFs produced with FreePDF are always sliced, so one image results in multiple image files.
Is there a trick to avoid this? How can I use the results of pdfshow?
Are there coordinates to know the position and height and width to
cut/crop the image after converting the PDF to a PNG or JPEG?
The most likely reason why your images are "sliced" after extracting them is this: they were "sliced" already before extracting them -- as their way of living within the PDF file itself.
Don't ask me why some PDF generating software does this.
MS Powerpoint is infamous for this -- background images showing some gradient often get sliced up into tens of thousands of 1x1, 1x2 or 1x8 pixels and similarly-sized mini images inside the PDF.
Update
1. Identify the scope of the problem
The image fragments of the sample PDF can be identified with the pdfimages -list command (this requires a recent version of pdfimages based on the Poppler fork, not the xpdf one!):
pdfimages -list so-28023312-test1.pdf
page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
------------------------------------------------------------------------------------------
1 0 image 271 271 rgb 3 8 jpeg no 18 0 163 163 26.7K 12%
1 1 image 271 271 rgb 3 8 jpeg no 19 0 163 163 21.7K 10%
1 2 image 271 271 rgb 3 8 jpeg no 30 0 163 163 22.9K 11%
1 3 image 271 271 rgb 3 8 jpeg no 31 0 163 163 21.8K 10%
1 4 image 132 271 rgb 3 8 jpeg no 32 0 162 163 9895B 9.2%
1 5 image 271 271 rgb 3 8 jpeg no 33 0 163 163 22.5K 10%
1 6 image 271 271 rgb 3 8 jpeg no 34 0 163 163 16.5K 7.7%
1 7 image 271 271 rgb 3 8 jpeg no 35 0 163 163 16.9K 7.9%
1 8 image 271 271 rgb 3 8 jpeg no 36 0 163 163 20.3K 9.4%
1 9 image 132 271 rgb 3 8 jpeg no 37 0 162 163 14.5K 14%
1 10 image 271 271 rgb 3 8 jpeg no 20 0 163 163 17.1K 8.0%
1 11 image 271 271 rgb 3 8 image no 21 0 163 163 107K 50%
1 12 image 271 271 rgb 3 8 image no 22 0 163 163 96.7K 45%
1 13 image 271 271 rgb 3 8 image no 23 0 163 163 119K 56%
1 14 image 132 271 rgb 3 8 jpeg no 24 0 162 163 10.7K 10%
1 15 image 271 99 rgb 3 8 jpeg no 25 0 163 161 7789B 9.7%
1 16 image 271 99 rgb 3 8 jpeg no 26 0 163 161 6456B 8.0%
1 17 image 271 99 rgb 3 8 jpeg no 27 0 163 161 7202B 8.9%
1 18 image 271 99 rgb 3 8 jpeg no 28 0 163 161 8241B 10%
1 19 image 132 99 rgb 3 8 jpeg no 29 0 162 161 5905B 15%
Because there are only 20 different fragments on 1 page, it is easy to...
...first extract them all and convert them to JPEGs, and
...then stitch them together again.
2. Extract the fragments as JPEGs
The following command will extract the fragments and try to save them as JPEGs (-j) 28023312:
pdfimages so-28023312-test1.pdf 28023312
There are 3 images which came out as PPM. Use ImageMagick's convert to make JPEGs from them (not strictly required, but it simplifies the 'stitching' command line:
for i in 11 12 13; do
convert 28023312-0${i}.ppm 28023312-0${i}.jpg
done
Here are the first three fragments, 280233312-000.jpg, 280233312-001.jpg and 280233312-002.jpg:
3. Stitch the 20 fragments together again
ImageMagick can stitch the 20 images together again. When looking at the PDF page as well as at the 20 JPEGs it is easy to determine the order they need to be put together:
convert \
\( 28023312-0{00,01,02,03,04}.jpg +append \) \
\( 28023312-0{05,06,07,08,09}.jpg +append \) \
\( 28023312-0{10,11,12,13,14}.jpg +append \) \
\( 28023312-0{15,16,17,18,19}.jpg +append \) \
-append \
complete.jpg
Dissecting the command:
The +append image operator appends all listed images in a horizontal order.
The \( ... \) lines indicate an 'aside' processing of the resprective part of the image stack (which needs to be separated by the escaped parentheses). The result of this horizontal appending operation will then replace the individual fragments inside the current image stack.
The final -append image operator appends the current images vertically.
Here is the resulting JPEG, fully stitched together again:
Could this be automated?
In theory we could automate this process. For this we would have to analyse the PDF source code. However, that is rather difficult, because the content stream may be compressed.
In order to uncompress all or most of the content streams and to get a nicer representation of the PDF file structure, we could use mutool clean -d, podofouncompress or qpdf --qdf.
I prefer qpdf, the 'structural, content-preserving PDF file transformer'. Here is the command:
qpdf --qdf --object-streams=disable so-28023312-test1.pdf qdf.pdf
The resulting PDF file, qdf.pdf is more easy to analyse, because most (but not all) previously binary sections are now in ASCII. When you search for the occurrences of Do inside this file, you will see where images are inserted (however, I cannot give you a complete PDF analysing tutorial here, sorry...).
The following command prints all lines where Do occurs, plus the preceding line (-B 1):
grep -a -B 1 " Do" qdf.pdf
1002 0 0 1002 236 5776.67 cm
/Im0 Do
--
1001 0 0 1002 1237 5776.67 cm
/Im1 Do
--
120.12 0 0 120.24 268.44 693.2004 cm
/Im2 Do
--
[...skipping 15 other output segments...]
--
1002 0 0 369 3237 3406.67 cm
/Im18 Do
--
490 0 0 369 4238 3406.67 cm
/Im19 Do
--
1 0 0 1 204.9037018 508.5130005 cm
/Fm0 Do
All the /ImNN Do lines insert images (the /Fm0 Do line refers to a form object not an image).
The preceding lines, for example 490 0 0 369 4238 3406.67 cm set up the current transformation matrix. From this line alone, one can sometimes conclude the position of the image and its size. In the case of this file, it is not enough -- the contents of more preceding lines would be required in order to determine the current 'drawing position'.
FreePDF uses Ghostscript and creates a 'virtual printer'. When you 'print to PDF' what actually happens is that your application prints to the Windows print pipeline, which sends the graphics primitives to the Windows PostScript printer driver, which sends the PostScript to the Port Monitor. The FreePDF Port Monitor stores this PostScript program on disk. When the output is complete, it starts up Ghostscript which interprets the PostScript and produces a PDF file.
Now, unless you are using a spectacularly old version of Ghostscript (which is possible, you should check!) this will take whatever was in the input and put it in the output. It won't slice up images.
Which means that, as Kurt and David said above, that the real reason for the problem is that the PostScript program has sliced up images in it, before Ghostscript ever saw it.
Now I know that's not generally the case, but it depends heavily on what PostScript printer driver you have installed, how its configured, what version of Windows you are using and what the application driving the printer is.
As David rightly says, Microsoft Office applications have a bad habit of drawing certain kinds of patterns this way (to get a 'translucent effect' they use a pattern where the cell is an imagemask, the 'white' pixels are transparent).
Also if you have large photographs (for example) and the PostScript printer is configured with minimal memory, the driver might split up the image in order not to exhaust the printer's memory. Obviously that's a configuration problem because on a desktop PC you would have to be using monster images to overwhelm Ghostscript.
So basically, we need a lot more information from you before we can answer this fully, but the principle is that the damage was done before it got to FreePDF. The version of Ghostscript used to create the PDF file will be in the PDF file metadata, unless FreePDF chose to erase/overwrite it.
Finally, as Kurt pointed out, you should post a link to the PDF file, and ideally the application file and intermediate PostScript file which was used to produce the PDF.

want eps file and text into single compound path using ghostscript

what I want is to add eps file into temporary ps file which has text written, then I convert my ps file to eps file using ghostscript, but when I see my eps file in AI outline mode, I see extra square around my eps file which is size box, which should not be there, It should be part of single compound box
Ghostscript version is 9.05 and before I include eps into ps I need to resize it. So resized eps file shows page border into outline mode. Which is actually not there, but when it goes to machine it will cut out that path which should not be case.
Alright, I think I have some understanding of what you're doing and where the trouble may be creeping in. As I commented, you're running the file through ghostscript multiple times. Each time it has to interpret the postscript code and construct an internal display-list representation and then recreate appropriate postscript code on the output end. So it's the clone of a clone of a clone problem. Any little hiccup can cause cascade failures.
So, enough of being preachy. The alternative is to manipulate the eps file as text.
So, if we want the image to be scaled to fill a 500x500 square, we will be guided by the number in the BoundingBox comment. I'll quote this silly file from the linked question as an example:
%!PS-Adobe-2.0 EPSF-2.0
%%BoundingBox: 72 700 127 708 %<-- modify this
%%HiResBoundingBox: 72.000000 700.000000 127.000000 707.500000 %<-- delete this
%%EndComments
% EPSF created by ps2eps 1.68
%%BeginProlog
save
countdictstack
mark
newpath
/showpage {} def
/setpagedevice {pop} def
%%EndProlog
%%Page 1 1 %<-- insert translate and scale after this line
/Times-Roman findfont
11 scalefont setfont
72 700 moveto
(This is a test)show
%%Trailer
cleartomark
countdictstack
exch sub { end } repeat
restore
%%EOF
So, the BoundingBox was %%BoundingBox: 72 700 127 708 and it needs to be 0 0 500 500 or rather (to preserve the aspect ratio) 0 0 x 500 or 0 0 500 y where x or y (whichever it happens to be) is < 500. The existing size is 127-72 x 708 - 700 = 55 x 8. So our scaling factor is 500/55. But we also want to translate the lower-left corner to the origin, and its simplest to do that first, so the scaling doesn't affect the interpretation of the numbers.
So, to take 72 700 127 708 to 0 0 500 y, first we add -72 -700 translate to the file, and modify the bounding box to 0 0 55 8, and delete that silly HiRes line: we don't really need it.
Then, we add 500 55 div dup scale (let the interpreter do the math, hee hee). So the maximum x will now be 500, but, oh, what to put for the y? A quick calculation yields 72!
So, this awk program will modify an eps file to be 500 points wide, with y scaled appropriately.
/%%BoundingBox: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*)/{x=$2;y=$3;w=$4-x;h=$5-y;print $1,0,0,500,(500/w)*h}
!/%%BoundingBox:/&&!/%%HiRes/{print}
/%%Page /{print -x,-y,"translate"; print 500,w,"div dup scale"}
Usage:
$ awk -f epsscale.awk etest.eps

Print a postscript document with CUPS and a thermal printer

I installed an epson TM-T20 in Ubuntu 12.04, using the official driver. This is a thermal printer, I'm using 80mm paper.
My problem: When I print an image (using a postscript document) it waste a lot of paper because the image uses around 5cm and the printer before the image sends out 25cm of white paper.
I use the following command to send the document to the printer:
lpr -P tm-t20 -o document.ps
The printer prints the image (a 200x200 image), but first sends out a lot of non printed paper.
The printer wasn't recognized by CUPS (using the web interface at localhost:631). Then I installed it using the following procedure:
sudo lpadmin -p tm-t20 -E -v serial:/dev/ttyUSB0 -P /usr/share/ppd/epson-tm-t20-rastertotmt.ppd
Then the printer appeared in the CUPS web interface and I configured it (baud rate, bit parity, etc).
The printer works ok when I send some text.
Here is part of the printer ppd:
*DefaultPageRegion:RP80x297
*PageRegion RP80x297/Roll Paper 80 x 297 mm: "<</PageSize[204 841.8]/ ImagingBBox null>>setpagedevice"
*PageRegion RP58x297/Roll Paper 58 x 297 mm: "<</PageSize[141.7 841.8]/ ImagingBBox null>>setpagedevice"
*CloseUI: *PageRegion
*DefaultImageableArea: RP80x297
*ImageableArea RP80x297/Roll Paper 80 x 297 mm: "0 0 204 841.8"
*ImageableArea RP58x297/Roll Paper 58 x 297 mm: "0 0 141.7 841.8"
*DefaultPaperDimension: RP80x297
*PaperDimension RP80x297/Roll Paper 80 x 297 mm: "204 841.8"
*PaperDimension RP58x297/Roll Paper 58 x 297 mm: "141.7 841.8"
I suppose that this waste of paper is because the 297mm of long that appears in the ppd file. Then I tried adding another configuration of 100mm instead of 297mm, but the problem persists.
I also tryied adding the tag %%DocumentMedia to the ps file, but the same problem:
%!PS-Adobe-3.0
%%Creator: GIMP PostScript file plugin V 1.17 by Peter Kirchgessner
%%Title: yay.ps
%%CreationDate: Thu Sep 13 13:44:26 2012
%%DocumentData: Clean7Bit
%%LanguageLevel: 2
%%Pages: 1
%%BoundingBox: 14 14 215 215
%%
%%EndComments
%%DocumentMedia: Plain 72 72 0 white Plain
%%BeginProlog
% Use own dictionary to avoid conflicts
10 dict begin
%%EndProlog
%%Page: 1 1
% Translate for offset
14.173228346456694 14.173228346456694 translate
% Translate to begin of first scanline
0 199.99999999999997 translate
199.99999999999997 -199.99999999999997 scale
% Image geometry
200 200 8
% Transformation matrix
[ 200 0 0 200 0 0 ]
% Strings to hold RGB-samples per scanline
/rstr 200 string def
/gstr 200 string def
/bstr 200 string def
{currentfile /ASCII85Decode filter /RunLengthDecode filter rstr readstring pop}
{currentfile /ASCII85Decode filter /RunLengthDecode filter gstr readstring pop}
{currentfile /ASCII85Decode filter /RunLengthDecode filter bstr readstring pop}
true 3
%%BeginData: 14759 ASCII Bytes
Any idea?
Finally after a lot of pain. I discover that the problem was the serial to USB cable (in order to connect the serial printer to an USB port). I tried with two different serial to USB cables, but the problem persists and finally I conclude that The printer works erratically if is not connect to a "real" serial port. I tested the printer under identical conditions in a PC with a serial port and it works perfect, just installing the driver provided by epson and giving chmod 777 to /dev/ttyS0. At the job list sometimes I see the error: "/usr/lib/cups/filter/pstopdf failed". But the printer prints ok, like no error occurred.
I have to chmod 777 /dev/ttyUSB0 in order to get the printer working (Even if a run the commands with sudo).
I'm getting acceptable results (text is not at the center) with the option media=B8
lp -d tm-t20 -o media=B8 document.ps
I also tried with
lp -d tm-t20 -o media=Custom.80x90mm document.ps
But the printer doesn't print and the job appears as completed at the cups web interface.
If I try with
lp -d tm-t20 -o media=Custom.200x190 document.ps
The printer prints (not correctly centered, I guess that I need to try with different values until I get the desired result). The paper dimensions in dots are in this site: http://paulbourke.net/dataformats/postscript/
The printer isn't cutting the paper, I dont know how to give that option (print and cut the paper).
The options accepted by the printer are:
lpoptions -p tm-t20 -l
PageSize/Media Size: *RP80x297 RP58x297 Custom.WIDTHxHEIGHT
Resolution/Resolution: *203x203dpi
TmtSpeed/Printing Speed: *Auto 1 2 3 4
TmtPaperReduction/Paper Reduction: Off Top *Bottom Both
TmtPaperSource/Paper Source: *DocFeedCut DocFeedNoCut DocNoFeedCut DocNoFeedNoCut PageFeedCut PageFeedNoCut PageNoFeedCut
TmtBuzzerControl/Buzzer: *Off Before After
TmtSoundPattern/Sound Pattern: *A B C D E
TmtBuzzerRepeat/Buzzer Repeat: *1 2 3 5
TmtDrawer1/Cash Drawer #1: *Off Before After
How to make the printer print and cut the paper? I need to do it from the console, to use it from a custom C++ program. If you have any other experience with this kinds of printers under Linux, please give me some advice. My goal is to use the printer from a C++ program, I didn't find a fast way to do it (sending directly ESC/POS commands to the printer, there isn't official documentation to do it under Linux), so I'm working with CUPS from the console.
Paper CUT SOLVED:
lp -d tm-t20 -o media=Custom.200x258 -o source=DocFeedCut document.ps
I don't know why it works, because as is shown in the options DocFeedCut is the default option.
Now I just will try to center correctly the text.

wkhtmltopdf with full page background

I am using wkhtmltopdf to generate a PDF file that is going to a printer and have some troubles with making the content fill up an entire page in the resulting PDF.
In the CSS I've set the width and height to 2480 X 3508 pixels (a4 300 dpi) and when creating the PDF I use 0 for margins but still end up with a small white border to the right and bottom. Also tried to use mm and percentage but with the same result.
I'd need someone to please provide an example on how to style the HTML and what options to use at command line so that the resulting PDF pages fill out the entire background. One way might be to include bleeding (this might be necessary anyway) but any tips are welcome. At the moment I am creating one big HTML page (without CSS page breaks - might help?) but if needed it would be fine to generate each page separately and then feed them all to wkhtmltopdf.
wkhtmltopdf v 0.11.0 rc2
What ended up working:
wkhtmltopdf --margin-top 0 --margin-bottom 0 --margin-left 0 --margin-right 0 <url> <output>
shortens to
wkhtmltopdf -T 0 -B 0 -L 0 -R 0 <url> <output>
Using html from stdin (Note dash)
echo "<h1>Testing Some Html</h2>" | wkhtmltopdf -T 0 -B 0 -L 0 -R 0 - <output>
Using html from stdin to stdout
echo "Testing Some Html" | wkhtmltopdf -T 0 -B 0 -L 0 -R 0 - test.pdf
echo "Testing Some Html" | wkhtmltopdf -T 0 -B 0 -L 0 -R 0 - - > test.pdf
What did not work:
Using --dpi
Using --page-width and --page-height
Using --zoom
We just solved the same problem by using the --disable-smart-shrinking option.
I realize this is old and cold, but just in case someone finds this and has the same/similar problem, here's a workaround that worked for me after some trial&error.
I created a simple filler.html as:
<!DOCTYPE html>
<html>
<head>
</head>
<body style="margin: 0; padding: 0;">
<div style="height: 30mm; background-color: #F7EBD4;">
</div>
</body>
</html>
Use valid HTML (!DOCTYPE is important) and only inline styles. Match the background color to that of the main document and use height equal or bigger than your margins.
I run version 0.12.0 with the following arguments:
wkhtmltopdf --print-media-type --orientation portrait --page-size A4
--encoding UTF-8 --T 10mm --B 10mm --L 0mm --R 0mm
--header-html filler.html --footer-html filler.html - - <file.html >file.pdf
Hoping this helps someone...
I'm using version 0.12.2.1 and setting:
body { padding: 0; margin 0; }
div.page-layout { height: 295.5mm; width: 209mm;}
worked for me.
Of course need to add 0 margins by:
wkhtmltopdf -T 0 -B 0 -L 0 -R 0
At http://code.google.com/p/wkhtmltopdf/issues/detail?id=359 I found out more people 'suffer' from this bug. The --dpi 300 workaround did not work for me, I had to set --zoom 1.045 to zoom in a bit which made the extra right and bottom border disappear...
Works fine for me with -B 0 -L 0 -R 0 -T 0 options and using your trick of setting up an A4 sized div.
Did you remember to use body {margin:0; padding:0;} in the top of your CSS?
I cannot help you with CSS page breaks as I have not trialled an errored those yet, however, you can run scripts on the page to do clever things. Here is a jQuery example of how to split content down into page size chunks based on the length of the content. If you can get that adapted to work with wkhtmltopdf then please post here!
http://www.script-tutorials.com/demos/79/index.html
What you are experiencing is a bug.
You'll need to set the --dpi option when converting the file. In you case you will probably want --dpi 300, but that can be set lower.
Solved it by increasing the DPI
I'm working with an A4 size in portrait mode. Had white space to the right.
I noticed that as the dpi is increased, the white space got thinner.
at 300 dpi the white space is not visible in chrome pdf view at (max) zoomed at 500%
In Adobe reader it's still visible. It got better at 600 DPI and at 1200 DPI it's become invisible even at 6500% zoom.
There's no disadvantage to this so far as I observed, all dpi generate the same file size and run at the same speed (tested on 1 page).
effectively my settings are as follows:
echo "<html style='padding=0;margin=0'><body style='background-color:black;padding=0;margin=0'></html>" | wkhtmltopdf -T 0 -B 0 -L 0 -R 0 --disable-smart-shrinking --orientation portrait --page-size A4 --dpi 1200 - happy.pdf
If using an unscaled PNG image (thus will be pixel perfect) the default ratio, for an A4 needs to be 120ppi thus # 210mm = 993 pixels wide x 1404 pixels high, if the source is 72 or 300 dpi it makes no difference for a default placement, its the 993 that's counted as 210 mm
No heights, no width, no stretch, nor shrink just default place image as background un-scaled.
wkhtmltopdf --enable-local-file-access -T "0mm" -L "0mm" -R "0mm" -B "0mm" test.html test.pdf
here is such an image reduced into A 4 pdf page 2 different densities same number of pixels
If you use scaling you can use different density values, but this is all that is needed by default's, since PDF works on overall pixel values not DPI as such. Note the PNG is actually smaller by insertion in a PDF than the source JPG which was over 372 KB

Resources