Ghostscript mswinpr2 prints monochrome unless BitsPerPixel=24 which creates very large spool file - ghostscript

6 July 2018
Hi Ken, thank you for all your information! I'm sorry it took a while to get back to you but based on your advice we went away and did some work. We are now down to about 10.4MB at 100dpi.
Our problem at this low resolution is fonts and barcode integrity.
So far the only way we have managed great spool file size is by using Adobe's AcroRd32.exe via command line. This gives amazing sizes of around 2.5MB. Resolution seems fine and crucially barcodes and fonts are fine too. However using this method with high volume printing would not be ideal.
Do you have any idea why printing in this way creates such small spool file size? We are having some colour issues but resolution seems very good.
What makes AcroRd32.exe different to everything else we’ve tried so far? Your advice would be much appreciated.
Thank you.
Lizl
I need to print an image heavy pdf catalogue via ghostscript. If I do not reduce the resolution, the spool file becomes very big.
Ultimately we need to print the pdf files over a VPN connection which means that the file size needs to stay around 5MB or lower. We are happy with a resolution of around 300 dpi.
This command creates a 1.74 MB file:
C:\Users\admin>"c:\Program Files\gs\gs9.23\bin\gswin64c.exe" -dNOPAUSE
-dQUIET -dBATCH -c "mark /OutputFile (%printer%Pro C5100Sseries E-22B PS 1.1) /UserSettings <> (mswinpr2) finddevice
putdeviceprops setdevice" -f "myCatalogue.pdf"
This command creates a 84.7MB file:
c:\Program Files\gs\gs9.23\bin\gswin64c.exe" -dNOPAUSE -dQUIET
-dBATCH -c "mark /BitsPerPixel 24 /OutputFile (%printer%Pro C5100Sseries E-22B PS 1.1) /UserSettings <>
(mswinpr2) finddevice putdeviceprops setdevice" -f "myCatalogue.pdf"
The pdf prints in monochrome if I do not specify /BitsPerPixel 24. However that pushes file size up to 84.7MB.
Found this explanation online:
Some Windows device drivers erroneously return a low value
that causes the BitsPerPixel which can force us to map to monochrome, dithered even on a full color device, making -dBitsPerPixel=24 mandatory.
Is there anybody else that has experienced this problem or any suggestions on alternative ways to batch print pdf files over VPN with files sizes no more than 5MB?

The way mswinpr2 works is to render the input file to a bitmap, then blit the bitmap to the Windows device context, then tell the device context to print it. That invokes the print pipeline which uses the Windows printer driver to create a file suitable for the printer to read.
Depending on the printer, this could be PCL, PostScript, XPS, GDI or some other language proprietary to the printer manufacturer (eg ZPL for Zebra printers).
The advantage of working this way is that it leverages the vast support of Windows for specific printer types. Otherwise Ghostscript would have to have a driver for every single different printer, which long ago became an impossible task.
The disadvantage of course is that what gets printed is a huge bitmap. So its big.
If you consider a 300 dpi A4 page, 8 bits per component RGB, then the image will be:
width in inches * resolution (dpi) * bits per sample (24)
8.27 * 300 * 3 = 7443 bytes per scan line
Then there are:
height in inches * resolution (dpi) scan lines on the page
11.69 * 300 = 3507
So we multiply the scan line size * the number of scan lines to get the image size:
7443 * 3507 = 26,102,601 bytes or a little under 25 MB
So your goal of an image of 5 MB would require you to compress the file and get a compression ratio at least 5:1. So one solution would be to try zipping the file and unzipping at the other end.
Now, one of the things about this device is that its properties are controlled by the printer. The Ghostscript device queries the printer and adjusts itself to the resolution of the printer. I suspect your printer is actually set up to render at 600 dpi, which is why your spool file is 4 times larger than a 300 dpi resolution would suggest.
The device also doesn't support reducing the colour quality, other than to monochrome (which is what I suspect your 1.74MB file is). So your choice is monochrome, 1 bit per component CMYK or 24-bit RGB.
You can find the documentation on the Ghostscript devices on the Ghostscript web site, and the specifics for this device here
About the only thing you can do (and I haven't tried this) is set the MaxResolution parameter. But as I've shown above, that will only get you to 25Mb. If you want lower than that you'd have to reduce the resolution still further. To get a further drop by a factor of 5 would mean more than halving the resolution.
Looks like you'd be looking at about 135 dpi.

Related

How can I take a pdf, and convert any jpeg2000/jpx/jp2 images in it to jpeg images?

I am using MacOS Mojave on a Mac Mini, and I am also using an old Kindle Dx which cannot read jpeg2000 images. It also has trouble with too many or too large jpeg images.
I cannot use touchscreens, so newer e-readers and tablets aren't a solution.
So far, I've found some buggy solutions--
I can use Willus's k2pdfopt with -mode copy and -dev dx, which rasterizes everything. It's a good solution for scanned pdfs. If more detail is needed, -mode copy without -dev dx will preserve higher resolution. It's something of a last resort for pdf-born-pdfs, since text can be uglier and harder to read, and file sizes can increase alarmingly.
I can also use Ghostscript with -dCompatibilityLevel=1.4, which doesn't rasterize everything. It converts jpeg2000 images to jpeg images. But it doesn't tackle some oversized or poorly-constructed images, it often creates dark rectangles which can obscure text, and it occasionally loses the ability to search or select text. [P.S. I mean it takes a pdf which had searchable pdf and outputs one which does not. Also if I do any kind of image downsampling or removal, it sometimes rescales everything or loses pages.]
I have experimented with options to compress images in Ghostscript, with mixed success, and with the above bugs persisting. [P.S. I think I was downsampling, yes.]
For whatever reason, MacOS Quartz filters only work if they will reduce image sizes. So they tend not to work on the buggy images.
Now my ideal solution would preserve the text itself, preferably untangling ligatures, and would compress the images like Willus's k2pdfopt. But I have no idea if that's possible or how.
Short of that-- I'm wondering if there's a way to use Ghostscript to convert the jpeg2000 images without causing the gray rectangles or losing the ability to search or select text.
or if there's a way to use Quartz filters so they work. In some older versions of MacOS they did work.
or if there's a way to batch-print these pdf files to the appropriate resolution, apparently 800x1180, reprocessing images in the process.
I don't have much programming experience. I mainly use homebrew to install command-line tools, very sloppy bash scripts, and Automator to run them.
P.S. For a minimal example of the gray rectangles in Ghostscript, using the free pdf from here: https://www.peginc.com/store/test-drive-savage-worlds-the-wild-hunt/
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -o out.pdf in.pdf
substituting that pdf for in.pdf.
For a minimal example of losing searchable text, using the free pdf from here: http://datafortress2020.com/fileproject/details.php?image_id=498
same minimal script
Compatibility Level
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dCompatibilityLevel=1.4 -o out.pdf in.pdf
Aggressive Downsampling and Grayscale
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dCompatibilityLevel=1.4
-g800x1080 -r150 -dPDFFitPage \
-dFastWebView -sColorConversionStrategy=Gray \
-dDownsampleColorImages=true -dDownsampleGrayImages=true -dDownsampleMonoImages=true -dColorImageResolution=75 -dGrayImageResolution=75 -dMonoImageResolution=150 -dColorImageDownsampleThreshold=1.0 -dGrayImageDownsampleThreshold=1.0 -dMonoImageDownsampleThreshold=1.0 \ -o out.pdf in.pdf
P.P.S. I can use k2pdfopt to rasterize to fit my Kindle. If the file has searchable text, this retains it, if it doesn't I can run tesseract in k2 or run ocrmypdf afterwards.
But if I want especially good graphics, or especially clear text, and the file has hundreds of pages, it will need hundreds of megs. I had blamed this on rasterizing the text, which was why my ideal solution was to keep text and rasterize images, but apparently it's an issue with the images themselves.
If you think you've found a bug, then it's helpful to report it. If you don't it will never be fixed. You can report a bug at https://bugs.ghostscript.com, please be sure to attach an example file to reproduce the problem and state the command line used.
The Ghostscript pdfwrite device does not, ever, produce JPEG2000 images (due to patent issues). So you don't need to set the CompatibilityLEvel at all, and I'd recommend that you do not. By setting the CompatibilityLevel you are limiting the output. Unless your device cannot handle later versions then don't do this.
Without seeing an example file, a command line and knowing the version and operating system it's obviously not possible for anyone to comment on your 'gray rectangles'.
You can reduce the size of images (in bytes) by downsampling (as opposed to compressing) them, you can't do anything about the number of images.
Note that searchable text depends on the construction of the PDF file, and so cannot ever be guaranteed. Searchable text (in the sense of ToUnicode CMaps) was a later addition to the PDF Reference and is always optional, because it's possible to have input from which the Unicode code points cannot be determined (without using OCR software) but a perfectly readable PDF file can still be produced.
Ghostscript itself can produce a PDF file which is a rendered representation of the original, wrapped up as a PDF. See the pdfimage* devices.
Tesseract can take images and produce PDF files with searchable text, produced by OCR'ing the images. This would seem to me to be your best option, though obviously I don't know if a single large image is going to be acceptable to your device.
Edit
I already agreed that searching text is inherently not supported in PDF, except as an optional adjunct. The bug report you pointed to talks about 'corrupting text layers'. There are no text layers in PDF, and the text is neither corrupted nor missing, ts just not encoded as ASCII any more.
The reason you shouldn't set the resolution, and the size in pixels, is because PDF is not an image format. You aren't gaining anything by doing this. All that happens is that pdfwrite divides the 'g' valuess by the resolution, to get a media size in inches, and writes that as the MediaBox. Simpler just to set the Media Size. If you set the resolution you are fixing anything which does get rendered at that resolution. Choose a low resolution and you get crappy output. If you use a higher resolution then the image can be downscaled and smoothed giving better output.
It is indeed possible that your Kindle cannot handle transparency any better than the Mac, it is after all an old device. It's also possible that whoever built Ghostscript for you introduced a bug. I'm afraid we can't help you with either of those.
I did suggest, right back at the end of the original post, that you render the content to an image (Ghostscript will do that for you), then use Tesseract to convert the image back to a PDF, and at the same time OCR the text.
That will get past your problems with JPEG2000, will do a *better job of creating searchable text, since even files that aren't already searchable will become so, and will allow you to specify the resolution.

Use gostscript 9.21 to convert text to outlines, and how to keep the resolution of the picture

I use gostscript to convert text to outlines with the following code :gswin32c.exe -sDEVICE=pdfwrite -sOutputFile=output.pdf -dQUIET -dNOPAUSE -dBATCH -dNoOutputFonts -f test_new.pdf,it works.But i got a very small output file from 2.5M to 70kb.Then i find the picture became blurred in pdf.
Add -dPDFSETTINGS=/default,This will have the same result.
I's better to use -dPDFSETTINGS=/printer or -dPDFSETTINGS=/prepress,but 300dpi is not enough for me(or for my boss).
Is there any way to keep the original resolution of the picture.
Or how to set a higher dpi for images in output pdf.
The test file is here.
Thanks in advance.
The answer to your question is 'yes' (but see later). Don't use PDFSETTINGS, that sets lots of things all in one go. If you want control then you need to specify each setting individually.
Rather than use this shotgun approach you need to read the documentation, decide which controls affect areas you want to change, and alter those controls only.
However, image downsampling is not your problem. If you don't use -dPDFSETTINGS then PDF file written by Ghostscript contains an image at exactly the same resolution as the image in the original file.
Your problem is that the image is being written with JPEG compression, and JPEG is a lossy compression, so you are losing fidelity. Note that in the original file the image is written uncompressed, which is why its so large.
It looks like the original image was a JPEG, and the free PDF editor you are using has realised that so it saved the image uncompressed (I may be giving it too much credit here, it may save all images uncompressed). Applying JPEG to an image which has already been quantised simply amplifies the artefacts.
Instead you need to specify that you want images compressed with Flate, which is a lossless compression. The documentation for the pdfwrite controls can be found here, you need to change AutoFilterColorImages and ColorImageFilter.
Note that by not applying JPEG quantisation (a second time) and DCT encoding, the compression is less than your first experience. For me the output file comes in at just over 600Kb (leaving the font in place, and the text as text, would be a couple of Kb smaller). However the image is identical, as expected.
Since you are clearly using Ghostscript in a commercial environment, can I just point you at the licence and ask you to check that your usage is compatible with the AGPL, bearing in mind that this covers software as a service usage as well.

GhostScript PDF to PostScript

I have to convert pdf files (created with jasperreports) to postscript.
I'm using ghostscript (Version 9.19) to make the conversion.
The commmand i'm using is:
gswin64c -dNOPAUSE -dBATCH -sDEVICE=ps2write -sOutputFile=file.ps file.pdf
The conversion is done without problem, but when i open the postscript file generated (using GSview 5.0), the top margin is crop by 2-3 cm, and some information to print is lost.
I have changed the device from ps2write to eps2write, used the property -g<width>x<height> with the page size in pixels, but the problem persist.
The file is to be printed in a preformated paper, so i can not use the postscript generated to print.
Can someone help?
Thanks
Its not possible to say with great certainty, but it sounds like the PDF mediaBox is larger than the media you have specified to GSView.
You can try using the -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS along with -dFIXEDMEDIA and -dPDFFitPage, that should allow you to set up a specific media size, override the size in the PDF file and scale the result to fit the specified size.
Perhaps you could post an example PDF file, without that its very hard to comment sensibly.

Ghostscript: how to reduce file size of large PDFs without changing smaller PDFs

I am using GhostScript to convert large batches of PDF to PDF to reduce file size. The original PDFs vary in size and quality. Where there is a low quality, small file size (<350kb) PDF the output from Ghostscript is often poor.
Is there a way I can get GhostScript to ignore files below a certain size and just pass them through without downsampling?
Current settings:
SearchablePDFSetting=-dColorImageResolution=120 -dMonoImageResolution=38 -dMonoImageDownsampleType=/Average -dOptimize=true -dDownsampleColorImages=true -dDownsampleGrayImages=true -dDownsampleMonoImages=true -dUseCIEColor -dColorConversionStrategy=/sRGB -dFIXEDMEDIA -dDEVICEWIDTHPOINTS=596 -dDEVICEHEIGHTPOINTS=834
Thanks,
Vix
The pdfwrite device can already pass images (not files) through without downsampling, there is no way to 'pass through without changing' a file. If you want to not process files below a certain size, then don't process them.
To avoid further downsampling of images, you need to add the 'xxxxImageDownsampleThreshold' parameters (one each for Mono, Grey and Color). If you set this to (eg) 1.5 then images which are up to 50% higher resolution than the target resolution won't be downsampled.
Note that you haven't (apparently) set a GrayImageDownsampleResolution, you haven't set the downsample type for Color or Gray images and a MonoImageResolution of 38 looks pretty ugly to me.
The default Gray image filter is DCT (JPEG) as is the Color filter. If the original image was DCT then applying a second round of DCT compression will result in ugly artefacts, especially if the image is not downsampled. I would suggest you change the filter type to FlateEncode.
All these options are documented in ps2pdf.htm in the Ghostscript doc folder.
Add the option:
-dPDFSETTINGS=/screen
This "selects low-resolution output similar to the Acrobat Distiller 'Screen Optimized' setting."

How to display JPEG image on microcontroller LCD?

I am recently developing some firmware on the STM3210E development board which has an ARM cortex M3 processor. It has been interfaced to a 240x320 LCD. After going through the demo firmware, I realised that images are encoded in 32 bit variables (correct me if I am wrong) stored in array as shown below.
uint32_t STM32Banner[50] = {0x6461EB7A, 0x646443BC, 0x64669BFE, 0x6468F440, 0x646B4C82,
0x646DA4C4, 0x646FFD06, 0x64725548, 0x6474AD8A, 0x647705CC,
0x64795E0E, 0x647BB650, 0x647E0E92, 0x648066D4, 0x6482BF16,
0x64851758, 0x64876F9A, 0x6489C7DC, 0x648C201E, 0x648E7860,
0x6490D0A2, 0x649328E4, 0x64958126, 0x6497D968, 0x649A31AA,
0x649C89EC, 0x649EE22E, 0x64A13A70, 0x64A392B2, 0x64A5EAF4,
0x64A84336, 0x64AA9B78, 0x64ACF3BA, 0x64AF4BFC, 0x64B1A43E,
0x64B3FC80, 0x64B654C2, 0x64B8AD04, 0x64BB0546, 0x64BD5D88,
0x64BFB5CA, 0x64C20E0C, 0x64C4664E, 0x64C6BE90, 0x64C916D2,
0x64CB6F14, 0x64CDC756, 0x64D01F98, 0x64D277DA, 0x64D4D01C}
Could you please explain me how to convert a JPEG/PNG/BMP image to this format (RGB565) ?
You have two choices:
Write your own set of decoders.
Use available free decoders
The first solution is only really viable for BMP (and perhaps GIF), which is quite a simple format compared to PNG and JPEG. Even so, writing a BMP decoder that handles all different versions and specialties of BMP gracefully takes quite a bit of work (I have tried it). Hacking together something that can extract the image data from the most common BMP formats is quite easy though.
The second solution is probably the way to go for the other formats. Most open-source decoders are available under LGPL or similar, so licensing shouldn't really be a problem. For JPEG images use libJPEG, for PNG use libPNG and for GIF use giflib.
Most of the decoders do not support decoding to RGB565 so you will have to write a converter to convert from RGB888 to RGB565.
use a program like GIMP to convert to an uncompressed bmp (what you normally get when you save-as bmp).
A bmp has something like a 54 byte header then it goes into the data. Each line is pixels either 3 bytes (RGB) or four bytes (RGBX) per pixel. The width is aligned on a 4 byte boundary so if you have three bytes per and multiply that by the width in pixels if that is not a multiple of four (say 3 bits wide * 3 = 9 as a simple example) then there will be some padding. You know from opening the file in gimp how wide it is, you probably want to use gimp to adjust the image to match your lcd screen anyway. The first bytes of data after the header are the pixel in the lower left corner of the image, you might need to flip the image in the y axis, or just start off this way and see what happens.
Knowing the size of your image, (from opening it with gimp), you can do a little math to see if the size of the file matches with what I am saying, if it is dramatically smaller then there is some compression going on and you need to save again and change the settings for the bmp.
Once you have this figured out then write a simple program to extract the pixels from the bmp and save them in the format you desire. Even better read the code and docs and understand how to program the lcd and you can get from raw pixels to the lcd without having to to through their specific format/code.

Resources