Convert RGB pdf to CMYK preserve pdf - ghostscript

I am using ghostscript 9.25 windows.
I am trying to convert RGB pdf to CMYK preserve pdf using following command:
gswin32c.exe
-dSAFER -dBATCH -dNOPAUSE -dNOCACHE -sDEVICE=pdfwrite -sColorConversionStrategy=CMYK -dProcessColorModel=/DeviceCMYK -dAutoFilterColorImages=false -dAutoFilterGrayImages=false -sOutputFile=out.pdf input.pdf
input.pdf file here
https://www.dropbox.com/s/8jfnov526nhb9m9/blank.pdf?dl=0
output.pdf file here
https://www.dropbox.com/s/ftrmm32mmixaxqh/out.pdf?dl=0
but my output becomes light compared adobe output, expected result is it should be dark when i do in adobe CMYK preserve option, i am getting little dark compared to ghostscript output. Am I doing anything wrong?
Should I use any icc profile?
Thanks

You say you are using ImageMagick, yet you give a Ghostscript command line....
I presume that when you say CMYL you mean CMYK.
There is nothing immediately obviously wrong with your command line, but you have given no example file, nor any reason why you expect the result to be 'dark'.
If you want to control the conversion then you will need to supply at least one and possibly up to 4 ICC profiles. You will certainly need a CIE->CMYK Output profile, and you might like to supply ICC profiles for Gray->CIE, RGB->CIE and CMYK->CIE as well, in order to override the default ones Ghostscript is using.
[EDIT]
The problem is nothing to do with colour conversion. Your original file contains nothing except a very large image, which is compressed with the Flate filter (lossless). It looks like this:
You've turned off auto filtering, but you haven't told Ghostscript which compression filter to use for images, so it sticks with the default, which is JPEG (DCT). The image now looks like this:
For the nature of your original image, JPEG (lossy) compression is an outstandingly bad choice. The output image compresses less well, and it loses fidelity. You should change to using Flate compression instead of JPEG for images of this kind.
By the way, the image in your original PDF file was defined in CMYK space already.

Related

How can I take a pdf, and convert any jpeg2000/jpx/jp2 images in it to jpeg images?

I am using MacOS Mojave on a Mac Mini, and I am also using an old Kindle Dx which cannot read jpeg2000 images. It also has trouble with too many or too large jpeg images.
I cannot use touchscreens, so newer e-readers and tablets aren't a solution.
So far, I've found some buggy solutions--
I can use Willus's k2pdfopt with -mode copy and -dev dx, which rasterizes everything. It's a good solution for scanned pdfs. If more detail is needed, -mode copy without -dev dx will preserve higher resolution. It's something of a last resort for pdf-born-pdfs, since text can be uglier and harder to read, and file sizes can increase alarmingly.
I can also use Ghostscript with -dCompatibilityLevel=1.4, which doesn't rasterize everything. It converts jpeg2000 images to jpeg images. But it doesn't tackle some oversized or poorly-constructed images, it often creates dark rectangles which can obscure text, and it occasionally loses the ability to search or select text. [P.S. I mean it takes a pdf which had searchable pdf and outputs one which does not. Also if I do any kind of image downsampling or removal, it sometimes rescales everything or loses pages.]
I have experimented with options to compress images in Ghostscript, with mixed success, and with the above bugs persisting. [P.S. I think I was downsampling, yes.]
For whatever reason, MacOS Quartz filters only work if they will reduce image sizes. So they tend not to work on the buggy images.
Now my ideal solution would preserve the text itself, preferably untangling ligatures, and would compress the images like Willus's k2pdfopt. But I have no idea if that's possible or how.
Short of that-- I'm wondering if there's a way to use Ghostscript to convert the jpeg2000 images without causing the gray rectangles or losing the ability to search or select text.
or if there's a way to use Quartz filters so they work. In some older versions of MacOS they did work.
or if there's a way to batch-print these pdf files to the appropriate resolution, apparently 800x1180, reprocessing images in the process.
I don't have much programming experience. I mainly use homebrew to install command-line tools, very sloppy bash scripts, and Automator to run them.
P.S. For a minimal example of the gray rectangles in Ghostscript, using the free pdf from here: https://www.peginc.com/store/test-drive-savage-worlds-the-wild-hunt/
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -o out.pdf in.pdf
substituting that pdf for in.pdf.
For a minimal example of losing searchable text, using the free pdf from here: http://datafortress2020.com/fileproject/details.php?image_id=498
same minimal script
Compatibility Level
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dCompatibilityLevel=1.4 -o out.pdf in.pdf
Aggressive Downsampling and Grayscale
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dCompatibilityLevel=1.4
-g800x1080 -r150 -dPDFFitPage \
-dFastWebView -sColorConversionStrategy=Gray \
-dDownsampleColorImages=true -dDownsampleGrayImages=true -dDownsampleMonoImages=true -dColorImageResolution=75 -dGrayImageResolution=75 -dMonoImageResolution=150 -dColorImageDownsampleThreshold=1.0 -dGrayImageDownsampleThreshold=1.0 -dMonoImageDownsampleThreshold=1.0 \ -o out.pdf in.pdf
P.P.S. I can use k2pdfopt to rasterize to fit my Kindle. If the file has searchable text, this retains it, if it doesn't I can run tesseract in k2 or run ocrmypdf afterwards.
But if I want especially good graphics, or especially clear text, and the file has hundreds of pages, it will need hundreds of megs. I had blamed this on rasterizing the text, which was why my ideal solution was to keep text and rasterize images, but apparently it's an issue with the images themselves.
If you think you've found a bug, then it's helpful to report it. If you don't it will never be fixed. You can report a bug at https://bugs.ghostscript.com, please be sure to attach an example file to reproduce the problem and state the command line used.
The Ghostscript pdfwrite device does not, ever, produce JPEG2000 images (due to patent issues). So you don't need to set the CompatibilityLEvel at all, and I'd recommend that you do not. By setting the CompatibilityLevel you are limiting the output. Unless your device cannot handle later versions then don't do this.
Without seeing an example file, a command line and knowing the version and operating system it's obviously not possible for anyone to comment on your 'gray rectangles'.
You can reduce the size of images (in bytes) by downsampling (as opposed to compressing) them, you can't do anything about the number of images.
Note that searchable text depends on the construction of the PDF file, and so cannot ever be guaranteed. Searchable text (in the sense of ToUnicode CMaps) was a later addition to the PDF Reference and is always optional, because it's possible to have input from which the Unicode code points cannot be determined (without using OCR software) but a perfectly readable PDF file can still be produced.
Ghostscript itself can produce a PDF file which is a rendered representation of the original, wrapped up as a PDF. See the pdfimage* devices.
Tesseract can take images and produce PDF files with searchable text, produced by OCR'ing the images. This would seem to me to be your best option, though obviously I don't know if a single large image is going to be acceptable to your device.
Edit
I already agreed that searching text is inherently not supported in PDF, except as an optional adjunct. The bug report you pointed to talks about 'corrupting text layers'. There are no text layers in PDF, and the text is neither corrupted nor missing, ts just not encoded as ASCII any more.
The reason you shouldn't set the resolution, and the size in pixels, is because PDF is not an image format. You aren't gaining anything by doing this. All that happens is that pdfwrite divides the 'g' valuess by the resolution, to get a media size in inches, and writes that as the MediaBox. Simpler just to set the Media Size. If you set the resolution you are fixing anything which does get rendered at that resolution. Choose a low resolution and you get crappy output. If you use a higher resolution then the image can be downscaled and smoothed giving better output.
It is indeed possible that your Kindle cannot handle transparency any better than the Mac, it is after all an old device. It's also possible that whoever built Ghostscript for you introduced a bug. I'm afraid we can't help you with either of those.
I did suggest, right back at the end of the original post, that you render the content to an image (Ghostscript will do that for you), then use Tesseract to convert the image back to a PDF, and at the same time OCR the text.
That will get past your problems with JPEG2000, will do a *better job of creating searchable text, since even files that aren't already searchable will become so, and will allow you to specify the resolution.

Ghostscript - EPS (with embedded TIFF with transparent background) to PNG conversion

I'm trying to convert an EPS file with an embedded TIFF that has a transparent background to a PNG using GhostScript. The problem that I am having is that the background of the TIFF image becomes white in the PNG. It looks like the following:
IncorrectPNG
When I export from Adobe Illustrator, it comes out correct:
CorrectPNG
I was reading that there is not transparency in EPS, only marked and unmarked areas. I was wondering if there was a call that I was missing that would create the PNG through Ghostscript similar to that of Illustrator? Or if there is any other alternative that doesn't just replace white with transparency through ImageMagick?
I am using Windows and have Ghostscript 9.25 installed. Here is the command (one of many) that I've tried:
-q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -sDEVICE=pngalpha -r300 -dEPSCrop NamePlatePNG.png NamePlate.eps
I can get the EPS file to you if needed. Any help would be appreciated, thanks!
UPDATE:
Here is the EPS file (Hopefully this link works):
https://drive.google.com/open?id=1m4HHGLoPe0jdWkx1Oghe7ttiXPldZnJs
Also, I should have mentioned that the images I uploaded were just screenshots of the PNGs open in an image editor. The checkered portion is indeed fully transparent alpha channel. I was trying to easily accentuate the difference.
Your file doesn't look like its transparent, it looks like its masked, possibly with a stencil mask, possibly chroma-keyed. Without seeing the file I can't tell for sure.
You are correct that PostScript (and hence EPS) doesn't support transparency, but it does support several features which have somewhat similar effects.
The color space is irrelevant, and in fact the only kind of 'transparency' supported in PostScript works when the color space is CMYK, but not when its RGB (and certainly not sRGB, which isn't even a PostScript color space, you have to manufacture it from CIEBasedABC)
As far as I can see the command line you are using is correct, but as I say I can't tell much without seeing the actual EPS program.
[EDIT]
So the Ghostscript rendering is correct, that's what is in your EPS file, there is no transparency of any kind there. So how is Illustrator able to make a transparent PNG ? Well the answer is that Illustrator isn't using the PostScript part of the EPS file.
About 1/3 of the way through the EPS file you'll see a line which reads:
%AI9_PrivateDataBegin
What follows that is an Adobe Illustrator file format. When AI reads the file it finds that line, throws away the PostScript portion of the file, and reads the AI representation of the content from the portion of the file beginning with that comment.
Now stored somewhere in there will be the information that portions of the content are transparent. Although PostScript can't represent that, Illustrator's internal format can. So when you write a PNG file from Illustrator it knows that portion is transparent and writes it as such.
Ghostscript, however, is constrained by the PostScript portion of the file, it can't read the Illustrator native format, and so renders the image with a white background.
It 'might' be possible to save a different kind of EPS from Illustrator (level 3 instead of level 2 possibly, I notice this is a language level 2 EPS file) which duplicate the effect, but from what you have here, there isn't anything a standard PostScript interpreter can do which will give you the result you want.

Use gostscript 9.21 to convert text to outlines, and how to keep the resolution of the picture

I use gostscript to convert text to outlines with the following code :gswin32c.exe -sDEVICE=pdfwrite -sOutputFile=output.pdf -dQUIET -dNOPAUSE -dBATCH -dNoOutputFonts -f test_new.pdf,it works.But i got a very small output file from 2.5M to 70kb.Then i find the picture became blurred in pdf.
Add -dPDFSETTINGS=/default,This will have the same result.
I's better to use -dPDFSETTINGS=/printer or -dPDFSETTINGS=/prepress,but 300dpi is not enough for me(or for my boss).
Is there any way to keep the original resolution of the picture.
Or how to set a higher dpi for images in output pdf.
The test file is here.
Thanks in advance.
The answer to your question is 'yes' (but see later). Don't use PDFSETTINGS, that sets lots of things all in one go. If you want control then you need to specify each setting individually.
Rather than use this shotgun approach you need to read the documentation, decide which controls affect areas you want to change, and alter those controls only.
However, image downsampling is not your problem. If you don't use -dPDFSETTINGS then PDF file written by Ghostscript contains an image at exactly the same resolution as the image in the original file.
Your problem is that the image is being written with JPEG compression, and JPEG is a lossy compression, so you are losing fidelity. Note that in the original file the image is written uncompressed, which is why its so large.
It looks like the original image was a JPEG, and the free PDF editor you are using has realised that so it saved the image uncompressed (I may be giving it too much credit here, it may save all images uncompressed). Applying JPEG to an image which has already been quantised simply amplifies the artefacts.
Instead you need to specify that you want images compressed with Flate, which is a lossless compression. The documentation for the pdfwrite controls can be found here, you need to change AutoFilterColorImages and ColorImageFilter.
Note that by not applying JPEG quantisation (a second time) and DCT encoding, the compression is less than your first experience. For me the output file comes in at just over 600Kb (leaving the font in place, and the text as text, would be a couple of Kb smaller). However the image is identical, as expected.
Since you are clearly using Ghostscript in a commercial environment, can I just point you at the licence and ask you to check that your usage is compatible with the AGPL, bearing in mind that this covers software as a service usage as well.

ghostscript how to retain pdf size while converting pdf to grayscale

I am using Ghostscript 8.x to convert a pdf to grayscale.I am using the following command:
gs -dNOPAUSE -dBATCH -q -sOutputFile=- -sDEVICE=psgray 2016-12-15-165043474.pdf | ps2pdf - output.pdf
This successfully converts my pdf to grayscale but I lose the original pdfsize. The resulting pdf has a lot of whitespace looks like A4 size.
My input pdf has fixed width of 3cm (height may vary).I want the output pdf to be of same size.
Please suggest.
Don't use the psgray device! This is seriously deprecated and has been removed totally in recent versions of Ghostscript. By using this you are converting the PDF to PostScript and then converting it back to PDF. More steps than you need (with each conversion potentially introducing problems), and that's where you are getting the default media size from.
Simply use the pdfwrite device to do all the work, but you will need a reasonably recent version of Ghostscript to do it. Possibly more recent than the old version you are obviously using currently.
gs -sDEVICE=pdfwrite -sColorConversionStrategy=DeviceGray -sOutputFile=out.pdf input.pdf

GhostScript PDF to PostScript

I have to convert pdf files (created with jasperreports) to postscript.
I'm using ghostscript (Version 9.19) to make the conversion.
The commmand i'm using is:
gswin64c -dNOPAUSE -dBATCH -sDEVICE=ps2write -sOutputFile=file.ps file.pdf
The conversion is done without problem, but when i open the postscript file generated (using GSview 5.0), the top margin is crop by 2-3 cm, and some information to print is lost.
I have changed the device from ps2write to eps2write, used the property -g<width>x<height> with the page size in pixels, but the problem persist.
The file is to be printed in a preformated paper, so i can not use the postscript generated to print.
Can someone help?
Thanks
Its not possible to say with great certainty, but it sounds like the PDF mediaBox is larger than the media you have specified to GSView.
You can try using the -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS along with -dFIXEDMEDIA and -dPDFFitPage, that should allow you to set up a specific media size, override the size in the PDF file and scale the result to fit the specified size.
Perhaps you could post an example PDF file, without that its very hard to comment sensibly.

Resources