Convert pdf to other formats with colorprofiles using ghostscript - ghostscript

I'm planning to use colorprofiles while converting pdf's to jpg/png/pdf(low res/high res/rgb/cmyk) but (Question 1)I could not find how could I determine if a input document has an icc profile and if it has do i use it to help my conversion. Is there a ghostscript command to determine icc profiles
I found a gs command to convert to pdf as below from link:
gs -o cmyk-doc.pdf \
-sDEVICE=pdfwrite \
-dOverrideICC=true \
-sDefaultCMYKProfile=/path/to/mycmykprofile.icc \
-sOutputICCProfile=/path/to/mydeviceprofile.icc \
-dRenderIntent=3 \
-dDeviceGrayToK=true \
input-doc.pdf
(Question 2)If my input document has a profile, can I skip the option -sDefaultCMYKProfile and only pass the required -sOutputICCProfile.

You can't use Ghostscript to determine if a PDF file has ICC colour profiles. Note there can be numerous colour profiles in a PDF file. Each colour can be in its own space, and each space can use a different ICC colour profile. That's in addition to the OutputIntent profile. On the other hand, you don't need to care.
For rendering you are basically looking at a conversion like this:
input colour -> CIE representation -> output colour
The input colour (in PDF) can be Gray, RGB, CMYK, Separation, DeviceN or one of the CIE spaces; Lab, ICC.
The output colour, for rendering will need to be one of Gray, RGB, CMYK or 'separated' (where each component rendered to a separate gray image).
The OutputICCProfile controls the second half of that conversion, the Default* profiles control the first half of that conversion, when the colour space isn't already one of the CIE spaces.
Taking question 2 first....
You never need to supply the DefaultCMYKProfile. That is used to override the Ghostscript default CMYK->CIE profile. You can use this if you happen to know that the input file is in CMYK space, and that it was a characterised CMYK space. In which case supplying the profile for that space would do a better job of converting from CMYK to CIE than the Ghostscript default one. However this is rare and generally only true in controlled workflows.
For question 1:
You only really care if the PDF file does not contain ICC profiles but instead defines colours in a device space such as RGB or CMYK. In that case, and assuming the PDF file has been created for a colour controlled workflow, you might assume that all the colours are in the same space, in which case I believe you would want to use the OutputIntent profile from the PDF file, or override the Ghostscript default (see below). There is extensive documentation on the use of ICC profiles here.
The OutputICCProfile is used with rendering devices in order to characterise the output space. So if you are rendering to RGB output you might use a specific RGB ICC profile to convert from CIE space to RGB space. That profile might additionally then be attached to the output file (eg JPEG) and a conforming reader would be able to use that profile to convert the RGB samples back into CIE space in order to use yet another profile to convert to a different characterised space (eg the profile for your display).
Now that's for rendering, ie creating a bitmap image. The pdfwrite device, on the other hand, goes to considerable lengths to maintain input colours in their original colour space. If you want to convert them into a different space then you need to set -sColorConversionStrtaegy. The command line you've quoted won't do that. If you want it converted to a characterised CMYK space then you would indeed supply the OutputICCProfile, but you do need to specify -sColorConversionStrategy=DeviceCMYK.

Related

Ghostcript Converting PDF from RGB to CMYK also produces DeviceGray object

I'm trying to use Ghostscript to convert PDF files which have RGB colors into CMYK colors. The blue colors in my PDF get converted into DeviceCMYK just fine. However, pure white colors (i.e. RGB 255, 255, 255) don't show up in the CMYK separation when I do Output Preview in Adobe Acrobat. When I use Acrobat's Object Inspector, it reveals that my white colors have ColorSpace=DeviceGray and ColorValues=1.0 (i.e. white).
This is the simplest form of the command I'm using:
ghostscript\gswin32c.exe -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -sColorConversionStrategy=CMYK -sOutputFile="cmyk.pdf" "rgb.pdf"
I would like to force these white colors to be ColorSpace=DeviceCMYK and ColorValues=0, 0, 0, 0. (This is what Acrobat's Convert Colors produces). How can I do that? I tried hunting through the documentation and trying out various switches that I didn't fully understand, but the result was always the same so far.
If it matters, the version of Ghostscript is 9.21
Input and output files can be downloaded here: https://ufile.io/f/faxbb
It's quite simple, you can't do that. The pdfwrite device regards DeviceGray as a subset of CMYK (it's the K channel) and if it finds a DeviceGray colour it will retain it as such. However, true RGB colours ought to be converted to CMYK.
You haven't supplied the original file so I can't inspect it or try it.
You should upgrade, 9.21 is 3 years old, 9.53.0 was released today.
Edit
I checked the code and this is actually an optimisation. C=M=Y=0, K=% or R=G=B are converted into DeviceGray when written out (ONLY if you are using ColorConversion though!). This is because a single component floating point is smaller to write than three floating point values, so you get a smaller PDF file.
If any profession printing service fails to print colours in DeviceGray I'd be shocked. I'd also be looking for a new printer!

How to embed ICC color profile in PDF other than Illustrator for Vector images

Illustrator prompt on Color profile informationIn Illustrator there is an option to embed ICC color profile to the PDF like wise is there any command line tools to embed ICC color profiles to the vector images. I have tried Ghostscript and Inkscape nothing helps.
There is insufficient information here to answer your question. What ICC profiles do you want to embed, and under what conditions do you want to embed them, ie for what purpose ?
(Note that 'image' in PDF means a bitmap image, so you can't have a 'vector' image)
Ghostscript will be starting with input (PostScript, PDF, XPS, PXL or PCL) which already describes the content in a particular colour space, depending on the input format this could be DeviceGray, DeviceRGB, DeviceCMYK, /Separation, /DeviceN, CIEBasedA, CIEBasedABC, CIEBasedDEF, CieBasedDEFG, ICCBased, Lab, CalGray or CalRGB.
If the input is ICCBased then Ghostscript will of course emit the ICC profile describing that space. If the input is anything else, then how is Ghostscript supposed to know what is the correct ICC profile to use ? You would have to know the characteristics od the (for example) CMYK space that the application which produced the input file was using.
If you know the characteristics of the colour and space that you are using then you can embed an OutputIntent profile (possibly this is what you mean but you haven't said so). To do that with Ghostscript you will need to construct a series of pdfmark operations. There is code in ghostpdl/lib/PDFA_def.ps that does exactly that, because a PDF/A file which is not in a device-independent colour space requires an OutputIntent profile.

Non-deterministic* data in header/beginning of PNG files

I noticed that PNG files created by Gimp from the same RPG data are identical except for the very beginning. This image shows a diff of otherwise identical PNG files created with Gimp:
What is this data which changes each time and how is it encoded? Are there tools to decode it? Can you learn something from this information, e.g. can you find out when a PNG file was (probably) created by this information?
I was under the impression that PNG files are created deterministically* and don't store meta data which isn't necessary to decode the image. (Obviously, the last part is not true, either, as Gimp writes its own name into the files but doesn't ask the user (which is does if you export something as a JPEG file).)
 * I use the word "deterministic" here to refer to things and only such which are the same on each execution/export/whatever given the same input. I'd usually use the word "functional" (i.e. like a mathematical function) but I fear this could be misunderstood by people who don't know what "functional" means in mathematics. Obviously, this is different from the usage of this word in information theory.
See the PNG header definition.
tIME stores the time that the image was last changed, so for me it's the same as the timestamp of the file you create.
bKGD gives the default background color. Possibly the bakcgournd color you are using in Gimp, or the color of the transparent pixels.
tEXT with key Comment and value Created with Gimp is just the default comment. You can change the comment for the image in Image>Properties and you can set a default comment in Edit>Preferences>Default Image
When I export the same PNG twice, I only see a change in tIME. In fact I can't get a bKGD item, even when exporting a PNG with transparent pixels. Are you using any specific options when exporting?

Is it possible to check if a PDF is CMYK or RGB using GhostScript?

Is it possible to check if a PDF is CMYK or RGB using GhostScript?
I am aware of the inkcov feature, but this just returns values in terms of CMYK (with silent conversion)?
Is the real check, a check for RGB colours or RGB images within the PDF? not sure if both RGB and CMYK images can exist in the same PDF?
Images aren't the only thing that can be in a PDF file, you can also have text, linework and shadings. Also transparency blending can be specified in specific colour spaces. Colour spaces are not limited to RGB or CMYK but can also include Gray and spot (Separation) colours, as well as ICCBased colour spaces and certain specific CIE colour spaces such as Lab.
All of these colour spaces can potentially be present in a PDF file simultaneously.
Ghostscript doesn't contain any tools currently to tell you what colour spaces are used in a PDF file, though the pdf_info.ps script could be modified to do so for unusual (not grey/RGB/CMYK) spaces. You could also write a small piece of PostScript which could tell you when a colour space was used, and what kind of colour it is.
The inkcov device is a CMYK device, so all colours specified in the PDF are converted to CMYK before being 'printed' to the inkcov device which counts up the coverage. It doesn't tell you anything about the original PDF file.
My understanding is that a PDF can contain both RGB and CMYK images, so you'd need to have a tool that can review all images and report on their mode.
If GhostScript doesn't include options to do so, you may have to write a script to use a PDF library for parsing the image and reporting details on the elements it contains.
For example, this Cam::PDF module in Perl says it can parse any PDF v1.5 formatted file.

Ghostscript: how to reduce file size of large PDFs without changing smaller PDFs

I am using GhostScript to convert large batches of PDF to PDF to reduce file size. The original PDFs vary in size and quality. Where there is a low quality, small file size (<350kb) PDF the output from Ghostscript is often poor.
Is there a way I can get GhostScript to ignore files below a certain size and just pass them through without downsampling?
Current settings:
SearchablePDFSetting=-dColorImageResolution=120 -dMonoImageResolution=38 -dMonoImageDownsampleType=/Average -dOptimize=true -dDownsampleColorImages=true -dDownsampleGrayImages=true -dDownsampleMonoImages=true -dUseCIEColor -dColorConversionStrategy=/sRGB -dFIXEDMEDIA -dDEVICEWIDTHPOINTS=596 -dDEVICEHEIGHTPOINTS=834
Thanks,
Vix
The pdfwrite device can already pass images (not files) through without downsampling, there is no way to 'pass through without changing' a file. If you want to not process files below a certain size, then don't process them.
To avoid further downsampling of images, you need to add the 'xxxxImageDownsampleThreshold' parameters (one each for Mono, Grey and Color). If you set this to (eg) 1.5 then images which are up to 50% higher resolution than the target resolution won't be downsampled.
Note that you haven't (apparently) set a GrayImageDownsampleResolution, you haven't set the downsample type for Color or Gray images and a MonoImageResolution of 38 looks pretty ugly to me.
The default Gray image filter is DCT (JPEG) as is the Color filter. If the original image was DCT then applying a second round of DCT compression will result in ugly artefacts, especially if the image is not downsampled. I would suggest you change the filter type to FlateEncode.
All these options are documented in ps2pdf.htm in the Ghostscript doc folder.
Add the option:
-dPDFSETTINGS=/screen
This "selects low-resolution output similar to the Acrobat Distiller 'Screen Optimized' setting."

Resources