How to embed ICC color profile in PDF other than Illustrator for Vector images

How to embed ICC color profile in PDF other than Illustrator for Vector images - ghostscript

Illustrator prompt on Color profile informationIn Illustrator there is an option to embed ICC color profile to the PDF like wise is there any command line tools to embed ICC color profiles to the vector images. I have tried Ghostscript and Inkscape nothing helps.

There is insufficient information here to answer your question. What ICC profiles do you want to embed, and under what conditions do you want to embed them, ie for what purpose ?
(Note that 'image' in PDF means a bitmap image, so you can't have a 'vector' image)
Ghostscript will be starting with input (PostScript, PDF, XPS, PXL or PCL) which already describes the content in a particular colour space, depending on the input format this could be DeviceGray, DeviceRGB, DeviceCMYK, /Separation, /DeviceN, CIEBasedA, CIEBasedABC, CIEBasedDEF, CieBasedDEFG, ICCBased, Lab, CalGray or CalRGB.
If the input is ICCBased then Ghostscript will of course emit the ICC profile describing that space. If the input is anything else, then how is Ghostscript supposed to know what is the correct ICC profile to use ? You would have to know the characteristics od the (for example) CMYK space that the application which produced the input file was using.
If you know the characteristics of the colour and space that you are using then you can embed an OutputIntent profile (possibly this is what you mean but you haven't said so). To do that with Ghostscript you will need to construct a series of pdfmark operations. There is code in ghostpdl/lib/PDFA_def.ps that does exactly that, because a PDF/A file which is not in a device-independent colour space requires an OutputIntent profile.

Related

How to recognize an image file format using its contents?

If a Image file is of format .png then it will contain ‰PNG, at the beginning of the file. (when read in Text mode)
If a Image file is of format .bmp then it will contain BM, at the beginning of the file. (when read in Text mode)
I know that Image formats contain text (data) of certain size (bytes) in the beginning of the file, which is used as metadata of the Image file?
My Questions are:-
Is this behavior same in all image file formats (or formats in general)?
Could a image file (of no extension) be recognized just using this data?
Is there information available on how this metadata is broken down? By that I mean, data at which position in the metadata has what meaning?

Is this behavior same in all image file formats (or formats in
general)?
For most of them, yes. There are some proprietary formats (e.g. for games) that might have very short or no metadata. Also, metadata might be in another file (e.g. animations together with XML metadata).
Could a image file (of no extension) be recognized just using this
data?
Yes. In fact, most image viewers will warn you if an image file has an incorrect extension and ask you if they should fix it.
On Unix systems, there's a file command that identifies files based on their metadata. There is a better tool specific for images called identify (part of ImageMagick) that returns more detailed information on resolution, bitdepth, etc.
Is there information available on how this metadata is broken down? By
that I mean, data at which position in the metadata has what meaning?
There are books about (image) file formats and for most formats, this information is available in official specifications (e.g. RFC 2083 for PNG). They list all of the (optional) file contents, describe the compressions and what a viewer/decoder/encoder can/must/should do with the data. A good starting point might be the Wikipedia list of image file formats.
Note that based on the examples you gave I suppose you opened files with a text editor which is not the ideal tool for that task. It's better to use a hex-editor for this. Text editors won't show most bytes (e.g. 255) by default and interprete others (e.g. tab or line feed). They might be good enough to see magic text strings like "BM" and "PNG", but with a hex editor, you can see both these text parts and their numerical representation - e.g. allowing you to extract image width and height. For this, some tool to convert hexademical values to decimal is useful, most calculators can do this.
As an example, let's look at the beginning of a PNG file with a resolution of 6146 x 14293 in both a text editor and a hex editor:
You can see that the file is a PNG image in both of them, that's correct. But the marked part in the hex editor view will show the width and height of the image (matching the PNG chunk specification of the "IHDR" part) - 0x00001802 is 6146 in decimal, 0x000037D5 is 14293. There's no way to do this in the text editor.
Also note that even if you don't know an image format, you might be lucky with just guessing it's uncompressed data (this often works for some game image file formats, most notable Unity's "assets"). E.g. if you rename files to ".raw", the image viewer IrfanView will give you a dialog (see the screenshot below) where you can guess width, height and bit depth of the image and see if the result looks good. This requires some experience in interpreting the outcome though, if width and bitdepth don't match, images will look like noise, warped, or have wrong colors.
This "image geometry guessing" can be improved/automated by trying different widths and computing the correlation coefficent between two lines. The tool raw2tiff can do this. Quote from the site:
There is no magic, it is just a mathematical statistics, so it can be
wrong in some cases. But for most ordinary images guessing method will
work fine.

Using Imagemagick, you can get that information (if available) for formats that Imagemagick can read from its "magick" data in the header file as follows:
convert image -format "%m\n" info:
For example:
convert lena.png -format "%m\n" info:
PNG
convert lena.jpg -format "%m\n" info:
JPEG
convert lena.pnm -format "%m\n" info:
PPM
Even if the suffix is removed, this still works:
convert lena_copy -format "%m\n" info:
PNG

Convert pdf to other formats with colorprofiles using ghostscript

I'm planning to use colorprofiles while converting pdf's to jpg/png/pdf(low res/high res/rgb/cmyk) but (Question 1)I could not find how could I determine if a input document has an icc profile and if it has do i use it to help my conversion. Is there a ghostscript command to determine icc profiles
I found a gs command to convert to pdf as below from link:
gs -o cmyk-doc.pdf \
-sDEVICE=pdfwrite \
-dOverrideICC=true \
-sDefaultCMYKProfile=/path/to/mycmykprofile.icc \
-sOutputICCProfile=/path/to/mydeviceprofile.icc \
-dRenderIntent=3 \
-dDeviceGrayToK=true \
input-doc.pdf
(Question 2)If my input document has a profile, can I skip the option -sDefaultCMYKProfile and only pass the required -sOutputICCProfile.

You can't use Ghostscript to determine if a PDF file has ICC colour profiles. Note there can be numerous colour profiles in a PDF file. Each colour can be in its own space, and each space can use a different ICC colour profile. That's in addition to the OutputIntent profile. On the other hand, you don't need to care.
For rendering you are basically looking at a conversion like this:
input colour -> CIE representation -> output colour
The input colour (in PDF) can be Gray, RGB, CMYK, Separation, DeviceN or one of the CIE spaces; Lab, ICC.
The output colour, for rendering will need to be one of Gray, RGB, CMYK or 'separated' (where each component rendered to a separate gray image).
The OutputICCProfile controls the second half of that conversion, the Default* profiles control the first half of that conversion, when the colour space isn't already one of the CIE spaces.
Taking question 2 first....
You never need to supply the DefaultCMYKProfile. That is used to override the Ghostscript default CMYK->CIE profile. You can use this if you happen to know that the input file is in CMYK space, and that it was a characterised CMYK space. In which case supplying the profile for that space would do a better job of converting from CMYK to CIE than the Ghostscript default one. However this is rare and generally only true in controlled workflows.
For question 1:
You only really care if the PDF file does not contain ICC profiles but instead defines colours in a device space such as RGB or CMYK. In that case, and assuming the PDF file has been created for a colour controlled workflow, you might assume that all the colours are in the same space, in which case I believe you would want to use the OutputIntent profile from the PDF file, or override the Ghostscript default (see below). There is extensive documentation on the use of ICC profiles here.
The OutputICCProfile is used with rendering devices in order to characterise the output space. So if you are rendering to RGB output you might use a specific RGB ICC profile to convert from CIE space to RGB space. That profile might additionally then be attached to the output file (eg JPEG) and a conforming reader would be able to use that profile to convert the RGB samples back into CIE space in order to use yet another profile to convert to a different characterised space (eg the profile for your display).
Now that's for rendering, ie creating a bitmap image. The pdfwrite device, on the other hand, goes to considerable lengths to maintain input colours in their original colour space. If you want to convert them into a different space then you need to set -sColorConversionStrtaegy. The command line you've quoted won't do that. If you want it converted to a characterised CMYK space then you would indeed supply the OutputICCProfile, but you do need to specify -sColorConversionStrategy=DeviceCMYK.

Non-deterministic* data in header/beginning of PNG files

I noticed that PNG files created by Gimp from the same RPG data are identical except for the very beginning. This image shows a diff of otherwise identical PNG files created with Gimp:
What is this data which changes each time and how is it encoded? Are there tools to decode it? Can you learn something from this information, e.g. can you find out when a PNG file was (probably) created by this information?
I was under the impression that PNG files are created deterministically* and don't store meta data which isn't necessary to decode the image. (Obviously, the last part is not true, either, as Gimp writes its own name into the files but doesn't ask the user (which is does if you export something as a JPEG file).)
 * I use the word "deterministic" here to refer to things and only such which are the same on each execution/export/whatever given the same input. I'd usually use the word "functional" (i.e. like a mathematical function) but I fear this could be misunderstood by people who don't know what "functional" means in mathematics. Obviously, this is different from the usage of this word in information theory.

See the PNG header definition.
tIME stores the time that the image was last changed, so for me it's the same as the timestamp of the file you create.
bKGD gives the default background color. Possibly the bakcgournd color you are using in Gimp, or the color of the transparent pixels.
tEXT with key Comment and value Created with Gimp is just the default comment. You can change the comment for the image in Image>Properties and you can set a default comment in Edit>Preferences>Default Image
When I export the same PNG twice, I only see a change in tIME. In fact I can't get a bKGD item, even when exporting a PNG with transparent pixels. Are you using any specific options when exporting?

How to convert Autocad drawing to image?

I have an Autocad drawing which is a plan for land squares where each square contains a number.
I tried to convert it to image by choosing: File --> Export Data --> and file format Bitmap (bmp). (I have Autocad 2013 Mac version)
the file converted to image, but the quality is too bad, I can't see the land numbers inside the square when I zoom in the image.
I tried also with Postscript (PS file format), quality is a bit better but it's still bad.
Is there away to convert Autocad file to image but still preserve it's high quality details
I need to convert the file to image because I would like to publish it online on my website. maybe there other was to publish autocad file on line, if so, please advice. But the trick I want the background of the autcad (plan) to be transparent so that I could display it on top of Google maps. I if I used autocad plugin I can't make it transparent. right?

Use the PLOT (for the current drawing) or PUBLISH (for batching) commands to produce high quality images.

Is it possible to check if a PDF is CMYK or RGB using GhostScript?

Is it possible to check if a PDF is CMYK or RGB using GhostScript?
I am aware of the inkcov feature, but this just returns values in terms of CMYK (with silent conversion)?
Is the real check, a check for RGB colours or RGB images within the PDF? not sure if both RGB and CMYK images can exist in the same PDF?

Images aren't the only thing that can be in a PDF file, you can also have text, linework and shadings. Also transparency blending can be specified in specific colour spaces. Colour spaces are not limited to RGB or CMYK but can also include Gray and spot (Separation) colours, as well as ICCBased colour spaces and certain specific CIE colour spaces such as Lab.
All of these colour spaces can potentially be present in a PDF file simultaneously.
Ghostscript doesn't contain any tools currently to tell you what colour spaces are used in a PDF file, though the pdf_info.ps script could be modified to do so for unusual (not grey/RGB/CMYK) spaces. You could also write a small piece of PostScript which could tell you when a colour space was used, and what kind of colour it is.
The inkcov device is a CMYK device, so all colours specified in the PDF are converted to CMYK before being 'printed' to the inkcov device which counts up the coverage. It doesn't tell you anything about the original PDF file.

My understanding is that a PDF can contain both RGB and CMYK images, so you'd need to have a tool that can review all images and report on their mode.
If GhostScript doesn't include options to do so, you may have to write a script to use a PDF library for parsing the image and reporting details on the elements it contains.
For example, this Cam::PDF module in Perl says it can parse any PDF v1.5 formatted file.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio