Ghostscript color converting of selected pages but output of all pages - ghostscript

I want to convert some pages in a PDF to grayscale
I can use the ghostscript option -sPageList to select the desired pages.
But the output contains only the selected pages.
How can I get the whole PDF including the converted pages?

You cannot do this in one pass with Ghostscript and the pdfwrite device.
The colour conversion options are applied to all the selected output pages, so you need to start by using the PagesList to output the pages you want colour converted. You can use the '%d' format specifier to get each page as a separate file. You can then 'split' the pages you didn't colour convert from the original file by doing the 'opposite' PagesList and using %d again to get each page in a separate file.
Now you have each page as a separate PDF file, some colour converted, some not.
Finally you can feed Ghostscript with each of the PDF files, in the desired order, to create a new PDF file which contains all the pages in the order you want.

Thanks to #KenS for the clarification.
Here's a Python script that does all this: https://gist.github.com/michaelosthege/a6cc9556ff4e2b64d5f7d3aaee43be70
→ Determinining which pages to convert
→ Extracting each page in colored/grayscale
→ Merging them all back into one PDF

Related

How to recognize an image file format using its contents?

If a Image file is of format .png then it will contain ‰PNG, at the beginning of the file. (when read in Text mode)
If a Image file is of format .bmp then it will contain BM, at the beginning of the file. (when read in Text mode)
I know that Image formats contain text (data) of certain size (bytes) in the beginning of the file, which is used as metadata of the Image file?
My Questions are:-
Is this behavior same in all image file formats (or formats in general)?
Could a image file (of no extension) be recognized just using this data?
Is there information available on how this metadata is broken down? By that I mean, data at which position in the metadata has what meaning?
Is this behavior same in all image file formats (or formats in
general)?
For most of them, yes. There are some proprietary formats (e.g. for games) that might have very short or no metadata. Also, metadata might be in another file (e.g. animations together with XML metadata).
Could a image file (of no extension) be recognized just using this
data?
Yes. In fact, most image viewers will warn you if an image file has an incorrect extension and ask you if they should fix it.
On Unix systems, there's a file command that identifies files based on their metadata. There is a better tool specific for images called identify (part of ImageMagick) that returns more detailed information on resolution, bitdepth, etc.
Is there information available on how this metadata is broken down? By
that I mean, data at which position in the metadata has what meaning?
There are books about (image) file formats and for most formats, this information is available in official specifications (e.g. RFC 2083 for PNG). They list all of the (optional) file contents, describe the compressions and what a viewer/decoder/encoder can/must/should do with the data. A good starting point might be the Wikipedia list of image file formats.
Note that based on the examples you gave I suppose you opened files with a text editor which is not the ideal tool for that task. It's better to use a hex-editor for this. Text editors won't show most bytes (e.g. 255) by default and interprete others (e.g. tab or line feed). They might be good enough to see magic text strings like "BM" and "PNG", but with a hex editor, you can see both these text parts and their numerical representation - e.g. allowing you to extract image width and height. For this, some tool to convert hexademical values to decimal is useful, most calculators can do this.
As an example, let's look at the beginning of a PNG file with a resolution of 6146 x 14293 in both a text editor and a hex editor:
You can see that the file is a PNG image in both of them, that's correct. But the marked part in the hex editor view will show the width and height of the image (matching the PNG chunk specification of the "IHDR" part) - 0x00001802 is 6146 in decimal, 0x000037D5 is 14293. There's no way to do this in the text editor.
Also note that even if you don't know an image format, you might be lucky with just guessing it's uncompressed data (this often works for some game image file formats, most notable Unity's "assets"). E.g. if you rename files to ".raw", the image viewer IrfanView will give you a dialog (see the screenshot below) where you can guess width, height and bit depth of the image and see if the result looks good. This requires some experience in interpreting the outcome though, if width and bitdepth don't match, images will look like noise, warped, or have wrong colors.
This "image geometry guessing" can be improved/automated by trying different widths and computing the correlation coefficent between two lines. The tool raw2tiff can do this. Quote from the site:
There is no magic, it is just a mathematical statistics, so it can be
wrong in some cases. But for most ordinary images guessing method will
work fine.
Using Imagemagick, you can get that information (if available) for formats that Imagemagick can read from its "magick" data in the header file as follows:
convert image -format "%m\n" info:
For example:
convert lena.png -format "%m\n" info:
PNG
convert lena.jpg -format "%m\n" info:
JPEG
convert lena.pnm -format "%m\n" info:
PPM
Even if the suffix is removed, this still works:
convert lena_copy -format "%m\n" info:
PNG

The right part is lost when using ghostscript to convert .prn file to pdf

I am using ghostpcl-9.20-win32.
I have tried this:
gpcl6win32 -dNOPAUSE-dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf input.prn
The right part of the input file is lost in the output.
input file:
https://drive.google.com/file/d/0B29492qqMUX7Zk9nUmhDYXpJRVk
output file:
https://drive.google.com/open?id=0B29492qqMUX7T2RxVDZJaE9seEE
Your PCL file (actually it appears to be simple text, not even PCL) doesn't contain a media request.
In the absence of a media size, GhostPCL (NB NOT Ghostscript, GhostPCL) uses its default media size. Depending on a number of factors that will be either A4 or US Letter, portrait.
If you want different media, then you need to tell GhostPCL what you want. You need to use -sPAPERSIZE or -dDEVICEWIDTHPOINTS -dDEVICEHIGHTPOINTS or any of the other media selection switches.

Convert a searchable PDF to searchable PDF/A using Ghostscript

I am using Ghostscript to convert PDF to PDF/A by command line:
gs -dPDFA -dBATCH -dNOPAUSE -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile="output.pdf" input.pdf
But output file lost searchable text property.
How can I obtain searchable PDF/A files as output ?
Thanks.
You haven't supplied an input file to look at, nor mentioned which version of Ghostscript you are using.
Let me start with my standard lecture on this subject; when you take a PDF file as input, and use Ghostscript's pdfwrite device to produce a new PDF file, you are NOT 'converting', 'editing' or 'modifying' the input file.
What happens is that the PDF interpreter interprets the PDF file, and produces a series of graphcs primitives, which it feeds to the graphics library. This then processes these primitives, and passes them to the device. The device then emits them to the output file. In the case of a rendering device (eg TIFF) it renders theoperation to a bitmap and when it reaches the end of file, it writes the bitmap as a file. In the case of pdfwrite, it re-assembles these primtives into a brand new PDF file.
So the output PDF file has nothing in common with the input PDF file, except its appearance.
There are disadvantages to this approach (it does limit us in preserving some non-printing aspects of the input file), but there are also advantages; for instance it permits us to alter colour spaces, flatten transparency, change font encodings etc.
In addition to this you have chosen to create a PDF/A file. PDF/A limits the available features of the PDF specification, and it may be (its impossible to tell without seeing the original file) that it simply isn't possible to represent the original PDF file as a PDF/A file without altering some aspects of it.
Again, without seeing the original file I can tell, but it may be that you simply cannot achieve what you want, or at least not using Ghostscript.

How can I add an image to an existing PDF template page containing form fields?

I'm doing a document scanning project that involves inserting a scanned image into an existing PDF template page that contains form fields. I've used ImageMagick to take process the scan, and then append a raster image of the form template to the bottom, and convert that image into a PDF. However, forms and checkbox fields have to be added manually to the resulting PDF. Below is a sample of my ImageMagick command.
convert inputScan.jpg -resize 975x420 FormTemplate.png -append CombinedFile.pdf
Ideally, I would run a command that would take the JPG scan and the PDF template file containing fields, and output a PDF file with the scan at the top of a page and the field-containing template text below it. The closest thing I could find to a solution was here, but PHP can't be used on the computer in question.
Any help or suggestions are greatly appreciated!

image not shown in dvi after latexing

I include several images of eps format in latex. After latex command, there are some of the images missing in the dvi file. Not sure if it is related to the image size, most of the images missing have size around 83kB while those shown up have a size less than 40kB. After conversion from dvi to ps, the images are all back. Just wonder what is the reason causing the images missing in dvi file?
Thanks and regards!
As far as I can remember, a dvi viewer cannot show eps file. Just use pdflatex as the front-end instead of latex and view the resulting pdf file.
Checking man xdvi reveals this:
Xdvi can show PostScript specials
by any of three methods. It will try
first to use Display PostScript,
then NeWS, then it will try to
use Ghostscript to render the images. All of these options
depend on additional software to work
properly; moreover, some of them may
not be compiled into this copy of xdvi.
So it would appear to be platform- and/or implementation-dependent.

Resources