ghostscript option to make a pdf with flattened images - ghostscript

Is there an option to print a pdf in ghostscript as images?
I can use:
gs -dNOPAUSE -dBATCH -sDEVICE=pngalpha -r300 -sOutputFile=p%03d.png my.pdf
Then use imagemagick to make a pdf out of them with:
convert *.png new.pdf
PDF printers seem to have an option that does the same thing that is a checkbox that says "print as image". I could not find anything in the ghostscript docs that sounded like that was an option. There may be a term for it that I just don't know to look for.
It is kind of hard to explain why you would want to take a pdf document that is text and turn it into a document of images of text that is 4 times the size of the original but that is what I want to do.

Currently the only way to do that would be to start with a PDF which contains transparency operations, and select a CompatibilityLevel of 1.3 or less.
I have an idea to implement this feature, but I have not had time to work on it.
You can do it as a 2-pass approach using Ghostscript to render an image, then using the view* scripts to read the image back into Ghostscript and produce a PDF. No better than using convert of course.

Related

How can I take a pdf, and convert any jpeg2000/jpx/jp2 images in it to jpeg images?

I am using MacOS Mojave on a Mac Mini, and I am also using an old Kindle Dx which cannot read jpeg2000 images. It also has trouble with too many or too large jpeg images.
I cannot use touchscreens, so newer e-readers and tablets aren't a solution.
So far, I've found some buggy solutions--
I can use Willus's k2pdfopt with -mode copy and -dev dx, which rasterizes everything. It's a good solution for scanned pdfs. If more detail is needed, -mode copy without -dev dx will preserve higher resolution. It's something of a last resort for pdf-born-pdfs, since text can be uglier and harder to read, and file sizes can increase alarmingly.
I can also use Ghostscript with -dCompatibilityLevel=1.4, which doesn't rasterize everything. It converts jpeg2000 images to jpeg images. But it doesn't tackle some oversized or poorly-constructed images, it often creates dark rectangles which can obscure text, and it occasionally loses the ability to search or select text. [P.S. I mean it takes a pdf which had searchable pdf and outputs one which does not. Also if I do any kind of image downsampling or removal, it sometimes rescales everything or loses pages.]
I have experimented with options to compress images in Ghostscript, with mixed success, and with the above bugs persisting. [P.S. I think I was downsampling, yes.]
For whatever reason, MacOS Quartz filters only work if they will reduce image sizes. So they tend not to work on the buggy images.
Now my ideal solution would preserve the text itself, preferably untangling ligatures, and would compress the images like Willus's k2pdfopt. But I have no idea if that's possible or how.
Short of that-- I'm wondering if there's a way to use Ghostscript to convert the jpeg2000 images without causing the gray rectangles or losing the ability to search or select text.
or if there's a way to use Quartz filters so they work. In some older versions of MacOS they did work.
or if there's a way to batch-print these pdf files to the appropriate resolution, apparently 800x1180, reprocessing images in the process.
I don't have much programming experience. I mainly use homebrew to install command-line tools, very sloppy bash scripts, and Automator to run them.
P.S. For a minimal example of the gray rectangles in Ghostscript, using the free pdf from here: https://www.peginc.com/store/test-drive-savage-worlds-the-wild-hunt/
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -o out.pdf in.pdf
substituting that pdf for in.pdf.
For a minimal example of losing searchable text, using the free pdf from here: http://datafortress2020.com/fileproject/details.php?image_id=498
same minimal script
Compatibility Level
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dCompatibilityLevel=1.4 -o out.pdf in.pdf
Aggressive Downsampling and Grayscale
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dCompatibilityLevel=1.4
-g800x1080 -r150 -dPDFFitPage \
-dFastWebView -sColorConversionStrategy=Gray \
-dDownsampleColorImages=true -dDownsampleGrayImages=true -dDownsampleMonoImages=true -dColorImageResolution=75 -dGrayImageResolution=75 -dMonoImageResolution=150 -dColorImageDownsampleThreshold=1.0 -dGrayImageDownsampleThreshold=1.0 -dMonoImageDownsampleThreshold=1.0 \ -o out.pdf in.pdf
P.P.S. I can use k2pdfopt to rasterize to fit my Kindle. If the file has searchable text, this retains it, if it doesn't I can run tesseract in k2 or run ocrmypdf afterwards.
But if I want especially good graphics, or especially clear text, and the file has hundreds of pages, it will need hundreds of megs. I had blamed this on rasterizing the text, which was why my ideal solution was to keep text and rasterize images, but apparently it's an issue with the images themselves.
If you think you've found a bug, then it's helpful to report it. If you don't it will never be fixed. You can report a bug at https://bugs.ghostscript.com, please be sure to attach an example file to reproduce the problem and state the command line used.
The Ghostscript pdfwrite device does not, ever, produce JPEG2000 images (due to patent issues). So you don't need to set the CompatibilityLEvel at all, and I'd recommend that you do not. By setting the CompatibilityLevel you are limiting the output. Unless your device cannot handle later versions then don't do this.
Without seeing an example file, a command line and knowing the version and operating system it's obviously not possible for anyone to comment on your 'gray rectangles'.
You can reduce the size of images (in bytes) by downsampling (as opposed to compressing) them, you can't do anything about the number of images.
Note that searchable text depends on the construction of the PDF file, and so cannot ever be guaranteed. Searchable text (in the sense of ToUnicode CMaps) was a later addition to the PDF Reference and is always optional, because it's possible to have input from which the Unicode code points cannot be determined (without using OCR software) but a perfectly readable PDF file can still be produced.
Ghostscript itself can produce a PDF file which is a rendered representation of the original, wrapped up as a PDF. See the pdfimage* devices.
Tesseract can take images and produce PDF files with searchable text, produced by OCR'ing the images. This would seem to me to be your best option, though obviously I don't know if a single large image is going to be acceptable to your device.
Edit
I already agreed that searching text is inherently not supported in PDF, except as an optional adjunct. The bug report you pointed to talks about 'corrupting text layers'. There are no text layers in PDF, and the text is neither corrupted nor missing, ts just not encoded as ASCII any more.
The reason you shouldn't set the resolution, and the size in pixels, is because PDF is not an image format. You aren't gaining anything by doing this. All that happens is that pdfwrite divides the 'g' valuess by the resolution, to get a media size in inches, and writes that as the MediaBox. Simpler just to set the Media Size. If you set the resolution you are fixing anything which does get rendered at that resolution. Choose a low resolution and you get crappy output. If you use a higher resolution then the image can be downscaled and smoothed giving better output.
It is indeed possible that your Kindle cannot handle transparency any better than the Mac, it is after all an old device. It's also possible that whoever built Ghostscript for you introduced a bug. I'm afraid we can't help you with either of those.
I did suggest, right back at the end of the original post, that you render the content to an image (Ghostscript will do that for you), then use Tesseract to convert the image back to a PDF, and at the same time OCR the text.
That will get past your problems with JPEG2000, will do a *better job of creating searchable text, since even files that aren't already searchable will become so, and will allow you to specify the resolution.

Convert RGB pdf to CMYK preserve pdf

I am using ghostscript 9.25 windows.
I am trying to convert RGB pdf to CMYK preserve pdf using following command:
gswin32c.exe
-dSAFER -dBATCH -dNOPAUSE -dNOCACHE -sDEVICE=pdfwrite -sColorConversionStrategy=CMYK -dProcessColorModel=/DeviceCMYK -dAutoFilterColorImages=false -dAutoFilterGrayImages=false -sOutputFile=out.pdf input.pdf
input.pdf file here
https://www.dropbox.com/s/8jfnov526nhb9m9/blank.pdf?dl=0
output.pdf file here
https://www.dropbox.com/s/ftrmm32mmixaxqh/out.pdf?dl=0
but my output becomes light compared adobe output, expected result is it should be dark when i do in adobe CMYK preserve option, i am getting little dark compared to ghostscript output. Am I doing anything wrong?
Should I use any icc profile?
Thanks
You say you are using ImageMagick, yet you give a Ghostscript command line....
I presume that when you say CMYL you mean CMYK.
There is nothing immediately obviously wrong with your command line, but you have given no example file, nor any reason why you expect the result to be 'dark'.
If you want to control the conversion then you will need to supply at least one and possibly up to 4 ICC profiles. You will certainly need a CIE->CMYK Output profile, and you might like to supply ICC profiles for Gray->CIE, RGB->CIE and CMYK->CIE as well, in order to override the default ones Ghostscript is using.
[EDIT]
The problem is nothing to do with colour conversion. Your original file contains nothing except a very large image, which is compressed with the Flate filter (lossless). It looks like this:
You've turned off auto filtering, but you haven't told Ghostscript which compression filter to use for images, so it sticks with the default, which is JPEG (DCT). The image now looks like this:
For the nature of your original image, JPEG (lossy) compression is an outstandingly bad choice. The output image compresses less well, and it loses fidelity. You should change to using Flate compression instead of JPEG for images of this kind.
By the way, the image in your original PDF file was defined in CMYK space already.

how to convert PS image to PNG and fit to page or to multiple pages

I am analyzing a model (compiled with -pg option so it would generate "gmon.out") and then generated a PS file (using gprof2dot.py and piping that to "dot") that charts how much time is spent in each subroutine. When I use ghostscript to convert to a PDF it is cutting off some of the right hand side of the figure. So I tested outputting to multiple pages, but the first page still has the right side cut off and the second page is blank.
These are the 2 commands I have tried:
gs -dBATCH -dNOPAUSE -dPDFFitPage -sOutputFile=myfile.pdf -sDEVICE=pdfwrite output.ps
gs -dBATCH -dNOPAUSE -dPDFFitPage -sOutputFile=out%d.pdf -sDEVICE=pdfwrite output.ps
Please let me know if you have any suggestions. Thanks!
Your title says you want a PNG (and you render a PostScript program to PNG, not "convert a PostScript image", PostScript is a programming language not an image format) but your description says you are creating PDF files. So which is it, PNG or PDF ?
Using PDFFitPage scales the requested media size to fit an actual (already set up, fixed) media size, since you haven't set a fixed media size on the command line, no scaling will be performed.
So, what media size are you getting, and why is that not correct ? I would 'guess' that your PostScript program does not request a media size, if it did then pdfwrite would create a PDF with the same media size.
In the absence of a media request, the pdfwrite device uses the default. Depending on your system and configuration that will most likely be either A4 or US Letter. It then reproduces the PostScript drawing program by either rendering to a bitmap, or as a PDF page description, using that media size. If the original PostScript required a different media size, then bits will be clipped.
Since you have not supplied an example its not possible for me to do more than guess of course.
However, you should probably try setting -sPAPERSIZE to something like A3. Or set a specific media size using -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS.
If you supplied an example I could probably be more specific.
It would also be a decent idea to mention what version of Ghostscript you are using too.

GhostScript PDF to PostScript

I have to convert pdf files (created with jasperreports) to postscript.
I'm using ghostscript (Version 9.19) to make the conversion.
The commmand i'm using is:
gswin64c -dNOPAUSE -dBATCH -sDEVICE=ps2write -sOutputFile=file.ps file.pdf
The conversion is done without problem, but when i open the postscript file generated (using GSview 5.0), the top margin is crop by 2-3 cm, and some information to print is lost.
I have changed the device from ps2write to eps2write, used the property -g<width>x<height> with the page size in pixels, but the problem persist.
The file is to be printed in a preformated paper, so i can not use the postscript generated to print.
Can someone help?
Thanks
Its not possible to say with great certainty, but it sounds like the PDF mediaBox is larger than the media you have specified to GSView.
You can try using the -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS along with -dFIXEDMEDIA and -dPDFFitPage, that should allow you to set up a specific media size, override the size in the PDF file and scale the result to fit the specified size.
Perhaps you could post an example PDF file, without that its very hard to comment sensibly.

Is it possible to not render text using the Ghostscript command line?

I want to do some image processing in MATLAB. However, my source files are PDF's, so I'm converting them to images using the Ghostscript command line.
gs -dNOPAUSE -dBATCH -sDEVICE=pngalpha -r300 -sOutputFile=test%d.png test.pdf
However, there is some text overlaying the images I would like to process. Is it possible to disable the rendering of fonts? Ugly workarounds are acceptable too, like forcing gs to use an empty font (does such a thing exist?) or something.
If you are using a recent version of Ghostscript (and if you aren't, then upgrade) you can use:
-dFILTERTEXT
See Ghostscript documentation: Use.htm => "Other parameters".

Resources