how to convert PS image to PNG and fit to page or to multiple pages - ghostscript

I am analyzing a model (compiled with -pg option so it would generate "gmon.out") and then generated a PS file (using gprof2dot.py and piping that to "dot") that charts how much time is spent in each subroutine. When I use ghostscript to convert to a PDF it is cutting off some of the right hand side of the figure. So I tested outputting to multiple pages, but the first page still has the right side cut off and the second page is blank.
These are the 2 commands I have tried:
gs -dBATCH -dNOPAUSE -dPDFFitPage -sOutputFile=myfile.pdf -sDEVICE=pdfwrite output.ps
gs -dBATCH -dNOPAUSE -dPDFFitPage -sOutputFile=out%d.pdf -sDEVICE=pdfwrite output.ps
Please let me know if you have any suggestions. Thanks!

Your title says you want a PNG (and you render a PostScript program to PNG, not "convert a PostScript image", PostScript is a programming language not an image format) but your description says you are creating PDF files. So which is it, PNG or PDF ?
Using PDFFitPage scales the requested media size to fit an actual (already set up, fixed) media size, since you haven't set a fixed media size on the command line, no scaling will be performed.
So, what media size are you getting, and why is that not correct ? I would 'guess' that your PostScript program does not request a media size, if it did then pdfwrite would create a PDF with the same media size.
In the absence of a media request, the pdfwrite device uses the default. Depending on your system and configuration that will most likely be either A4 or US Letter. It then reproduces the PostScript drawing program by either rendering to a bitmap, or as a PDF page description, using that media size. If the original PostScript required a different media size, then bits will be clipped.
Since you have not supplied an example its not possible for me to do more than guess of course.
However, you should probably try setting -sPAPERSIZE to something like A3. Or set a specific media size using -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS.
If you supplied an example I could probably be more specific.
It would also be a decent idea to mention what version of Ghostscript you are using too.

Related

How can I take a pdf, and convert any jpeg2000/jpx/jp2 images in it to jpeg images?

I am using MacOS Mojave on a Mac Mini, and I am also using an old Kindle Dx which cannot read jpeg2000 images. It also has trouble with too many or too large jpeg images.
I cannot use touchscreens, so newer e-readers and tablets aren't a solution.
So far, I've found some buggy solutions--
I can use Willus's k2pdfopt with -mode copy and -dev dx, which rasterizes everything. It's a good solution for scanned pdfs. If more detail is needed, -mode copy without -dev dx will preserve higher resolution. It's something of a last resort for pdf-born-pdfs, since text can be uglier and harder to read, and file sizes can increase alarmingly.
I can also use Ghostscript with -dCompatibilityLevel=1.4, which doesn't rasterize everything. It converts jpeg2000 images to jpeg images. But it doesn't tackle some oversized or poorly-constructed images, it often creates dark rectangles which can obscure text, and it occasionally loses the ability to search or select text. [P.S. I mean it takes a pdf which had searchable pdf and outputs one which does not. Also if I do any kind of image downsampling or removal, it sometimes rescales everything or loses pages.]
I have experimented with options to compress images in Ghostscript, with mixed success, and with the above bugs persisting. [P.S. I think I was downsampling, yes.]
For whatever reason, MacOS Quartz filters only work if they will reduce image sizes. So they tend not to work on the buggy images.
Now my ideal solution would preserve the text itself, preferably untangling ligatures, and would compress the images like Willus's k2pdfopt. But I have no idea if that's possible or how.
Short of that-- I'm wondering if there's a way to use Ghostscript to convert the jpeg2000 images without causing the gray rectangles or losing the ability to search or select text.
or if there's a way to use Quartz filters so they work. In some older versions of MacOS they did work.
or if there's a way to batch-print these pdf files to the appropriate resolution, apparently 800x1180, reprocessing images in the process.
I don't have much programming experience. I mainly use homebrew to install command-line tools, very sloppy bash scripts, and Automator to run them.
P.S. For a minimal example of the gray rectangles in Ghostscript, using the free pdf from here: https://www.peginc.com/store/test-drive-savage-worlds-the-wild-hunt/
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -o out.pdf in.pdf
substituting that pdf for in.pdf.
For a minimal example of losing searchable text, using the free pdf from here: http://datafortress2020.com/fileproject/details.php?image_id=498
same minimal script
Compatibility Level
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dCompatibilityLevel=1.4 -o out.pdf in.pdf
Aggressive Downsampling and Grayscale
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dCompatibilityLevel=1.4
-g800x1080 -r150 -dPDFFitPage \
-dFastWebView -sColorConversionStrategy=Gray \
-dDownsampleColorImages=true -dDownsampleGrayImages=true -dDownsampleMonoImages=true -dColorImageResolution=75 -dGrayImageResolution=75 -dMonoImageResolution=150 -dColorImageDownsampleThreshold=1.0 -dGrayImageDownsampleThreshold=1.0 -dMonoImageDownsampleThreshold=1.0 \ -o out.pdf in.pdf
P.P.S. I can use k2pdfopt to rasterize to fit my Kindle. If the file has searchable text, this retains it, if it doesn't I can run tesseract in k2 or run ocrmypdf afterwards.
But if I want especially good graphics, or especially clear text, and the file has hundreds of pages, it will need hundreds of megs. I had blamed this on rasterizing the text, which was why my ideal solution was to keep text and rasterize images, but apparently it's an issue with the images themselves.
If you think you've found a bug, then it's helpful to report it. If you don't it will never be fixed. You can report a bug at https://bugs.ghostscript.com, please be sure to attach an example file to reproduce the problem and state the command line used.
The Ghostscript pdfwrite device does not, ever, produce JPEG2000 images (due to patent issues). So you don't need to set the CompatibilityLEvel at all, and I'd recommend that you do not. By setting the CompatibilityLevel you are limiting the output. Unless your device cannot handle later versions then don't do this.
Without seeing an example file, a command line and knowing the version and operating system it's obviously not possible for anyone to comment on your 'gray rectangles'.
You can reduce the size of images (in bytes) by downsampling (as opposed to compressing) them, you can't do anything about the number of images.
Note that searchable text depends on the construction of the PDF file, and so cannot ever be guaranteed. Searchable text (in the sense of ToUnicode CMaps) was a later addition to the PDF Reference and is always optional, because it's possible to have input from which the Unicode code points cannot be determined (without using OCR software) but a perfectly readable PDF file can still be produced.
Ghostscript itself can produce a PDF file which is a rendered representation of the original, wrapped up as a PDF. See the pdfimage* devices.
Tesseract can take images and produce PDF files with searchable text, produced by OCR'ing the images. This would seem to me to be your best option, though obviously I don't know if a single large image is going to be acceptable to your device.
Edit
I already agreed that searching text is inherently not supported in PDF, except as an optional adjunct. The bug report you pointed to talks about 'corrupting text layers'. There are no text layers in PDF, and the text is neither corrupted nor missing, ts just not encoded as ASCII any more.
The reason you shouldn't set the resolution, and the size in pixels, is because PDF is not an image format. You aren't gaining anything by doing this. All that happens is that pdfwrite divides the 'g' valuess by the resolution, to get a media size in inches, and writes that as the MediaBox. Simpler just to set the Media Size. If you set the resolution you are fixing anything which does get rendered at that resolution. Choose a low resolution and you get crappy output. If you use a higher resolution then the image can be downscaled and smoothed giving better output.
It is indeed possible that your Kindle cannot handle transparency any better than the Mac, it is after all an old device. It's also possible that whoever built Ghostscript for you introduced a bug. I'm afraid we can't help you with either of those.
I did suggest, right back at the end of the original post, that you render the content to an image (Ghostscript will do that for you), then use Tesseract to convert the image back to a PDF, and at the same time OCR the text.
That will get past your problems with JPEG2000, will do a *better job of creating searchable text, since even files that aren't already searchable will become so, and will allow you to specify the resolution.

Ghostscript doesn't keep landscape page orientation

Example:
Create a one sheet landscape PDF from Excel. Open it using Adobe Reader, where it opens as landscape. Print it using Ghostscript. Open printout using Adobe Reader, and it should still be landscape, but it isn't.
This issues happens whether or not the source is a PDF or Excel itself. I use a PDF source in the example because I figured that it would be easier to illustrate the issue.
I know it can probably be forced to be printed using landscape. What I don't understand is why Ghostscript is not using the orientation of the original document. How can I get Ghostscript to keep the orientation of the original document?
PDFWrite config file:
-I"C:\Program Files\gs\gs9.22\lib";"C:\Program Files\gs\fonts"
-sDEVICE=pdfwrite
-r300
-dNOPAUSE
-dSAFER
-sPAPERSIZE=A4
-dCompatibilityLevel=1.4
-dPDFSETTINGS=/prepress
Redmon config:
#"C:\Program Files\gs\pdfwrite.txt" -sOutputFile="%1" -c 30000000 setvmthreshold .setpdfwrite -f -
Printer driver:
Ghostscript PDF
You've left the AutoRotatePages switch at the default. The default is to rotate pages so that the majority of text is horizontal, if you don't want that, then set -dAutoRotatePages=/None.
Since you apparently already have a PDF file ? Why use Ghostscript to create another one ? You stand the chance of losing stuff that way.
Since the input is PDF there is no point in the -sPAPERSIZE switch, since that only sets the dfefault media, which is immediately overridden by the PDF interpreter.
If I were you I wouldn't set -dPDFSETTINGS at all, nor -r300 (the resolution setting is mostly pointless for pdfwrite)

Use gostscript 9.21 to convert text to outlines, and how to keep the resolution of the picture

I use gostscript to convert text to outlines with the following code :gswin32c.exe -sDEVICE=pdfwrite -sOutputFile=output.pdf -dQUIET -dNOPAUSE -dBATCH -dNoOutputFonts -f test_new.pdf,it works.But i got a very small output file from 2.5M to 70kb.Then i find the picture became blurred in pdf.
Add -dPDFSETTINGS=/default,This will have the same result.
I's better to use -dPDFSETTINGS=/printer or -dPDFSETTINGS=/prepress,but 300dpi is not enough for me(or for my boss).
Is there any way to keep the original resolution of the picture.
Or how to set a higher dpi for images in output pdf.
The test file is here.
Thanks in advance.
The answer to your question is 'yes' (but see later). Don't use PDFSETTINGS, that sets lots of things all in one go. If you want control then you need to specify each setting individually.
Rather than use this shotgun approach you need to read the documentation, decide which controls affect areas you want to change, and alter those controls only.
However, image downsampling is not your problem. If you don't use -dPDFSETTINGS then PDF file written by Ghostscript contains an image at exactly the same resolution as the image in the original file.
Your problem is that the image is being written with JPEG compression, and JPEG is a lossy compression, so you are losing fidelity. Note that in the original file the image is written uncompressed, which is why its so large.
It looks like the original image was a JPEG, and the free PDF editor you are using has realised that so it saved the image uncompressed (I may be giving it too much credit here, it may save all images uncompressed). Applying JPEG to an image which has already been quantised simply amplifies the artefacts.
Instead you need to specify that you want images compressed with Flate, which is a lossless compression. The documentation for the pdfwrite controls can be found here, you need to change AutoFilterColorImages and ColorImageFilter.
Note that by not applying JPEG quantisation (a second time) and DCT encoding, the compression is less than your first experience. For me the output file comes in at just over 600Kb (leaving the font in place, and the text as text, would be a couple of Kb smaller). However the image is identical, as expected.
Since you are clearly using Ghostscript in a commercial environment, can I just point you at the licence and ask you to check that your usage is compatible with the AGPL, bearing in mind that this covers software as a service usage as well.

GhostScript PDF to PostScript

I have to convert pdf files (created with jasperreports) to postscript.
I'm using ghostscript (Version 9.19) to make the conversion.
The commmand i'm using is:
gswin64c -dNOPAUSE -dBATCH -sDEVICE=ps2write -sOutputFile=file.ps file.pdf
The conversion is done without problem, but when i open the postscript file generated (using GSview 5.0), the top margin is crop by 2-3 cm, and some information to print is lost.
I have changed the device from ps2write to eps2write, used the property -g<width>x<height> with the page size in pixels, but the problem persist.
The file is to be printed in a preformated paper, so i can not use the postscript generated to print.
Can someone help?
Thanks
Its not possible to say with great certainty, but it sounds like the PDF mediaBox is larger than the media you have specified to GSView.
You can try using the -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS along with -dFIXEDMEDIA and -dPDFFitPage, that should allow you to set up a specific media size, override the size in the PDF file and scale the result to fit the specified size.
Perhaps you could post an example PDF file, without that its very hard to comment sensibly.

Fit to page size in ghostscript (with a possibly corrupt input)

I'm trying to use ghostscript to convert a .ps file to a series of .png files, largely because I don't have a tolerable ps viewer.
This is the command I've used:
gs -dBATCH -dEPSCrop -dEPSFitPage -sDEVICE=png16m -r300 -dNOPAUSE -sOutputFile=neptune_111115_ob1-2_13pca_boloplots_%d.png neptune_111115_ob1-2_13pca_boloplots.ps
(the .ps file is a multi-page postscript).
The outputs are partly off the page. I'd like the images to fit inside the page.
I can include example files, but they're pretty large - is there any particular part of the .ps file that would be helpful?
My suspicion is that the .ps file is specifying the bounding box incorrectly, but hacking the BB values didn't have any effect. The .ps file is written by IDL (ittvis' Interactive Data Language). I've also tried the above command without the -dEPS* commands without luck.
-dEPSCrop and -dEPSFitPage are mutually exclusive:
One crops the EPS to the BoundingBox specified in the comments.
The other scales up the EPS from the %%BoundingBox specified in the PS file's internal comments to fit the current media.
You can't really use both at the same time.
The file can't be an EPS file anyway, because you can't have multiple pages in an EPS file. So actually neither switch will have any effect (as you've discovered).
Either the PostScript requests a media size using setpage or setpagedevice, or it just uses whatever the currently set media is. My guess is that its just using the current media. Try setting -sPAPERSIZE=a4 and -sPAPERSIZE=letter.
If that works then the program does not request a media size. If it has no effect, then set -dFIXEDMEDIA in addition which will ignore subsequent requests to change the media size.
That should allow you to specify the correct media size, if you don't know what the media size should be then you can use the Ghostscript -sDEVICE=bbox device to find out.
Lastly, Ghostscript has a rudimentary display device which you can use to view the rendered output without first going to a PNG.

Resources