Trying to take a pdf and convert it to a tiff, but make all grayscale into pure black. In other words, if it's not white, it should be black. The tiffg3 device is 1 bit, but it's still taking gray and trying to fake it.
Example command:
gs -dQUIET -dNOPAUSE -dBATCH -r200 -sPAPERSIZE=letter -sDEVICE=tiffg3 -sOutputFile=out.tiff in.pdf
Example input:
Example output:
Desired output:
The 1-bit device uses halftoning (aka screening) to represent shades of gray using only black and white pixels. That's what its intended to do, its not intended to change colours at all.
There are various ways you can fake what you want in PostScript (redefining setgray is one method, also setting a transfer function), but PDF is not a programming language, so this approach doesn't really work.
You could use a custom ICC profile to colour correct gray values so that they are all black. Ghostscript versions 9 and above use Little CMS as the colour management system, and have a 'default' Gray ICC profile which you can override. Presumably it would be easy enough to construct a profile which maps anything other than white to pure black. However, this is not my field. You also need to consider how this is going to affect (for example) images.
There are additional controls based on object type in Ghostscript, so you could optionally only apply this conversion to text, or linework.
Ideally you should go back to the original document, alter it there, and make a new PDF.
You can use convert from imagemagick for that purpose, using the flag -level. For example to convert an image to pure black/white splitting the gray scale in half:
convert -level 50%,50% file.pdf blackwhite.pdf
and then pass it to GS. My suggestion for fax though is to leave part of the grayscale there since many documents rely on it for its legibility:
convert -density 816x784 -level 55%,95% file.pdf blackwhite.pdf
This one leaves a 40% of the grayscale there, converts to black every dot in the range 0-55 (being 0 black and 100 white) and puts everything in the range 95-100 to white, you can play with that and get your best match. The density flag will help it later when converting to tiffg3 in GS giving a better quality in the resulting tif.
Related
I have a ps image that I want to convert to a gif image with horizontal by vertical dimensions of 900 and 800 respectively. I have tried to use the command:
convert panel.gs -resize x800 y900 panel.gif
or also:
convert panel.gs -resize 900x800 panel.gif
Can you help me to tweak the convert commands so I can get the desired results?
.gs is not a valid suffix. Did you mean .ps?
Imagemagick will need ghostscript as a delegate. You did not say what was wrong nor what platform or what version of Imagemagick.
If the image does not have the same aspect ratio as the final dimensions you want, you will either 1) need to distort it to fit using !, 2) resize it and then extend to the size you want filling with background color, or 3) resize it with ^ and crop it to the size you want.
convert panel.ps -resize "900x800!" panel.gif
convert panel.ps -resize 900x800 -gravity center -background white -extent 900x800 panel.gif
convert panel.ps -resize "900x800^" -gravity center -extent 900x800 panel.gif
Well, firstly you haven't actually said what's wrong with the two commands that you have tried already.....
Your PostScript program probably does not contain an 'image' as such, PostScript is not a bitmap format its a programming language.
You can use Ghostscript to render the PostScript to an image, and then use ImageMagick to resize that image, possibly you can combine these two steps, or just perform a single conversion, it depends on what exactly you want to happen, which isn't clear.
If (for example) your PostScript program requests a media size of 9 inches by 8 then you can create a bitmap image by simply setting the resolution to 100 dpi using -r100.
If you want the image scaled differently in each direction, then you need to set a non-square resolution. For example if the PostScript program requests media of 9 inches by 4 then you need to set the resolution to 100x200 in order to get an image exactly 900 x 800 pixels. You would use -r100x200 for this.
The alternative, from a PostScript point of view, is to set the media size to a given
value in pixels (using -g900x800) and set -dDFIXEDMEDIA which prevents the PostScript program from changing it. You can then use -dFitPage which will have Ghostscript scale the content to fit the page. However it will scale the content equally in both directions, which may leave white space around the edge.
Now since Ghostscritp doesn't write GIF directly you'll need to load whatever bitmap format you select into IM in order to write it out as a GIF, so perhaps the simplest solution is just to use Ghostscript to render the PostScript to a defined resolution (eg 100 dpi) and then load that image into IM and rescale it there.
Since IM (and therefore convert) use Ghostscript to process PostScript programs, that's what's happening now so it isn't obvious to me what your problem is.
I'm new to GhostScript. Can you let me know the Ghostscript command for finding the number of colors used for each page in pdf file. I need to parse the results of this command from java program
There is no such Ghostscript command or device. It would also be difficult to figure out; so much depends on what you mean. Do you intend to count the colour of each pixel in every image for example ? Which colour spaces are you interested in ? What about ICCBased colour spaces, do you want the component values, or the CIE values ?
[edit]
Yeah there's no Ghostscript equivalent, I did say that.
You wuold have to intercept every call to the colour operators, examine the components being supplied and see if they were no black and white. For example, if you set a CMYK colour with C=M=Y=0 and K!=0 then its still black and white. Similar arguments apply for RGB, CIE and ICC colour spaces.
Now I bet ImageMagick doesn't do that, I suspect it simply uses Ghostscript to render a bitmap (probably RGB) and then counts the number of pixels of each colour in the output. Image manipulation tools pretty much all have to have a way to do that counting already, so its a low cost for them.
Its also wrong.
It doesn't tell you anything about the original colour. If you render a colour object to a colour space that is different to the one it was specified in, then the rendering engine has to convert it from the colour space it was in, to the expected one. This often leads to colour shifts, especially when converting from RGB to CMYK but any conversion will potentially have this problem.
So if this is what ImageMagick is doing, its inaccurate at best. It is possible to write PostScript to do this accurately, with some effort, but exactly what counts as 'colour' and 'black and white' is still a problem. You haven't said why you want to know if an input file is 'black and white' (you also haven't said if gray counts as black and white, its not the same thing)
I'm guessing you intend to either charge more for colour printing, or need to divert colour input to a different printer. In which case you do need to know if the PDF uses (eg) R=G=B=1 for black, because that often will not result in C=M=Y=0 K=1 when rendered to the printer. Not only that, but the exact colour produced may not even be the same from one printer to another (colour conversion is device-dependent), so just because Ghostscript produced pure black doesn't mean that another printer would.
This is not a simple subject.
I'm trying to batch convert PDF's to PNG's. Previously, this was always done manually through GIMP by importing a PDF, then converting it to PNG.
With the script that I wrote, this should all be done automatically. But for some reason, the image quality I get from using
convert \
-density 300 \
-adaptive-resize 2048 \
-define png:compression-level=9 \
"File1"
"File2"
Doesn't have the same "quality" compared to doing it via GIMP. See the image below for the difference in image quality.
In GIMP, I don't change much to the image. When I import the PDF, I change the resolution to 2048 pixels. When I convert and export it to PNG, I use all the default values GIMP offers, nothing fancy.
Changing the density to a higher or lower value doesn't do anything to the image. Also changing adaptive-resizing to normal resizing doesn't do much.
In the example image, both pictures are 2048 pixels wide. As you can see the lower image has a lot thicker/blurrier lines.
Example image comparison:
So, I have found a way around my problem.
Increasing the PPI kind of helped but still not as much as I would have liked it to.
Eventually I added this:
-channel A -fx "p*(p>0.2?22:0)"
Just some simple piece of code I found somewhere around here. It checks for the Alpha levels in the picture and if it's below a certain threshold it will just remove or "make the pixel" transparent. If it's over the threshold it will just boost the pixel to maximum visibility. Combined with the high PPI I dont get any "half pixels" anymore.
I have a large number of PSD files which contain semi-transparent layers. These layers are not getting flattened correctly regardless of what flags I use via convert or mogrify
The simplest form looks as follows:
convert -background transparent source.psd -flatten output.png
Here is what the source image looks like in Photoshop. Note that this is a drop shadow layer and not a layer effect:
Here is how it comes out:
This may not be obvious from the photoshop background, so here it is in laid over a grey background:
Source:
Output:
EDIT:
I dug a bit into what is happening in the numbers. For the initial source image, the shadow is completely black and the alpha fades in. For the output image, the alpha is not as high, but it compensates by inaccurately lightening the image in a somewhat bumpy fashion. Its almost as if its pre-multiplied, but its taking the background as white?
Here is a strait RGB render without alpha multiplied in:
Source:
Output:
In other words, the RBG values are not at all being preserved. Alpha is being dimmed, but not distorted as theses values are. My guess would be some sort of rounding error based on trying to extrapolate the color from the alpha as though it is trying to "unpre-multiply" the values. Any help is appreciated.
Short answer is it is fixed in V7 of the software (I think). I run mac and the installer for V7 doesn't work well at all and it appears unstable. After running it on an Ubuntu VM, it works good. I have also confirmed with another user that V6 has this problem and V7 does not on Windows
I am trying to remove background color so as to improve the accuracy of OCR against images. A sample would look like below:
I'd keep all letters in the post-processed image while just removing the light purple color textured background. Is it possible to use some open source software such as Imagemagick to convert it to a binary image (black/white) to achieve this goal? What if the background has more than one color? Would the solution be the same?
Further, what if I also want to remove the purple letters (theater name) and the line so as to only keep the black color letters? Simple cropping might not work because the purple letters could appear at other places as well.
I am looking for a solution in programming, rather than via tools like Photoshop.
You can do this using GIMP (or any other image editing tool).
Open your image
Convert to grayscale
Duplicate the layer
Apply Gaussian blur using a large kernel (10x10) to the top layer
Calculate the image difference between the top and bottom layer
Threshold the image to yield a binary image
Blurred image:
Difference image:
Binary:
If you're doing it as a once-off, GIMP is probably good enough. If you expect to do this many times over, you could probably write an imagemagick script or code up your approach using something like Python and OpenCV.
Some problems with the above approach:
The purple text (CENTURY) gets lost because it isn't as contrasting as the other text. You could work your way around it by thresholding different parts of the image differently, or by using local histogram manipulation methods
The following shows a possible strategy for processing your image, and OCR it
The last step is doing an OCR. My OCR routine is VERY basic, so I'm sure you may get better results.
The code is Mathematica code.
Not bad at all!
In Imagemagick, you can use the -lat function to do that.
convert image.jpg -colorspace gray -negate -lat 50x50+5% -negate result.jpg
convert image.jpg -colorspace HSB -channel 2 -separate +channel \
-white-threshold 35% \
-negate -lat 50x50+5% -negate \
-morphology erode octagon:1 result2.jpg
You can apply blur to the image, so you get almost clear background. Then divide each color component of each pixel of original image by the corresponding component of pixel on the background. And you will get text on white background. Additional postprocessing can help further.
This method works in the case if text is darker then the background (in each color component). Otherwise you can invert colors and apply this method.
If your image is captured as RGB, just use the green image or quickly convert the bayer pattern which is probably #misha's convert to greyscale solutions probably do.
Hope this helps someone
Using one line code you can get is using OpenCV and python
#Load image as Grayscale
im = cv2.imread('....../Downloads/Gd3oN.jpg',0)
#Use Adaptivethreshold with Gaussian
th = cv2.adaptiveThreshold(im,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,11,2)
Here's the result
Here's the link for Image Thresholding in OpenCV