How to determine if an image needs to be rotated - algorithm

I am trying to find a way to determine whether an image needs to be rotated in order for the text to be horizontally aligned. And if it does need to be rotated then by how many degrees?
I am sending the images to tesseract and for tesseract to be effective, the text in the images needs to be horizontally aligned.
I'm looking for a way do this without depending on the "Orientation" metadata in the image.
I've thought of following ways to do this:
Rotate the image 90 degrees clockwise four times and send all four images to tesseract. This isn't ideal because of the need to process one image 4 times.
Use hough line transform to see if the lines are vertical or horizontal. If they are vertical then rotate the image. This way the image still might need to be rotated 180 degrees. So I'm unsure how effective this would be.
I'm wondering if there are other ways to accomplish this using OpenCV, imageMagik or any other image processing techniques.

If you have a 1000 images which say horizontal or vertical, you can resize these images to 224x224 and then fine-tune a Convolutional neural network, like AlexNet or VGG for this task. If you want to know how many right rotations to make for the image, you can set the labels as the number of clock-wise rotations, like 0,1,2,3.
http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html

Aytempting ocr on all 4 orientations seems like a reasonable choice, and I doubt you will find a more reliable heuristic.
If speed is an issue, you could OCR a small part of the image first. Select a rectangular region, that has the proper amount of edge pixels and white/black ratio for text, then send that to tesseract in different orientations. With a small region, you could even try smaller steps than 90°, or combine it with another heuristic like Hough.
If you remember the most likely orientation based on previous images, and stop once an orientation is successfully processed by tesseract, you probably do not even have to try most orientations in most cases.

You can figure this out in a terminal with tesseract's psm option.
tesseract --psm 0 "infile" "outfile" will create outfile.osd which contains the info:
Page number: 0
Orientation in degrees: 90
Rotate: 270
Orientation confidence: 27.93
Script: Latin
Script confidence: 6.55
man tesseract
...
--psm N
Set Tesseract to only run a subset of layout analysis and assume a certain form of image. The options for N are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR. (not implemented)
...

Related

What is the difference between cropping, resizing and scaling an image?

I am using Perl's
Image::Imlib2
package to generate thumbnails from larger images.
I've done such tasks before with several ImageMagick interfaces (PHP, Ruby, Python) and it was relatively easy. I have no prior experience with Imlib2 and it is a long time since I wrote something in Perl, so I am sorry if this seems naive!
This is what I've tried so far. It is simple, and assumes that scaling an image will keep the aspect ratio, and the generated thumbnail will be an exact miniature copy of the original image.
use strict;
use warnings;
use Image::Imlib2;
my $dir = 'imgs/*';
my #files = glob ($dir);
foreach my $img ( #files ) {
my $image = Image::Imlib2->load($img);
my $cropped_image = $image->create_scaled_image(50, 50);
$cropped_image->save($img);
}
Original image
Generated image
My first look at the image tells me that something is wrong. It may be my ignorance on cropping, resizing and scaling, but the generated image is displaying wrongly on small screens.
I've read What's the difference between cropping and resizing?, and honestly didn't understand anything. Also this one Image scaling.
Could someone explain the differences between those three ideas, and if possible give examples (preferably with Perl) to achieve better results? Or at least describe what I should consider when I want to create thumbnails?
The code you use isn't preserving the aspect-ratio. From Image::Imlib2::create_scaled_image
If x or y are 0, then retain the aspect ratio given in the other.
So change the line
my $cropped_image = $image->create_scaled_image(50, 50);
to
my $scaled_image = $image->create_scaled_image(50, 0);
and the new image will be 50 pixels wide, and its height computed so to keep the original aspect-ratio.
Since this is not cropping I've changed the variable name as well.
As for other questions, below is a basic discussion from comments. Please search for tutorials on image processing. Also, documentation of major libraries often have short and good explanations.
This is aggregated from comments deemed helpful. Also see Borodin's short and clear answer.
Imagine that you want to draw a picture (of some nice photograph) yourself in the following way. You draw a grid of, say, 120 (horizontally) by 60 (vertically) boxes. So 120 x 60, 720 boxes. These are your "pixels," and each you may fill with only one color. If the photo you are re-drawing is "mostly" blue at some spot, you color that pixel blue. Etc. It is not easy to end up with a faithful redrawing -- the denser the pixels the beter.
Now imagine that you want to draw another copy of this, just smaller. If you make it 20x20 that will be completely different, since it's a square. The best chance of getting it to "look the same" is to pick 2-to-1 ratio (like 120x60), so say 40x20. That's "aspect-ratio." But there is still a problem, since now you have to decide all over again what color to pick for each box, so to represent what is "mostly" on the photo at that spot. There are algorithms for that ("sampling," see your second link). That's involved with "resizing." The "quality" of the obtained drawing clearly must be much worse.
So "resizing" isn't all that simple. But, for us users, we mostly need to roughly know what is involved, and to find out how to use these features in a library. So read documentation. Some uses are very simple, while sometimes you'll have to decide which "algorithm" to let it use, or some such. Again, what I do is read manuals carefully.
The basic version of "cropping" is simple -- you just cut off a part of the picture. Say, remove the first and last 20 columns and the bottom and top 10 rows, and from the initial 120x60 you get a picture of 80x40. This is normally done when outer parts of an image have just white areas (or, worse, black!). So you want to "cut out" the picture itself from the whole image. Many graphics tools can do that on their own, by analyzing the image and figuring out those areas. Or, we select and hit a button.
I'm still not certain that you understand the difference between these terms
Your original image is 752 × 500 pixels
Resizing is a vague term that just means making a picture a different size somehow
Scaling is to change the size of an image proportionally. Scaling your picture down by a factor of ten would result in an image 75 × 50 (it should be 75.2 but we can't have 0.2 of a pixel). Scaling it up would make it bigger
You have scaled your picture to 50 × 50 pixels, which is a vertical scale of 10 (500 ÷ 5) but a horizontal scale of 15 (752 ÷ 50), so it appears squashed horizontally (or stretched vertically)
Cropping is to reduce an image by removing parts of it. To crop your image to 50 × 50 you would choose a 50 × 50 rectangle out of the whole picture and remove the rest. It would be a piece about the size of your monkey's nose, but you can pick any region you wish
zdim has shown you how you can call
$image->create_scaled_image(0, 50)
so that the height, or y-dimension, is reduced to 50, while the width, or x-dimension, is scaled by the same factor. That will result in a thumbnail 75 × 50 as above
I hope that helps
As I said in my comment, there is an
Image::Magick
Perl module if you would prefer to be back on familiar ground
Resizing and scaling is the same; you just change the size of the image. You can make it smaller or bigger.
Depending on the interface, you have to give either the new dimensions or a scaling factor for the operation. A factor less than or greater than 1.0 would make the image smaller or bigger. Smaller images are created by subsampling and bigger images by interpolation.
Cropping is very simple. You select a rectangular region of an image and that's your new image. It's like using scissors.
In your code example the image is named cropped_image although it is created through scaling, or resizing.
The output image is an image of size 50 x 50 pixels. That's what you did here:
my $cropped_image = $image->create_scaled_image(50, 50);
So no matter how your image looks before, you stuff it into 50 x 50 pixels. In this case not only reducing the resolution but also changing the aspect ratio.
The image is not displayed improperly, it's displayed perfectly fine.

Image resizing method during preprocessing for neural network

I am new to machine learning. I am trying to create an input matrix (X) from a set of images (Stanford dog set of 120 breeds) to train a convolutional neural network. I aim to resize images and turn each image into one row by making each pixel a separate column.
If I directly resize images to a fixed size, the images lose their originality due to squishing or stretching, which is not good (first solution).
I can resize by fixing either width or height and then crop it (all resultant images will be of the same size as 100x100), but critical parts of the image can be cropped (second solution).
I am thinking of another way of doing it, but I am sure. Assume I want 10000 columns per image. Instead of resizing images to 100x100, I will resize the image so that the total pixel count will be around 10000 pixels. So, images of size 50x200, 100x100 and 250x40 will all converted into 10000 columns. For other sizes like 52x198, the first 10000 pixels out of 10296 will be considered (third solution).
The third solution I mentioned above seems to preserve the original shape of the image. However, it may be losing all of this originality while converting into a row since not all images are of the same size. I wonder about your comments on this issue. It will also be great if you can direct me to sources I can learn about the topic.
Solution 1 (simply resizing the input image) is a common approach. Unless you have a very different aspect ratio from the expected input shape (or your target classes have tight geometric constraints), you can usually still get good performance.
As you mentioned, Solution 2 (cropping your image) has the drawback of potentially excluding a critical part of your image. You can get around that by running the classification on multiple subwindows of the original image (i.e., classify multiple 100 x 100 sub-images by stepping over the input image horizontally and/or vertically at an appropriate stride). Then, you need to decide how to combine your multiple classification results.
Solution 3 will not work because the convolutional network needs to know the image dimensions (otherwise, it wouldn't know which pixels are horizontally and vertically adjacent). So you need to pass an image with explicit dimensions (e.g., 100 x 100) unless the network expects an array that was flattened from assumed dimensions. But if you simply pass an array of 10000 pixel values and the network doesn't know (or can't assume) whether the image was 100 x 100, 50 x 200, or 250 x 40, then the network can't apply the convolutional filters properly.
Solution 1 is clearly the easiest to implement but you need to balance the likely effect of changing the image aspect ratios with the level of effort required for running and combining multiple classifications for each image.

How to decrease background noise in binary image

Here is an example of binary images, i.e. as input we have an imageByteArray with 2 possible values: 0 and 255.
Example1:
Example2:
The image contains some document edge on a background.
The task is to remove, decrease amount of background pixels with minimal impact on edge pixels.
The question is what modern algorithms, techniques exist to do this?
What I do not expect as an answer: use Gaussian blur to get rid of background noise, use bitonal algorithm (Canny, Sobel, etc.) thresholds or use Hough (Hough linearization goes crazy on such noise no matter what options are set)
The simplest solution is to detect all contours and filter out ones with the lowest length. This works good, but sometimes depending on an image it will also erase useful edge pixels pretty much.
Update:
As input I have standard RGB image with a document (driver license ID, check, bill, credit card, ...) on some background. The main task is to detect document edges. Next steps are pretty known: greyscale, blur, Sobel binarization, Hough probabilistic, find rectangle or trapezium (if trapezium shape found then go to perspective transformation). On simple contrast backgrounds it all works fine. The reason why I am asking about noise reduction is that I have to work with thousands of backgrounds and some of them give noise no matter what options used. The noise will cause additional lines no matter how Hough is configured and additional lines may fool subsequent logic and seriously affect performance. (It is implemented in java script, no OpenCV or GPU support).
It's hard to know whether this approach will work with all your images since you only provided one, but a Hough Line detection with ImageMagick and these parameters in the Terminal command-line produces this:
convert card.jpg \
\( +clone -background none -fill red -stroke red \
-strokewidth 2 -hough-lines 49x49+100 -write lines.mvg \
\) -composite hough.png
and the file lines.mvg contains 4 lines as follows:
# Hough line transform: 49x49+100
viewbox 0 0 1024 765
line 168.14,0 141.425,765 # 215
line 0,155.493 1024,191.252 # 226
line 0,653.606 1024,671.48 # 266
line 940.741,0 927.388,765 # 158
ImageMagick is installed on most Linux distros and is available for OSX and Windows from here.
I assume you did mean binary image instead of bitonic...
Do flood fill based segmentation
scan image for set pixels (color=255)
for each set pixel create a mask/map of its area
Just flood fill set pixels with 4 or 8 neighbor connection and count how many pixels you filled.
for each filled area compute its bounding box
detect edge lines
edge lines have rectangular bounding box so test its aspect ratio if close to square then this is not edge line
also too small bounding box means not an edge line
too small filled pixels count in comparison to bounding box bigger side size then area is also not an edge line
You can make this more robust if you regress line for set pixels of each area and compute the average distance between regressed line and each set pixel. If too high area is not edge line ...
recolor not edge lines areas to black
so either substract the mask from image or flood fill with black again ...
[notes]
Sometimes step #5 can mess the inside of document. In that case you do not recolor anything instead you remember all the regressed lines for edge areas. Then after whole process is done join together all lines that are parallel and close to same axis (infinite line) that should reduce to 4 big lines determining document rectangle. So now fill with black all outside pixels (by geometric approach)
For such tasks you would usually carefully examine input data and try to figure out what cues can you utilize. But unfortunately you have provided only one example, which makes this approach pretty useless. Besides, this representation is not really comfortable to work with - have you done some preprocessing, or this is what you get as input? In first case, you may get better advice if you can show us real input.
Next, if your goal is noise reduction and not document/background segmentation - you are really limited in options. Similar to what you said, I would try to detect connected components with 255 intensity (instead of detecting contours, which can be less robust) and remove ones with small area. That may fail on certain instances.
Besides, on image you have provided you can use local statistics to suppress areas of regular noise. This will reduce background clutter if you select neighborhood size appropriately.
But again, if you are doing this for document detection - there may be more robust approaches.
For example, if you know the foreground object (driver's ID) - you can try to collect a dataset of ID images, and calculate the 'typical' color histogram - it may be rather characteristic. After that, you can backproject this histogram on input image and get either rough region of interest, or maybe even precise mask. Then you may binarize it and try to detect contours. You may try different color spaces and bin sizes to see which fits best.
If you have to work in different lighting conditions you can try to equalize histogram or do some other preprocessing to reduce color variation caused by lighting.
Strictly answering the question for the binary image (i.e. after the harm as been made):
What seems characteristic of the edge pixels as opposed to noise is that they form (relatively) long and smooth chains.
So far I see no better way than tracing all chains of 8-connected pixels, for instance with a contour following algorithm, and detect the straight sections, for example by Douglas-Peucker simplification.
As the noise is only on the outside of the card, the outline of the blobs will have at least one "clean" section. Keep the sections that are long enough.
This may destroy the curved corners as well and actually you should look for the "smooth" paths that are long enough.
Unfortunately, I cannot advise of any specific algorithm to address that. It should probably be based on graph analysis combined to geometry (enumerating long paths in a graph and checking the local/global curvature).
As far as I know (after reading thousands related articles), this is nowhere addressed in the literature.
None of the previous answers would really work, the only thing that can work here is a blob filter, filter it so that blobs below a certain size get deleted.

Pulling non-transparent areas to the center of the transparent areas in an image

I am making an image processing project which has a few steps and stuck in one of them. Here is the thing; I have segmented an image and subtract the foreground from background. Now, I need to fill the background.
So far, I have tried the inpainting algorithms. They don't work in my case because my background images haven't at least 40% of them. I mean they fail when they are trying the complete 40% of an image. (By the way, these images have given bad results even in the Photoshop with content-aware tool.)
Anyway, I've given up trying inpainting and decided something else. In my project, I don't need to complete 100% of my background. I want to illustrate my solution;
As you see in the image above, I want to pull the image to the black area (which is transparent) with minimum corruption. Any MATLAB code samples, technique, keyword and approach would be great. If you need further explanation, feel free to ask.
I can think of two crude ways to fill the hole:
use roifill: this fills gaps in 2d image preserving image smoothness.
Alteratively, you can use bwdist to compute the nearest neighbor of each black pixel and assign it to its nearest neighbor's color:
[~, nnIdx] = bwdist( bw );
fillImg(bw) = IMG(bw);
although this code snippet works only for gray images, it is quite trivial to extend it to RGB color images.

How do I locate black rectangles in a grid and extract the binary code from that

i'm working in a project to recognize a bit code from an image like this, where black rectangle represents 0 bit, and white (white space, not visible) 1 bit.
Somebody have any idea to process the image in order to extract this informations? My project is written in java, but any solution is accepted.
thanks all for support.
I'm not an expert in image processing, I try to apply Edge Detection using Canny Edge Detector Implementation, free java implementation find here. I used this complete image [http://img257.imageshack.us/img257/5323/colorimg.png], reduce it (scale factor = 0.4) to have fast processing and this is the result [http://img222.imageshack.us/img222/8255/colorimgout.png]. Now, how i can decode white rectangle with 0 bit value, and no rectangle with 1?
The image have 10 line X 16 columns. I don't use python, but i can try to convert it to Java.
Many thanks to support.
This is recognising good old OMR (optical mark recognition).
The solution varies depending on the quality and consistency of the data you get, so noise is important.
Using an image processing library will clearly help.
Simple case: No skew in the image and no stretch or shrinkage
Create a horizontal and vertical profile of the image. i.e. sum up values in all columns and all rows and store in arrays. for an image of MxN (width x height) you will have M cells in horizontal profile and N cells in vertical profile.
Use a thresholding to find out which cells are white (empty) and which are black. This assumes you will get at least a couple of entries in each row or column. So black cells will define a location of interest (where you will expect the marks).
Based on this, you can define in lozenges in the form and you get coordinates of lozenges (rectangles where you have marks) and then you just add up pixel values in each lozenge and based on the number, you can define if it has mark or not.
Case 2: Skew (slant in the image)
Use fourier (FFT) to find the slant value and then transform it.
Case 3: Stretch or shrink
Pretty much the same as 1 but noise is higher and reliability less.
Aliostad has made some good comments.
This is OMR and you will find it much easier to get good consistent results with a good image processing library. www.leptonica.com is a free open source 'C' library that would be a very good place to start. It could process the skew and thresholding tasks for you. Thresholding to B/W would be a good start.
Another option would be IEvolution - http://www.hi-components.com/nievolution.asp for .NET.
To be successful you will need some type of reference / registration marks to allow for skew and stretch especially if you are using document scanning or capturing from a camera image.
I am not familiar with Java, but in Python, you can use the imaging library to open the image. Then load the height and the widths, and segment the image into a grid accordingly, by Height/Rows and Width/Cols. Then, just look for black pixels in those regions, or whatever color PIL registers that black to be. This obviously relies on the grid like nature of the data.
Edit:
Doing Edge Detection may also be Fruitful. First apply an edge detection method like something from wikipedia. I have used the one found at archive.alwaysmovefast.com/basic-edge-detection-in-python.html. Then convert any grayscale value less than 180 (if you want the boxes darker just increase this value) into black and otherwise make it completely white. Then create bounding boxes, lines where the pixels are all white. If data isn't terribly skewed, then this should work pretty well, otherwise you may need to do more work. See here for the results: http://imm.io/2BLd
Edit2:
Denis, how large is your dataset and how large are the images? If you have thousands of these images, then it is not feasible to manually remove the borders (the red background and yellow bars). I think this is important to know before proceeding. Also, I think the prewitt edge detection may prove more useful in this case, since there appears to be less noise:
The previous method of segmenting may be applied, if you do preprocess to bin in the following manner, in which case you need only count the number of black or white pixels and threshold after some training samples.

Resources