I am working on a project in which I need to highlight the difference between pair of scanned images of text.
Example images are here and here.
I am building a webapp based on HTML,JS for this.
I found that openCV does support highlighting differences between 2 images.
Also I saw that imageMagick also has such support.
Does openCV has support for doing automatic registration of images?
And is there a JS module for openCV?
Which one is more suited for my purpose?
1. Simplistic way:
Suppose the images are perfectly aligned and similarly illuminated: subtract one image from another pixel by pixel, then threshold the result, filter out noisy blobs, and select the biggest ones. Good for a school project
2. A bit more complicated:
Align the images, then find a way to uniform the illumination, then apply the simplistic way.
How to align:
Find the text area in two images, as being a darker than the file color.
Find its corners
Use getPerspectiveTransform() to find the transform between images.
warpPerspective() one image to another.
Another way to register the two images is by feature matching. It has quite an extensive support in OpenCV. And findHomography() will estimate the pose between two images from a bigger set of matching points.
3. Canonical answer:
Align the image.
Convert it to text with an OCR engine.
Compare the text in the two images.
Well, besides the great help given by vasile, you also need the web app answer.
In order to make it work in a server, you will probably need a file upload form, as well as an answer from the server with the applied algorithm. There are several ways you can do it depending on the server restrictions you have. If you can run command line arguments, you would probably need to implement the highlight algorithm in opencv and pass the two input files a an output one for the program. A php script should be used for uploading the files, calling the command line program, and outputting the result to the user.
Another approach could be using java and JavaCV in a web container like Apache Tomcat, for instance.
Best regards,
Daniel
Related
I am learning image processing and i am trying to start my first project, that is Simple number recognition in an image.
So far i have applied thresholding to the image. Now i would like to know some algorithms by which my system can recognize the number in the image. Preferably the algorithm must be simple and it doesn't have to robust as i am would be generating the image in paint using the same font.
I have looked at the similar questions here on SO and they all point out to using libraries. Remember guys i am trying to learn so please don't point out some libraries.
Are the numbers printed or hand-written?
The Computer Vision System Toolbox includes a function called ocr, which will recognize both, letters and numbers.
If you are looking for hand-written digit recognition, please take a look at this example.
I really want to learn how an image is composed (i.e. array of bits, or however, how is the color composed for each pixel, etc). Can you point me in the right direction? I'm not really sure what to search for.
Thanks a lot in advance.
So what I want to do is to be able to modify the picture pragmatically, i.e. change to black and white, scale it, crop it, etc, and for this I would really like to learn how the image is composed instead of just finding these algorithms online.
You don't always need to know low level mathematical details(matrixes,quantisation,fourier transform etc.) of graphic formats to manipulate images.
For all the things you want to do you may use proper libraries.
For example in PHP libraries used freuqently to manipulate images are:
GD - http://php.net/manual/en/book.image.php
ImageMagick - http://php.net/manual/en/book.imagick.php
It depends on the image format that you're interested in manipulating. Each format (more or less) is composed in a different manner, and based on that has a different set of capabilities for manipulating the image.
Different sets of actions on an image favor different image formats, as does the type of image you want to manipulate.
Provide more details about what you want to do with the image and I'm sure someone else will come along and tell you which formats are best and how they are handled.
With my new assignment I am looking for a method to detect the presence of text on image. The image is a map - can be for example google map. The task is to detect where the street/city label is placed.
I know that opencv library has algorithm that can detect features (for example human faces) - haar classifier or hog (histogram of oriented gradients), but I heard that learning process of such algorithms is quite difficult.
Do you know of any algorithm, method or a library that could do that (detect presence of text on image)?
Thanks,
John
There is a standard problem in vision called text detection in images. it is quite different to OCR. OCR concerms itself with what it says, while text detection is about determining if there is text in the image. Adi Shavit's third link is a method to address this problem. You can look on google scholar well cited articles on text detection.
There are several possible approaches you can take.
Use OCR. A search for OCR on Stackoverflow will show many options. These include Tesseract and Ocropus.
If your text uses very specific fixed font, you may get away with simple template matching.
In the more general case you might want to take a look at "Detecting Text in Natural Scenes with Stroke Width Transform"
UPDATE Jan. 2017
The OpenCV 3.2 contrib module now has a text detection module.
It also includes a sample (C++, Python) of how to use it.
You need to tune this to a specific type of map images, or the problem is going to be very difficult (see the previous post about links to articles).
OCR is the way to go, and you should use an existing library. However, OCR is mainly done on text on white backgrounds. To reduce your problem to a regular OCR problem, you should attempt to work on the color space of the map. Likely the map text has a very specific color and this may be enough to find these pixels. You can then filter the detected pixels based on the size of connected regions.
If you literally only want to find the locations of text labels, you can do the above, and pretty much just skip the OCR step. If the labels are not too close, simple clustering algorithms can be used to find their respective positions.
I need to compare 2 same-size, nearly identical images for exact differences in the RGBs of every pixel.
I would like to find a tool that already does it... seems nowhere to be found on google, strangely.
If I could even find a tool to print out the RGB values of every pixel I could compute it by hand (the images are small enough) or load that input for my tool. Again, couldn't find anything.
Otherwise I look for a simple C library to decode GIFs and access each pixel... recommendations? I see quite a few on google, most look old and have no documentation.
I hope someone with more exposure to image processing can help me solve this this somewhat trivial task in one way or another without spending too many hours!!
If you have ImageMagick installed, it already does it.
What about SDL + SDL_Image (main site)?
You can easily open GIFs and load them on SDL_Surfaces to retrieve the pixel information you need..
If you're not opposed to Python, one option would be to use the Python Imaging Library (PIL), which provides Python bindings for native decoders for many file formats, including PNG and GIF.
This past summer, I wrote a few small apps to do RGB-wise comparisons of PNG images, in C++, pure Python, and Python using PIL. It would be trivial to make the PIL code work with GIF images.
If you want to roll your own, the "standard" C library for simple image manipulation is GD.
Beyond Compare will do image comparisons and highlights differences.
http://www.scootersoftware.com/
Sometimes two image files may be different on a file level, but a human would consider them perceptively identical. Given that, now suppose you have a huge database of images, and you wish to know if a human would think some image X is present in the database or not. If all images had a perceptive hash / fingerprint, then one could hash image X and it would be a simple matter to see if it is in the database or not.
I know there is research around this issue, and some algorithms exist, but is there any tool, like a UNIX command line tool or a library I could use to compute such a hash without implementing some algorithm from scratch?
edit: relevant code from findimagedupes, using ImageMagick
try $image->Sample("160x160!");
try $image->Modulate(saturation=>-100);
try $image->Blur(radius=>3,sigma=>99);
try $image->Normalize();
try $image->Equalize();
try $image->Sample("16x16");
try $image->Threshold();
try $image->Set(magick=>'mono');
($blob) = $image->ImageToBlob();
edit: Warning! ImageMagick $image object seems to contain information about the creation time of an image file that was read in. This means that the blob you get will be different even for the same image, if it was retrieved at a different time. To make sure the fingerprint stays the same, use $image->getImageSignature() as the last step.
findimagedupes is pretty good. You can run "findimagedupes -v fingerprint images" to let it print "perceptive hash", for example.
Cross-correlation or phase correlation will tell you if the images are the same, even with noise, degradation, and horizontal or vertical offsets. Using the FFT-based methods will make it much faster than the algorithm described in the question.
The usual algorithm doesn't work for images that are not the same scale or rotation, though. You could pre-rotate or pre-scale them, but that's really processor intensive. Apparently you can also do the correlation in a log-polar space and it will be invariant to rotation, translation, and scale, but I don't know the details well enough to explain that.
MATLAB example: Registering an Image Using Normalized Cross-Correlation
Wikipedia calls this "phase correlation" and also describes making it scale- and rotation-invariant:
The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation.
Colour histogram is good for the same image that has been resized, resampled etc.
If you want to match different people's photos of the same landmark it's trickier - look at haar classifiers. Opencv is a great free library for image processing.
I don't know the algorithm behind it, but Microsoft Live Image Search just added this capability. Picasa also has the ability to identify faces in images, and groups faces that look similar. Most of the time, it's the same person.
Some machine learning technology like a support vector machine, neural network, naive Bayes classifier or Bayesian network would be best at this type of problem. I've written one each of the first three to classify handwritten digits, which is essentially image pattern recognition.
resize the image to a 1x1 pixle... if they are exact, there is a small probability they are the same picture...
now resize it to a 2x2 pixle image, if all 4 pixles are exact, there is a larger probability they are exact...
then 3x3, if all 9 pixles are exact... good chance etc.
then 4x4, if all 16 pixles are exact,... better chance.
etc...
doing it this way, you can make efficiency improvments... if the 1x1 pixel grid is off by a lot, why bother checking 2x2 grid? etc.
If you have lots of images, a color histogram could be used to get rough closeness of images before doing a full image comparison of each image against each other one (i.e. O(n^2)).
There is DPEG, "The" Duplicate Media Manager, but its code is not open. It's a very old tool - I remember using it in 2003.
You could use diff to see if they are REALLY different.. I guess it will remove lots of useless comparison. Then, for the algorithm, I would use a probabilistic approach.. what are the chances that they look the same.. I'd based that on the amount of rgb in each pixel. You could also find some other metrics such as luminosity and stuff like that.