Determining which are the text and graphic regions in an image - algorithm

I dont know whether should I post this question here or not? But if someone knows it, please answer?
What are the algorithms for determining which region in an image is text and which one is graphic? Means how to separate such regions? (figure or diagram)

Most OCR software, e.g., Ocropus, support layout analysis, which is what you need.
Mao, Rosenfeld & Kanungo (2003) Document structure analysis algorithms: a literature survey provides a fairly recent survey of layout analysis algorithms.

first step would probably be to isolate the sharper contrast between text and image. This can be done by taking the derivative of the image. This will show the change in color and the high values would most likely then be compared to textual shapes


Logo recognition with a huge dataset

First of all, thanks for reading my question. I'm beginner in computer vision.
I read a lot but I didn't find any solution.
I have an image and I want to detect logo/logos on it.
Also, I have a whole of images with different logos, all image containing a logo on it and nothing more.
Can you help me with any idea of how to detect logo/logos on an image when I have a whole (thousands) of training sets (known logos set)?
It can be done by using the SURF or SIFT feature detection algorithm for few known logos, by matching the given image with all of the others but I have a huge dataset, and I can't match with all other images.
To try all images in the dataset takes toooooo much time :)
Can be useful any SDK? (it can be even for mobile phones or for desktop also).
Or can I use some multiple algorithms for it?
I found an interesting paper about this question with a SIGMA algorithm, but I can't find any description for these algorithms (
I think to detect the features on the images is OK (SIFT, maybe SURF).
But I think the problem is with the big number of known images/logos.
I think it should be stored in a special way.
Ex. made a tree somehow from the thousand of known logos, or to separate them in groups.
Is it possible to do this task?
I appreciate any help.
The thousands of training sets is useful only to test your algorithm, it will not help to analyze a new image.
I made a bit of pattern recognition in the past, I would start this way: look for sharp edges (sharp color transitions too). So an edge filter and statistical analysis about features all located in the same corner. The result of the algorithm will be a number that you will use with your training set.
Since you are doing original reserch be prepared for a long work. If a SDK with a function "ImageHasLogo()" exists yet, you will find it on Google.

Algorithm to detect presence of text on image

With my new assignment I am looking for a method to detect the presence of text on image. The image is a map - can be for example google map. The task is to detect where the street/city label is placed.
I know that opencv library has algorithm that can detect features (for example human faces) - haar classifier or hog (histogram of oriented gradients), but I heard that learning process of such algorithms is quite difficult.
Do you know of any algorithm, method or a library that could do that (detect presence of text on image)?
There is a standard problem in vision called text detection in images. it is quite different to OCR. OCR concerms itself with what it says, while text detection is about determining if there is text in the image. Adi Shavit's third link is a method to address this problem. You can look on google scholar well cited articles on text detection.
There are several possible approaches you can take.
Use OCR. A search for OCR on Stackoverflow will show many options. These include Tesseract and Ocropus.
If your text uses very specific fixed font, you may get away with simple template matching.
In the more general case you might want to take a look at "Detecting Text in Natural Scenes with Stroke Width Transform"
UPDATE Jan. 2017
The OpenCV 3.2 contrib module now has a text detection module.
It also includes a sample (C++, Python) of how to use it.
You need to tune this to a specific type of map images, or the problem is going to be very difficult (see the previous post about links to articles).
OCR is the way to go, and you should use an existing library. However, OCR is mainly done on text on white backgrounds. To reduce your problem to a regular OCR problem, you should attempt to work on the color space of the map. Likely the map text has a very specific color and this may be enough to find these pixels. You can then filter the detected pixels based on the size of connected regions.
If you literally only want to find the locations of text labels, you can do the above, and pretty much just skip the OCR step. If the labels are not too close, simple clustering algorithms can be used to find their respective positions.

How does Content-Aware fill work?

In the upcoming version of Photoshop there is a feature called Content-Aware fill.
This feature will fill a selection of an image based on the surrounding image - to the point it can generate bushes and clouds while being seamless with the surrounding image.
See for a preview of the Photoshop feature I'm talking about.
My question is:
How does this feature work algorithmically?
I am a co-author of the PatchMatch paper previously mentioned here, and I led the development of the original Content-Aware Fill feature in Photoshop, along with Ivan Cavero Belaunde and Eli Shechtman in the Creative Technologies Lab, and Jeff Chien on the Photoshop team.
Photoshop's Content-Aware Fill uses a highly optimized, multithreaded variation of the algorithm described in the PatchMatch paper, and an older method called "SpaceTime Video Completion." Both papers are cited on the following technology page for this feature:
You can find out more about us on the Adobe Research web pages.
I'm guessing that for the smaller holes they are grabbing similarly textured patches surrounding the area to fill it in. This is described in a paper entitled "PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing" by Connelly Barnes and others in SIGGRAPH 2009. For larger holes they can exploit a large database of pictures with similar global statistics or texture, as describe in "Scene Completion Using Millions of Photographs". If they somehow could fused the two together I think it should work like in the video.
There is very similar algorithm for GIMP for a quite long time. It is called resynthesizer and probably you should be able to find a source for it (maybe at the project site)
There is also source available at the ubuntu repository
And here you can see processing the same images with GIMP:
Well, they are not going to tell for the obvious reasons. The general name for the technique is "inpainting", you can look this up.
Specifically, if you look at what Criminisi did while in Microsoft and what Todor Georgiev does now at Adobe, you'll be able to make a very good guess. A 90% guess, I'd say, which should be good enough.
I work on a similar problem. From what i read they use "PatchMatch" or "non-parametric patch sampling" in general.
PatchMatch: A Randomized Correspondence Algorithm
for Structural Image Editing
As a guess (and that's all that it would be) I'd expect that it does some frequency analysis (some like a Fourier transform) of the image. By looking only at the image at the edge of the selection and ignoring the middle, it could then extrapolate back into the middle. If the designers choose the correct color plains and what not, they should be able to generate a texture that seamlessly blends into the image at the edges.
edit: looking at the last example in the video; if you look at the top of the original image on either edge you see that the selection line runs right down a "gap" in the clouds and that right in the middle there is a "bump". These are the kind of artifacts I'd expect to see if my guess is correct. (OTOH, I'd also expect to see them is it was using some kind of sudo-mirroring across the selection boundary.)
The general approach is either content-aware fill or seam-carving. Ariel Shamir's group is responsible for the seminal work here, which was presented in SIGGRAPH 2007. See:
Edit: Please see answer from the co-author of Content-Aware fill. I will be deleting this soon.

Computing the difference between images

Do you guys know of any algorithms that can be used to compute difference between images?
Take this webpage for example You give it a link or upload an image and it finds similiar images. I doubt that it compares the image in question against all of them (or maybe it does).
By compute I mean like what the Levenshtein_distance or the Hamming distance is for strings.
By no means do I need to the correct answer for a project or anything, I just found the website and got very curious. I know digg pays for a similiar service for their website.
The very simplest measures are going to be RMS-error based approaches, for example:
Root Mean Square Deviation
Peak Signal to Noise Ratio
These probably gel with your notions of distance measures, but their results are really only meaningful if you've got two images that are very close already, like if you're looking at how well a particular compression scheme preserved the original image. Also, the same result from either comparison can mean a lot of different things, depending on what kind of artifacts there are (take a look at the paper I cite below for some example photos of RMS/PSNR can be misleading).
Beyond these, there's a whole field of research devoted to image similarity. I'm no expert, but here are a few pointers:
A lot of work has gone into approaches using dimensionality reduction (PCA, SVD, eigenvalue analysis, etc) to pick out the principal components of the image and compare them across different images.
Other approaches (particularly medical imaging) use segmentation techniques to pick out important parts of images, then they compare the images based on what's found
Still others have tried to devise similarity measures that get around some of the flaws of RMS error and PSNR. There was a pretty cool paper on the spatial domain structural similarity (SSIM) measure, which tries to mimic peoples' perceptions of image error instead of direct, mathematical notions of error. The same guys did an improved translation/rotation-invariant version using wavelet analysis in this paper on WSSIM.
It looks like TinEye uses feature vectors with values for lots of attributes to do their comparison. If you hunt around on their site, you eventually get to the Ideé Labs page, and their FAQ has some (but not too many) specifics on the algorithm:
Q: How does visual search work?
A: Idée’s visual search technology uses sophisticated algorithms to analyze hundreds of image attributes such as colour, shape, texture, luminosity, complexity, objects, and regions.These attributes form a compact digital signature that describes the appearance of each image, and these signatures are calculated by and indexed by our software. When performing a visual search, these signatures are quickly compared by our search engine to return visually similar results.
This is by no means exhaustive (it's just a handful of techniques I've encountered in the course of my own research), but if you google for technical papers or look through proceedings of recent conferences on image processing, you're bound to find more methods for this stuff. It's not a solved problem, but hopefully these pointers will give you an idea of what's involved.
One technique is to use color histograms. You can use machine learning algorithms to find similar images based on the repesentation you use. For example, the commonly used k-means algorithm. I have seen other solutions trying to analyze the vertical and horizontal lines in the image after using edge detection. Texture analysis is also used.
A recent paper clustered images from picasa web. You can also try the clustering algorithm that I am working on.
Consider using lossy wavelet compression and comparing the highest relevance elements of the images.
What TinEye does is a sort of hashing over the image or parts of it (see their FAQ). It's probably not a real hash function since they want similar "hashes" for similar (or nearly identical) images. But all they need to do is comparing that hash and probably substrings of it, to know whether the images are similar/identical or whether one is contained in another.
Heres an image similarity page, but its for polygons. You could convert your image into a finite number of polygons based on color and shape, and run these algorithm on each of them.
here is some code i wrote, 4 years ago in java yikes that does image comparisons using histograms. dont look at any part of it other than buildHistograms()
maybe its helpful, atleast if you are using java
Correlation techniques will make a match jump out. If they're JPEGs you could compare the dominant coefficients for each 8x8 block and get a decent match. This isn't exactly correlation but it's based on a cosine transfore, so it's a first cousin.

How can I measure the similarity between two images? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I would like to compare a screenshot of one application (could be a Web page) with a previously taken screenshot to determine whether the application is displaying itself correctly. I don't want an exact match comparison, because the aspect could be slightly different (in the case of a Web app, depending on the browser, some element could be at a slightly different location). It should give a measure of how similar are the screenshots.
Is there a library / tool that already does that? How would you implement it?
This depends entirely on how smart you want the algorithm to be.
For instance, here are some issues:
cropped images vs. an uncropped image
images with a text added vs. another without
mirrored images
The easiest and simplest algorithm I've seen for this is just to do the following steps to each image:
scale to something small, like 64x64 or 32x32, disregard aspect ratio, use a combining scaling algorithm instead of nearest pixel
scale the color ranges so that the darkest is black and lightest is white
rotate and flip the image so that the lighest color is top left, and then top-right is next darker, bottom-left is next darker (as far as possible of course)
Edit A combining scaling algorithm is one that when scaling 10 pixels down to one will do it using a function that takes the color of all those 10 pixels and combines them into one. Can be done with algorithms like averaging, mean-value, or more complex ones like bicubic splines.
Then calculate the mean distance pixel-by-pixel between the two images.
To look up a possible match in a database, store the pixel colors as individual columns in the database, index a bunch of them (but not all, unless you use a very small image), and do a query that uses a range for each pixel value, ie. every image where the pixel in the small image is between -5 and +5 of the image you want to look up.
This is easy to implement, and fairly fast to run, but of course won't handle most advanced differences. For that you need much more advanced algorithms.
The 'classic' way of measuring this is to break the image up into some canonical number of sections (say a 10x10 grid) and then computing a histogram of RGB values inside of each cell and compare corresponding histograms. This type of algorithm is preferred because of both its simplicity and it's invariance to scaling and (small!) translation.
Use a normalised colour histogram. (Read the section on applications here), they are commonly used in image retrieval/matching systems and are a standard way of matching images that is very reliable, relatively fast and very easy to implement.
Essentially a colour histogram will capture the colour distribution of the image. This can then be compared with another image to see if the colour distributions match.
This type of matching is pretty resiliant to scaling (once the histogram is normalised), and rotation/shifting/movement etc.
Avoid pixel-by-pixel comparisons as if the image is rotated/shifted slightly it may lead to a large difference being reported.
Histograms would be straightforward to generate yourself (assuming you can get access to pixel values), but if you don't feel like it, the OpenCV library is a great resource for doing this kind of stuff. Here is a powerpoint presentation that shows you how to create a histogram using OpenCV.
Don't video encoding algorithms like MPEG compute the difference between each frame of a video so they can just encode the delta? You might look into how video encoding algorithms compute those frame differences.
Look at this open source image search application It describes several image similarity algorighms, three of which are from the MPEG-7 standard: ScalableColor, ColorLayout, EdgeHistogram and Auto Color Correlogram.
You could use a pure mathematical approach of O(n^2), but it will be useful only if you are certain that there's no offset or something like that. (Although that if you have a few objects with homogeneous coloring it will still work pretty well.)
Anyway, the idea is the compute the normalized dot-product of the two matrices.
C = sum(Pij*Qij)^2/(sum(Pij^2)*sum(Qij^2)).
This formula is actually the "cosine" of the angle between the matrices (wierd).
The bigger the similarity (lets say Pij=Qij), C will be 1, and if they're completely different, lets say for every i,j Qij = 1 (avoiding zero-division), Pij = 255, then for size nxn, the bigger n will be, the closer to zero we'll get. (By rough calculation: C=1/n^2).
You'll need pattern recognition for that. To determine small differences between two images, Hopfield nets work fairly well and are quite easy to implement. I don't know any available implementations, though.
A ruby solution can be found here
From the readme:
Phashion is a Ruby wrapper around the pHash library, "perceptual hash", which detects duplicate and near duplicate multimedia files
How to measure similarity between two images entirely depends on what you would like to measure, for example: contrast, brightness, modality, noise... and then choose the best suitable similarity measure there is for you. You can choose from MAD (mean absolute difference), MSD (mean squared difference) which are good for measuring brightness...there is also available CR (correlation coefficient) which is good in representing correlation between two images. You could also choose from histogram based similarity measures like SDH (standard deviation of difference image histogram) or multimodality similarity measures like MI (mutual information) or NMI (normalized mutual information).
Because this similarity measures cost much in time, it is advised to scale images down before applying these measures on them.
I wonder (and I'm really just throwing the idea out there to be shot down) if something could be derived by subtracting one image from the other, and then compressing the resulting image as a jpeg of gif, and taking the file size as a measure of similarity.
If you had two identical images, you'd get a white box, which would compress really well. The more the images differed, the more complex it would be to represent, and hence the less compressible.
Probably not an ideal test, and probably much slower than necessary, but it might work as a quick and dirty implementation.
You might look at the code for the open source tool findimagedupes, though it appears to have been written in perl, so I can't say how easy it will be to parse...
Reading the findimagedupes page that I liked, I see that there is a C++ implementation of the same algorithm. Presumably this will be easier to understand.
And it appears you can also use gqview.
Well, not to answer your question directly, but I have seen this happen. Microsoft recently launched a tool called PhotoSynth which does something very similar to determine overlapping areas in a large number of pictures (which could be of different aspect ratios).
I wonder if they have any available libraries or code snippets on their blog.
to expand on Vaibhav's note, hugin is an open-source 'autostitcher' which should have some insight on the problem.
There's software for content-based image retrieval, which does (partially) what you need. All references and explanations are linked from the project site and there's also a short text book (Kindle): LIRE
You can use Siamese Network to see if the two images are similar or dissimilar following this tutorial. This tutorial cluster the similar images whereas you can use L2 distance to measure the similarity of two images.
Beyond Compare has pixel-by-pixel comparison for images, e.g.,
If this is something you will be doing on an occasional basis and doesn't need automating, you can do it in an image editor that supports layers, such as Photoshop or Paint Shop Pro (probably GIMP or Paint.Net too, but I'm not sure about those). Open both screen shots, and put one as a layer on top of the other. Change the layer blending mode to Difference, and everything that's the same between the two will become black. You can move the top layer around to minimize any alignment differences.
Well a really base-level method to use could go through every pixel colour and compare it with the corresponding pixel colour on the second image - but that's a probably a very very slow solution.
