Finding a image that contains a cropped image

Finding a image that contains a cropped image - image

I have a big database of pictures (say, 1 million 512x512px images) and I want to do the following query in a fast way:
Given a cropped image, find an image from the database that contains it.
(The closest question that I could find in StackOverflow is this one, which I address later in this post)
The following image illustrates what I'm trying to do.
I have the following restrictions:
(I) – The query must be fast. 10⁶ is a lot, so I don't think I can compare each image in the query to each of the others individually.
(II) – I need to work with cropped images, so solutions like simple image hashing won't do it (of course, this does not apply to crop-resistant hashes)
(III) – I don't know the proportion between the area of the queried image and the image that contains it. In the example above, the refrigerant is just a small portion of the original image, but the cat takes a lot of space in the image where it is contained. Although I estimate that the proportion is always between 10%~100%, I don't know the exact amount beforehand (suppose the images in queries are always 512x512px, for example)
I've gathered some information in my research:
Simple image hash matching isn't possible because of (II) (I'm working with cropped parts)
Reddit's RepostSleuthBot (available on GitHub) is an excellent starting point for me: It can identify if an image was already posted in an efficient way. Instead of simply matching hashes, seems like it uses the ANNOY algorithm to find similar images (so it can match images with slight modifications in text or brightness, for example). The only problem with this approach is that it isn't well adapted for cropped images. So, this addresses (I) but not (II) and (III).
In my StackOverflow searches, the closest thing I found to help in this problem is that if I knew the proportion between the cropped image and the original, I could match it using phase correlation, like this answer says.
This addresses (II), which is awesome, but then I'll have problems with (I) because I'd have to try to match with each image of the database, and it's also inviable because of (III).
A promising feature would be cropping-resistant image hashing - the paper Efficient Cropping-Resistant Robust Image Hashing, 10.1109/ares.2014.85 describes one, but seems like it isn't that performant, especially taking in consideration that I'm aiming at small crops (10%~100% of the original image) and a huge amount of images.
I got stuck after this point. Is there any other algorithm or method I should be aware of? Anything will be very appreciated.

Related

How to know if an images is similar to another (slightly different angle but same point of view)

I've checked methods like Phasher to get similar images. Basically to resize images to 8x8, grayscale, get average pixel and create a binary hash of each pixel comparing if it's above or below the average pixel.
This method is very well explained here:
http://hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
Example working:
- image 1 of a computer on a table
- image 2, the same, but with a coin
This would work, since, using the hash of a very reduced, grayscale image, both of them will be almost the same, or even the same. So you can conclude they are similar when 90% of more of the pixels are the same (in the same place!)
My problem is in images that are taken from the same point of view but different angle, for example this ones:
In this case, the hashes "fingerprint" generated are so shifted each other, that we can not compare the hashes bit by bit, it will be very different.
The pixels are "similar", but they are not in the same place, since in this case there's more sky, and the houses "starts" more below than the first one.
So the hash comparison results in "they are different images".
Possible solution:
I was thinking about creating a larger hash for the first image, then get 10 random "sub hashes" for the second one, and try to see if the 10 sub hashes are or are not in "some place" of the first big hash (if a substring is contained into another bigger).
Problem here I think is the CPU/time when working with thousands of images, since you have to compare 1 image to 1000, and in each one, compare 10 sub hashes with a big one.
Other solutions ? ;-)

One option is to detect a set of "interesting" points for each image and store that alongside your hash. It's somewhat similar to the solution you suggested.
We want those points be unlikely to vary between images like yours that have shifts in perspective. These lecture slides give a good overview of how to find points like that with fairly straightforward linear algebra. I'm using Mathematica because it has built in functions for a lot of this stuff. ImageKeypoints does what we want here.
After we have our interesting points we need to find which ones match between the images we're comparing. If your images are very similar, like the ones in your examples, you could probably just take an 8x8 greyscale image for each interesting point and compare each from one image with the ones for the nearby interesting points on the other image. I think you could use your existing algorithm.
If you wanted to use a more advanced algorithm like SIFT you'd need to have a look at ImageKeypoint's properties like scale and orientation.
The ImageKeypoints documentation has this example you can use to get a small piece of the image for each interesting point (it uses the scale property instead of a fixed size):
MapThread[ImageTrim[img, {#1}, 2.5 #2] &,
Transpose#
ImageKeypoints[img, {"Position", "Scale"},
"KeypointStrength" -> .001]]
Finding a certain number of matching points might be enough to say that the images are similar, but if not you can use something like RANSAC to figure out the transformation you need to align your hash images (the 8x8 images you're already able to generate) enough that your existing algorithm works.
I should point out that Mathematica has ImageCorrespondingPoints, which does all of this stuff (using ImageKeypoints) much better. But I don't know how you could have it cache the intermediate results so that scales for what you're trying to do. You might want to look into its ability to constrain matching points to a perspective transform, though.
Here's a plot of the matching points for your example images to give you an idea of what parts end up matching:
So you can precalculate the interesting points for your database of images, and the greyscale hashes for each point. You'll have to compare several hash images for each image in your database, rather than just two, but it will scale to within a constant factor of your current algorithm.

You can try an upper bound if the hashes doesn't match compare how many pixels match from the 8x8 grid. Maybe you can try to match the colors like in photo mosaic:Photo Mosaic Algorithm. How to create a mosaic photo given the basic image and a list of tiles?.

Is it possible to get originals HQ images from CIFAR10 dataset?

I'm currently working on my thesis on the neural networks. I'm using the CIFAR10 as a reference dataset. Now I would like to show some example results in my paper. The problem is, that the images in the dataset are 32x32 pixels so it's really hard to recognize something on them when printed on paper.
Is there any way to get hold of the original images with higher resolution?
UPDATE: I'm not asking for image processing algorithm, but for the original images presented in CIFAR-10. I need some higher resolution samples to put in my paper.

I now have the same problem and I just found your question.
It seems that CIFAR was built from labeling the tinyimages dataset, and are kind enough to share the indexing from CIFAR to tinyimages. Now tinyimages contain metadata file with URL of the original images and a toolbox for getting for any image you wish (e.g. those included in the CIFAR index).
So one may write a mat file which does this and share the results...

They're just small:
The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset.
You could use Google reverse image search if you're curious.

Find duplicate images of different sizes

I am wondering if there is a pre-existing algorithm/library/framework to compare two images to see if one is a re-sized version of the other? The programming language doesn't matter at this stage.
If there is nothing out there, I'd need to write something up. What I have thought of so far:
(Expensive) Resize the larger to the smaller and compare pixel by pixel.
Better yet, just resize a few random "areas" on the picture and compare. If they match, convert more, etc...
Break the image into a number of rows and columns and do some sort of parity math on the color values.
The problem I see with the first two ideas especially, is that there are different ways to re-size a picture in the first place, so the math will likely not work out the same at all. Some re-sizing adds blur, etc....
If anyone could point me to some good literature on this subject, that would be great. My googling turns up mostly shareware applications which is not what I want.
The goal is to have this running in the back of a webserver.

The best approach depends on the characteristics of the images you are comparing, what percentage of probability it is that the images are the same, and when they are different, are they typically off by a lot or could it be as minute as a single pixel difference?
If the answers to the above is that the images you need to compare will be completely random then going with the expensive solution, or some available package might be the best bet.
If it is that you know that the images are different more often than not, and that the images typically differ quite a lot, and you really want to hand-roll a solution you could implement some initial 'quick compare' steps that would be less expensive and that would quickly identify a lot of the cases where the images are different.
For example you could resize the larger image, then either compare pixel-by-pixel (or calculate a hash of the pixel values) only a 'diagonal line' of the image (top left pixel to bottom right pixel) and by doing so exclude differing images and only do the more expensive comparison for those that pass this test.
Or take a pre-set number of points at whatever is a 'good distribution' depending on the type of image and only do the more expensive comparison for those that pass this test.
If you know a lot about the images you will be comparing, they have known characteristics and they are different more often than they are the same, implementing a cheap 'quick elimination compare' along the lines of the above could be worthwhile.

You need to look into dHash algorithm for this.
I wrote a pure java library just for this few days back. You can feed it with directory path(includes sub-directory), and it will list the duplicate images in list with absolute path which you want to delete. Alternatively, you can use it to find all unique images in a directory too.
It used awt api internally, so can't be used for Android though. Since, imageIO has problem reading alot of new types of images, i am using twelve monkeys jar which is internally used.
https://github.com/srch07/Duplicate-Image-Finder-API
Jar with dependencies bundled internally can be downloaded from, https://github.com/srch07/Duplicate-Image-Finder-API/blob/master/archives/duplicate_image_finder_1.0.jar
The api can find duplicates among images of different sizes too.

What algorithm could be used to identify if images are the "same" or similar, regardless of size?

TinEye, the "reverse image search engine", allows you to upload/link to an image and it is able to search through the billion images it has crawled and it will return links to images it has found that are the same image.
However, it isn't a naive checksum or anything related to that. It is often able to find both images of a higher resolution and lower resolution and larger and smaller size than the original image you supply. This is a good use for the service because I often find an image and want the highest resolution version of it possible.
Not only that, but I've had it find images of the same image set, where the people in the image are in a different position but the background largely stays the same.
What type of algorithm could TinEye be using that would allow it to compare an image with others of various sizes and compression ratios and yet still accurately figure out that they are the "same" image or set?

These algorithms are usually fingerprint-based. Fingerprint is a reasonably small data structure, something like a long hash code. However, the goals of fingerprint function are opposite to the goals of hash function. A good hash function should generate very different codes for very similar (but not equal) objects. The fingerprint function should, on contrary, generate the same fingerprint for similar images.
Just to give you an example, this is a (not particularly good) fingerprint function: resize the picture to 32x32 square, normalize and and quantize the colors, reducing the number of colors to something like 256. Then, you have 1024-byte fingerprint for the image. Just keep a table of fingerprint => [list of image URLs]. When you need to look images similar to a given image, just calculate its fingerprint value and find the corresponding image list. Easy.
What is not easy - to be useful in practice, the fingerprint function needs to be robust against crops, affine transforms, contrast changes, etc. Construction of good fingerprint functions is a separate research topic. Quite often they are hand-tuned and uses a lot of heuristics (i.e. use the knowledge about typical photo contents, about image format / additional data in EXIF, etc.)
Another variation is to use more than one fingerprint function, try to apply each of them and combine the results. Actually, it's similar to finding similar texts. Just instead of "bag of words" the image similarity search uses a "bag of fingerprints" and finds how many elements from one bag are the same as elements from another bag. How to make this search efficient is another topic.
Now, regarding the articles/papers. I couldn't find a good article that would give an overview of different methods. Most of the public articles I know discuss specific improvement to specific methods. I could recommend to check these:
"Content Fingerprinting Using Wavelets". This article is about audio fingerprinting using wavelets, but the same method can be adapted for image fingerprinting.
PERMUTATION GROUPING:
INTELLIGENT HASH FUNCTION DESIGN FOR AUDIO & IMAGE RETRIEVAL. Info on Locality-Sensitive Hashes.
Bundling Features for Large Scale Partial-Duplicate Web Image Search. A very good article, talks about SIFT and bundling features for efficiency. It also has a nice bibliography at the end

The creator of the FotoForensics site posted this blog post on this topic, it was very useful to me, and showed algorithms that may be good enough for you and that require a lot less work than wavelets and feature extraction.
http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html
aHash (also called Average Hash or Mean Hash). This approach crushes the image into a grayscale 8x8 image and sets the 64 bits in
the hash based on whether the pixel's value is greater than the
average color for the image.
pHash (also called "Perceptive Hash"). This algorithm is similar to aHash but use a discrete cosine transform (DCT) and compares based
on frequencies rather than color values.
dHash Like aHash and pHash, dHash is pretty simple to implement and is far more accurate than it has any right to be. As an
implementation, dHash is nearly identical to aHash but it performs
much better. While aHash focuses on average values and pHash evaluates
frequency patterns, dHash tracks gradients.

It's probably based on improvements of feature extraction algorithms, taking advantage of features which are scale invariant.
Take a look at
Feature extraction
SIFT, other site
or, if you are REALLY interested, you can shell out some 70 bucks (or at least look at the Google preview) for
Feature Extraction & Image Processing

http://tineye.com/faq#how
Based on this, Igor Krivokon's answer seems to be on the mark.

The Hough Transform is a very old feature extraction algorithm, that you mind find interesting. I doubt it's what tinyeye uses, but it's a good, simple starting place for learning about feature extraction.
There are also slides to a neat talk from some University of Toronto folks about their work at astrometry.net. They developed an algorithm for matching telescoping images of the night sky to locations in star catalogs in order to identify the features in the image. It's a more specific problem than what tinyeye tries to solve, but I'd expect that a lot of the basic ideas that they talk about are applicable to the more general problem.

Check out this blog post (not mine) for a very understandable description of a very understandable algorithm which seems to get good results for how simple it is. It basically partitions the respective pictures into a very coarse grid, sorts the grid by red:blue and green:blue ratios, and checks whether the sorts were the same. This naturally works for color images only.
The pros most likely get better results using vastly more advanced algorithms. As mentioned in the comments on that blog, a leading approach seems to be wavelets.

They may well be doing a Fourier Transform to characterize the complexity of the image, as well as a histogram to characterize the chromatic distribution, paired with a region categorization algorithm to assure that similarly complex and colored images don't get wrongly paired. Don't know if that's what they're using, but it seems like that would do the trick.

What about resizing the pictures to a standard small size and checking for SSIM or luma-only PSNR values? that's what I would do.

Detecting if two images are visually identical

Sometimes two image files may be different on a file level, but a human would consider them perceptively identical. Given that, now suppose you have a huge database of images, and you wish to know if a human would think some image X is present in the database or not. If all images had a perceptive hash / fingerprint, then one could hash image X and it would be a simple matter to see if it is in the database or not.
I know there is research around this issue, and some algorithms exist, but is there any tool, like a UNIX command line tool or a library I could use to compute such a hash without implementing some algorithm from scratch?
edit: relevant code from findimagedupes, using ImageMagick
try $image->Sample("160x160!");
try $image->Modulate(saturation=>-100);
try $image->Blur(radius=>3,sigma=>99);
try $image->Normalize();
try $image->Equalize();
try $image->Sample("16x16");
try $image->Threshold();
try $image->Set(magick=>'mono');
($blob) = $image->ImageToBlob();
edit: Warning! ImageMagick $image object seems to contain information about the creation time of an image file that was read in. This means that the blob you get will be different even for the same image, if it was retrieved at a different time. To make sure the fingerprint stays the same, use $image->getImageSignature() as the last step.

findimagedupes is pretty good. You can run "findimagedupes -v fingerprint images" to let it print "perceptive hash", for example.

Cross-correlation or phase correlation will tell you if the images are the same, even with noise, degradation, and horizontal or vertical offsets. Using the FFT-based methods will make it much faster than the algorithm described in the question.
The usual algorithm doesn't work for images that are not the same scale or rotation, though. You could pre-rotate or pre-scale them, but that's really processor intensive. Apparently you can also do the correlation in a log-polar space and it will be invariant to rotation, translation, and scale, but I don't know the details well enough to explain that.
MATLAB example: Registering an Image Using Normalized Cross-Correlation
Wikipedia calls this "phase correlation" and also describes making it scale- and rotation-invariant:
The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation.

Colour histogram is good for the same image that has been resized, resampled etc.
If you want to match different people's photos of the same landmark it's trickier - look at haar classifiers. Opencv is a great free library for image processing.

I don't know the algorithm behind it, but Microsoft Live Image Search just added this capability. Picasa also has the ability to identify faces in images, and groups faces that look similar. Most of the time, it's the same person.
Some machine learning technology like a support vector machine, neural network, naive Bayes classifier or Bayesian network would be best at this type of problem. I've written one each of the first three to classify handwritten digits, which is essentially image pattern recognition.

resize the image to a 1x1 pixle... if they are exact, there is a small probability they are the same picture...
now resize it to a 2x2 pixle image, if all 4 pixles are exact, there is a larger probability they are exact...
then 3x3, if all 9 pixles are exact... good chance etc.
then 4x4, if all 16 pixles are exact,... better chance.
etc...
doing it this way, you can make efficiency improvments... if the 1x1 pixel grid is off by a lot, why bother checking 2x2 grid? etc.

If you have lots of images, a color histogram could be used to get rough closeness of images before doing a full image comparison of each image against each other one (i.e. O(n^2)).

There is DPEG, "The" Duplicate Media Manager, but its code is not open. It's a very old tool - I remember using it in 2003.

You could use diff to see if they are REALLY different.. I guess it will remove lots of useless comparison. Then, for the algorithm, I would use a probabilistic approach.. what are the chances that they look the same.. I'd based that on the amount of rgb in each pixel. You could also find some other metrics such as luminosity and stuff like that.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio