What is currently considered the "best" algorithm for 2D point-matching? - algorithm

I have two lists containing x-y coordinates (of stars). I could also have magnitudes (brightnesses) attached to each star. Now each star has random position jiggles and there can be a few extra or missing points in each image. My question is, "What is the best 2D point matching algorithm for such a dataset?" I guess both for a simple linear (translation, rotation, scale) and non-linear (say, n-degree polynomials in the coordinates). In the lingo of the point matching field, I'm looking for the algorithms that would win in a shootout between 2D point matching programs with noise and spurious points. There may be a different "winners" depending if the labeling info is used (the magnitudes) and/or the transformation is restricted to being linear.
I am aware that there are many classes of 2D point matching algorithms and many algorithms in each class (literally probably hundreds in total) but I don't know which, if any, is the consider the "best" or the "most standard" by people in the field of computer vision. Sadly, many of the articles to papers I want to read don't have online versions and I can only read the abstract. Before I settle on a particular algorithm to implement it would be good to hear from a few experts to separate the wheat from the chaff.
I have a working matching program that uses triangles but it fails somewhat frequently (~5% of the time) such that the solution transformation has obvious distortions but for no obvious reason. This program was not written by me and is from a paper written almost 20 years ago. I want to write a new implementation that performs most robustly. I am assuming (hoping) that there have been some advances in this area that make this plausible.

If you're interested in star matching, check out the Astrometry.net blind astrometry solver and the paper on it here. They use four point quads to solve star configurations in Flickr pictures of the night sky. Check out this interview.

There is no single "best" algorithm for this. There are lots of different techniques, and each work better than others on specific datasets and types of data.
One thing I'd recommend is to read this introduction to image registration from the tutorials of the Insight Toolkit. ITK supports MANY types of image registration (which is what it sounds like you are attempting), and is very robust in many cases. Most of their users are in the medical field, so you'll have to wade through a lot of medical jargon, but the algorithms and code work with any type of image (including 1,2,3, and n dimensional images, of different types,etc).

You can consider applying your algorithm first only on the N brightest stars, then include progressively the others to refine the result, reducing the search range at the same time.
Using RANSAC for robustness to extra points is also very common.

I'm not sure it would work, but worth a try:
For each star do the circle time ray Fourier transform - centered around it - of all the other stars (note: this is not the standard Fourier transform, which is line times line).
The phase space of circle times ray is integers times line, but since we only have finite accuracy, you just get a matrix; the dimensions of the matrix depend on accuracy. Now try to pair the matrices to one another (e.g. using L_2 norm)

I saw a program on tv a while ago about how researchers were taking pictures of whales and using the spots on them (which are unique for each whale) to id each whale. It used the angles between the spots. By using the angles it didn't matter if the image was rotated or scaled or translated. That sounds similar to what you're doing with your triangles.

I think the "best" (most technical) way would to be to take the Fourier Transform of the original image and of the new linearly modified image. By doing some simple filtering, it should be easy to figure out the orientation and scale of your image with respect to the old one. There is a description of the 2d Fourier Transform here.

Related

Recognize recurring images into a larger one

Edit: this is not a duplicate of Determine if an image exists within a larger image, and if so, find it, using Python since I do not know the pattern beforehand
Suppose I have a big image (usually a picture taken with a camera so it might be a bit noisy, but let's assume it's not for now) made up of multiple smaller images all equal among themselves, something like
I need to find the contour of each one of those. The first step is recognizing that there's a recurring image (or unknown pattern) in the 2D image. How can I achieve this first step?
I did read around that I might use a FFT of the original image and search for duplicate frequencies, would that be a feasible approach?
To build a bit on the problem: I do not know the image beforehand, nor its size or how many will there be on the big image. The images can be shot from camera so they might be noisy. The images won't overlap.
You can try to use described keypoints (Sift/SURF/ORB/etc.) to find features in the image and try to detect the same features in the image.
You can see such a result in How to find euclidean distance between keypoints of a single image in opencv where 3x the same image is present and features are detected and linked between those subimages automatically.
In your image the result looks like
so you can see that the different occurances of the same pattern is indeed automatically detected and linked.
Next steps would be to group features to objects, so that the "whole" pattern can be extracted. Once you have a candidate for a pattern, you can extract a homography for each occurance of the pattern (with one reference candidate pattern) to verify that it is a pattern. One open problem is how to find such candidates. Maybe it is worth trying to find "parallel features", so keypoint matches that have parallel lines and/or same length lines (see image). Or maybe there is some graph theory approach.
All in all, this whole approach will have some advantages and disadvantes:
Advantages:
real world applicability - Sift and other keypoints are working quite well even with noise and some perspective effects, so chances are increased to find such patterns.
Disadvantages
slow
parametric (define what it means that two features are successfully
matched)
not suitable for all kind of patterns - your pattern must have some extractable keypoints
Those are some thoughts and probably not complete ;)
Unfortunately no full code yet for your concrete task, but I hope the idea is clear.
For such a clean image, it suffices to segment the patterns by blob analysis and to compare the segments or ROI that contain them. The size is a first matching criterion. The SAD, SSD or correlation similarity scores can do finer comparison.
In practice you will face more difficulties such as
not possible to segment the patterns
geometric variations in size/orientation
partial occlusion
...
Handling these is out of the scope of this answer; it makes things much harder than in the "toy" case.
The goal is to find several equal or very similar patterns which are not known before in a picture. As it is this problem is still a bit ill posed.
Are the patterns exactly equal or only similar (added noise maybe)?
Do you want to have the largest possible patterns or are smaller subpatterns okay too or are all possible patterns needed? Reason is that of course each pattern could consist of equal patterns too.
Is the background always that simple (completely white) or can it be much more difficult? What do we know about it?
Are the patterns always equally oriented, equally scaled, non-overlapping?
For the simple case of non-overlapping patterns with simple background, the answer of Yves Daoust using segmentation is well performing but fails if patterns are very close or overlapping.
For other cases the idea of the keypoints by Micka will help but might not perform well if there is noise or might be slow.
I have one alternative: look at correlations of subblocks of the image.
In pseudocode:
Divide the image in overlapping areas of size MxN for a suitable M,N (pixel width and height chosen to be approximately the size of the desired pattern)
Correlate each subblock with the whole image. Look for local maxima in the correlation. The position of these maxima denotes the position of similar regions.
Choose a global threshold on all correlations (smartly somehow) and find sets of equal patterns.
Determine the fine structure of these patterns by shanging the shape from rectangular (bounding box) to a more sophisticaed shape (maybe by looking at the shape of the peaks in the correlation)
In case the approximate size of the desired patterns is not known before, try with large values of M, N and go down to smaller ones.
To speed up the whole process start on a coarse scale (downscaled version of the image) and then process finer scales only where needed. Needs balancing of zooming in and performing correlations.
Sorry, I cannot make this a full Matlab project right now, but I hope this helps you.

Algorithms for finding a look alike face?

I'm doing a personal project of trying to find a person's look-alike given a database of photographs of other people all taken in a consistent manner - people looking directly into the camera, neutral expression and no tilt to the head (think passport photo).
I have a system for placing markers for 2d coordinates on the faces and I was wondering if there are any known approaches for finding a look alike of that face given this approach?
I found the following facial recognition algorithms:
http://www.face-rec.org/algorithms/
But none deal with the specific task of finding a look-alike.
Thanks for your time.
I believe you can also try searching for "Face Verification" rather than just "Face Recognition". This might give you more relevant results.
Strictly speaking, the 2 are actually different things in scientific literature but are sometimes lumped under face recognition. For details on their differences and some sample code, take a look here: http://www.idiap.ch/~marcel/labs/faceverif.php
However, for your purposes, what others such as Edvard and Ari has kindly suggested would work too. Basically they are suggesting a K-nearest neighbor style face recognition classifier.
As a start, you can probably try that. First, compute a feature vector for each of your face images in your database. One possible feature to use is the Local Binary Pattern (LBP). You can find the code by googling it. Do the same for your query image. Now, loop through all the feature vectors and compare them to that of your query image using euclidean distance and return the K nearest ones.
While the above method is easy to code, it will generally not be as robust as some of the more sophisticated ones because they generally fail badly when faces are not aligned (known as unconstrained pose. Search for "Labelled Faces in the Wild" to see the results for state of the art for this problem.) or taken under different environmental conditions. But if the faces in your database are aligned and taken under similar conditions as you mentioned, then it might just work. If they are not aligned, you can use the face key points, which you mentioned you are able to compute, to align the faces. In general, comparing faces which are not aligned is a very difficult problem in computer vision and is still a very active area of research. But, if you only consider faces that look alike and in the same pose to be similar (i.e. similar in pose as well as looks) then this shouldn't be a problem.
The website your gave have links to the code for Eigenfaces and Fisherfaces. These are essentially 2 methods for computing feature vectors for your face images. Faces are identified by doing a K nearest neighbor search for faces in the database with feature vectors (computed using PCA and LDA respectively) closest to that of the query image.
I should probably also mention that in the Fisherfaces method, you will need to have "labels" for the faces in your database to identify the faces. This is because Linear Discriminant Analysis (LDA), the classification method used in Fisherfaces, needs this information to compute a projection matrix that will project feature vectors for similar faces close together and dissimilar ones far apart. Comparison is then performed on these projected vectors. Here lies the difference between Face Recognition and Face Verification: for recognition, you need to have "labels" your training images in your database i.e. you need to identify them.
For verification, you are only trying to tell whether any 2 given faces are of the same person. Often, you don't need the "labelled" data in the traditional sense (although some methods might make use of auxiliary training data to help in the face verification).
The code for computing Eigenfaces and Fisherfaces are available in OpenCV in case you use it.
As a side note:
A feature vector is actually just a vector in your linear algebra sense. It is simply n numbers packed together. The word "feature" refers to something like a "statistic" i.e. a feature vector is a vector containing statistics that characterizes the object it represents. For e.g., for the task of face recognition, the simplest feature vector would be the intensity values of the grayscale image of the face. In that case, I just reshape the 2D array of numbers into a n rows by 1 column vector, each entry containing the value of one pixel. The pixel value here is the "feature", and the n x 1 vector of pixel values is the feature vector. In the LBP case, roughly speaking, it computes a histogram at small patches of pixels in the image and joins these histograms together into one histogram, which is then used as the feature vector. So the Local Binary Pattern is the statistic and the histograms joined together is the feature vector. Together they described the "texture" and facial patterns of your face.
Hope this helps.
These two would seem like the equivalent problem, but I do not work in the field. You essentially have the following two problems:
Face recognition: Take a face and try to match it to a person.
Find similar faces: Take a face and try to find similar faces.
Aren't these equivalent? In (1) you start with a picture that you want to match to the owner and you compare it to a database of reference pictures for each person you know. In (2) you pick a picture in your reference database and run (1) for that picture against the other pictures in the database.
Since the algorithms seem to give you a measure of how likely two pictures belong to the same person, in (2) you just sort the measures in decreasing order and pick the top hits.
I assume you should first analyze all the picture in your database with whatever approach you are using. You should then have a set of metrics for each picture which you can compare a specific picture with and statistically find the closest match.
For example, if you can measure the distance between the eyes, you can find faces that have the same distance. You can then find the face that has the overall closest match and return that.

Fastest method to search for a specified item on an image?

Imagine we have a simple 2D drawing, filled it with lots of non-overlapping circles and only a few stars.
If we are to find all the stars among all these circles, I can think of very few methods. Brute force is one of them. Another one is possibly reduce the image size (to the optimal point where you can still distinguish the objects apart) and then apply brute force and map to the original image. The drawback of brute force is of course, it is very time consuming. I am looking for faster methods, possibly the fastest one.
What is the fastest image processing method to search for the specified item on a simple 2D image?
One typical way of looking for an object in an image is through cross correlation. Basically, you look for the position where the cross-correlation between a mask (the object you're attempting to find) and the image is the highest. That position is the likely location of the object you're trying to find.
For the sake of simplicity, I will refer to the object you're attempting to find as a star, but in general it can be any shape.
Some problems with the above approach:
The size of the mask has to match the size of the star. If you don't know the size of the star, then you will have to try different size masks. Image pyramids are more effective than just iteratively trying different size masks, but still require extra effort.
Similarly, the orientations of the mask and the star have to match. If they don't, the cross-correlation won't work.
For these reasons, the more you know about your problem, the simpler it becomes. This is the reason why people have asked you for more information in the comments. A general purpose solution doesn't really exist, to the best of my knowledge. Maybe someone more knowledgeable can correct me on this.
As you've mentioned, reducing the size of the image will help you reduce the computational time of your approach. In my opinion, it's hardly the core element of a solution -- it's just an optional optimization step.
If the shapes are easy to segment from the background, you might be able to compute distinguishing shape/color descriptors. Depending on your problem you could choose descriptors that are invariant to scale, translation or rotation (e.g. compactness, if it is unique to each shape). I do not know if this will be faster, though.
If you already know the exact shape and have an idea about the size, you might want to have a look at the Generalized Hough Transform, which is basically a formalized description of your "brute force algorithm"
As you list a property that the shapes are not overlapping then I assume an efficient algorithm would be able to
cut out all the shapes by scanning the image in some way (I can imagine relatively efficient and simple algorithm for convex shapes)
when you are left with cut out shapes you could use cross relation misha mentioned
You should describe the problem a bit better
can the shapes be rotated or scaled (or some other transform?)
is the background uniform colour
are the shapes uniform colour
are the shapes filled
Depending on the answer on the above questions you might have more less or more simple solutions.
Also, maybe this article might be interesting.
If the shapes are very regular maybe turning them into vectors could fit your needs nicely, but it might be an overkill, really depends what you want to do later.
Step 1: Thresholding - reduce the image to 1 bit (black or white) if the general image set permits it. [For the type of example you cite, my guess is thresholding would work nicely - leaving enough details to find objects].
Step 2: Optionally do some smoothing/noise removal.
Step 3: Use some clustering approach to gather the foreground objects.
Step 4: Use an appropriate heuristic to identify the objects.
The parameters in steps 1/2 will depend a lot on the type of images as well as experimentation/observation. 3 is usually straightforward if you have worked out 1/2 correctly. 4 will depend very much on the problem (for example, in your case identifying stars - which would depend on what is the actual shape of the stars expected in the images).

Explaining the AdaBoost Algorithms to non-technical people

I've been trying to understand the AdaBoost algorithm without much success. I'm struggling with understanding the Viola Jones paper on Face Detection as an example.
Can you explain AdaBoost in laymen's terms and present good examples of when it's used?
Adaboost is an algorithm that combines classifiers with poor performance, aka weak learners, into a bigger classifier with much higher performance.
How does it work? In a very simplified manner:
Train a weak learner.
Add it to the set of weak learners trained so far (with an optimal weight)
Increase the importance of samples that are still miss-classified.
Go to 1.
There is a broad and detailed theory behind the scenes, but the intuition is just that: let each "dumb" classifier focus on the mistakes the previous ones were not able to fix.
AdaBoost is one of the most used algorithms in the machine learning community. In particular, it is useful when you know how to create simple classifiers (possibly many different ones, using different features), and you want to combine them in an optimal way.
In Viola and Jones, each different type of weak-learner is associated to one of the 4 or 5 different Haar features you can have.
AdaBoost uses a number of training sample images (such as faces) to pick a number of good 'features'/'classifiers'. For face recognition a classifiers is typically just a rectangle of pixels that has a certain average color value and a relative size. AdaBoost will look at a number of classifiers and find out which one is the best predictor of a face based on the sample images. After it has chosen the best classifier it will continue to find another and another until some threshold is reached and those classifiers combined together will provide the end result.
This part you may not want to share with non-technical people :) but it is interesting anyway. There are several mathematical tricks which make AdaBoost fast for face recognition such as the ability to add up all the color values of an image and store them in a 2 dimensional array so that the value in any position will be the sum of all the pixels up and to the left of that position. This array can be used to very quickly calculate the average color value of any rectangle within the image by subtracting the value found in the top left corner from the value found in the bottom right corner and dividing by the number of pixels in the rectangle. Using this trick you can quickly scan over an entire image looking for rectangles of different relative sizes that match or are close to a particular color.
Hope this helps.
This is understandable. Most of the papers you can find on Internet retell Viola-Jones and Freund-Shapire papers which are foundation of AdaBoost applied for face recognition in OpenCV. And they mostly consist of difficult formulas and algorithms from several mathematical areas combined.
Here is what can help you (short enough) -
1 - It is used in object and, mostly, in face detection-recognition.The most popular and quite good C++ library is OpenCV from Intel originally. I take the part of Face detection in OpenCV, as an example.
2 - First, a cascade of boosted classifiers working with sample rectangles ("features") is trained on sample of images with faces (called positive) and without faces (negative).
From some Googled paper:
"· Boosting refers to a general and provably effective method of producing a very accurate classifier by combining rough and moderately inaccurate rules of thumb.
· It is based on the observation that finding many rough rules of thumb can be a lot easier than finding a single, highly accurate classifier.
· To begin, we define an algorithm for finding the rules of thumb, which we call a weak learner.
· The boosting algorithm repeatedly calls this weak learner, each time feeding it a different distribution over the training data (in AdaBoost).
· Each call generates a weak classifier and we must combine all of these into a single classifier that, hopefully, is much more accurate than any one of the rules."
During this process the images are scanned to determine the distinctive areas corresponding to certain part of every face. The complex calculation-hypothesis based algorithms are applied (which are not so difficult to understand once you get the main idea).
This can take a week and the output is an XML file which contains the learned information on how to quickly detect the human face, say, in frontal position on any picture (it can be any object in other case).
3 - After that you supply this file to OpenCV face detection program which runs quite fast with up to 99% positive rate (depending on conditions).
As was mentioned here, the scanning speed can be increased greatly with technique known as "integral image".
And finally, these are helpful sources - Object Detection in OpenCV and
Generic Object Detection using AdaBoost from University of California, 2008.

Near-Duplicate Image Detection [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What's a fast way to sort a given set of images by their similarity to each other.
At the moment I have a system that does histogram analysis between two images, but this is a very expensive operation and seems too overkill.
Optimally I am looking for a algorithm that would give each image a score (for example a integer score, such as the RGB Average) and I can just sort by that score. Identical Scores or scores next to each other are possible duplicates.
0299393
0599483
0499994 <- possible dupe
0499999 <- possible dupe
1002039
4995994
6004994
RGB Average per image sucks, is there something similar?
There has been a lot of research on image searching and similarity measures. It's not an easy problem. In general, a single int won't be enough to determine if images are very similar. You'll have a high false-positive rate.
However, since there has been a lot of research done, you might take a look at some of it. For example, this paper (PDF) gives a compact image fingerprinting algorithm that is suitable for finding duplicate images quickly and without storing much data. It seems like this is the right approach if you want something robust.
If you're looking for something simpler, but definitely more ad-hoc, this SO question has a few decent ideas.
I would recommend considering moving away from just using an RGB histogram.
A better digest of your image can be obtained if you take a 2d Haar wavelet of the image (its a lot easier than it sounds, its just a lot of averaging and some square roots used to weight your coefficients) and just retain the k largest weighted coefficients in the wavelet as a sparse vector, normalize it, and save that to reduce its size. You should rescale R G and B using perceptual weights beforehand at least or I'd recommend switching to YIQ (or YCoCg, to avoid quantization noise) so you can sample chrominance information with reduced importance.
You can now use the dot product of two of these sparse normalized vectors as a measure of similarity. The image pairs with the largest dot products are going to be very similar in structure. This has the benefit of being slightly resistant to resizing, hue shifting and watermarking, and being really easy to implement and compact.
You can trade off storage and accuracy by increasing or decreasing k.
Sorting by a single numeric score is going to be intractable for this sort of classification problem. If you think about it it would require images to only be able to 'change' along one axis, but they don't. This is why you need a vector of features. In the Haar wavelet case its approximately where the sharpest discontinuities in the image occur. You can compute a distance between images pairwise, but since all you have is a distance metric a linear ordering has no way to express a 'triangle' of 3 images that are all equally distant. (i.e. think of an image that is all green, an image that is all red and an image that is all blue.)
That means that any real solution to your problem will need O(n^2) operations in the number of images you have. Whereas if it had been possible to linearize the measure, you could require just O(n log n), or O(n) if the measure was suitable for, say, a radix sort. That said, you don't need to spend O(n^2) since in practice you don't need to sift through the whole set, you just need to find the stuff thats nearer than some threshold. So by applying one of several techniques to partition your sparse vector space you can obtain much faster asymptotics for the 'finding me k of the images that are more similar than a given threshold' problem than naively comparing every image against every image, giving you what you likely need... if not precisely what you asked for.
In any event, I used this a few years ago to good effect personally when trying to minimize the number of different textures I was storing, but there has also been a lot of research noise in this space showing its efficacy (and in this case comparing it to a more sophisticated form of histogram classification):
http://www.cs.princeton.edu/cass/papers/spam_ceas07.pdf
If you need better accuracy in detection, the minHash and tf-idf algorithms can be used with the Haar wavelet (or the histogram) to deal with edits more robustly:
http://cmp.felk.cvut.cz/~chum/papers/chum_bmvc08.pdf
Finally, Stanford has an image search based on a more exotic variant of this kind of approach, based on doing more feature extraction from the wavelets to find rotated or scaled sections of images, etc, but that probably goes way beyond the amount of work you'd want to do.
http://wang14.ist.psu.edu/cgi-bin/zwang/regionsearch_show.cgi
I implemented a very reliable algorithm for this called Fast Multiresolution Image Querying. My (ancient, unmaintained) code for that is here.
What Fast Multiresolution Image Querying does is split the image into 3 pieces based on the YIQ colorspace (better for matching differences than RGB). Then the image is essentially compressed using a wavelet algorithm until only the most prominent features from each colorspace are available. These points are stored in a data structure. Query images go through the same process, and the prominent features in the query image are matched against those in the stored database. The more matches, the more likely the images are similar.
The algorithm is often used for "query by sketch" functionality. My software only allowed entering query images via URL, so there was no user interface. However, I found it worked exceptionally well for matching thumbnails to the large version of that image.
Much more impressive than my software is retrievr which lets you try out the FMIQ algorithm using Flickr images as the source. Very cool! Try it out via sketch or using a source image, and you can see how well it works.
A picture has many features, so unless you narrow yourself to one, like average brightness, you are dealing with an n-dimensional problem space.
If I asked you to assign a single integer to the cities of the world, so I could tell which ones are close, the results wouldn't be great. You might, for example, choose time zone as your single integer and get good results with certain cities. However, a city near the north pole and another city near the south pole can also be in the same time zone, even though they are at opposite ends of the planet. If I let you use two integers, you could get very good results with latitude and longitude. The problem is the same for image similarity.
All that said, there are algorithms that try to cluster similar images together, which is effectively what you're asking for. This is what happens when you do face detection with Picasa. Even before you identify any faces, it clusters similar ones together so that it's easy to go through a set of similar faces and give most of them the same name.
There is also a technique called Principle Component Analysis, which lets you reduce n-dimensional data down to any smaller number of dimensions. So a picture with n features could be reduced to one feature. However, this is still not the best approach for comparing images.
There's a C library ("libphash" - http://phash.org/) that will calculate a "perceptual hash" of an image and allow you to detect similar images by comparing hashes (so you don't have to compare each image directly against every other image) but unfortunately it didn't seem to be very accurate when I tried it.
You have to decide what is "similar." Contrast? Hue?
Is a picture "similar" to the same picture upside-down?
I bet you can find a lot of "close calls" by breaking images up into 4x4 pieces and getting an average color for each grid cell. You'd have sixteen scores per image. To judge similarity, you would just do a sum of squares of differences between images.
I don't think a single hash makes sense, unless it's against a single concept like hue, or brightness, or contrast.
Here's your idea:
0299393
0599483
0499994 <- possible dupe
0499999 <- possible dupe
1002039
4995994
6004994
First of all, I'm going to assume these are decimal numbers that are R*(2^16)+G*(2^8)+B, or something like that. Obviously that's no good because red is weighted inordinately.
Moving into HSV space would be better. You could spread the bits of HSV out into the hash, or you could just settle H or S or V individually, or you could have three hashes per image.
One more thing. If you do weight R, G, and B. Weight green highest, then red, then blue to match human visual sensitivity.
In the age of web services you could try http://tineye.com
The question Good way to identify similar images? seems to provide a solution for your question.
i assumed that other duplicate image search software performs an FFT on the images, and stores the values of the different frequencies as a vectors:
Image1 = (u1, u2, u3, ..., un)
Image2 = (v1, v2, v3, ..., vn)
and then you can compare two images for equalness by computing the distance between the weight vectors of two images:
distance = Sqrt(
(u1-v1)^2 +
(u2-v2)^2 +
(u2-v3)^2 +
...
(un-vn)^2);
One solution is to perform a RMS/RSS comparison on every pair of pictures required to perform a bubble sort. Second, you could perform an FFT on each image and do some axis averaging to retrieve a single integer for each image which you would use as an index to sort by. You may consider doing whatever comparison on a resized (25%, 10%) version of the original depending on how small a difference you choose to ignore and how much speedup you require. Let me know if these solutions are interesting, and we can discuss or I can provide sample code.
Most modern approaches to detect Near duplicate image detection use interesting points detection and descriptors describing area around such points. Often SIFT is used. Then you can quatize descriptors and use clusters as visual word vocabulary.
So if we see on ratio of common visual words of two images to all visual words of these images you estimate similarity between images. There are a lot of interesting articles. One of them is Near Duplicate Image Detection: minHash and tf-idf Weighting
For example using IMMI extension and IMMI you can examine many different ways how to measure similarity between images:
http://spl.utko.feec.vutbr.cz/en/component/content/article/46-image-processing-extension-for-rapidminer-5
By defining some threshold and selecting some method you can measure similarity.

Resources