I have a small render engine written for fun. I would like to have some unit testing that would render automatically an image and then compare it to a stored image to check for differences. This should give some sort of metric to be able to gauge if the image is too far off or if we can attribute that to just different timings in animations. If it can also produce the location in the image of the differences that would be great, but not necessary. We can also assume that the 2 images are the exact same size.
What are the classic papers/techniques for that sort of thing ?
(the language is Go, probably nothing exists for it yet and I'd like to implement it myself to understand what's going on. The renderer is github.com/luxengine)
Thank you
One idea could be to see your problem as a case in Image Registration.
The following figure (taken from http://it.mathworks.com/help/images/point-mapping.html) gives a flow-chart for a method to solve the image registration problem.
Using the above figure terms, the basic idea is:
find some interest points in the Fixed image;
find in the Moving image the same corresponding points;
estimate the transformation between the two images using the point correspondences. One of the simplest transformation is a translation represented by a 2D vector; the magnitude of this vector is a measure of differences between the two images, in your case it can be related to the shift you wrote about in your comment. A richer transformation is an homography described by a 3x3 matrix, its distance from the identity matrix is again a measure of differences between the two images.
you can apply the transformation back, for example in the case of the translation you apply the translation to the Moving image and then the warped image can be compared (here I am simplifying a little) pixel by pixel to the Reference image.
Some more ideas are here: Image comparison - fast algorithm
Currently I am trying to figure out the Signal to Noise Ratio of a set of images as a way of gauging the performance of my deconvolution (filtering algorithms). I Have a set of images like the one below, which show the image, before and after the algorithm:
Now, I have discovered quite a few ways of judging the performance. One of these is to use the formula for the SNR of an image, where the signal is the original image and the noise is the filtered image. Another method, as described by this question, goes about figuring out the SNR from the singular image itself. This way, I can compare the SNR ratios that I get for both images and get an all new altogether.
Therefore, my question lies in the fact that, the resources on the internet are confusing and I do not know about the "correct" way of measuring the SNR of these images and using it as a performance metric.
It really depends on what you are trying to compare, and what you deem as "signal" and "noise". In your first method, you are effectively calculating the error(or difference) between image 1 and image 2 where you assume image 2 was tinted by noise but image 1 was not (this is also a sort of signal to distortion ratio). Therefore, this measurement is relative and it measures the performance of your method of transformation from Original to Target (or distortion technique), but not the image itself. For example a new type of encrypting filter generated image 2 from image 1 and you want to measure how different the images are to work out the performance of your filter.
In the second method based on the link you posted, you are assuming that noise is present in both images but at different levels and you are measuring it against each individual image - or in other words, you are measuring the standard deviation of each individual image, which is not relative.The second measurement is usually used to compare results generated from the same source, i.e. an experiment produces N images of the same object in a controlled environment and you want to measure, for example the amount of noise present at the scene (you would use this method to work out the covariance of noise to enable you to control the experiment environment).
I have two images of same height/width they look like similar.But they are not exactly similar pixel by pixel.That is one of the image is moved to right by few pixels.
I am currently using imagemagick compare command.It shows difference as it compares pixel by pixel.Also i tried with fuzz attribute of it.
Please suggest any other tool to compare such type of images.
I don't know what you're really trying to achieve, but if you want a metric to express the similitude between the two images without taking image displacement into account, then maybe you should work in the frequency domain.
As instance, the frequency part of the DFT of your images should be nearly identical, so if you compute the SNR of the two frequency parts, it should be practically null.
In fact, according to the Fourier shift theorem, you can even get an estimation of the displacement offset by calculating the inverse DFT of the combination of the two DFT.
I have a project where I am required to subtract an empty template image from an incoming user filled image. The document type is a normal Bank cheque.
The aim is to extract the handwritten fields from it by subtracting one image from the empty template image.
The issue what i am facing is in aligning these two images, as there is scaling, translation, rotation etc
Any ideas on how to align the template image with the incoming image?
UPDATE 1:
I am posting an example image from the wikipedia page but in the monochrome format as my image is in monochrome format.
When working with Image processing for industrial projects we have in most of the cases a fiducial. A fiducial is like a mark - can be a hole, an cross mark - that never changes, is always in the same positions.
Generally two fiducials are enough to correct misaligning problems like rotation, translation and also scale. For instance If you know the distance between the two, you can always check it to make sure the scale factor is right, or correct it based on the difference of the current distance against the right distance.
In your case, what I would ask you is: Does the template and the incoming image share any visual sign that are invariant and can easily be segmented?
If you have the answer for that question, all the rest will be more simple - the difference itself is a quite straightforward algorithm.
The basic answer is write a function that takes two images and a 2D transform and tells you how aligned they are once you apply the transform to the target image. The function needs to be continuous based on the transform and have a local minima (0) where the images are aligned perfectly. This is called a cost function.
Then use any optimization algorithm over the function and inputs -- you are trying to optimize the transform (translation, scale, rotation). Examples are hill climbing, genetic, simulated annealing, etc.
There are products that do this -- usually they are called Forms Recognition, Forms Registration, Forms Processing, etc. Some are SDKs, but there are also applications that can do it without programming.
Disclaimer: I work at Atalasoft, where we sell a Forms Processing add-on to our .NET imaging SDK.
I need to create fingerprints of many images (about 100.000 existing, 1000 new per day, RGB, JPEG, max size 800x800) to compare every image to every other image very fast. I can't use binary compare methods because also images which are nearly similar should be recognized.
Best would be an existing library, but also some hints to existing algorithms would help me a lot.
Normal hashing or CRC calculation algorithms do not work well with image data. The dimensional nature of the information must be taken into account.
If you need extremely robust fingerprinting, such that affine transformations (scaling, rotation, translation, flipping) are accounted for, you can use a Radon transformation on the image source to produce a normative mapping of the image data - store this with each image and then compare just the fingerprints. This is a complex algorithm and not for the faint of heart.
a few simple solutions are possible:
Create a luminosity histogram for the image as a fingerprint
Create scaled down versions of each image as a fingerprint
Combine technique (1) and (2) into a hybrid approach for improved comparison quality
A luminosity histogram (especially one that is separated into RGB components) is a reasonable fingerprint for an image - and can be implemented quite efficiently. Subtracting one histogram from another will produce a new historgram which you can process to decide how similar two images are. Histograms, because the only evaluate the distribution and occurrence of luminosity/color information handle affine transformations quite well. If you quantize each color component's luminosity information down to an 8-bit value, 768 bytes of storage are sufficient for the fingerprint of an image of almost any reasonable size. Luminosity histograms produce false negatives when the color information in an image is manipulated. If you apply transformations like contrast/brightness, posterize, color shifting, luminosity information changes. False positives are also possible with certain types of images ... such as landscapes and images where a single color dominates others.
Using scaled images is another way to reduce the information density of the image to a level that is easier to compare. Reductions below 10% of the original image size generally lose too much of the information to be of use - so an 800x800 pixel image can be scaled down to 80x80 and still provide enough information to perform decent fingerprinting. Unlike histogram data, you have to perform anisotropic scaling of the image data when the source resolutions have varying aspect ratios. In other words, reducing a 300x800 image into an 80x80 thumbnail causes deformation of the image, such that when compared with a 300x500 image (that's very similar) will cause false negatives. Thumbnail fingerprints also often produce false negatives when affine transformations are involved. If you flip or rotate an image, its thumbnail will be quite different from the original and may result in a false positive.
Combining both techniques is a reasonable way to hedge your bets and reduce the occurence of both false positives and false negatives.
There is a much less ad-hoc approach than the scaled down image variants that have been proposed here that retains their general flavor, but which gives a much more rigorous mathematical basis for what is going on.
Take a Haar wavelet of the image. Basically the Haar wavelet is the succession of differences from the lower resolution images to each higher resolution image, but weighted by how deep you are in the 'tree' of mipmaps. The calculation is straightforward. Then once you have the Haar wavelet appropriately weighted, throw away all but the k largest coefficients (in terms of absolute value), normalize the vector and save it.
If you take the dot product of two of those normalized vectors it gives you a measure of similarity with 1 being nearly identical. I posted more information over here.
You should definitely take a look at phash.
For image comparison there is this php project :
https://github.com/kennethrapp/phasher
And my little javascript clone:
https://redaktor.me/phasher/demo_js/index.html
Unfortunately this is "bitcount"-based but will recognize rotated images.
Another approach in javascript was to build a luminosity histogram from the image by the help of canvas. You can visualize a polygon histogram on the canvas and compare that polygon in your database (e.g. mySQL spatial ...)
A long time ago I worked on a system that had some similar characteristics, and this is an approximation of the algorithm we followed:
Divide the picture into zones. In our case we were dealing with 4:3 resolution video, so we used 12 zones. Doing this takes the resolution of the source images out of the picture.
For each zone, calculate an overall color - the average of all pixels in the zone
For the entire image, calculate an overall color - the average of all zones
So for each image, you're storing n + 1 integer values, where n is the number of zones you're tracking.
For comparisons, you also need to look at each color channel individually.
For the overall image, compare the color channels for the overall colors to see if they are within a certain threshold - say, 10%
If the images are within the threshold, next compare each zone. If all zones also are within the threshold, the images are a strong enough match that you can at least flag them for further comparison.
This lets you quickly discard images that are not matches; you can also use more zones and/or apply the algorithm recursively to get stronger match confidence.
Similar to Ic's answer - you might try comparing the images at multiple resolutions. So each image get saved as 1x1, 2x2, 4x4 .. 800x800. If the lowest resolution doesn't match (subject to a threshold), you can immediately reject it. If it does match, you can compare them at the next higher resolution, and so on..
Also - if the images share any similar structure, such as medical images, you might be able to extract that structure into a description that is easier/faster to compare.
As of 2015 (back to the future... on this 2009 question which is now high-ranked in Google) image similarity can be computed using Deep Learning techniques. The family of algorithms known as Auto Encoders can create a vector representation which is searchable for similarity. There is a demo here.
One way you can do this is to resize the image and drop the resolution significantly (to 200x200 maybe?), storing a smaller (pixel-averaged) version for doing the comparison. Then define a tolerance threshold and compare each pixel. If the RGB of all pixels are within the tolerance, you've got a match.
Your initial run through is O(n^2) but if you catalog all matches, each new image is just an O(n) algorithm to compare (you only have to compare it to each previously inserted image). It will eventually break down however as the list of images to compare becomes larger, but I think you're safe for a while.
After 400 days of running, you'll have 500,000 images, which means (discounting the time to resize the image down) 200(H)*200(W)*500,000(images)*3(RGB) = 60,000,000,000 comparisons. If every image is an exact match, you're going to be falling behind, but that's probably not going to be the case, right? Remember, you can discount an image as a match as soon as a single comparison falls outside your threshold.
Do you literally want to compare every image against the others? What is the application? Maybe you just need some kind of indexing and retrieval of images based on certain descriptors? Then for example you can look at MPEG-7 standard for Multimedia Content Description Interface. Then you could compare the different image descriptors, which will be not that accurate but much faster.
So you want to do "fingerprint matching" that's pretty different than "image matching". Fingerprints' analysis has been deeply studied during the past 20 years, and several interesting algorithms have been developed to ensure the right detection rate (with respect to FAR and FRR measures - False Acceptance Rate and False Rejection Rate).
I suggest you to better look to LFA (Local Feature Analysis) class of detection techniques, mostly built on minutiae inspection. Minutiae are specific characteristics of any fingerprint, and have been classified in several classes. Mapping a raster image to a minutiae map is what actually most of Public Authorities do to file criminals or terrorists.
See here for further references
For iPhone image comparison and image similarity development check out:
http://sites.google.com/site/imagecomparison/
To see it in action, check out eyeBuy Visual Search on the iTunes AppStore.
It seems that specialised image hashing algorithms are an area of active research but perhaps a normal hash calculation of the image bytes would do the trick.
Are you seeking byte-identical images rather than looking for images that are derived from the same source but may be a different format or resolution (which strikes me as a rather hard problem).