Determine distance to vehicle in image (Matlab, OpenCV) - image

I have 2 image:
1 Initial image when I detected car.
2 IPM image when I transformed image to another plan.
I don't have any information about camera parameters.
I know the location of the car in original image and in IPM image and I wish to know how can I determinate the distance between the car and the camera. Can we assume the height of the camera is 1 m.
Exist any formula or algorithm for that?

From your question, it appears that you are in the monocular case, where depth/scale information is naturally unavailable. I'm afraid there is no easy way to achieve what you want. Off the top of my head, here are a few options:
1- Using neural networks. This is the least expensive option (from a material and development effort point of view.). Plus, it will work if you do not have a video stream but only single images (although I'm guessing that is not your case). If performance is not an issue, you can take more or less any convolutional neural network, and train it on depth data. Otherwise, a quick search will lead you to faster state of the art networks which you can taylor to your needs. However, databases which contain depth map ground truth are a bit scarce, and you usually have to build the depth map for your training data yourself. In that case many of the open source methods listed here at the bottom of here come to mind. Once you have the depth maps you can train for monocular depth estimation.
2- You can use stereo cameras. Those naturally give you the depth, via for example simple triangulation.
3- You have a video stream and can use IMUs or car odometry. In that case, you can use many (multisensor) Simultaneous Localization And Mapping (SLAM) method . The litterature on the subject is rich, but IMU calibration is usually a bit nightmarish.
4- You can use a cheap GPS receiver (the ublox EVK* series come to mind). In that case, you still need to use some SLAM variant (e.g constrained bundle adjustment or any kalman-based method). Assuming low GPS bias (since you show only peri-urban images which are not affected by multipath) this will give you a decent approximation of the scale.
Note that methods three and four will give you an estimate of the reconstruction, so if you use sparse (i.e feature based) SLAM methods, you will end up having to densify the region where your car is detected (unless crude estimates are ok).

Related

algorithm - warping image to another image and calculate similarity measure

I have a query on calculation of best matching point of one image to another image through intensity based registration. I'd like to have some comments on my algorithm:
Compute the warp matrix at this iteration
For every point of the image A,
2a. We warp the particular image A pixel coordinates with the warp matrix to image B
2b. Perform interpolation to get the corresponding intensity form image B if warped point coordinate is in image B.
2c. Calculate the similarity measure value between warped pixel A intensity and warped image B intensity
Cycle through every pixel in image A
Cycle through every possible rotation and translation
Would this be okay? Is there any relevant opencv code we can reference?
Comments on algorithm
Your algorithm appears good although you will have to be careful about:
Edge effects: You need to make sure that the algorithm does not favour matches where most of image A does not overlap image B. e.g. you may wish to compute the average similarity measure and constrain the transformation to make sure that at least 50% of pixels overlap.
Computational complexity. There may be a lot of possible translations and rotations to consider and this algorithm may be too slow in practice.
Type of warp. Depending on your application you may also need to consider perspective/lighting changes as well as translation and rotation.
Acceleration
A similar algorithm is commonly used in video encoders, although most will ignore rotations/perspective changes and just search for translations.
One approach that is quite commonly used is to do a gradient search for the best match. In other words, try tweaking the translation/rotation in a few different ways (e.g. left/right/up/down by 16 pixels) and pick the best match as your new starting point. Then repeat this process several times.
Once you are unable to improve the match, reduce the size of your tweaks and try again.
Alternative algorithms
Depending on your application you may want to consider some alternative methods:
Stereo matching. If your 2 images come from stereo camera then you only really need to search in one direction (and OpenCV provides useful methods to do this)
Known patterns. If you are able to place a known pattern (e.g. a chessboard) in both your images then it becomes a lot easier to register them (and OpenCV provides methods to find and register certain types of pattern)
Feature point matching. A common approach to image registration is to search for distinctive points (e.g. types of corner or more general places of interest) and then try to find matching distinctive points in the two images. For example, OpenCV contains functions to detect SURF features. Google has published a great paper on using this kind of approach in order to remove rolling shutter noise that I recommend reading.

image registration(non-rigid \ nonlinear)

I'm looking for some algorithm (preferably if source code available)
for image registration.
Image deformation can't been described by homography matrix(because I think that distortion not symmetrical and not
homogeneous),more specifically deformations are like barrel/distortion and trapezoid distortion, maybe some rotation of image.
I want to obtain pairs of pixel of two images and so i can obtain representation of "deformation field".
I google a lot and find out that there are some algorithm base on some phisics ideas, but it seems that they can converge
to local maxima, but not global.
I can affort program to be semi-automatic, it means some simple user interation.
maybe some algorithms like SIFT will be appropriate?
but I think it can't provide "deformation field" with regular sufficient density.
if it important there is no scale changes.
example of complicated field
http://www.math.ucla.edu/~yanovsky/Research/ImageRegistration/2DMRI/2DMRI_lambda400_grid_only1.png
What you are looking for is "optical flow". Searching for these terms will yield you numerous results.
In OpenCV, there is a function called calcOpticalFlowFarneback() (in the video module) that does what you want.
The C API does still have an implementation of the classic paper by Horn & Schunck (1981) called "Determining optical flow".
You can also have a look at this work I've done, along with some code (but be careful, there are still some mysterious bugs in the opencl memory code. I will release a corrected version later this year.): http://lts2www.epfl.ch/people/dangelo/opticalflow
Besides OpenCV's optical flow (and mine ;-), you can have a look at ITK on itk.org for complete image registration chains (mostly aimed at medical imaging).
There's also a lot of optical flow code (matlab, C/C++...) that can be found thanks to google, for example cs.brown.edu/~dqsun/research/software.html, gpu4vision, etc
-- EDIT : about optical flow --
Optical flow is divided in two families of algorithms : the dense ones, and the others.
Dense algorithms give one motion vector per pixel, non-dense ones one vector per tracked feature.
Examples of the dense family include Horn-Schunck and Farneback (to stay with opencv), and more generally any algorithm that will minimize some cost function over the whole images (the various TV-L1 flows, etc).
An example for the non-dense family is the KLT, which is called Lucas-Kanade in opencv.
In the dense family, since the motion for each pixel is almost free, it can deal with scale changes. Keep in mind however that these algorithms can fail in the case of large motions / scales changes because they usually rely on linearizations (Taylor expansions of the motion and image changes). Furthermore, in the variational approach, each pixel contributes to the overall result. Hence, parts that are invisible in one image are likely to deviate the algorithm from the actual solution.
Anyway, techniques such as coarse-to-fine implementations are employed to bypass these limits, and these problems have usually only a small impact. Brutal illumination changes, or large occluded / unoccluded areas can also be explicitly dealt with by some algorithms, see for example this paper that computes a sparse image of "innovation" alongside the optical flow field.
i found some software medical specific, but it's complicate and it's not work with simple image formats, but seems that it do that I need.
http://www.csd.uoc.gr/~komod/FastPD/index.html
Drop - Deformable Registration using Discrete Optimization

What is an algorithm I can use to program an image compare routine to detect changes (like a person coming into the frame of a web cam)?

I have a web cam that takes a picture every N seconds. This gives me a collection of images of the same scene over time. I want to process that collection of images as they are created to identify events like someone entering into the frame, or something else large happening. I will be comparing images that are adjacent in time and fixed in space - the same scene at different moments of time.
I want a reasonably sophisticated approach. For example, naive approaches fail for outdoor applications. If you count the number of pixels that change, for example, or the percentage of the picture that has a different color or grayscale value, that will give false positive reports every time the sun goes behind a cloud or the wind shakes a tree.
I want to be able to positively detect a truck parking in the scene, for example, while ignoring lighting changes from sun/cloud transitions, etc.
I've done a number of searches, and found a few survey papers (Radke et al, for example) but nothing that actually gives algorithms that I can put into a program I can write.
Use color spectroanalisys, without luminance: when the Sun goes down for a while, you will get similar result, colors does not change (too much).
Don't go for big changes, but quick changes. If the luminance of the image changes -10% during 10 min, it means the usual evening effect. But when the change is -5%, 0, +5% within seconds, its a quick change.
Don't forget to adjust the reference values.
Split the image to smaller regions. Then, when all the regions change same way, you know, it's a global change, like an eclypse or what, but if only one region's parameters are changing, then something happens there.
Use masks to create smart regions. If you're watching a street, filter out the sky, the trees (blown by wind), etc. You may set up different trigger values for different regions. The regions should overlap.
A special case of the region is the line. A line (a narrow region) contains less and more homogeneous pixels than a flat area. Mark, say, a green fence, it's easy to detect wheter someone crosses it, it makes bigger change in the line than in a flat area.
If you can, change the IRL world. Repaint the fence to a strange color to create a color spectrum, which can be identified easier. Paint tags to the floor and wall, which can be OCRed by the program, so you can detect wheter something hides it.
I believe you are looking for Template Matching
Also i would suggest you to look on to Open CV
We had to contend with many of these issues in our interactive installations. It's tough to not get false positives without being able to control some of your environment (sounds like you will have some degree of control). In the end we looked at combining some techniques and we created an open piece of software named OpenTSPS (Open Toolkit for Sensing People in Spaces - http://www.opentsps.com). You can look at the C++ source in github (https://github.com/labatrockwell/openTSPS/).
We use ‘progressive background relearn’ to adjust to the changing background over time. Progressive relearning is particularly useful in variable lighting conditions – e.g. if lighting in a space changes from day to night. This in combination with blob detection works pretty well and the only way we have found to improve is to use 3D cameras like the kinect which cast out IR and measure it.
There are other algorithms that might be relevant, like SURF (http://achuwilson.wordpress.com/2011/08/05/object-detection-using-surf-in-opencv-part-1/ and http://en.wikipedia.org/wiki/SURF) but I don't think it will help in your situation unless you know exactly the type of thing you are looking for in the image.
Sounds like a fun project. Best of luck.
The problem you are trying to solve is very interesting indeed!
I think that you would need to attack it in parts:
As you already pointed out, a sudden change in illumination can be problematic. This is an indicator that you probably need to achieve some sort of illumination-invariant representation of the images you are trying to analyze.
There are plenty of techniques lying around, one I have found very useful for illumination invariance (applied to face recognition) is DoG filtering (Difference of Gaussians)
The idea is that you first convert the image to gray-scale. Then you generate two blurred versions of this image by applying a gaussian filter, one a little bit more blurry than the first one. (you could use a 1.0 sigma and a 2.0 sigma in a gaussian filter respectively) Then you subtract from the less-blury image, the pixel intensities of the more-blurry image. This operation enhances edges and produces a similar image regardless of strong illumination intensity variations. These steps can be very easily performed using OpenCV (as others have stated). This technique has been applied and documented here.
This paper adds an extra step involving contrast equalization, In my experience this is only needed if you want to obtain "visible" images from the DoG operation (pixel values tend to be very low after the DoG filter and are veiwed as black rectangles onscreen), and performing a histogram equalization is an acceptable substitution if you want to be able to see the effect of the DoG filter.
Once you have illumination-invariant images you could focus on the detection part. If your problem can afford having a static camera that can be trained for a certain amount of time, then you could use a strategy similar to alarm motion detectors. Most of them work with an average thermal image - basically they record the average temperature of the "pixels" of a room view, and trigger an alarm when the heat signature varies greatly from one "frame" to the next. Here you wouldn't be working with temperatures, but with average, light-normalized pixel values. This would allow you to build up with time which areas of the image tend to have movement (e.g. the leaves of a tree in a windy environment), and which areas are fairly stable in the image. Then you could trigger an alarm when a large number of pixles already flagged as stable have a strong variation from one frame to the next one.
If you can't afford training your camera view, then I would suggest you take a look at the TLD tracker of Zdenek Kalal. His research is focused on object tracking with a single frame as training. You could probably use the semistatic view of the camera (with no foreign objects present) as a starting point for the tracker and flag a detection when the TLD tracker (a grid of points where local motion flow is estimated using the Lucas-Kanade algorithm) fails to track a large amount of gridpoints from one frame to the next. This scenario would probably allow even a panning camera to work as the algorithm is very resilient to motion disturbances.
Hope this pointers are of some help. Good Luck and enjoy the journey! =D
Use one of the standard measures like Mean Squared Error, for eg. to find out the difference between two consecutive images. If the MSE is beyond a certain threshold, you know that there is some motion.
Also read about Motion Estimation.
if you know that the image will remain reletivly static I would reccomend:
1) look into neural networks. you can use them to learn what defines someone within the image or what is a non-something in the image.
2) look into motion detection algorithms, they are used all over the place.
3) is you camera capable of thermal imaging? if so it may be worthwile to look for hotspots in the images. There may be existing algorithms to turn your webcam into a thermal imager.

Looking for ways for a robot to locate itself in the house

I am hacking a vacuum cleaner robot to control it with a microcontroller (Arduino). I want to make it more efficient when cleaning a room. For now, it just go straight and turn when it hits something.
But I have trouble finding the best algorithm or method to use to know its position in the room. I am looking for an idea that stays cheap (less than $100) and not to complex (one that don't require a PhD thesis in computer vision). I can add some discrete markers in the room if necessary.
Right now, my robot has:
One webcam
Three proximity sensors (around 1 meter range)
Compass (no used for now)
Wi-Fi
Its speed can vary if the battery is full or nearly empty
A netbook Eee PC is embedded on the robot
Do you have any idea for doing this? Does any standard method exist for these kind of problems?
Note: if this question belongs on another website, please move it, I couldn't find a better place than Stack Overflow.
The problem of figuring out a robot's position in its environment is called localization. Computer science researchers have been trying to solve this problem for many years, with limited success. One problem is that you need reasonably good sensory input to figure out where you are, and sensory input from webcams (i.e. computer vision) is far from a solved problem.
If that didn't scare you off: one of the approaches to localization that I find easiest to understand is particle filtering. The idea goes something like this:
You keep track of a bunch of particles, each of which represents one possible location in the environment.
Each particle also has an associated probability that tells you how confident you are that the particle really represents your true location in the environment.
When you start off, all of these particles might be distributed uniformly throughout your environment and be given equal probabilities. Here the robot is gray and the particles are green.
When your robot moves, you move each particle. You might also degrade each particle's probability to represent the uncertainty in how the motors actually move the robot.
When your robot observes something (e.g. a landmark seen with the webcam, a wifi signal, etc.) you can increase the probability of particles that agree with that observation.
You might also want to periodically replace the lowest-probability particles with new particles based on observations.
To decide where the robot actually is, you can either use the particle with the highest probability, the highest-probability cluster, the weighted average of all particles, etc.
If you search around a bit, you'll find plenty of examples: e.g. a video of a robot using particle filtering to determine its location in a small room.
Particle filtering is nice because it's pretty easy to understand. That makes implementing and tweaking it a little less difficult. There are other similar techniques (like Kalman filters) that are arguably more theoretically sound but can be harder to get your head around.
A QR Code poster in each room would not only make an interesting Modern art piece, but would be relatively easy to spot with the camera!
If you can place some markers in the room, using the camera could be an option. If 2 known markers have an angular displacement (left to right) then the camera and the markers lie on a circle whose radius is related to the measured angle between the markers. I don't recall the formula right off, but the arc segment (on that circle) between the markers will be twice the angle you see. If you have the markers at known height and the camera is at a fixed angle of inclination, you can compute the distance to the markers. Either of these methods alone can nail down your position given enough markers. Using both will help do it with fewer markers.
Unfortunately, those methods are imperfect due to measurement errors. You get around this by using a Kalman estimator to incorporate multiple noisy measurements to arrive at a good position estimate - you can then feed in some dead reckoning information (which is also imperfect) to refine it further. This part is goes pretty deep into math, but I'd say it's a requirement to do a great job at what you're attempting. You can do OK without it, but if you want an optimal solution (in terms of best position estimate for given input) there is no better way. If you actually want a career in autonomous robotics, this will play large in your future. (
Once you can determine your position you can cover the room in any pattern you'd like. Keep using the bump sensor to help construct a map of obstacles and then you'll need to devise a way to scan incorporating the obstacles.
Not sure if you've got the math background yet, but here is the book:
http://books.google.com/books/about/Applied_optimal_estimation.html?id=KlFrn8lpPP0C
This doesn't replace the accepted answer (which is great, thanks!) but I might recommend getting a Kinect and use that instead of your webcam, either through Microsoft's recently released official drivers or using the hacked drivers if your EeePC doesn't have Windows 7 (presumably it does not).
That way the positioning will be improved by the 3D vision. Observing landmarks will now tell you how far away the landmark is, and not just where in the visual field that landmark is located.
Regardless, the accepted answer doesn't really address how to pick out landmarks in the visual field, and simply assumes that you can. While the Kinect drivers may already have feature detection included (I'm not sure) you can also use OpenCV for detecting features in the image.
One solution would be to use a strategy similar to "flood fill" (wikipedia). To get the controller to accurately perform sweeps, it needs a sense of distance. You can calibrate your bot using the proximity sensors: e.g. run motor for 1 sec = xx change in proximity. With that info, you can move your bot for an exact distance, and continue sweeping the room using flood fill.
Assuming you are not looking for a generalised solution, you may actually know the room's shape, size, potential obstacle locations, etc. When the bot exists the factory there is no info about its future operating environment, which kind of forces it to be inefficient from the outset.
If that's you case, you can hardcode that info, and then use basic measurements (ie. rotary encoders on wheels + compass) to precisely figure out its location in the room/house. No need for wifi triangulation or crazy sensor setups in my opinion. At least for a start.
Ever considered GPS? Every position on earth has a unique GPS coordinates - with resolution of 1 to 3 metres, and doing differential GPS you can go down to sub-10 cm range - more info here:
http://en.wikipedia.org/wiki/Global_Positioning_System
And Arduino does have lots of options of GPS-modules:
http://www.arduino.cc/playground/Tutorials/GPS
After you have collected all the key coordinates points of the house, you can then write the routine for the arduino to move the robot from point to point (as collected above) - assuming it will do all those obstacles avoidance stuff.
More information can be found here:
http://www.google.com/search?q=GPS+localization+robots&num=100
And inside the list I found this - specifically for your case: Arduino + GPS + localization:
http://www.youtube.com/watch?v=u7evnfTAVyM
I was thinking about this problem too. But I don't understand why you can't just triangulate? Have two or three beacons (e.g. IR LEDs of different frequencies) and a IR rotating sensor 'eye' on a servo. You could then get an almost constant fix on your position. I expect the accuracy would be in low cm range and it would be cheap. You can then map anything you bump into easily.
Maybe you could also use any interruption in the beacon beams to plot objects that are quite far from the robot too.
You have a camera you said ? Did you consider looking at the ceiling ? There is little chance that two rooms have identical dimensions, so you can identify in which room you are, position in the room can be computed from angular distance to the borders of the ceiling and direction can probably be extracted by the position of doors.
This will require some image processing but the vacuum cleaner moving slowly to be efficiently cleaning will have enough time to compute.
Good luck !
Use Ultra Sonic Sensor HC-SR04 or similar.
As above told sense the walls distance from robot with sensors and room part with QR code.
When your are near to a wall turn 90 degree and move as width of your robot and again turn 90deg( i.e. 90 deg left turn) and again move your robot I think it will help :)

Algorithm to compare two images

Given two different image files (in whatever format I choose), I need to write a program to predict the chance if one being the illegal copy of another. The author of the copy may do stuff like rotating, making negative, or adding trivial details (as well as changing the dimension of the image).
Do you know any algorithm to do this kind of job?
These are simply ideas I've had thinking about the problem, never tried it but I like thinking about problems like this!
Before you begin
Consider normalising the pictures, if one is a higher resolution than the other, consider the option that one of them is a compressed version of the other, therefore scaling the resolution down might provide more accurate results.
Consider scanning various prospective areas of the image that could represent zoomed portions of the image and various positions and rotations. It starts getting tricky if one of the images are a skewed version of another, these are the sort of limitations you should identify and compromise on.
Matlab is an excellent tool for testing and evaluating images.
Testing the algorithms
You should test (at the minimum) a large human analysed set of test data where matches are known beforehand. If for example in your test data you have 1,000 images where 5% of them match, you now have a reasonably reliable benchmark. An algorithm that finds 10% positives is not as good as one that finds 4% of positives in our test data. However, one algorithm may find all the matches, but also have a large 20% false positive rate, so there are several ways to rate your algorithms.
The test data should attempt to be designed to cover as many types of dynamics as possible that you would expect to find in the real world.
It is important to note that each algorithm to be useful must perform better than random guessing, otherwise it is useless to us!
You can then apply your software into the real world in a controlled way and start to analyse the results it produces. This is the sort of software project which can go on for infinitum, there are always tweaks and improvements you can make, it is important to bear that in mind when designing it as it is easy to fall into the trap of the never ending project.
Colour Buckets
With two pictures, scan each pixel and count the colours. For example you might have the 'buckets':
white
red
blue
green
black
(Obviously you would have a higher resolution of counters). Every time you find a 'red' pixel, you increment the red counter. Each bucket can be representative of spectrum of colours, the higher resolution the more accurate but you should experiment with an acceptable difference rate.
Once you have your totals, compare it to the totals for a second image. You might find that each image has a fairly unique footprint, enough to identify matches.
Edge detection
How about using Edge Detection.
(source: wikimedia.org)
With two similar pictures edge detection should provide you with a usable and fairly reliable unique footprint.
Take both pictures, and apply edge detection. Maybe measure the average thickness of the edges and then calculate the probability the image could be scaled, and rescale if necessary. Below is an example of an applied Gabor Filter (a type of edge detection) in various rotations.
Compare the pictures pixel for pixel, count the matches and the non matches. If they are within a certain threshold of error, you have a match. Otherwise, you could try reducing the resolution up to a certain point and see if the probability of a match improves.
Regions of Interest
Some images may have distinctive segments/regions of interest. These regions probably contrast highly with the rest of the image, and are a good item to search for in your other images to find matches. Take this image for example:
(source: meetthegimp.org)
The construction worker in blue is a region of interest and can be used as a search object. There are probably several ways you could extract properties/data from this region of interest and use them to search your data set.
If you have more than 2 regions of interest, you can measure the distances between them. Take this simplified example:
(source: per2000.eu)
We have 3 clear regions of interest. The distance between region 1 and 2 may be 200 pixels, between 1 and 3 400 pixels, and 2 and 3 200 pixels.
Search other images for similar regions of interest, normalise the distance values and see if you have potential matches. This technique could work well for rotated and scaled images. The more regions of interest you have, the probability of a match increases as each distance measurement matches.
It is important to think about the context of your data set. If for example your data set is modern art, then regions of interest would work quite well, as regions of interest were probably designed to be a fundamental part of the final image. If however you are dealing with images of construction sites, regions of interest may be interpreted by the illegal copier as ugly and may be cropped/edited out liberally. Keep in mind common features of your dataset, and attempt to exploit that knowledge.
Morphing
Morphing two images is the process of turning one image into the other through a set of steps:
Note, this is different to fading one image into another!
There are many software packages that can morph images. It's traditionaly used as a transitional effect, two images don't morph into something halfway usually, one extreme morphs into the other extreme as the final result.
Why could this be useful? Dependant on the morphing algorithm you use, there may be a relationship between similarity of images, and some parameters of the morphing algorithm.
In a grossly over simplified example, one algorithm might execute faster when there are less changes to be made. We then know there is a higher probability that these two images share properties with each other.
This technique could work well for rotated, distorted, skewed, zoomed, all types of copied images. Again this is just an idea I have had, it's not based on any researched academia as far as I am aware (I haven't look hard though), so it may be a lot of work for you with limited/no results.
Zipping
Ow's answer in this question is excellent, I remember reading about these sort of techniques studying AI. It is quite effective at comparing corpus lexicons.
One interesting optimisation when comparing corpuses is that you can remove words considered to be too common, for example 'The', 'A', 'And' etc. These words dilute our result, we want to work out how different the two corpus are so these can be removed before processing. Perhaps there are similar common signals in images that could be stripped before compression? It might be worth looking into.
Compression ratio is a very quick and reasonably effective way of determining how similar two sets of data are. Reading up about how compression works will give you a good idea why this could be so effective. For a fast to release algorithm this would probably be a good starting point.
Transparency
Again I am unsure how transparency data is stored for certain image types, gif png etc, but this will be extractable and would serve as an effective simplified cut out to compare with your data sets transparency.
Inverting Signals
An image is just a signal. If you play a noise from a speaker, and you play the opposite noise in another speaker in perfect sync at the exact same volume, they cancel each other out.
(source: themotorreport.com.au)
Invert on of the images, and add it onto your other image. Scale it/loop positions repetitively until you find a resulting image where enough of the pixels are white (or black? I'll refer to it as a neutral canvas) to provide you with a positive match, or partial match.
However, consider two images that are equal, except one of them has a brighten effect applied to it:
(source: mcburrz.com)
Inverting one of them, then adding it to the other will not result in a neutral canvas which is what we are aiming for. However, when comparing the pixels from both original images, we can definatly see a clear relationship between the two.
I haven't studied colour for some years now, and am unsure if the colour spectrum is on a linear scale, but if you determined the average factor of colour difference between both pictures, you can use this value to normalise the data before processing with this technique.
Tree Data structures
At first these don't seem to fit for the problem, but I think they could work.
You could think about extracting certain properties of an image (for example colour bins) and generate a huffman tree or similar data structure. You might be able to compare two trees for similarity. This wouldn't work well for photographic data for example with a large spectrum of colour, but cartoons or other reduced colour set images this might work.
This probably wouldn't work, but it's an idea. The trie datastructure is great at storing lexicons, for example a dictionarty. It's a prefix tree. Perhaps it's possible to build an image equivalent of a lexicon, (again I can only think of colours) to construct a trie. If you reduced say a 300x300 image into 5x5 squares, then decompose each 5x5 square into a sequence of colours you could construct a trie from the resulting data. If a 2x2 square contains:
FFFFFF|000000|FDFD44|FFFFFF
We have a fairly unique trie code that extends 24 levels, increasing/decreasing the levels (IE reducing/increasing the size of our sub square) may yield more accurate results.
Comparing trie trees should be reasonably easy, and could possible provide effective results.
More ideas
I stumbled accross an interesting paper breif about classification of satellite imagery, it outlines:
Texture measures considered are: cooccurrence matrices, gray-level differences, texture-tone analysis, features derived from the Fourier spectrum, and Gabor filters. Some Fourier features and some Gabor filters were found to be good choices, in particular when a single frequency band was used for classification.
It may be worth investigating those measurements in more detail, although some of them may not be relevant to your data set.
Other things to consider
There are probably a lot of papers on this sort of thing, so reading some of them should help although they can be very technical. It is an extremely difficult area in computing, with many fruitless hours of work spent by many people attempting to do similar things. Keeping it simple and building upon those ideas would be the best way to go. It should be a reasonably difficult challenge to create an algorithm with a better than random match rate, and to start improving on that really does start to get quite hard to achieve.
Each method would probably need to be tested and tweaked thoroughly, if you have any information about the type of picture you will be checking as well, this would be useful. For example advertisements, many of them would have text in them, so doing text recognition would be an easy and probably very reliable way of finding matches especially when combined with other solutions. As mentioned earlier, attempt to exploit common properties of your data set.
Combining alternative measurements and techniques each that can have a weighted vote (dependant on their effectiveness) would be one way you could create a system that generates more accurate results.
If employing multiple algorithms, as mentioned at the begining of this answer, one may find all the positives but have a false positive rate of 20%, it would be of interest to study the properties/strengths/weaknesses of other algorithms as another algorithm may be effective in eliminating false positives returned from another.
Be careful to not fall into attempting to complete the never ending project, good luck!
Read the paper: Porikli, Fatih, Oncel Tuzel, and Peter Meer. “Covariance Tracking Using Model Update Based
on Means on Riemannian Manifolds”. (2006) IEEE Computer Vision and Pattern Recognition.
I was successfully able to detect overlapping regions in images captured from adjacent webcams using the technique presented in this paper. My covariance matrix was composed of Sobel, canny and SUSAN aspect/edge detection outputs, as well as the original greyscale pixels.
An idea:
use keypoint detectors to find scale- and transform- invariant descriptors of some points in the image (e.g. SIFT, SURF, GLOH, or LESH).
try to align keypoints with similar descriptors from both images (like in panorama stitching), allow for some image transforms if necessary (e.g. scale & rotate, or elastic stretching).
if many keypoints align well (exists such a transform, that keypoint alignment error is low; or transformation "energy" is low, etc.), you likely have similar images.
Step 2 is not trivial. In particular, you may need to use a smart algorithm to find the most similar keypoint on the other image. Point descriptors are usually very high-dimensional (like a hundred parameters), and there are many points to look through. kd-trees may be useful here, hash lookups don't work well.
Variants:
Detect edges or other features instead of points.
It is indeed much less simple than it seems :-) Nick's suggestion is a good one.
To get started, keep in mind that any worthwhile comparison method will essentially work by converting the images into a different form -- a form which makes it easier to pick similar features out. Usually, this stuff doesn't make for very light reading ...
One of the simplest examples I can think of is simply using the color space of each image. If two images have highly similar color distributions, then you can be reasonably sure that they show the same thing. At least, you can have enough certainty to flag it, or do more testing. Comparing images in color space will also resist things such as rotation, scaling, and some cropping. It won't, of course, resist heavy modification of the image or heavy recoloring (and even a simple hue shift will be somewhat tricky).
http://en.wikipedia.org/wiki/RGB_color_space
http://upvector.com/index.php?section=tutorials&subsection=tutorials/colorspace
Another example involves something called the Hough Transform. This transform essentially decomposes an image into a set of lines. You can then take some of the 'strongest' lines in each image and see if they line up. You can do some extra work to try and compensate for rotation and scaling too -- and in this case, since comparing a few lines is MUCH less computational work than doing the same to entire images -- it won't be so bad.
http://homepages.inf.ed.ac.uk/amos/hough.html
http://rkb.home.cern.ch/rkb/AN16pp/node122.html
http://en.wikipedia.org/wiki/Hough_transform
In the form described by you, the problem is tough. Do you consider copy, paste of part of the image into another larger image as a copy ? etc.
What we loosely refer to as duplicates can be difficult for algorithms to discern.
Your duplicates can be either:
Exact Duplicates
Near-exact Duplicates. (minor edits of image etc)
perceptual Duplicates (same content, but different view, camera etc)
No1 & 2 are easier to solve. No 3. is very subjective and still a research topic.
I can offer a solution for No1 & 2.
Both solutions use the excellent image hash- hashing library: https://github.com/JohannesBuchner/imagehash
Exact duplicates
Exact duplicates can be found using a perceptual hashing measure.
The phash library is quite good at this. I routinely use it to clean
training data.
Usage (from github site) is as simple as:
from PIL import Image
import imagehash
# image_fns : List of training image files
img_hashes = {}
for img_fn in sorted(image_fns):
hash = imagehash.average_hash(Image.open(image_fn))
if hash in img_hashes:
print( '{} duplicate of {}'.format(image_fn, img_hashes[hash]) )
else:
img_hashes[hash] = image_fn
Near-Exact Duplicates
In this case you will have to set a threshold and compare the hash values for their distance from each
other. This has to be done by trial-and-error for your image content.
from PIL import Image
import imagehash
# image_fns : List of training image files
img_hashes = {}
epsilon = 50
for img_fn1, img_fn2 in zip(image_fns, image_fns[::-1]):
if image_fn1 == image_fn2:
continue
hash1 = imagehash.average_hash(Image.open(image_fn1))
hash2 = imagehash.average_hash(Image.open(image_fn2))
if hash1 - hash2 < epsilon:
print( '{} is near duplicate of {}'.format(image_fn1, image_fn2) )
If you take a step-back, this is easier to solve if you watermark the master images.
You will need to use a watermarking scheme to embed a code into the image. To take a step back, as opposed to some of the low-level approaches (edge detection etc) suggested by some folks, a watermarking method is superior because:
It is resistant to Signal processing attacks
► Signal enhancement – sharpening, contrast, etc.
► Filtering – median, low pass, high pass, etc.
► Additive noise – Gaussian, uniform, etc.
► Lossy compression – JPEG, MPEG, etc.
It is resistant to Geometric attacks
► Affine transforms
► Data reduction – cropping, clipping, etc.
► Random local distortions
► Warping
Do some research on watermarking algorithms and you will be on the right path to solving your problem. (
Note: You can benchmark you method using the STIRMARK dataset. It is an accepted standard for this type of application.
This is just a suggestion, it might not work and I'm prepared to be called on this.
This will generate false positives, but hopefully not false negatives.
Resize both of the images so that they are the same size (I assume that the ratios of widths to lengths are the same in both images).
Compress a bitmap of both images with a lossless compression algorithm (e.g. gzip).
Find pairs of files that have similar file sizes. For instance, you could just sort every pair of files you have by how similar the file sizes are and retrieve the top X.
As I said, this will definitely generate false positives, but hopefully not false negatives. You can implement this in five minutes, whereas the Porikil et. al. would probably require extensive work.
I believe if you're willing to apply the approach to every possible orientation and to negative versions, a good start to image recognition (with good reliability) is to use eigenfaces: http://en.wikipedia.org/wiki/Eigenface
Another idea would be to transform both images into vectors of their components. A good way to do this is to create a vector that operates in x*y dimensions (x being the width of your image and y being the height), with the value for each dimension applying to the (x,y) pixel value. Then run a variant of K-Nearest Neighbours with two categories: match and no match. If it's sufficiently close to the original image it will fit in the match category, if not then it won't.
K Nearest Neighbours(KNN) can be found here, there are other good explanations of it on the web too: http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
The benefits of KNN is that the more variants you're comparing to the original image, the more accurate the algorithm becomes. The downside is you need a catalogue of images to train the system first.
If you're willing to consider a different approach altogether to detecting illegal copies of your images, you could consider watermarking. (from 1.4)
...inserts copyright information into the digital object without the loss of quality. Whenever the copyright of a digital object is in question, this information is extracted to identify the rightful owner. It is also possible to encode the identity of the original buyer along with the identity of the copyright holder, which allows tracing of any unauthorized copies.
While it's also a complex field, there are techniques that allow the watermark information to persist through gross image alteration: (from 1.9)
... any signal transform of reasonable strength cannot remove the watermark. Hence a pirate willing to remove the watermark will not succeed unless they debase the document too much to be of commercial interest.
of course, the faq calls implementing this approach: "...very challenging" but if you succeed with it, you get a high confidence of whether the image is a copy or not, rather than a percentage likelihood.
If you're running Linux I would suggest two tools:
align_image_stack from package hugin-tools - is a commandline program that can automatically correct rotation, scaling, and other distortions (it's mostly intended for compositing HDR photography, but works for video frames and other documents too). More information: http://hugin.sourceforge.net/docs/manual/Align_image_stack.html
compare from package imagemagick - a program that can find and count the amount of different pixels in two images. Here's a neat tutorial: http://www.imagemagick.org/Usage/compare/ uising the -fuzz N% you can increase the error tolerance. The higher the N the higher the error tolerance to still count two pixels as the same.
align_image_stack should correct any offset so the compare command will actually have a chance of detecting same pixels.

Resources