Image Comparison using OpenCV in order to determine traffic density - image

I am working on a project which gives plots real time traffic status on Google Maps, & make it available to user on an Android phone and web browser.
http://www.youtube.com/watch?v=tcAyMngkzjk
I need to compare 2 images in openCV in order to determine traffic density. Can you please guide me how to compare the images? Should I go for histogram comparison or simple image subtraction?

One common solution is using background subtraction to track moving objects (cars) and then export an image with the moving objects remarked, so you can easily extract the objects from the image. If this is not the case, you will have to detect the vehicles and that's more challenging task because as carlosdc says there are many approaches depending on the angle of the camera, the size of vehicles, light conditions, cluttered backgrounds, etc.
If you specify a little more the problem ...

It really depends, and it would be impossible to determine without looking at your images.
Also, let me point out that it may be quite difficult to make this work adequately in all conditions: day/night, ray/shine, etc. Perhaps you should start by looking at what others have done and how good/bad it works. One such example would be this

try to read this two tutorials about OpenCV versus detect/recognition and find contur.
http://python-catalin.blogspot.ro/search/label/OpenCV
or try to find the color change in your image ... ( for example find colors versus background street )

Related

Remove background and get deer as a fore ground?

I want to remove background and get deer as a foreground image.
This is my source image captured by trail camera:
This is what I want to get. This output image can be a binary image or RGB.
I worked on it and try many methods to get solution but every time it failed at specific point. So please first understand what is my exact problem.
Image are captured by a trail camera and camera is motion detector. when deer come in front of camera it capture image.
Scene mode change with respect to weather changing or day and night etc. So I can't use frame difference or some thing like this.
Segmentation may be not work correctly because Foreground (deer) and Background have same color in many cases.
If anyone still have any ambiguity in my question then please first ask me to clear and then answer, it will be appreciated.
Thanks in advance.
Here's what I would do:
As was commented to your question, you can detect the dear and then perform grabcut to segment it from the picture.
To detect the dear, I would couple a classifier with a sliding window approach. That would mean that you'll have a classifier that given a patch (can be a large patch) in the image, output's a score of how much that patch is similar to a dear. The sliding window approach means that you loop on the window size and then loop on the window location. For each position of the window in the image, you should apply the classifier on that window and get a score of how much that window "looks like" a dear. Once you've done that, threshold all the scores to get the "best windows", i.e. the windows that are most similar to a dear. The rational behind this is that if we a dear is present at some location in the image, the classifier will output a high score at all windows that are close/overlap with the actual dear location. We would like to merge all that locations to a single location. That can be done by applying the functions groupRectangles from OpenCV:
http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html#grouprectangles
Take a look at some face detection example from OpenCV, it basically does the same (sliding window + classifier) where the classifier is a Haar cascade.
Now, I didn't mention what that "dear classifier" can be. You can use HOG+SVM (which are both included in OpenCV) or use a much powerful approach of running a deep convulutional neural network (deep CNN). Luckily, you don't need to train a deep CNN. You can use the following packages with their "off the shelf" ImageNet networks (which are very powerful and might even be able to identify a dear without further training):
Decaf- which can be used only for research purposes:
https://github.com/UCB-ICSI-Vision-Group/decaf-release/
Or Caffe - which is BSD licensed:
http://caffe.berkeleyvision.org/
There are other packages of which you can read about here:
http://deeplearning.net/software_links/
The most common ones are Theano, Cuda ConvNet's and OverFeat (but that's really opinion based, you should chose the best package from the list that I linked to).
The "off the shelf" ImageNet network were trained on roughly 10M images from 1000 categories. If those categories contain "dear", that you can just use them as is. If not, you can use them to extract features (as a 4096 dimensional vector in the case of Decaf) and train a classifier on positive and negative images to build a "dear classifier".
Now, once you detected the dear, meaning you have a bounding box around it, you can apply grabcut:
http://docs.opencv.org/trunk/doc/py_tutorials/py_imgproc/py_grabcut/py_grabcut.html
You'll need an initial scribble on the dear to perform grabcu. You can just take a horizontal line in the middle of the bounding box and hope that it will be on the dear's torso. More elaborate approaches would be to find the symmetry axis of the dear and use that as a scribble, but you would have to google, research an implement some method to extract symmetry axis from the image.
That's about it. Not straightforward, but so is the problem.
Please let me know if you have any questions.
Try OpenCV Background Substraction with Mixture of Gaussians models. They should be adaptable enough for your scenes. Of course, the final performance will depend on the scenario, but it is worth trying.
Since you just want to separate the background from the foreground I think you do not need to recognize the deer. You need to recognize an object in motion in the scene. You just need to separate what is static in a significant interval of time (background) from what is not static: the deer.
There are algorithms that combine multiple frames from the same scene in order to determine the background, like THIS ONE.
You mentioned that the scene mode changes with respect to weather changing or day and night considering photos of different deers.
You could implement a solution when motion is detected, instead of taking a single photo, it could take a few ones with some interval of time.
This interval has to be long as to get the deer in different positions or out of the scene and at the same time short enough to not be much affected by scene variations. Perhaps you need to deal with some brightness variation, but I think it is feasible to determine the background using these frames and finally segment the deer in the "motion frame".

Correlation between two image(binary image)

I have two binary image like this. I have a data set with lots of picture like at the bottom but with differents signs.
and
I would like to compare them in order to know if it's the same figure or not (especially inside the triangle). I took a look in Sift and Surf feature but it's doesn't work well on this type of picture (it find matchning point whereas the two picture are different,especially inside).
I also hear about SVM but i don't know if i have to implement it for this type of problem.
Do you have an idea ?
Thank you
I think you should not use SURF features on the binary image as you have already discarded a lot of information at that stage with your edge detector.
You could also use the Linear or Circle Hough Transform that in this case could tell you a lot about image differences.
If you wat to find 2 exactly identical images, simply use hash functions like md5.
But if you want to find related ( not exatcly identical) images, you are running in trouble ;). look for artificial neural network libs...

Object detection + segmentation

I 'm trying to find an efficient way of acceptable complexity to
detect an object in an image so I can isolate it from its surroundings
segment that object to its sub-parts and label them so I can then fetch them at will
It's been 3 weeks since I entered the image processing world and I've read about so many algorithms (sift, snakes, more snakes, fourier-related, etc.), and heuristics that I don't know where to start and which one is "best" for what I'm trying to achieve. Having in mind that the image dataset in interest is a pretty large one, I don't even know if I should use some algorithm implemented in OpenCV or if I should implement one my own.
Summarize:
Which methodology should I focus on? Why?
Should I use OpenCV for that kind of stuff or is there some other 'better' alternative?
Thank you in advance.
EDIT -- More info regarding the datasets
Each dataset consists of 80K images of products sharing the same
concept e.g. t-shirts, watches, shoes
size
orientation (90% of them)
background (95% of them)
All pictures in each datasets look almost identical apart from the product itself, apparently. To make things a little more clear, let's consider only the 'watch dataset':
All the pictures in the set look almost exactly like this:
(again, apart form the watch itself). I want to extract the strap and the dial. The thing is that there are lots of different watch styles and therefore shapes. From what I've read so far, I think I need a template algorithm that allows bending and stretching so as to be able to match straps and dials of different styles.
Instead of creating three distinct templates (upper part of strap, lower part of strap, dial), it would be reasonable to create only one and segment it into 3 parts. That way, I would be confident enough that each part was detected with respect to each other as intended to e.g. the dial would not be detected below the lower part of the strap.
From all the algorithms/methodologies I've encountered, active shape|appearance model seem to be the most promising ones. Unfortunately, I haven't managed to find a descent implementation and I'm not confident enough that that's the best approach so as to go ahead and write one myself.
If anyone could point out what I should be really looking for (algorithm/heuristic/library/etc.), I would be more than grateful. If again you think my description was a bit vague, feel free to ask for a more detailed one.
From what you've said, here are a few things that pop up at first glance:
Simplest thing to do it binarize the image and do Connected Components using OpenCV or CvBlob library. For simple images with non-complex background this usually yeilds objects
HOwever, looking at your sample image, texture-based segmentation techniques may work better - the watch dial, the straps and the background are wisely variant in texture/roughness, and this could be an ideal way to separate them.
The roughness of a portion can be easily found by the Eigen transform (explained a bit on SO, check the link to the research paper provided there), then the Mean Shift filter can be applied on the output of the Eigen transform. This will give regions clearly separated according to texture. Both the pyramidal Mean Shift and finding eigenvalues by SVD are implemented in OpenCV, so unless you can optimize your own code its better (and easier) to use inbuilt functions (if present) as far as speed and efficiency is concerned.
I think I would turn the problem around. Instead of hunting for the dial, I would use a set of robust features from the watch to 'stitch' the target image onto a template. The first watch has a set of squares in the dial that are white, the second watch has a number of white circles. I would per type of watch:
Segment out the squares or circles in the dial. Segmentation steps can be tricky as they are usually both scale and light dependent
Estimate the centers or corners of the above found feature areas. These are the new feature points.
Use the Hungarian algorithm to match features between the template watch and the target watch. Alternatively, one can take the surroundings of each feature point in the original image and match these using cross correlation
Use matching features between the template and the target to estimate scaling, rotation and translation
Stitch the image
As the image is now in a known form, one can extract the regions simply via pre set coordinates

image feature identification

I am looking for a solution to do the following:
( the focus of my question is step 2. )
a picture of a house including the front yard
extract information from the picture like the dimensions and location of the house, trees, sidewalk, and car. Also, the textures and colors of the house, cars, trees, and sidewalk.
use extracted information to generate a model
How can I extract that information?
You could also consult Tatiana Jaworska research on this. As I understood, this details at least 1 new algorithm to feature extraction (targeted at roofs, doors, ...) by colour (RGB). More intriguing, the last publication also uses parameterized objects to be identified in the house images... that must might be a really good starting point for what you're trying to do.
link to her publications:
http://www.springerlink.com/content/w518j70542780r34/
http://portal.acm.org/citation.cfm?id=1578785
http://www.ibspan.waw.pl/~jaworska/TJ_BOS2010.pdf
Yes. You can extract these information from a picture.
1. You just identify these objects in a picture using some detection algorithms.
2. Measure these objects dimensions and generate a model using extracted information.
well actually your desired goal is not so easy to achieve. First of all you'll need a good way to figure what what is what and what is where on your image. And there simply is no easy "algorithm" for detecting houses/cars/whatsoever on an image. There are ways to segment different objects (like cars) from an image, but those don't work generally. Especially on houses this would be hard since each house looks different and it's hard to find one solid measurement for "this is house and this is not"...
Am I assuming it right that you are trying to simply photograph a house (with front yard) and build a texturized 3D-model out of it? This is not going to work since you need several photos of the house to get positions of walls/corners and everything in 3D space (There are approaches that try a mesh reconstruction with one image only but they lack of depth information and results are fairly poor). So if you would like to create 3D-mdoels you will need several photos of different angles of the house.
There are several different approaches that use this kind of technique to reconstruct real world objects to triangle-meshes.
Basically they work after the principle:
Try to find points in images of different viewpoint which are the same on an object. Considering you are photographing a house this could be salient structures likes corners of windows/doors or corners or edges on the walls/roof/...
Knowing where one and the same point of your house is in several different photos and knowing the position of the camera of both photos you can reconstruct this point in 3D-space.
Doing this for a lot of equal points will "empower" you to reconstruct the shape of your house as a 3D-model by triangulating the points.
Taking parts of the image as textures and mapping them on the generated model would work as well since you know where what is.
You should have a look at these papers:
http://www.graphicon.ru/1999/3D%20Reconstruction/Valiev.pdf
http://people.csail.mit.edu/wojciech/pubs/LabeledRec.pdf
http://people.csail.mit.edu/sparis/publi/2006/oceans/Paris_06_3D_Reconstruction.ppt
The second paper even has an example of doing exactly what you try to achieve, namely reconstruct a textured 3D-model of a house photographed from different angles.
The third link is a powerpoint presentation that shows how the reconstruction works and shows the drawbacks there are.
So you should get familiar with these papers to see what problems you are up to... If you then want to try this on your own have a look at OpenCV. This library provides some methods for feature extraction in images. You then can try to find salient points in each image and try to match them.
Good luck on your project... If you have problems, please keep asking!
I suggest to look at this blog
https://jwork.org/main/node/35
that shows how to identify certain features on images using a convolutional neural network. This particular blog discusses how to identify human faces on images from a large set of random images. You can adjust this example to train neural network using some other images. Note that even in the case of human faces, the identification rate is about 85%, therefore, more complex objects can be even harder to identify

Detecting if two images are visually identical

Sometimes two image files may be different on a file level, but a human would consider them perceptively identical. Given that, now suppose you have a huge database of images, and you wish to know if a human would think some image X is present in the database or not. If all images had a perceptive hash / fingerprint, then one could hash image X and it would be a simple matter to see if it is in the database or not.
I know there is research around this issue, and some algorithms exist, but is there any tool, like a UNIX command line tool or a library I could use to compute such a hash without implementing some algorithm from scratch?
edit: relevant code from findimagedupes, using ImageMagick
try $image->Sample("160x160!");
try $image->Modulate(saturation=>-100);
try $image->Blur(radius=>3,sigma=>99);
try $image->Normalize();
try $image->Equalize();
try $image->Sample("16x16");
try $image->Threshold();
try $image->Set(magick=>'mono');
($blob) = $image->ImageToBlob();
edit: Warning! ImageMagick $image object seems to contain information about the creation time of an image file that was read in. This means that the blob you get will be different even for the same image, if it was retrieved at a different time. To make sure the fingerprint stays the same, use $image->getImageSignature() as the last step.
findimagedupes is pretty good. You can run "findimagedupes -v fingerprint images" to let it print "perceptive hash", for example.
Cross-correlation or phase correlation will tell you if the images are the same, even with noise, degradation, and horizontal or vertical offsets. Using the FFT-based methods will make it much faster than the algorithm described in the question.
The usual algorithm doesn't work for images that are not the same scale or rotation, though. You could pre-rotate or pre-scale them, but that's really processor intensive. Apparently you can also do the correlation in a log-polar space and it will be invariant to rotation, translation, and scale, but I don't know the details well enough to explain that.
MATLAB example: Registering an Image Using Normalized Cross-Correlation
Wikipedia calls this "phase correlation" and also describes making it scale- and rotation-invariant:
The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation.
Colour histogram is good for the same image that has been resized, resampled etc.
If you want to match different people's photos of the same landmark it's trickier - look at haar classifiers. Opencv is a great free library for image processing.
I don't know the algorithm behind it, but Microsoft Live Image Search just added this capability. Picasa also has the ability to identify faces in images, and groups faces that look similar. Most of the time, it's the same person.
Some machine learning technology like a support vector machine, neural network, naive Bayes classifier or Bayesian network would be best at this type of problem. I've written one each of the first three to classify handwritten digits, which is essentially image pattern recognition.
resize the image to a 1x1 pixle... if they are exact, there is a small probability they are the same picture...
now resize it to a 2x2 pixle image, if all 4 pixles are exact, there is a larger probability they are exact...
then 3x3, if all 9 pixles are exact... good chance etc.
then 4x4, if all 16 pixles are exact,... better chance.
etc...
doing it this way, you can make efficiency improvments... if the 1x1 pixel grid is off by a lot, why bother checking 2x2 grid? etc.
If you have lots of images, a color histogram could be used to get rough closeness of images before doing a full image comparison of each image against each other one (i.e. O(n^2)).
There is DPEG, "The" Duplicate Media Manager, but its code is not open. It's a very old tool - I remember using it in 2003.
You could use diff to see if they are REALLY different.. I guess it will remove lots of useless comparison. Then, for the algorithm, I would use a probabilistic approach.. what are the chances that they look the same.. I'd based that on the amount of rgb in each pixel. You could also find some other metrics such as luminosity and stuff like that.

Resources