Algorithm to detect doors and windows in an image - algorithm

I'm working on an educational project where I need to detect objects, mainly doors and windows. I have tried to find specific algorithms to do this.
This is the first step in a project to detect all objects and let a user choose the object he wants. Then in the next step the system will define edges of the object accurately.
I want to detect objects by their color variety with background or with overlapping objects. I need an algorithm to start with. I started learning color spaces and I chose hvs color space. I read many papers and I know how they work, but I'm still confused and don't know what algorithm will really help.

You can use any segmentation algorithm.
You will need to find features from images to use in segmentation, a good approach for feature selection is by using any deep learning technique, i would recommend try CNN, you will find a builtin library "matconvnet" "http://www.vlfeat.org/matconvnet/" for implementing CNN in MATLAB.
you can also find few already build models for segmentation using CNN here http://www.vlfeat.org/matconvnet/pretrained/

Related

Removing skew/distortion based on known dimensions of a shape

I have an idea for an app that takes a printed page with four squares in each corner and allows you to measure objects on the paper given at least two squares are visible. I want to be able to have a user take a picture from less than perfect angles and still have the objects be measured accurately.
I'm unable to figure out exactly how to find information on this subject due to my lack of knowledge in the area. I've been able to find examples of opencv code that does some interesting transforms and the like but I've yet to figure out what I'm asking in simpler terms.
Does anyone know of papers or mathematical concepts I can lookup to get further into this project?
I'm not quite sure how or who to ask other than people on this forum, sorry for the somewhat vague question.
What you describe is very reminiscent of augmented reality marker tracking. Maybe you can start by searching these words on a search engine of your choice.
A single marker, if done correctly, can be used to identify it without confusing it with other markers AND to determine how the surface is placed in 3D space in front of the camera.
But that's all very difficult and advanced stuff, I'd greatly advise to NOT try and implement something like this, it would take years of research... The only way you have is to use a ready-made open source library that outputs the data you need for your app.
It may even not exist. In that case you'll have to buy one. Given the niché of your problem that would be perfectly plausible.
Here I give you only the programming aspect and if you want you can find out about the mathematical aspect from those examples. Most of the functions you need can be done using OpenCV. Here are some examples in python:
To detect the printed paper, you can use cv2.findContours function. The most outer contour is possibly the paper, but you need to test on actual images. https://docs.opencv.org/3.1.0/d4/d73/tutorial_py_contours_begin.html
In case of sloping (not in perfect angle), you can find the angle by cv2.minAreaRect which return the angle of the contour you found above. https://docs.opencv.org/3.1.0/dd/d49/tutorial_py_contour_features.html (part 7b).
If you want to rotate the paper, use cv2.warpAffine. https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_geometric_transformations/py_geometric_transformations.html
To detect the object in the paper, there are some methods. The easiest way is using the contours above. If the objects are in certain colors, you can detect it by using color filter. https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_colorspaces/py_colorspaces.html

Extracting trees from image without picking up background vegetation?

I do not have a background in image recognition/feature extraction, but I am in need of a way to extract trees from an image without the background vegetation.
Seen above is a small example of the kind of imagery I'm working with. I have access to multi-spectral imagery as well (though I haven't seen it yet) including NDVI, NIR, Red-edge.
From researching the problem at hand, I am aware that feature extraction is an active area of research and it seems that often supervised and unsupervised machine learning is employed in combination with statistical voodoo such as "PCA". Being able to differentiate between trees and background vegetation has been noted as an area of difficulty in some papers I skimmed over in my research.
There are notable features about the imagery I am working with. First of all, the palm trees have a very distinctive shape. Not only this, but there are obvious differences in the texture of the trees vs the texture of the background vegetation.
I am not an academic, and as such I only have access to publicly available papers for my research. I am looking for relevant algorithms that could help me extract the features of interest to me (trees) that either have an implementation (ideally in C or bindings to C, though I'm aware that it is not a commonly used language in this field) or with publicly available papers/tutorials/sites/etc. detailing the algorithm so that I could implement it myself.
Thanks in advance for any help!
Look into OpenCV, It has a lot of options for supervised/semi supervised Learning methods. As you have mentioned there is a visible texture difference between the tress and background vegetation, a good place for you would be to start would be color based segmentation and evolving it to use textures as well. OpenCV ML tutorial is a good starting point. Moreover you can also combine the NDVI data to create a stronger feature set.

Object detection + segmentation

I 'm trying to find an efficient way of acceptable complexity to
detect an object in an image so I can isolate it from its surroundings
segment that object to its sub-parts and label them so I can then fetch them at will
It's been 3 weeks since I entered the image processing world and I've read about so many algorithms (sift, snakes, more snakes, fourier-related, etc.), and heuristics that I don't know where to start and which one is "best" for what I'm trying to achieve. Having in mind that the image dataset in interest is a pretty large one, I don't even know if I should use some algorithm implemented in OpenCV or if I should implement one my own.
Summarize:
Which methodology should I focus on? Why?
Should I use OpenCV for that kind of stuff or is there some other 'better' alternative?
Thank you in advance.
EDIT -- More info regarding the datasets
Each dataset consists of 80K images of products sharing the same
concept e.g. t-shirts, watches, shoes
size
orientation (90% of them)
background (95% of them)
All pictures in each datasets look almost identical apart from the product itself, apparently. To make things a little more clear, let's consider only the 'watch dataset':
All the pictures in the set look almost exactly like this:
(again, apart form the watch itself). I want to extract the strap and the dial. The thing is that there are lots of different watch styles and therefore shapes. From what I've read so far, I think I need a template algorithm that allows bending and stretching so as to be able to match straps and dials of different styles.
Instead of creating three distinct templates (upper part of strap, lower part of strap, dial), it would be reasonable to create only one and segment it into 3 parts. That way, I would be confident enough that each part was detected with respect to each other as intended to e.g. the dial would not be detected below the lower part of the strap.
From all the algorithms/methodologies I've encountered, active shape|appearance model seem to be the most promising ones. Unfortunately, I haven't managed to find a descent implementation and I'm not confident enough that that's the best approach so as to go ahead and write one myself.
If anyone could point out what I should be really looking for (algorithm/heuristic/library/etc.), I would be more than grateful. If again you think my description was a bit vague, feel free to ask for a more detailed one.
From what you've said, here are a few things that pop up at first glance:
Simplest thing to do it binarize the image and do Connected Components using OpenCV or CvBlob library. For simple images with non-complex background this usually yeilds objects
HOwever, looking at your sample image, texture-based segmentation techniques may work better - the watch dial, the straps and the background are wisely variant in texture/roughness, and this could be an ideal way to separate them.
The roughness of a portion can be easily found by the Eigen transform (explained a bit on SO, check the link to the research paper provided there), then the Mean Shift filter can be applied on the output of the Eigen transform. This will give regions clearly separated according to texture. Both the pyramidal Mean Shift and finding eigenvalues by SVD are implemented in OpenCV, so unless you can optimize your own code its better (and easier) to use inbuilt functions (if present) as far as speed and efficiency is concerned.
I think I would turn the problem around. Instead of hunting for the dial, I would use a set of robust features from the watch to 'stitch' the target image onto a template. The first watch has a set of squares in the dial that are white, the second watch has a number of white circles. I would per type of watch:
Segment out the squares or circles in the dial. Segmentation steps can be tricky as they are usually both scale and light dependent
Estimate the centers or corners of the above found feature areas. These are the new feature points.
Use the Hungarian algorithm to match features between the template watch and the target watch. Alternatively, one can take the surroundings of each feature point in the original image and match these using cross correlation
Use matching features between the template and the target to estimate scaling, rotation and translation
Stitch the image
As the image is now in a known form, one can extract the regions simply via pre set coordinates

Face identification with opencv

I'm using the libraries OpenCV for image processing in C + + and this is my question: can you think possible to do a facial recognition (saying the name of a person based on a database of photos) by comparing the frame of videocamera with images in a database using the technique of image histograms comparison? (Note that i compare only the facial region of an image using an example included in the opecv libraries).
I'm asking this because i've just tried to do a program like above but i have a lot of problem (often i detect the wrong person)
You might want to start with compiling the Face Detection using OpenCV example. As others have pointed out, general facial recognition isn't exactly an easy problem to solve. EigenFaces is one common technique for face recognition that is fairly easy to understand and implement.
As others have stated, it's a hard problem, but this gives you a place to start.
Some method I had experience with them are
metric learning for comparing faces
naming video characters: they use SIFT descriptors computed at specific feducial points on each face. Their code worked quite well for me in the past.
A dataset and benchmark that is dedicated for this task is labeled faces in the wild. You can find there references to working methods for comparing faces after detection.
UPDATE:
I have a description of an experiment on face clustering: unsupervised face identification.
The experiment is described in Section 4.4 of my thesis.
The basic flow is as follows
Metric learning: how to determine if two faces are of the same person or not.
This part is supervised, in the sense that it requires as input face images labeled with the identity of the person who appears in each photo.
a. Detect fiducial points (eyes, corner of mouth, nose).
You may use this code, or more recent versions such as this one.
b. Extract SIFT descriptors at the detected fiducial points.
c. Construct a "face descriptor": each face is described using a single vector.
This vector is a concatenation of the sqrt of all the SIFT descriptors.
d. Use the method described here to learn a mahalanobis distance between faces of different persons.
Unsupervised face identification: Once a metric was learned, you may use new photos of new people (these people need not be part of the training set, you may use photos of unseen-before people!).
a. Repeat stages a-c to construct the same "face descriptor" vector for each input face.
b. Compare the descriptor vectors using the learned mahalanobis distance.
I suggest using an existing algorithm such as the one available in the Luxand FaceSDK: http://www.luxand.com/facesdk/ rather than trying to develop your own.
there are 3 builtin techniques for face-recognition in opencv now, pca(eigenfaces), lda(fisherfaces) and lbph.
nice example code:
https://github.com/Itseez/opencv/blob/master/samples/cpp/facerec_demo.cpp

Dilemma about image cropping algorithm - is it possible?

I am building a web application using .NET 3.5 (ASP.NET, SQL Server, C#, WCF, WF, etc) and I have run into a major design dilemma. This is a uni project btw, but it is 100% up to me what I develop.
I need to design a system whereby I can take an image and automatically crop a certain object within it, without user input. So for example, cut out the car in a picture of a road. I've given this a lot of thought, and I can't see any feasible method. I guess this thread is to discuss the issues and feasibility of achieving this goal. Eventually, I would get the dimensions of a car (or whatever it may be), and then pass this into a 3d modelling app (custom) as parameters, to render a 3d model. This last step is a lot more feasible. It's the cropping issue which is an issue. I have thought of all sorts of ideas, like getting the colour of the car and then the outline around that colour. So if the car (example) is yellow, when there is a yellow pixel in the image, trace around it. But this would fail if there are two yellow cars in a photo.
Ideally, I would like the system to be completely automated. But I guess I can't have everything my way. Also, my skills are in what I mentioned above (.NET 3.5, SQL Server, AJAX, web design) as opposed to C++ but I would be open to any solution just to see the feasibility.
I also found this patent: US Patent 7034848 - System and method for automatically cropping graphical images
Thanks
This is one of the problems that needed to be solved to finish the DARPA Grand Challenge. Google video has a great presentation by the project lead from the winning team, where he talks about how they went about their solution, and how some of the other teams approached it. The relevant portion starts around 19:30 of the video, but it's a great talk, and the whole thing is worth a watch. Hopefully it gives you a good starting point for solving your problem.
What you are talking about is an open research problem, or even several research problems. One way to tackle this, is by image segmentation. If you can safely assume that there is one object of interest in the image, you can try a figure-ground segmentation algorithm. There are many such algorithms, and none of them are perfect. They usually output a segmentation mask: a binary image where the figure is white and the background is black. You would then find the bounding box of the figure, and use it to crop. The thing to remember is that none of the existing segmentation algorithm will give you what you want 100% of the time.
Alternatively, if you know ahead of time what specific type of object you need to crop (car, person, motorcycle), then you can try an object detection algorithm. Once again, there are many, and none of them are perfect either. On the other hand, some of them may work better than segmentation if your object of interest is on very cluttered background.
To summarize, if you wish to pursue this, you would have to read a fair number of computer vision papers, and try a fair number of different algorithms. You will also increase your chances of success if you constrain your problem domain as much as possible: for example restrict yourself to a small number of object categories, assume there is only one object of interest in an image, or restrict yourself to a certain type of scenes (nature, sea, etc.). Also keep in mind, that even the accuracy of state-of-the-art approaches to solving this type of problems has a lot of room for improvement.
And by the way, the choice of language or platform for this project is by far the least difficult part.
A method often used for face detection in images is through the use of a Haar classifier cascade. A classifier cascade can be trained to detect any objects, not just faces, but the ability of the classifier is highly dependent on the quality of the training data.
This paper by Viola and Jones explains how it works and how it can be optimised.
Although it is C++ you might want to take a look at the image processing libraries provided by the OpenCV project which include code to both train and use Haar cascades. You will need a set of car and non-car images to train a system!
Some of the best attempts I've see of this is using a large database of images to help understand the image you have. These days you have flickr, which is not only a giant corpus of images, but it's also tagged with meta-information about what the image is.
Some projects that do this are documented here:
http://blogs.zdnet.com/emergingtech/?p=629
Start with analyzing the images yourself. That way you can formulate the criteria on which to match the car. And you get to define what you cannot match.
If all cars have the same background, for example, it need not be that complex. But your example states a car on a street. There may be parked cars. Should they be recognized?
If you have access to MatLab, you could test your pattern recognition filters with specialized software like PRTools.
Wwhen I was studying (a long time ago:) I used Khoros Cantata and found that an edge filter can simplify the image greatly.
But again, first define the conditions on the input. If you don't do that you will not succeed because pattern recognition is really hard (think about how long it took to crack captcha's)
I did say photo, so this could be a black car with a black background. I did think of specifying the colour of the object, and then when that colour is found, trace around it (high level explanation). But, with a black object in a black background (no constrast in other words), it would be a very difficult task.
Better still, I've come across several sites with 3d models of cars. I could always use this, stick it into a 3d model, and render it.
A 3D model would be easier to work with, a real world photo much harder. It does suck :(
If I'm reading this right... This is where AI shines.
I think the "simplest" solution would be to use a neural-network based image recognition algorithm. Unless you know that the car will look the exact same in each picture, then that's pretty much the only way.
If it IS the exact same, then you can just search for the pixel pattern, and get the bounding rectangle, and just set the image border to the inner boundary of the rectangle.
I think that you will never get good results without a real user telling the program what to do. Think of it this way: how should your program decide when there is more than 1 interesting object present (for example: 2 cars)? what if the object you want is actually the mountain in the background? what if nothing of interest is inside the picture, thus nothing to select as the object to crop out? etc, etc...
With that said, if you can make assumptions like: only 1 object will be present, then you can have a go with using image recognition algorithms.
Now that I think of it. I recently got a lecture about artificial intelligence in robots and in robotic research techniques. Their research went on about language interaction, evolution, and language recognition. But in order to do that they also needed some simple image recognition algorithms to process the perceived environment. One of the tricks they used was to make a 3D plot of the image where x and y where the normal x and y axis and the z axis was the brightness of that particular point, then they used the same technique for red-green values, and blue-yellow. And lo and behold they had something (relatively) easy they could use to pick out the objects from the perceived environment.
(I'm terribly sorry, but I can't find a link to the nice charts they had that showed how it all worked).
Anyway, the point is that they were not interested (that much) in image recognition so they created something that worked good enough and used something less advanced and thus less time consuming, so it is possible to create something simple for this complex task.
Also any good image editing program has some kind of magic wand that will select, with the right amount of tweaking, the object of interest you point it on, maybe it's worth your time to look into that as well.
So, it basically will mean that you:
have to make some assumptions, otherwise it will fail terribly
will probably best be served with techniques from AI, and more specifically image recognition
can take a look at paint.NET and their algorithm for their magic wand
try to use the fact that a good photo will have the object of interest somewhere in the middle of the image
.. but i'm not saying that this is the solution for your problem, maybe something simpler can be used.
Oh, and I will continue to look for those links, they hold some really valuable information about this topic, but I can't promise anything.

Resources