What are some Line detection/recognition algorithms used in machine learning? - algorithm

I'v looked at Hough Transform, but I'd like a machine learning classifier that would achieve the same purpose: detect unique lines from a given 2D vector or image. The closest I could think of was k-NN, but that would give me neighbors around a cluster instead of those that fall in a straight line.

The closest thing to do would be to train a Convolutional Neural Network (CNN). The different convolution layers should produce maps that will detect small portions of lines within different orientations. And when recombined all together it should detect straight lines. As I said into my comments, it can even reconnect discontinued lines.
If you are also interested by twisted lines (so not only straight), you can add a Transformer layer, which is going to apply elastic deformations during the training. These transformations will twist the line detectors, making them sensitive to twisted lines.

Related

OpenCV detect tennis court lines behind net

I'm trying to implement a tennis court detector using video recorded from the phone. I filmed it from the far corner of the tennis court.
The original image is like this.
Using OpenCV Canny Edge Detection and Hough Lines transformation, I'm able to detect the lines in my own half, but not the ones behind the net. How can I improve this process and get the undetected court lines?
The processed image is as below.
Updated on 2016-08-25
Thanks guys. I understand it makes sense to derive the court lines by fitting the detected lines to the model lines. I am not going to try combinatorial search to find the best lines to fit models. Therefore, I have been trying to separate the horizontal/vertical lines in order to reduce the computational complexity. I tried RANSAC in order to find vanishing points (VP) that associate two different groups of lines, but failed probably because of detection error(?).
The scatter plot of the line parameters in polar coordinates is as below. It is basically to classify the points into two groups: the top points that form a horizontal line; the down left points that also form a line with deep slope. Is there anyway to do that? Thanks
You don't need to detect the lines behind the net. You know the ground is a flat plane, you know the dimensions of each side of the court are the same - so you only need to detect the nearby lines and you can calculate where the missing lines are.
In fact you really only need to detect a single corner if you know the characteristics of the camera+lens.
In addition to Martin's comments, you might try using some kind of blur on the image before running your edge/line detection. With some tuning, you should be able to remove the signal of the net and maintain the court lines.
Another approach would be to reduce the thick lines to a single pixel by scanning the image left to right (for example) to detect transitions from red/green to white and back to red/green again. When this occurs, you can estimate that the midpoint of those two transitions is the midpoint of a court line. This would give you data you could feed directly into your Hough transform. This of course requires you to classify individual pixels as either court or line, which it seems like you aren't currently doing. This process can also be performed top-to-bottom to produce a second set of midpoint estimates.
For blurring, try a Bilateral filter
Example input image:
cv2.bilateralFilter(img_gray,30,25,75)
output image:

Convert polygons into mesh

I have a lot of polygons. Ideally, all the polygons must not overlap one other, but they can be located adjacent to one another.
But practically, I would have to allow for slight polygon overlap ( defined by a certain tolerance) because all these polygons are obtained from user hand drawing input, which is not as machine-precised as I want them to be.
My question is, is there any software library components that:
Allows one to input a range of polygons
Check if the polygons are overlapped more than a prespecified tolerance
If yes, then stop, or else, continue
Create mesh in terms of coordinates and elements for the polygons by grouping common vertex and edges together?
More importantly, link back the mesh edges to the original polygon(s)'s edge?
Or is there anyone tackle this issue before?
This issue is a daily "bread" of GIS applications - this is what is exactly done there. We also learned that at a GIS course. Look into GIS systems how they address this issue. E.g. ArcGIS define so called topology rules and has some functions to check if the edited features are topologically correct. See http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=Topology_rules
This is pretty long, only because the question is so big. I've tried to group my comments based on your bullet points.
Components to draw polygons
My guess is that you'll have limited success without providing more information - a component to draw polygons will be very much coupled to the language and UI paradigm you are using for the rest of your project, ie. code for a web component will look very different to a native component.
Perhaps an alternative is to separate this element of the process out from the rest of what you're trying to do. There are some absolutely fantastic pre-existing editors that you can use to create 2d and 3d polygons.
Inkscape is an example of a vector graphics editor that makes it easy to enter 2d polygons, and has the advantage of producing output SVG, which is reasonably easy to parse.
In three dimensions Blender is an open source editor that can be used to produce arbitrary geometries that can be exported to a number of formats.
If you can use a google-maps API (possibly in an native HTML rendering control), and you are interested in adding spatial points on a map overlay, you may be interested in related click-to-draw polygon question on stackoverflow. From past experience, other map APIs like OpenLayers support similar approaches.
Check whether polygons are overlapped
Thomas T made the point in his answer, that there are families of related predicates that can be used to address this and related queries. If you are literally just looking for overlaps and other set theoretic operations (union, intersection, set difference) in two dimensions you can use the General Polygon Clipper
You may also need to consider the slightly more generic problem when two polygons that don't overlap or share a vertex when they should. You can use a Minkowski sum to dilate (enlarge) two and three dimensional polygons to avoid such problems. The Computational Geometry Algorithms Library has robust implementations of these algorithms.
I think that it's more likely that you are really looking for a piece of software that can perform vertex welding, Christer Ericson's book Real-time Collision Detection includes extensive and very readable description of the basics in this field, and also on related issues of edge snapping, crack detection, T-junctions and more. However, even though code snippets are included for that book, I know of no ready made library that addresses these problems, in particular, no complete implementation is given for anything beyond basic vertex welding.
Obviously all 3D packages (blender, maya, max, rhino) all include built in software and tools to solve this problem.
Group polygons based on vertices
From past experience, this turned out to be one of the most time consuming parts of developing software to solve problems in this area. It requires reasonable understanding of graph theory and algorithms to traverse boundaries. It is worth relying upon a solid geometry or graph library to do the heavy lifting for you. In the past I've had success with igraph.
Link the updated polygons back to the originals.
Again, from past experience, this is just a case of careful bookkeeping, and some very careful design of your mesh classes up-front. I'd like to give more advice, but even after spending a big chunk of the last six months on this, I'm still struggling to find a "nice" way to do this.
Other Comments
If you're interacting with users, I would strongly recommend avoiding this issue where possible by using an editor that "snaps", rounding all user entered points onto a grid. This will hopefully significantly reduce the amount of work that you have to do.
Yes, you can use OGR. It has python bindings. Specifically, the Geometry class has an Intersects method. I don't fully understand what you want in points 4 and 5.

Object detection + segmentation

I 'm trying to find an efficient way of acceptable complexity to
detect an object in an image so I can isolate it from its surroundings
segment that object to its sub-parts and label them so I can then fetch them at will
It's been 3 weeks since I entered the image processing world and I've read about so many algorithms (sift, snakes, more snakes, fourier-related, etc.), and heuristics that I don't know where to start and which one is "best" for what I'm trying to achieve. Having in mind that the image dataset in interest is a pretty large one, I don't even know if I should use some algorithm implemented in OpenCV or if I should implement one my own.
Summarize:
Which methodology should I focus on? Why?
Should I use OpenCV for that kind of stuff or is there some other 'better' alternative?
Thank you in advance.
EDIT -- More info regarding the datasets
Each dataset consists of 80K images of products sharing the same
concept e.g. t-shirts, watches, shoes
size
orientation (90% of them)
background (95% of them)
All pictures in each datasets look almost identical apart from the product itself, apparently. To make things a little more clear, let's consider only the 'watch dataset':
All the pictures in the set look almost exactly like this:
(again, apart form the watch itself). I want to extract the strap and the dial. The thing is that there are lots of different watch styles and therefore shapes. From what I've read so far, I think I need a template algorithm that allows bending and stretching so as to be able to match straps and dials of different styles.
Instead of creating three distinct templates (upper part of strap, lower part of strap, dial), it would be reasonable to create only one and segment it into 3 parts. That way, I would be confident enough that each part was detected with respect to each other as intended to e.g. the dial would not be detected below the lower part of the strap.
From all the algorithms/methodologies I've encountered, active shape|appearance model seem to be the most promising ones. Unfortunately, I haven't managed to find a descent implementation and I'm not confident enough that that's the best approach so as to go ahead and write one myself.
If anyone could point out what I should be really looking for (algorithm/heuristic/library/etc.), I would be more than grateful. If again you think my description was a bit vague, feel free to ask for a more detailed one.
From what you've said, here are a few things that pop up at first glance:
Simplest thing to do it binarize the image and do Connected Components using OpenCV or CvBlob library. For simple images with non-complex background this usually yeilds objects
HOwever, looking at your sample image, texture-based segmentation techniques may work better - the watch dial, the straps and the background are wisely variant in texture/roughness, and this could be an ideal way to separate them.
The roughness of a portion can be easily found by the Eigen transform (explained a bit on SO, check the link to the research paper provided there), then the Mean Shift filter can be applied on the output of the Eigen transform. This will give regions clearly separated according to texture. Both the pyramidal Mean Shift and finding eigenvalues by SVD are implemented in OpenCV, so unless you can optimize your own code its better (and easier) to use inbuilt functions (if present) as far as speed and efficiency is concerned.
I think I would turn the problem around. Instead of hunting for the dial, I would use a set of robust features from the watch to 'stitch' the target image onto a template. The first watch has a set of squares in the dial that are white, the second watch has a number of white circles. I would per type of watch:
Segment out the squares or circles in the dial. Segmentation steps can be tricky as they are usually both scale and light dependent
Estimate the centers or corners of the above found feature areas. These are the new feature points.
Use the Hungarian algorithm to match features between the template watch and the target watch. Alternatively, one can take the surroundings of each feature point in the original image and match these using cross correlation
Use matching features between the template and the target to estimate scaling, rotation and translation
Stitch the image
As the image is now in a known form, one can extract the regions simply via pre set coordinates

Face identification with opencv

I'm using the libraries OpenCV for image processing in C + + and this is my question: can you think possible to do a facial recognition (saying the name of a person based on a database of photos) by comparing the frame of videocamera with images in a database using the technique of image histograms comparison? (Note that i compare only the facial region of an image using an example included in the opecv libraries).
I'm asking this because i've just tried to do a program like above but i have a lot of problem (often i detect the wrong person)
You might want to start with compiling the Face Detection using OpenCV example. As others have pointed out, general facial recognition isn't exactly an easy problem to solve. EigenFaces is one common technique for face recognition that is fairly easy to understand and implement.
As others have stated, it's a hard problem, but this gives you a place to start.
Some method I had experience with them are
metric learning for comparing faces
naming video characters: they use SIFT descriptors computed at specific feducial points on each face. Their code worked quite well for me in the past.
A dataset and benchmark that is dedicated for this task is labeled faces in the wild. You can find there references to working methods for comparing faces after detection.
UPDATE:
I have a description of an experiment on face clustering: unsupervised face identification.
The experiment is described in Section 4.4 of my thesis.
The basic flow is as follows
Metric learning: how to determine if two faces are of the same person or not.
This part is supervised, in the sense that it requires as input face images labeled with the identity of the person who appears in each photo.
a. Detect fiducial points (eyes, corner of mouth, nose).
You may use this code, or more recent versions such as this one.
b. Extract SIFT descriptors at the detected fiducial points.
c. Construct a "face descriptor": each face is described using a single vector.
This vector is a concatenation of the sqrt of all the SIFT descriptors.
d. Use the method described here to learn a mahalanobis distance between faces of different persons.
Unsupervised face identification: Once a metric was learned, you may use new photos of new people (these people need not be part of the training set, you may use photos of unseen-before people!).
a. Repeat stages a-c to construct the same "face descriptor" vector for each input face.
b. Compare the descriptor vectors using the learned mahalanobis distance.
I suggest using an existing algorithm such as the one available in the Luxand FaceSDK: http://www.luxand.com/facesdk/ rather than trying to develop your own.
there are 3 builtin techniques for face-recognition in opencv now, pca(eigenfaces), lda(fisherfaces) and lbph.
nice example code:
https://github.com/Itseez/opencv/blob/master/samples/cpp/facerec_demo.cpp

What is currently considered the "best" algorithm for 2D point-matching?

I have two lists containing x-y coordinates (of stars). I could also have magnitudes (brightnesses) attached to each star. Now each star has random position jiggles and there can be a few extra or missing points in each image. My question is, "What is the best 2D point matching algorithm for such a dataset?" I guess both for a simple linear (translation, rotation, scale) and non-linear (say, n-degree polynomials in the coordinates). In the lingo of the point matching field, I'm looking for the algorithms that would win in a shootout between 2D point matching programs with noise and spurious points. There may be a different "winners" depending if the labeling info is used (the magnitudes) and/or the transformation is restricted to being linear.
I am aware that there are many classes of 2D point matching algorithms and many algorithms in each class (literally probably hundreds in total) but I don't know which, if any, is the consider the "best" or the "most standard" by people in the field of computer vision. Sadly, many of the articles to papers I want to read don't have online versions and I can only read the abstract. Before I settle on a particular algorithm to implement it would be good to hear from a few experts to separate the wheat from the chaff.
I have a working matching program that uses triangles but it fails somewhat frequently (~5% of the time) such that the solution transformation has obvious distortions but for no obvious reason. This program was not written by me and is from a paper written almost 20 years ago. I want to write a new implementation that performs most robustly. I am assuming (hoping) that there have been some advances in this area that make this plausible.
If you're interested in star matching, check out the Astrometry.net blind astrometry solver and the paper on it here. They use four point quads to solve star configurations in Flickr pictures of the night sky. Check out this interview.
There is no single "best" algorithm for this. There are lots of different techniques, and each work better than others on specific datasets and types of data.
One thing I'd recommend is to read this introduction to image registration from the tutorials of the Insight Toolkit. ITK supports MANY types of image registration (which is what it sounds like you are attempting), and is very robust in many cases. Most of their users are in the medical field, so you'll have to wade through a lot of medical jargon, but the algorithms and code work with any type of image (including 1,2,3, and n dimensional images, of different types,etc).
You can consider applying your algorithm first only on the N brightest stars, then include progressively the others to refine the result, reducing the search range at the same time.
Using RANSAC for robustness to extra points is also very common.
I'm not sure it would work, but worth a try:
For each star do the circle time ray Fourier transform - centered around it - of all the other stars (note: this is not the standard Fourier transform, which is line times line).
The phase space of circle times ray is integers times line, but since we only have finite accuracy, you just get a matrix; the dimensions of the matrix depend on accuracy. Now try to pair the matrices to one another (e.g. using L_2 norm)
I saw a program on tv a while ago about how researchers were taking pictures of whales and using the spots on them (which are unique for each whale) to id each whale. It used the angles between the spots. By using the angles it didn't matter if the image was rotated or scaled or translated. That sounds similar to what you're doing with your triangles.
I think the "best" (most technical) way would to be to take the Fourier Transform of the original image and of the new linearly modified image. By doing some simple filtering, it should be easy to figure out the orientation and scale of your image with respect to the old one. There is a description of the 2d Fourier Transform here.

Resources