distinguishing objects with opencv - algorithm

I want to identify lego bricks for building a lego sorting machine (I use c++ with opencv).
That means I have to distinguish between objects which look very similar.
The bricks are coming to my camera individually on a flat conveyer. But they might lay in any possible way: upside down, on the side or "normal".
My approach is to teach the sorting machine the bricks by taping them with the camera in lots of different positions and rotations. Features of each and every view are calculated by surf-algorythm.
void calculateFeatures(const cv::Mat& image,
std::vector<cv::KeyPoint>& keypoints,
cv::Mat& descriptors)
{
// detector == cv::SurfFeatureDetector(10)
detector->detect(image,keypoints);
// extractor == cv::SurfDescriptorExtractor()
extractor->compute(image,keypoints,descriptors);
}
If there is an unknown brick (the brick that i want to sort) its features also get calculated and matched with known ones.
To find wrongly matched features I proceed as described in the book OpenCV 2 Cookbook:
with the matcher (=cv::BFMatcher(cv::NORM_L2)) the two nearest neighbours in both directions are searched
matcher.knnMatch(descriptorsImage1, descriptorsImage2,
matches1,
2);
matcher.knnMatch(descriptorsImage2, descriptorsImage1,
matches2,
2);
I check the ratio between the distances of the found nearest neighbours. If the two distances are very similar it's likely that a false value is used.
// loop for matches1 and matches2
for(iterator matchIterator over all matches)
if( ((*matchIterator)[0].distance / (*matchIterator)[1].distance) > 0.65 )
throw away
Finally only symmatrical match-pairs are accepted. These are matches in which not only n1 is the nearest neighbour to feature f1, but also f1 is the nearest neighbour to n1.
for(iterator matchIterator1 over all matches)
for(iterator matchIterator2 over all matches)
if ((*matchIterator1)[0].queryIdx == (*matchIterator2)[0].trainIdx &&
(*matchIterator2)[0].queryIdx == (*matchIterator1)[0].trainIdx)
// good Match
Now only pretty good matches remain. To filter out some more bad matches I check which matches fit the projection of img1 on img2 using the fundamental matrix.
std::vector<uchar> inliers(points1.size(),0);
cv::findFundamentalMat(
cv::Mat(points1),cv::Mat(points2), // matching points
inliers,
CV_FM_RANSAC,
3,
0.99);
std::vector<cv::DMatch> goodMatches
// extract the surviving (inliers) matches
std::vector<uchar>::const_iterator itIn= inliers.begin();
std::vector<cv::DMatch>::const_iterator itM= allMatches.begin();
// for all matches
for ( ;itIn!= inliers.end(); ++itIn, ++itM)
if (*itIn)
// it is a valid match
The result is pretty good. But in cases of extreme alikeness faults still occur.
In the picture above you can see that a similar brick is recognized well.
However in the second picture a wrong brick is recognized just as well.
Now the question is how I could improve the matching.
I had two different ideas:
The matches in the second picture trace back to the features really fitting, but only if the visual field is intensely changed. To recognize a brick I have to compare it in many different positions anyway (at least as shown in figure three). This means I know that I am only allowed to minimally change the visual field. The information how intensely the visual field is changed should be hidden in the fundamental matrix. How can I read out of this matrix how far the position in the room has changed? Especially the rotation and strong scaling should be of interest; if the brick once is taped farer on the left side this shouldn't matter.
Second idea:
I calculated the fundamental matrix out of 2 pictures and filtered out features that don't fit the projections - shouldn't there be a way to do the same using three or more pictures? (keyword Trifocal tensor). This way the matching should become more stable. But I neither know how to do this using OpenCV nor could I find any information on this on google.

I don't have a complete answer, but I have a few suggestions.
On the image analysis side:
It looks like your camera setup is pretty constant. Easy to just separate the brick from the background. I also see your system finding features in the background. This is unnecessary. Set all non-brick pixels to black to remove them from the analysis.
When you have located just the brick, your first step should be to just filter likely candidates based on the size (i.e. number of pixels) in the brick. That way the example faulty match you show is already less likely.
You can take other features into account such as the aspect ratio of the bounding box of the brick, the major and minor axes (eigevectors of the covariance matrix of the central moments) of the brick etc.
These simpler features will give you a reasonable first filter to limit your search space.
On the mechanical side:
If bricks are actually coming down a conveyor you should be able to "straighten" the bricks along a straight edge using something like a rod that lies at an angle to the direction of the conveyor across the belt so that the bricks arrive more uniformly at your camera like so.
Similar to the previous point, you could use something like a very loose brush suspended across the belt to topple bricks standing up as they pass.
Again both these points will limit your search space.

Related

Accurate (and fast) angle matching

For a hobby project I'm attempting to align photo's and create 3D pictures. I basically have 2 camera's on a rig, that I use to make pictures. Automatically I attempt to align the images in such a way that you get a 3D SBS image.
They are high resolution images, which means a lot of pixels to process. Because I'm not really patient with computers, I want things to go fast.
Originally I've worked with code based on image stitching and feature extraction. In practice I found these algorithms to be too inaccurate and too slow. The main reason is that you have different levels of depth here, so you cannot do a 1-on-1 match of features. Most of the code already works fine, including vertical alignment.
For this question, you can assume that different ISO exposion levels / color correction and vertical alignment of the images are both taken care of.
What is still missing is a good algorithm for correcting the angle of the pictures. I noticed that left-right pictures usually vary a small number of degrees (think +/- 1.2 degrees difference) in angle, which is enough to get a slight headache. As a human you can easily spot this by looking at sharp differences in color and lining them up.
The irony here is that you spot it immediately as a human if it's correct or not, but somehow I'm not able to learn this to a machine. :-)
I've experimented with edge detectors, Hough transform and a large variety of home-brew algorithms, but so far found all of them to be both too slow and too inaccurate for my purposes. I've also attempted to iteratively aligning vertically while changing the angles slightly, so far without any luck.
Please note: Accuracy is perhaps more important than speed here.
I've added an example image here. It's actually both a left and right eye, alpha-blended. If you look closely, you can see the lamb at the top having two ellipses, and you can see how the chairs don't exactly line up at the top. It might seem negliable, but on a full screen resolution while using a beamer, you will easily see the difference. This also shows the level of accuracy that is required; it's quite a lot.
The shift in 'x' direction will give the 3D effect. Basically, if the shift is 0, it's on the screen, if it's <0 it's behind the screen and if it's >0 it's in front of the screen. This also makes matching harder, since you're not looking for a 'stitch'.
Basically the two camera's 'look' in the same direction (perpendicular as in the second picture here: http://www.triplespark.net/render/stereo/create.html ).
The difference originates from the camera being on a slightly different angle. This means the rotation is uniform throughout the picture.
I have once used the following amateur approach.
Assume that the second image has a rotation + vertical shift mismatch. This means that we need to apply some transform for the second image which can be expressed in matrix form as
x' = a*x + b*y + c
y' = d*x + e*y + f
that is, every pixel that has coordinates (x,y) on the second image, should be moved to a position (x',y') to compensate for this rotation and vertical shift.
We have a strict requirement that a=e, b=-d and d*d+e*e=1 so that it is indeed rotation+shift, no zoom or slanting etc. Also this notation allows for horizontal shift too, but this is easy to fix after angle+vertical shift correction.
Now select several common features on both images (I did selection by hand, as just 5-10 seemed enough, you can try to apply some automatic feature detection mechanism). Assume i-th feature has coordinates (x1[i], y1[i]) on first image and (x2[i], y2[i]) on the second. We expect that after out transformation the features have as equal as possible y-coordinates, that is we want (ideally)
y1[i]=y2'[i]=d*x2[i]+e*y2[i]+f
Having enough (>=3) features, we can determine d, e and f from this requirement. In fact, if you have more than 3 features, you will most probably not be able to find common d, e and f for them, but you can apply least-square method to find d, e and f that make y2' as close to y1 as possible. You can also account for the requirement that d*d+e*e=1 while finding d, e and f, though as far as i remember, I got acceptable results even not accounting for this.
After you have determined d, e and f, you have the requirement a=e and b=-d. This leaves only c unknown, which is horizontal shift. If you know what the horizontal shift should be, you can find c from there. I used the background (clouds on a landscape, for example) to get c.
When you know all the parameters, you can do one pass on the image and correct it. You might also want to apply some anti-aliasing, but that's a different question.
Note also that you can in a similar way introduce quadratic correction to the formulas to account for additional distortions the camera usually has.
However, that's just a simple algorithm I came up with when I faced the same problem some time ago. I did not do much research, so I'll be glad to know if there is a better or well-established approach or even a ready software.

Algorithm for following the path of ridges on a 3D image

I'm trying to find an algorithm (or algorithm ideas) for following a ridge on a 3D image, derived from a digital elevation model (DEM). I've managed to get very basic program working which just iterates across each row of the image marking a ridge line wherever it finds a large change in aspect (ie. from < 180 degrees to > 180 degrees).
However, the lines this produces aren't brilliant, there are often gaps and various strange artefacts. I'm hoping to try and extend this by using some sort of algorithm to follow the ridge lines, thus producing lines that are complete (that is, no gaps) and more accurate.
A number of people have mentioned snake algorithms to me, but they don't seem to be quite what I'm looking for. I've also done a lot of searching about path-finding algorithms, but again, they don't seem to be quite the right thing.
Does anyone have any suggestions for types or algorithms or specific algorithms I should look at?
Update: I've been asked to add some more detail on the exact area I'll be applying this to. It's working with gridded elevation data of sand dunes. I'm trying to extract the crests if these sand dunes, which look similar to the boundaries between drainage basins, but can be far more complex (for example, there can be multiple sand dunes very close to each other with gradually merging crests)
You can get a good estimate of the ridges using sign changes of the curvature. Note that the curvature will be near infinity at flat regions. Hence possible psuedo-code for a ridge detection algorithm could be:
for each face in the mesh
compute 1/curvature
if abs(1/curvature) != zeroTolerance
flag face as ridge
else
continue
(zeroTolerance is a number near but not equal to zero e.g. 0.003 etc)
Also Meshlab provides a module for normal & curvature estimation on most formats. You can test the idea using it, before you code it up.
I don't know how what your data is like or how much automation you need. This won't work if if consists of peaks without clear ridges (but then you probably wouldn't be asking the question.)
startPoint = highest point in DEM (or on ridge)
curPoint = startPoint;
line += curPoint;
Loop
curPoint = highest point adjacent to curPoint not in line; // (Don't backtrack)
line += point;
Repeat
Curious what the real solution turns out to be.
Edited to add: depending on the coarseness of your data set, 'point' can be a single point or a smoothed average of a local region of points.
http://en.wikipedia.org/wiki/Ridge_detection
You can treat the elevation as you would a grayscale color, then use a 2D edge recognition filter. There are lots of edge recognition methods available. The best would depend on your specific needs.

Find tunnel 'center line'?

I have some map files consisting of 'polylines' (each line is just a list of vertices) representing tunnels, and I want to try and find the tunnel 'center line' (shown, roughly, in red below).
I've had some success in the past using Delaunay triangulation but I'd like to avoid that method as it does not (in general) allow for easy/frequent modification of my map data.
Any ideas on how I might be able to do this?
An "algorithm" that works well with localized data changes.
The critic's view
The Good
The nice part is that it uses a mixture of image processing and graph operations available in most libraries, may be parallelized easily, is reasonable fast, may be tuned to use a relatively small memory footprint and doesn't have to be recalculated outside the modified area if you store the intermediate results.
The Bad
I wrote "algorithm", in quotes, just because I developed it and surely is not robust enough to cope with pathological cases. If your graph has a lot of cycles you may end up with some phantom lines. More on this and examples later.
And The Ugly
The ugly part is that you need to be able to flood fill the map, which is not always possible. I posted a comment a few days ago asking if your graphs can be flood filled, but didn't receive an answer. So I decided to post it anyway.
The Sketch
The idea is:
Use image processing to get a fine line of pixels representing the center path
Partition the image in chunks commensurated to the tunnel thinnest passages
At each partition, represent a point at the "center of mass" of the contained pixels
Use those pixels to represent the Vertices of a Graph
Add Edges to the Graph based on a "near neighbour" policy
Remove spurious small cycles in the induced Graph
End- The remaining Edges represent your desired path
The parallelization opportunity arises from the fact that the partitions may be computed in standalone processes, and the resulting graph may be partitioned to find the small cycles that need to be removed. These factors also allow to reduce the memory needed by serializing instead of doing calcs in parallel, but I didn't go trough this.
The Plot
I'll no provide pseudocode, as the difficult part is just that not covered by your libraries. Instead of pseudocode I'll post the images resulting from the successive steps.
I wrote the program in Mathematica, and I can post it if is of some service to you.
A- Start with a nice flood filled tunnel image
B- Apply a Distance Transformation
The Distance Transformation gives the distance transform of image, where the value of each pixel is replaced by its distance to the nearest background pixel.
You can see that our desired path is the Local Maxima within the tunnel
C- Convolve the image with an appropriate kernel
The selected kernel is a Laplacian-of-Gaussian kernel of pixel radius 2. It has the magic property of enhancing the gray level edges, as you can see below.
D- Cutoff gray levels and Binarize the image
To get a nice view of the center line!
Comment
Perhaps that is enough for you, as you ay know how to transform a thin line to an approximate piecewise segments sequence. As that is not the case for me, I continued this path to get the desired segments.
E- Image Partition
Here is when some advantages of the algorithm show up: you may start using parallel processing or decide to process each segment at a time. You may also compare the resulting segments with the previous run and re-use the previous results
F- Center of Mass detection
All the white points in each sub-image are replaced by only one point at the center of mass
XCM = (Σ i∈Points Xi)/NumPoints
YCM = (Σ i∈Points Yi)/NumPoints
The white pixels are difficult to see (asymptotically difficult with param "a" age), but there they are.
G- Graph setup from Vertices
Form a Graph using the selected points as Vertex. Still no Edges.
H- select Candidate Edges
Using the Euclidean Distance between points, select candidate edges. A cutoff is used to select an appropriate set of Edges. Here we are using 1.5 the subimagesize.
As you can see the resulting Graph have a few small cycles that we are going to remove in the next step.
H- Remove Small Cycles
Using a Cycle detection routine we remove the small cycles up to a certain length. The cutoff length depends on a few parms and you should figure it empirically for your graphs family
I- That's it!
You can see that the resulting center line is shifted a little bit upwards. The reason is that I'm superimposing images of different type in Mathematica ... and I gave up trying to convince the program to do what I want :)
A Few Shots
As I did the testing, I collected a few images. They are probably the most un-tunnelish things in the world, but my Tunnels-101 went astray.
Anyway, here they are. Remember that I have a displacement of a few pixels upwards ...
HTH !
.
Update
Just in case you have access to Mathematica 8 (I got it today) there is a new function Thinning. Just look:
This is a pretty classic skeletonization problem; there are lots of algorithms available. Some algorithms work in principle on outline contours, but since almost everyone uses them on images, I'm not sure how available such things will be. Anyway, if you can just plot and fill the sewer outlines and then use a skeletonization algorithm, you could get something close to the midline (within pixel resolution).
Then you could walk along those lines and do a binary search with circles until you hit at least two separate line segments (three if you're at a branch point). The midpoint of the two spots you first hit, or the center of a circle touching the three points you first hit, is a good estimate of the center.
Well in Python using package skimage it is an easy task as follows.
import pylab as pl
from skimage import morphology as mp
tun = 1-pl.imread('tunnel.png')[...,0] #your tunnel image
skl = mp.medial_axis(tun) #skeleton
pl.subplot(121)
pl.imshow(tun,cmap=pl.cm.gray)
pl.subplot(122)
pl.imshow(skl,cmap=pl.cm.gray)
pl.show()

Automatic tracking algorithm

I'm trying to write a simple tracking routine to track some points on a movie.
Essentially I have a series of 100-frames-long movies, showing some bright spots on dark background.
I have ~100-150 spots per frame, and they move over the course of the movie. I would like to track them, so I'm looking for some efficient (but possibly not overkilling to implement) routine to do that.
A few more infos:
the spots are a few (es. 5x5) pixels in size
the movement are not big. A spot generally does not move more than 5-10 pixels from its original position. The movements are generally smooth.
the "shape" of these spots is generally fixed, they don't grow or shrink BUT they become less bright as the movie progresses.
the spots don't move in a particular direction. They can move right and then left and then right again
the user will select a region around each spot and then this region will be tracked, so I do not need to automatically find the points.
As the videos are b/w, I though I should rely on brigthness. For instance I thought I could move around the region and calculate the correlation of the region's area in the previous frame with that in the various positions in the next frame. I understand that this is a quite naïve solution, but do you think it may work? Does anyone know specific algorithms that do this? It doesn't need to be superfast, as long as it is accurate I'm happy.
Thank you
nico
Sounds like a job for Blob detection to me.
I would suggest the Pearson's product. Having a model (which could be any template image), you can measure the correlation of the template with any section of the frame.
The result is a probability factor which determine the correlation of the samples with the template one. It is especially applicable to 2D cases.
It has the advantage to be independent from the sample absolute value, since the result is dependent on the covariance related with the mean of the samples.
Once you detect an high probability, you can track the successive frames in the neightboor of the original position, and select the best correlation factor.
However, the size and the rotation of the template matter, but this is not the case as I can understand. You can customize the detection with any shape since the template image could represent any configuration.
Here is a single pass algorithm implementation , that I've used and works correctly.
This has got to be a well reasearched topic and I suspect there won't be any 100% accurate solution.
Some links which might be of use:
Learning patterns of activity using real-time tracking. A paper by two guys from MIT.
Kalman Filter. Especially the Computer Vision part.
Motion Tracker. A student project, which also has code and sample videos I believe.
Of course, this might be overkill for you, but hope it helps giving you other leads.
Simple is good. I'd start doing something like:
1) over a small rectangle, that surrounds a spot:
2) apply a weighted average of all the pixel coordinates in the area
3) call the averaged X and Y values the objects position
4) while scanning these pixels, do something to approximate the bounding box size
5) repeat next frame with a slightly enlarged bounding box so you don't clip spot that moves
The weight for the average should go to zero for pixels below some threshold. Number 4 can be as simple as tracking the min/max position of anything brighter than the same threshold.
This will of course have issues with spots that overlap or cross paths. But for some reason I keep thinking you're tracking stars with some unknown camera motion, in which case this should be fine.
I'm afraid that blob tracking is not simple, not if you want to do it well.
Start with blob detection as genpfault says.
Now you have spots on every frame and you need to link them up. If the blobs are moving independently, you can use some sort of correspondence algorithm to link them up. See for instance http://server.cs.ucf.edu/~vision/papers/01359751.pdf.
Now you may have collisions. You can use mixture of gaussians to try to separate them, give up and let the tracks cross, use any other before-and-after information to resolve the collisions (e.g. if A and B collide and A is brighter before and will be brighter after, you can keep track of A; if A and B move along predictable trajectories, you can use that also).
Or you can collaborate with a lab that does this sort of stuff all the time.

Raytracing (LoS) on 3D hex-like tile maps

Greetings,
I'm working on a game project that uses a 3D variant of hexagonal tile maps. Tiles are actually cubes, not hexes, but are laid out just like hexes (because a square can be turned to a cube to extrapolate from 2D to 3D, but there is no 3D version of a hex). Rather than a verbose description, here goes an example of a 4x4x4 map:
(I have highlighted an arbitrary tile (green) and its adjacent tiles (yellow) to help describe how the whole thing is supposed to work; but the adjacency functions are not the issue, that's already solved.)
I have a struct type to represent tiles, and maps are represented as a 3D array of tiles (wrapped in a Map class to add some utility methods, but that's not very relevant).
Each tile is supposed to represent a perfectly cubic space, and they are all exactly the same size. Also, the offset between adjacent "rows" is exactly half the size of a tile.
That's enough context; my question is:
Given the coordinates of two points A and B, how can I generate a list of the tiles (or, rather, their coordinates) that a straight line between A and B would cross?
That would later be used for a variety of purposes, such as determining Line-of-sight, charge path legality, and so on.
BTW, this may be useful: my maps use the (0,0,0) as a reference position. The 'jagging' of the map can be defined as offsetting each tile ((y+z) mod 2) * tileSize/2.0 to the right from the position it'd have on a "sane" cartesian system. For the non-jagged rows, that yields 0; for rows where (y+z) mod 2 is 1, it yields 0.5 tiles.
I'm working on C#4 targeting the .Net Framework 4.0; but I don't really need specific code, just the algorithm to solve the weird geometric/mathematical problem. I have been trying for several days to solve this at no avail; and trying to draw the whole thing on paper to "visualize" it didn't help either :( .
Thanks in advance for any answer
Until one of the clever SOers turns up, here's my dumb solution. I'll explain it in 2D 'cos that makes it easier to explain, but it will generalise to 3D easily enough. I think any attempt to try to work this entirely in cell index space is doomed to failure (though I'll admit it's just what I think and I look forward to being proved wrong).
So you need to define a function to map from cartesian coordinates to cell indices. This is straightforward, if a little tricky. First, decide whether point(0,0) is the bottom left corner of cell(0,0) or the centre, or some other point. Since it makes the explanations easier, I'll go with bottom-left corner. Observe that any point(x,floor(y)==0) maps to cell(floor(x),0). Indeed, any point(x,even(floor(y))) maps to cell(floor(x),floor(y)).
Here, I invent the boolean function even which returns True if its argument is an even integer. I'll use odd next: any point point(x,odd(floor(y)) maps to cell(floor(x-0.5),floor(y)).
Now you have the basics of the recipe for determining lines-of-sight.
You will also need a function to map from cell(m,n) back to a point in cartesian space. That should be straightforward once you have decided where the origin lies.
Now, unless I've misplaced some brackets, I think you are on your way. You'll need to:
decide where in cell(0,0) you position point(0,0); and adjust the function accordingly;
decide where points along the cell boundaries fall; and
generalise this into 3 dimensions.
Depending on the size of the playing field you could store the cartesian coordinates of the cell boundaries in a lookup table (or other data structure), which would probably speed things up.
Perhaps you can avoid all the complex math if you look at your problem in another way:
I see that you only shift your blocks (alternating) along the first axis by half the blocksize. If you split up your blocks along this axis the above example will become (with shifts) an (9x4x4) simple cartesian coordinate system with regular stacked blocks. Now doing the raytracing becomes much more simple and less error prone.

Resources