Open/closed hand recognition: a simple method - algorithm

I'm searching for a simple method to recognize if an hand is open or closed.
I'm using C# and EmguCV, but this is not significative in this context. I only need a "pseudo-code" that describes what I need to do.
The input image to this algorithm is a binary image (I've already implemented the segmentation process) that represents the hand. The output must be a boolean (true for open hand, false otherwise).
This is an input example:
I tried to consider something about the convex hull, or the percentage of white area, but I guess these methods are not robust enough for this kind of problem.

In machine learning terms what you are trying to do is classification on a binary input matrix of the size of your input image (1 for white pixel, 0 for black pixel), to a single binary output (1 for open hand, 0 for closed hand).
If you build up a training set by taking lots of images of closed and open hands and hand label them (pun not intended), than you can apply a supervised learning algorithm to create the classifier.
There are many choices for supervised learning algorithms. Perhaps the best one to try for a first shot would be a support vector machine:
http://en.wikipedia.org/wiki/Support_vector_machine
Support vector machines work by essentially calculating the "distance" between the input image and the examples provided in the training set. If the input image is "closer" on average to the examples of open hands from the training set than the closed hands from the training set, it will classify it as an open hand (and visa versa).
There are many other supervised learning algorithms:
http://en.wikipedia.org/wiki/Supervised_learning

Convex Hull should do good, you can calculate the black area percentage lying in convex hull, and if its greater than some threshold, then hand is open.
Otherwise you can just calculate area & perimeter of the white area, then check their ratio, for open hands area / perimeter should be less than for closed hands.

Related

Algorithm for specific 1-bit bitmap coordinate conversion

I know Stack Overflow is a populated place and so I will try to make myself brief. I am a hobbyist programmer working within a small pocket niche of the coding world; my experience outside is horribly limited.
My dilemma is this: I have a 1-bit 2D matrix consisting of arbitrary data intended to represent on/off pixels on the screen. Consider an example for the imagination:
11110111101111111100
10111111111111111111
11011111111111011111
11111010111111111111
11111111011111111111
11111111111100011111
11111011011110001101
This is truly a poor example for my exact purposes, but it serves the question O.K. As a programmer, I look at this image matrix of data and consider ways to losslessly compress this, perhaps into a smaller, 1D matrix. I realize that large portions of the elements defining "on" pixels (this distinction is rather arbitrary, a display of 'on' pixels turned 'off' to create an image is the same as a display of 'off' pixels turned 'on' to create the same image) may be recognized by one's mind's eye as being solid rectangles of "on" pixels. Consider:
11110111101111111100
10111111111111111111
11011111111111011111
xxxxx010111111111111
xxxxx111011111111111
xxxxx111111100011111
xxxxx011011110001101
Storing the coordinates of a rectangle defining a rectangular section of elements of "on" pixels (and then storing those coordinates sequentially in a 1D array) seems like the best approach to this issue, for later, to redraw the same image, I may just scan through the resulting 1D array and plot all of the rectangles.
My question is: Does there exist a method or algorithm for performing this specific kind of conversion, and that, most efficiently? For that, consider Bresenham's Line Algorithm, which is arguably the most efficient at what it does, and many other algorithms out there.
I in my ignorance beyond my niche would like to assume that there existed a plethora of small algorithmic challenges like these, solved decades ago by mad computer scientists who nowadays surround themselves with magnetic tape and vacuum tubes. This seems to be the case for a lot of algorithmic hurdles like these which I often run into, that is, they were solved way before my time.
Image compresion is never exact. The level of compression depends a lot on the image itself. So, even you cleverly "discover" a new algorithm, it will succeed sometimes, where other times it will do a poor job.
This is due to some random factor in the nature of each image.
There's an old but still valid site with value info: CFAQ
About your "rectangle" algorithm, there's something near to it: PCIF algorithm

Find tunnel 'center line'?

I have some map files consisting of 'polylines' (each line is just a list of vertices) representing tunnels, and I want to try and find the tunnel 'center line' (shown, roughly, in red below).
I've had some success in the past using Delaunay triangulation but I'd like to avoid that method as it does not (in general) allow for easy/frequent modification of my map data.
Any ideas on how I might be able to do this?
An "algorithm" that works well with localized data changes.
The critic's view
The Good
The nice part is that it uses a mixture of image processing and graph operations available in most libraries, may be parallelized easily, is reasonable fast, may be tuned to use a relatively small memory footprint and doesn't have to be recalculated outside the modified area if you store the intermediate results.
The Bad
I wrote "algorithm", in quotes, just because I developed it and surely is not robust enough to cope with pathological cases. If your graph has a lot of cycles you may end up with some phantom lines. More on this and examples later.
And The Ugly
The ugly part is that you need to be able to flood fill the map, which is not always possible. I posted a comment a few days ago asking if your graphs can be flood filled, but didn't receive an answer. So I decided to post it anyway.
The Sketch
The idea is:
Use image processing to get a fine line of pixels representing the center path
Partition the image in chunks commensurated to the tunnel thinnest passages
At each partition, represent a point at the "center of mass" of the contained pixels
Use those pixels to represent the Vertices of a Graph
Add Edges to the Graph based on a "near neighbour" policy
Remove spurious small cycles in the induced Graph
End- The remaining Edges represent your desired path
The parallelization opportunity arises from the fact that the partitions may be computed in standalone processes, and the resulting graph may be partitioned to find the small cycles that need to be removed. These factors also allow to reduce the memory needed by serializing instead of doing calcs in parallel, but I didn't go trough this.
The Plot
I'll no provide pseudocode, as the difficult part is just that not covered by your libraries. Instead of pseudocode I'll post the images resulting from the successive steps.
I wrote the program in Mathematica, and I can post it if is of some service to you.
A- Start with a nice flood filled tunnel image
B- Apply a Distance Transformation
The Distance Transformation gives the distance transform of image, where the value of each pixel is replaced by its distance to the nearest background pixel.
You can see that our desired path is the Local Maxima within the tunnel
C- Convolve the image with an appropriate kernel
The selected kernel is a Laplacian-of-Gaussian kernel of pixel radius 2. It has the magic property of enhancing the gray level edges, as you can see below.
D- Cutoff gray levels and Binarize the image
To get a nice view of the center line!
Comment
Perhaps that is enough for you, as you ay know how to transform a thin line to an approximate piecewise segments sequence. As that is not the case for me, I continued this path to get the desired segments.
E- Image Partition
Here is when some advantages of the algorithm show up: you may start using parallel processing or decide to process each segment at a time. You may also compare the resulting segments with the previous run and re-use the previous results
F- Center of Mass detection
All the white points in each sub-image are replaced by only one point at the center of mass
XCM = (Σ i∈Points Xi)/NumPoints
YCM = (Σ i∈Points Yi)/NumPoints
The white pixels are difficult to see (asymptotically difficult with param "a" age), but there they are.
G- Graph setup from Vertices
Form a Graph using the selected points as Vertex. Still no Edges.
H- select Candidate Edges
Using the Euclidean Distance between points, select candidate edges. A cutoff is used to select an appropriate set of Edges. Here we are using 1.5 the subimagesize.
As you can see the resulting Graph have a few small cycles that we are going to remove in the next step.
H- Remove Small Cycles
Using a Cycle detection routine we remove the small cycles up to a certain length. The cutoff length depends on a few parms and you should figure it empirically for your graphs family
I- That's it!
You can see that the resulting center line is shifted a little bit upwards. The reason is that I'm superimposing images of different type in Mathematica ... and I gave up trying to convince the program to do what I want :)
A Few Shots
As I did the testing, I collected a few images. They are probably the most un-tunnelish things in the world, but my Tunnels-101 went astray.
Anyway, here they are. Remember that I have a displacement of a few pixels upwards ...
HTH !
.
Update
Just in case you have access to Mathematica 8 (I got it today) there is a new function Thinning. Just look:
This is a pretty classic skeletonization problem; there are lots of algorithms available. Some algorithms work in principle on outline contours, but since almost everyone uses them on images, I'm not sure how available such things will be. Anyway, if you can just plot and fill the sewer outlines and then use a skeletonization algorithm, you could get something close to the midline (within pixel resolution).
Then you could walk along those lines and do a binary search with circles until you hit at least two separate line segments (three if you're at a branch point). The midpoint of the two spots you first hit, or the center of a circle touching the three points you first hit, is a good estimate of the center.
Well in Python using package skimage it is an easy task as follows.
import pylab as pl
from skimage import morphology as mp
tun = 1-pl.imread('tunnel.png')[...,0] #your tunnel image
skl = mp.medial_axis(tun) #skeleton
pl.subplot(121)
pl.imshow(tun,cmap=pl.cm.gray)
pl.subplot(122)
pl.imshow(skl,cmap=pl.cm.gray)
pl.show()

matching jigsaw puzzle pieces

I have nothing useful to do and was playing with jigsaw puzzle like this:
alt text http://manual.gimp.org/nl/images/filters/examples/render-taj-jigsaw.jpg
and I was wondering if it'd be possible to make a program that assists me in putting it together.
Imagine that I have a small puzzle, like 4x3 pieces, but the little tabs and blanks are non-uniform - different pieces have these tabs in different height, of different shape, of different size. What I'd do is to take pictures of all of these pieces, let a program analyze them and store their attributes somewhere. Then, when I pick up a piece, I could ask the program to tell me which pieces should be its 'neighbours' - or if I have to fill in a blank, it'd tell me how does the wanted puzzle piece(s) look.
Unfortunately I've never did anything with image processing and pattern recognition, so I'd like to ask you for some pointers - how do I recognize a jigsaw piece (basically a square with tabs and holes) in a picture?
Then I'd probably need to rotate it so it's in the right position, scale to some proportion and then measure tab/blank on each side, and also each side's slope, if present.
I know that it would be too time consuming to scan/photograph 1000 pieces of puzzle and use it, this would be just a pet project where I'd learn something new.
Data acquisition
(This is known as Chroma Key, Blue Screen or Background Color method)
Find a well-lit room, with the least lighting variation across the room.
Find a color (hue) that is rarely used in the entire puzzle / picture.
Get a color paper that has that exactly same color.
Place as many puzzle pieces on the color paper as it'll fit.
You can categorize the puzzles into batches and use it as a computer hint later on.
Make sure the pieces do not overlap or touch each other.
Do not worry about orientation yet.
Take picture and download to computer.
Color calibration may be needed because the Chroma Key background may have upset the built-in color balance of the digital camera.
Acquisition data processing
Get some computer vision software
OpenCV, MATLAB, C++, Java, Python Imaging Library, etc.
Perform connected-component on the chroma key color on the image.
Ask for the contours of the holes of the connected component, which are the puzzle pieces.
Fix errors in the detected list.
Choose the indexing vocabulary (cf. Ira Baxter's post) and measure the pieces.
If the pieces are rectangular, find the corners first.
If the pieces are silghtly-off quadrilateral, the side lengths (measured corner to corner) is also a valuable signature.
Search for "Shape Context" on SO or Google or here.
Finally, get the color histogram of the piece, so that you can query pieces by color later.
To make them searchable, put them in a database, so that you can query pieces with any combinations of indexing vocabulary.
A step back to the problem itself. The problem of building a puzzle can be easy (P) or hard (NP), depending of whether the pieces fit only one neighbour, or many. If there is only one fit for each edge, then you just find, for each piece/side its neighbour and you're done (O(#pieces*#sides)). If some pieces allow multiple fits into different neighbours, then, in order to complete the whole puzzle, you may need backtracking (because you made a wrong choice and you get stuck).
However, the first problem to solve is how to represent pieces. If you want to represent arbitrary shapes, then you can probably use transparency or masks to represent which areas of a tile are actually part of the piece. If you use square shapes then the problem may be easier. In the latter case, you can consider the last row of pixels on each side of the square and match it with the most similar row of pixels that you find across all other pieces.
You can use the second approach to actually help you solve a real puzzle, despite the fact that you use square tiles. Real puzzles are normally built upon a NxM grid of pieces. When scanning the image from the box, you split it into the same NxM grid of square tiles, and get the system to solve that. The problem is then to visually map the actual squiggly piece that you hold in your hand with a tile inside the system (when they are small and uniformly coloured). But you get the same problem if you represent arbitrary shapes internally.

What is currently considered the "best" algorithm for 2D point-matching?

I have two lists containing x-y coordinates (of stars). I could also have magnitudes (brightnesses) attached to each star. Now each star has random position jiggles and there can be a few extra or missing points in each image. My question is, "What is the best 2D point matching algorithm for such a dataset?" I guess both for a simple linear (translation, rotation, scale) and non-linear (say, n-degree polynomials in the coordinates). In the lingo of the point matching field, I'm looking for the algorithms that would win in a shootout between 2D point matching programs with noise and spurious points. There may be a different "winners" depending if the labeling info is used (the magnitudes) and/or the transformation is restricted to being linear.
I am aware that there are many classes of 2D point matching algorithms and many algorithms in each class (literally probably hundreds in total) but I don't know which, if any, is the consider the "best" or the "most standard" by people in the field of computer vision. Sadly, many of the articles to papers I want to read don't have online versions and I can only read the abstract. Before I settle on a particular algorithm to implement it would be good to hear from a few experts to separate the wheat from the chaff.
I have a working matching program that uses triangles but it fails somewhat frequently (~5% of the time) such that the solution transformation has obvious distortions but for no obvious reason. This program was not written by me and is from a paper written almost 20 years ago. I want to write a new implementation that performs most robustly. I am assuming (hoping) that there have been some advances in this area that make this plausible.
If you're interested in star matching, check out the Astrometry.net blind astrometry solver and the paper on it here. They use four point quads to solve star configurations in Flickr pictures of the night sky. Check out this interview.
There is no single "best" algorithm for this. There are lots of different techniques, and each work better than others on specific datasets and types of data.
One thing I'd recommend is to read this introduction to image registration from the tutorials of the Insight Toolkit. ITK supports MANY types of image registration (which is what it sounds like you are attempting), and is very robust in many cases. Most of their users are in the medical field, so you'll have to wade through a lot of medical jargon, but the algorithms and code work with any type of image (including 1,2,3, and n dimensional images, of different types,etc).
You can consider applying your algorithm first only on the N brightest stars, then include progressively the others to refine the result, reducing the search range at the same time.
Using RANSAC for robustness to extra points is also very common.
I'm not sure it would work, but worth a try:
For each star do the circle time ray Fourier transform - centered around it - of all the other stars (note: this is not the standard Fourier transform, which is line times line).
The phase space of circle times ray is integers times line, but since we only have finite accuracy, you just get a matrix; the dimensions of the matrix depend on accuracy. Now try to pair the matrices to one another (e.g. using L_2 norm)
I saw a program on tv a while ago about how researchers were taking pictures of whales and using the spots on them (which are unique for each whale) to id each whale. It used the angles between the spots. By using the angles it didn't matter if the image was rotated or scaled or translated. That sounds similar to what you're doing with your triangles.
I think the "best" (most technical) way would to be to take the Fourier Transform of the original image and of the new linearly modified image. By doing some simple filtering, it should be easy to figure out the orientation and scale of your image with respect to the old one. There is a description of the 2d Fourier Transform here.

How do I efficiently segment 2D images into regions/blobs of similar values?

How do I segment a 2D image into blobs of similar values efficiently? The given input is a n array of integer, which includes hue for non-gray pixels and brightness of gray pixels.
I am writing a virtual mobile robot using Java, and I am using segmentation to analyze the map and also the image from the camera. This is a well-known problem in Computer Vision, but when it's on a robot performance does matter so I wanted some inputs. Algorithm is what matters, so you can post code in any language.
Wikipedia article: Segmentation (image processing)
[PPT] Stanford CS-223-B Lecture 11 Segmentation and Grouping (which says Mean Shift is perhaps the best technique to date)
Mean Shift Pictures (paper is also available from Dorin Comaniciu)
I would downsample,in colourspace and in number of pixels, use a vision method(probably meanshift) and upscale the result.
This is good because downsampling also increases the robustness to noise, and makes it more likely that you get meaningful segments.
You could use floodfill to smooth edges afterwards if you need smoothness.
Some more thoughts (in response to your comment).
1) Did you blend as you downsampled? y[i]=(x[2i]+x[2i+1])/2 This should eliminate noise.
2)How fast do you want it to be?
3)Have you tried dynamic meanshift?(also google for dynamic x for all algorithms x)
Not sure if it is too efficient, but you could try using a Kohonen neural network (or, self-organizing map; SOM) to group the similar values, where each pixel contains the original color and position and only the color is used for the Kohohen grouping.
You should read up before you implement this though, as my knowledge of the Kohonen network goes as far as that it is used for grouping data - so I don't know what the performance/viability options are for your scenario.
There are also Hopfield Networks. They can be mangled into grouping from what I read.
What I have now:
Make a buffer of the same size as the input image, initialized to UNSEGMENTED.
For each pixel in the image where the corresponding buffer value is not UNSEGMENTED, flood the buffer using the pixel value.
a. The border checking of the flooding is done by checking if pixel is within EPSILON (currently set to 10) of the originating pixel's value.
b. Flood filling algorithm.
Possible issue:
The 2.a.'s border checking is called many times in the flood filling algorithm. I could turn it into a lookup if I could precalculate the border using edge detection, but that may add more time than current check.
private boolean isValuesCloseEnough(int a_lhs, int a_rhs) {
return Math.abs(a_lhs - a_rhs) <= EPSILON;
}
Possible Enhancement:
Instead of checking every single pixel for UNSEGMENTED, I could randomly pick a few points. If you are expecting around 10 blobs, picking random points in that order may suffice. Drawback is that you might miss a useful but small blob.
Check out Eyepatch (eyepatch.stanford.edu). It should help you during the investigation phase by providing a variety of possible filters for segmentation.
An alternative to flood-fill is the connnected-components algorithm. So,
Cheaply classify your pixels. e.g. divide pixels in colour space.
Run the cc to find the blobs
Retain the blobs of significant size
This approach is widely used in early vision approaches. For example in the seminal paper "Blobworld: A System for Region-Based Image Indexing and Retrieval".

Resources