What gesture recognition libraries (if any) exist for the Kinect? Right now I'm using OpenNI to record skeleton movements but am not sure how to go from that to triggering discrete actions.
My problem might be as simple as pose detection but it could also be as complicated as time based movements (ie. detect when they are moving their hand in a circle) depending on how difficult that is. The examples that I've seen for pose detection have been very ad-hoc - is this because a generic algorithm is difficult to do right?
The NITE library (on top of OpenNI) has classes for detecting swipe and other gestures, but personally I've had trouble with using both the base OpenNI and NITE libraries together in C# (I keep running in to AccessViolationExceptions). If you're writing managed code, the XnVNITE.net.dll is what has the swipe detection. It's found under the PrimeSense/NITE folder after you install NITE.
If you can do without the skeleton and user recognition there is also the ManagedNite.dll library, which is a redundant library shipped with the PrimeSense NITE install. ManagedNite.dll also has hand/gesture recognition but no skeleton/user detection.
Otherwise, you can certainly detect your own time-based swipe gesture, as you suggested. You should be able to detect if a series of hand points travels in a straight line with a function like this:
static bool DetectSwipe(Point3D[] points)
{
int LineSize = 10; // number of points in the array to look at
int MinXDelta = 300; // required horizontal distance
int MaxYDelta = 100; // max mount of vertical variation
float x1 = points[0].X;
float y1 = points[0].Y;
float x2 = points[last].X;
float y2 = points[last].Y;
if (Math.Abs(x1 - x2) < MinXDelta)
return false;
if (y1 - y2 > MaxYDelta)
return false;
for (int i = 1; i < LineSize - 2; i++)
{
if (Math.Abs((points[i].Y - y1)) > MaxYDelta)
return false;
float result =
(y1 - y1) * points[i].X +
(x2 - x1) * points[i].Y +
(x1 * y2 - x2 * y1);
if (result > Math.Abs(result))
{
return false;
}
}
return true;
}
You could enhance this code to detect for right vs. left swiping. I also did not include time computation in my example above - you would need to look at the time of the first and last point and determine if the swipe was completed within a certain amount of time.
check this out: http://kinectrecognizer.codeplex.com/
supports 3D tracking and recognition fine-tuning.. should be easy to reuse as well
Softkinetic looks promising, but the SDK is not freely available just yet.
I am working on a standalone skeleton detection code for kinect. http://code42tiger.blogspot.com
I am planning to release it for free, however I still have a long way to go from perfection. I am wondering if your requirement is only hand position tracking, you can write it yourself without even using OpenNI or any other library. If you need a simple tip, read below.
1) Background removal (explained in my blog)
2) Blob detection (to choose which person to track, also explained in blog)
3) Hand tracking (Now when you have the user alone in the data, you can find easily find the hand by considering the farthest point from the body.)
4) Track the hand position to detect gestures. (some calculation that tracks the hand every few frames will given you the geometry of the movement)
This should work (if not perfect) 75% of the time. Unless the user tries to find fault with the algo, it should work for normal users.
Related
I am experimenting a little bit with shaders and the calculation of a collision between ray-box which is done following way:
inline bool hitsCube(in Ray ray, in Cube cube,
out float tMin, out float tMax,
out float3 signMin, out float3 signMax)
{
float3 biggerThan0 = ray.odir > 0; // ray.odir = (1.0/ray.dir)
float3 lessThan0 = 1.0f - biggerThan0;
float3 tMinXYZ = cube.center + biggerThan0 * cube.minSize + lessThan0 * cube.maxSize;
float3 tMaxXZY = cube.center + biggerThan0 * cube.maxSize + lessThan0 * cube.minSize;
float3 rMinXYZ = (tMinXYZ - ray.origin) * ray.odir;
float3 rMaxXYZ = (tMaxXZY - ray.origin) * ray.odir;
float minV = max(rMinXYZ.x, max(rMinXYZ.y, rMinXYZ.z));
float maxV = min(rMaxXYZ.x, min(rMaxXYZ.y, rMaxXYZ.z));
tMin = minV;
tMax = maxV;
signMin = (rMinXYZ == minV) * lessThan0; // important calculation for another algorithm, but no context provided here
signMax = (rMaxXYZ == maxV) * lessThan0;
return maxV > minV * (minV + maxV >= 0); // last multiplication makes sure the origin of the ray is outside the cube
}
Considering this function could be called inside a hlsl-shader many, many times (for some pixels lets say at least 200/300 times): Is my implementation of the collision logic inefficient?
Not rally a easily answerable "question", and hard to say without knowing all else that's going on around it, but just a few random thoughts:
a) if you're really interested in knowing that this could would look like on the GPU I'd suggest "porting" that to a CUDA kernel, then using CUDA to generate PTX and SASS for a modern GPU (say, sm75 for turing or sm86 for ampere); then compare two or three variants of that in SASS output.
b) the "converting logic to multiplications" might give you less than you think - if the logic isn't too complicated there's a good change you might end up with a few predicates and not much warp divergence at all, so might not be too bad. Only way to tell is look at PTX and/or SASS output, see 'a'.
c) your formulation of tMinXYZ/tMaxXYZ is (IMHO) unnecesarily complicated: just express it with min/max operations, which are really cheap on GPUs. Also see the respective chapter "ray/box intersection" in the ray tracing gems 2 book (which is free for download). Also more numerically stable btw.
d) re "lags... is my logic inefficient" - actual assembly "efficiency" will rarely have such gigantic effects; usually the culprit for noticeable "lags" is either memory stalls (hard to guess what's going on), or something going horribly wrong for other reasons (see next bullet).
e) just a hunch: I would check rays where some of the direction components are 0. In this case you're dividing by 0 (never a good idea), and in particular if this gets multiplied with 0.f (which in your case can happen) you'll get NaNs, and since "comparison with NaN is always false" you may end with cases where your traversal logic always goes down instead of skipping. Not the same as "efficiency" of your logic, but something to look out for. Good fix is to always change each ray.dir that's 0.f to 1e-6f or so.
I am writing an OpenCL app on mac using c++, and it crashes in certain cases depending on the work size.
The program crashes due to a SIGABRT.
Is there any way to get more information about the error?
Why is SIGABRT being raised? Can I catch it?
EDIT:
I realize that this program is a doozie, however I will try to explain it in case anyone would like to take a stab at it.
Through debugging I discovered that the cause of the SIGABRT was one of the kernels timing out.
The program is a tile-based 3D renderer. It is an OpenCL implementation of this algorithm: https://github.com/ssloy/tinyrenderer
The screen is divided into 8x8 tiles. One of the kernels (the tiler) computes which polygons overlap each tile, storing the results in a data structure called tilePolys. A subsequent kernel (the rasterizer), which runs one work item per tile, iterates over the list of polys occupying the tile and rasterizes them.
The tiler writes to an integer buffer which is a list of lists of polygon indices. Each list is of a fixed size (polysPerTile + 1 for the count) where the first element is the count and the subsequent polysPerTile elements are indices of polygons in the tile. There is one such list per tile.
For some reason in certain cases the tiler writes a very large poly count (13172746) to one of the tile's lists in tilePolys. This causes the rasterizer to loop for a long time and time out.
The strange thing is that the index to which the large count is written is never accessed by the tiler.
The code for the tiler kernel is below:
// this kernel is executed once per polygon
// it computes which tiles are occupied by the polygon and adds the index of the polygon to the list for that tile
kernel void tiler(
// number of polygons
ulong nTris,
// width of screen
int width,
// height of screen
int height,
// number of tiles in x direction
int tilesX,
// number of tiles in y direction
int tilesY,
// number of pixels per tile (tiles are square)
int tileSize,
// size of the polygon list for each tile
int polysPerTile,
// 4x4 matrix representing the viewport
global const float4* viewport,
// vertex positions
global const float* vertices,
// indices of vertices
global const int* indices,
// array of array-lists of polygons per tile
// structure of list is an int representing the number of polygons covering that tile,
// followed by [polysPerTile] integers representing the indices of the polygons in that tile
// there are [tilesX*tilesY] such arraylists
volatile global int* tilePolys)
{
size_t faceInd = get_global_id(0);
// compute vertex position in viewport space
float3 vs[3];
for(int i = 0; i < 3; i++) {
// indices are vertex/uv/normal
int vertInd = indices[faceInd*9+i*3];
float4 vertHomo = (float4)(vertices[vertInd*4], vertices[vertInd*4+1], vertices[vertInd*4+2], vertices[vertInd*4+3]);
vertHomo = vec4_mul_mat4(vertHomo, viewport);
vs[i] = vertHomo.xyz / vertHomo.w;
}
float2 bboxmin = (float2)(INFINITY,INFINITY);
float2 bboxmax = (float2)(-INFINITY,-INFINITY);
// size of screen
float2 clampCoords = (float2)(width-1, height-1);
// compute bounding box of triangle in screen space
for (int i=0; i<3; i++) {
for (int j=0; j<2; j++) {
bboxmin[j] = max(0.f, min(bboxmin[j], vs[i][j]));
bboxmax[j] = min(clampCoords[j], max(bboxmax[j], vs[i][j]));
}
}
// transform bounding box to tile space
int2 tilebboxmin = (int2)(bboxmin[0] / tileSize, bboxmin[1] / tileSize);
int2 tilebboxmax = (int2)(bboxmax[0] / tileSize, bboxmax[1] / tileSize);
// loop over all tiles in bounding box
for(int x = tilebboxmin[0]; x <= tilebboxmax[0]; x++) {
for(int y = tilebboxmin[1]; y <= tilebboxmax[1]; y++) {
// get index of tile
int tileInd = y * tilesX + x;
// get start index of polygon list for this tile
int counterInd = tileInd * (polysPerTile + 1);
// get current number of polygons in list
int numPolys = atomic_inc(&tilePolys[counterInd]);
// if list is full, skip tile
if(numPolys >= polysPerTile) {
// decrement the count because we will not add to the list
atomic_dec(&tilePolys[counterInd]);
} else {
// otherwise add the poly to the list
// the index is the offset + numPolys + 1 as tilePolys[counterInd] holds the poly count
int ind = counterInd + numPolys + 1;
tilePolys[ind] = (int)(faceInd);
}
}
}
}
My theories are that either:
I have incorrectly implemented the atomic functions for reading and incrementing the count
I am using an incorrect number format causing garbage to be written into tilePolys
One of my other kernels is inadvertently writing into the tilePolys buffer
I do not think it is the last one though because if instead of writing faceInd to tilePolys, I write a constant value, the large poly count disappears.
tilePolys[counterInd+numPolys+1] = (int)(faceInd); // this is the problem line
tilePolys[counterInd+numPolys+1] = (int)(5); // this fixes the issue
It looks like your kernel is crashing on the GPU itself. You can't really get any extra diagnostics about that directly, at least not on macOS. You'll need to start narrowing down the problem. Some suggestions:
As the crash is currently happening in clFinish() you don't know what asynchronous command is causing the crash. Try switching all your enqueue calls to blocking mode. This should cause it to crash in the call that's actually going wrong.
Check return/error codes on all OpenCL API calls. Sometimes, ignoring an error from an earlier call can cause problems in a later call which relies on earlier results. For example, if creating a buffer fails, passing the result of that buffer creation as a kernel argument will cause problems when trying to run the kernel.
The most likely reason for the crash is that your OpenCL kernel is accessing memory out of bounds or is otherwise misusing pointers. Re-check any array index calculations.
Check if the problem occurs with smaller work batches. Scale up from one workgroup (or work item if not using groups) and see if it only occurs beyond a certain work size. This may give you a clue about buffer sizes and array indices that might be causing the crash.
Systematically comment out parts of your kernel. If the crash goes away if you comment out a specific piece of code, there's a good chance the problem is in that code.
If you've narrowed the problem down to a small area of code but can't work out where it's coming from, start recording diagnostic output to check that variables have the values you're expecting.
Without seeing any code, I can't give you any more specific advice than that.
Note that OpenCL is deprecated on macOS, so if you're specifically targeting that platform and don't need to support Linux, Windows, etc. I recommend learning Metal Compute instead. Apple has made it clear that this is the GPU programming platform they want to support, and the tooling for it is already much better than their OpenCL tooling ever was.
I suspect Apple will eventually stop implementing OpenCL support when they release a Mac with a new type of GPU, so even if you're targeting the Mac as well as other platforms, you will probably need to switch to Metal on the Mac somewhere down the line anyway. As of macOS 10.14, the minimum system requirements of the OS already include a Metal-capable GPU, so you only need OpenCL as a fallback if you wish to support all Mac models able to run 10.13 or an even older OS version.
I am using Point Cloud Library. I know there is a function to find lines using RANSAC method, but I want to do opposite of that. I have a point cloud, I have an equation of line, now, I would like to find all the points on or near(within given threshold) the line.
Is there any function/s I can use to achieve my goal?
I would really appreciate any kind of help.
I have attempted to use PCL a few times for Kinect processing but it hasn't worked out too well for me. So I attempted to create my own algorithms to do what I want, and for the application, they work much faster than the PCL ones :)
The project I am working on is on GitHub and you can find some code that may help in the bool ConvexHull::addPoint(double newX, double newY, double newZ) found here.
This utilises a 3D plane equation generated using RANSAC and then compares each point to it, calculating the distance between the point and the plane, just like Oscee said.
Here's the juicy bit of the code which I think may help you:
// Find the distance from point to plane.
// http://mathworld.wolfram.com/Point-PlaneDistance.html
dist = newX * plane.a;
dist += newY * plane.b;
dist += newZ * plane.c;
dist += plane.d;
dist /= sqrt(pow(plane.a, 2) + pow(plane.b, 2) + pow(plane.c, 2));
dist = (dist >= 0) ? dist : -dist; // Absolute distance.
if (dist > tolerance) {
return false; // Return false as point is outside of tolerance.
}
With this function I pass in every point from the 640*480 Kinect image that has a depth value greater than 0.
And for me, this works quite fast :)
I hope this helps.
I don't think you need any special function to do that - simply go through all your points, calculate the point-line distance and accept the ones within your threshold and reject/delete the ones outside.
Alpha invisibility.
I currently define circular regions on some images as "hot spots". For instance, I could have my photo on screen and overlay a circle on my head. To check for interaction with my head in realtime, I would returnOverlaps and do some manipulation on all objects overlapping the circle. For debugging, I make the circle yellow with alpha 0.5, and for release I decrease alpha to 0, making the circle invisible (as it should be).
Does this slow down the program? Is there another way to make the circle itself invisible while still remaining capable of interaction? Is there some way to color it "invisible" without using a (potentially) costly alpha of 0? Cache as bitmap matrix? Or some other efficient way to solve the "hot spot" detection without using masks?
Having just a few invisible display objects should not slow it down that much, but having many could. I think a more cleaner option may be to just handle it all in code, rather then have actual invisible display objects on the stage.
For a circle, you would define the center point and radius. Then to get if anyone clicked on it, you could go:
var xDist:Number = circle.x - mousePoint.x;
var yDist:Number = circle.y - mousePoint.y;
if((xDist * xDist) + (yDist * yDist) <= (circle.radius * circle.radius)){
// mousePoint is within circle
} else {
// mousePoint is outside of circle
}
If you insist on using display objects to set these circular hit areas (sometimes it can be easier visually, then by numbers), you could also write some code to read those display objects (and remove them from being rendered) in to get their positions and radius size.
added method:
// inputX and inputY are the hotspot's x and y positions, and inputRadius is the radius of the hotspot
function hitTestObj(inputA:DisplayObject, inputX:int, inputY:int, inputRadius:int):Boolean {
var xDist:Number = inputX - inputA.x;
var yDist:Number = inputY - inputA.y;
var minDist:Number = inputRadius + (inputA.width / 2);
return (((xDist * xDist) + (yDist * yDist)) =< (minDist * minDist))
}
An alpha=0 isn't all that costly in terms of rendering as Flash player will optimize for that (check here for actual figures). Bitmap caching wouldn't be of any help as the sprite is invisible. There's other ways to perform collision detection by doing the math yourself (more relevant in games with tens or even hundreds of sprites) but that would be an overkill in your case.
I think swept means determining if objects will collide at some point, not just whether they are currently colliding, but if I'm wrong tell me.
I have objects with bounded boxes that are aligned on an axis. The boxes of objects can be different sizes, but they are always rectangular.
I've tried and tried to figure out an algorithm to determine if two moving AABB objects will collide at some point, but I am having a really hard time. I read a question on here about determining the time intervals when the two objects will pass at some point, and I didn't have a problem visualizing it, but implementing it was another story. It seems like there are too many exceptions, and it doesn't seem like I am doing it correctly.
The objects are only able to move in straight lines (though obviously they can change direction, e.g. turn around, but they are always on the axis. If they try to turn off the axis then it just doesn't work), and are bound to the axis. Their bounded boxes don't rotate or do anything like that. Velocity can change, but it doesn't matter since the point of the method is to determine whether, given the objects' current state, they are on a "collision course". If you need any more information let me know.
If someone could provide some pseudocode (or real code) that would be great. I read a document called Intersection of Convex Objects: The Method of Separating Axes but I didn't understand some of the pseudocode in it (what does Union mean)?
Any help is appreciated, thanks.
When a collision occurs, the boxes will touch on one side. You could check whether they would be touching for pairs of sides (LR, RL, UD, DU).
If it would simplify the problem, you could translate the boxes so the first box is at the origin and is not moving.
Something like the following code:
dLR = B.L - A.R;
dRL = A.L - B.R;
dUD = B.U - A.D;
dDU = A.U - B.D;
vX = A.xV - B.xV;
vY = A.yV - B.yV;
tLR = dLR / vX;
tRL =-dRL / vX;
tUD = dUD / vY;
tDU =-dDU / vY;
hY = dUD + dDU; //combined height
hX = dLR + dRL;
if((tLR > 0) && (abs(dDU + vY*tLR) < hY)) return true;
if((tRL > 0) && (abs(dUD - vY*tRL) < hY)) return true;
if((tUD > 0) && (abs(dRL + vX*tUD) < hX)) return true;
if((tDU > 0) && (abs(dLR - vX*tDU) < hX)) return true;
return false;