Inverse Kinematics: How to Parameterize a Ball-and-Socket Joint? - animation

I'm learning about inverse kinematics, and am trying to write a human skeleton simulation. I am having trouble deciding how to parameterize the rotation of a ball-and-socket joint.
Two methods that I can think of:
The familiar axis-angle (or Euler angle) way. Can change the characteristics of the joint by changing the order of rotation. Can also just use rotation matrices.
Using two quaternion rotations, one along the axis of the bone, and one to determine the orientation. I think this is more intuitive in terms of simulating the joint.
So which one should I use? As far as I can make out:
The axis-angle method is prone to gimbal-lock, which I can visualize
For the other method it is ambiguous as to which axes should be used when calculating the Jacobian entries - the v vector thing in this equation
(source: https://www.math.ucsd.edu/~sbuss/ResearchWeb/ikmethods/iksurvey.pdf, page 5)
I'm inclined to use the second method as I can get around the problem by using CCD instead of Jacobian pseudo-inverses. But I would just like to know which of these methods is used as standard (axis-angle or quaternions), and if so, what are the particular details I need to take into account if I were to adopt it.
Any advice would be helpful, but preferably professional, and in a non-esoteric language should you be kind enough to spare some code :-]

Related

Can you recommend a source of reference data for Fundamental matrix calculation

Specifically I'd ideally want images with point correspondences and a 'Gold Standard' calculated value of F and left and right epipoles. I could work with an Essential matrix and intrinsic and extrinsic camera properties too.
I know that I can construct F from two projection matrices and then generate left and right projected point coordinates from 3D actual points and apply Gaussian noise but I'd really like to work with someone else's reference data since I'm trying to test the efficacy of my code and writing more code to test the first batch of (possibly bad) code doesn't seem smart.
Thanks for any help
Regards
Dave
You should work with ground truth datasets for multi-view reconstructions. I recommend to use the Middlebury Multi-View Stereo datasets. Besides the image data in lossless format, they deliver camera parameters, such as camera pose and intrinsic camera calibration as well as the possibility to evaluate your own multi-view reconstruction system.
Perhaps, the results are not computed by "the" gold standard algorithm proposed in the book of Hartley and Zisserman but you can use it to compute the fundamental matrices you require between two views.
To compute the fundamental matrix F from two projection matrices P1 and P2 refer to the code Andrew Zisserman provides.

What does RiBasis which is described in RenderMan mean?

I'm working on a plugin of 3ds Max. In this plugin, I export the geometry information into a .rib file which can be rendered by a RenderMan renderer. When I export a nubrs curve's data into .rib file described by RiBasis and RiCurve. I use the RtBsplineBasis in RiBasis, but I get the wrong result that the rendered curve is short than the result of 3ds Max's renderer. Then I reprint the first and the last control vertex, the curve is long enough, but its shape is a little different.Who can tell me how I get wrong result or what does RiBasis mean? How can get correct RiBasis? Thank u very much!
RiCurve draws a cubic spline. The control points do not uniquely determine the curve; you also need the basis, which is expressed as a 4x4 matrix -- one matrix give the coefficients you need for a B-spline, Bezier, Catmull-Rom, and so on, and of course you can also supply the matrix yourself for some kind of hybrid interpolant that isn't quite one of the standard 3 or 4. The basis determines the character of the spline -- whether the curve is guaranteed to go through the control points or is merely approximating, the degree of continuity, the "tension", and so on.
There is a great discussion in one of the appendices of "The RenderMan Companion," including numeric examples of how different basis matrices affect the interpolation.
It sounds like you requested a B-spline basis, which is approximating (not interpolating) and continuous in both 1st and 2nd derivatives. Maybe that's not what you had in mind. It's hard to tell, since you didn't describe the properties of the spline that you were hoping for.
As an aside, approximating an arbitrary NURBS curve with a nonrational cubic is not always going to give you an exact match. Something else to keep in mind.

How to detect a Triangle gesture with kinect?

I am trying to implement a gesture recognition system which interprets the geometric gestures user makes and draws it on screen,
I have some idea of how circle can be recognized, however I have no clue how to get started with triangle recognition.
The data I have is X and Y coordinates of all points the gesture passed through. I get this data by tracking right hand.
I found something online called Hough Transform, which is used for detecting lines but I am not sure whether it will work for discrete collections of points.
Any ideas folks?
If you already have an x,y pair for the hand, the simplest thing that comes to mind is try the $1 Unistroke Recognizer.
A handy thing to look at is Dynamic Time Warping(DTW).
I've seen a fun Processing/SimpleOpenNI project that makes
use of that technique and the full skeleton called KineticSpace.
Since it's open-source might be worth having a peak.
I'd recommend trying the $1 Unistroke Recognizer first. You probably
need to work out a system to mimic press/release (perhaps using
the sign of the hand's velocity on z (positive to negative transitions/
negative to positive transitions) ?).
HTH
You can look for a space filling curve. It reduces the 2 dimension and reorder the points. It also add some spatial information. Maybe you can train or compare the new reordered 1d index with some simulated annealing or ant colony optimization?! A space filling curve is used in map tiling programs.

PointCloud with multiple Kinects

I am trying to make a PointCloud mapping user with multiple kinects on Processing. I get the user's front and back with 2 kinects on opposite sides and generate both PointClouds.
The trouble is that the PointClouds X/Y/Z are not syncronized, it just puts the two of them on screen and it surely looks messy. There is a way to calculate or make a comparison between them, to translate the second PointCloud to "join" the first? I could translate the position manually, but if I move the sensors it will go off again.
Supposing all the Kinects are stationary, I guess you would have to go in this order:
decide on which Kinect to use as a global reference,
get parameters for a 3D transformation for each of the other Kinects - I'd try to
use PMatrix3D and applyMatrix(), although it may be slow,
apply the transformations on to each of the other Kinects' point clouds and draw
the clouds
I don't (yet) know how to get the transformation parameters for a Procrustes transformation, but assuming they won't change, you'd probably have to set up multiple reference points, maybe by displaying the point clouds from each pair of Kinects and registering the points you know are the same in both point clouds. After getting enough of them, construct a PMatrix3D and apply it inside push/popMatrix.
This is the approach used by this guy: http://www.youtube.com/watch?v=ujUNj1RDL4I
An alternative approach would be to use an Iterative Closest Point algorithm and construct 3D transform from its output. I'd really like an ICP or PCL library for Processing, if anyone knows a good one.

Where can I find a good read about bicubic interpolation and Lanczos resampling?

I want to implement the two above mentioned image resampling algorithms (bicubic and Lanczos) in C++. I know that there are dozens of existing implementations out there, but I still want to make my own. I want to make it partly because I want to understand how they work, and partly because I want to give them some capabilities not found in mainstream implementations (like configurable multi-CPU support and progress reporting).
I tried reading Wikipedia, but the stuff is a bit too dry for me. Perhaps there are some nicer explanations of these algorithms? I couldn't find anything either on SO or Google.
Added: Seems like nobody can give me a good link about these topics. Can anyone at least try to explain them here?
The basic operation principle of both algorithms is pretty simple. They're both convolution filters. A convolution filter that for each output value moves the convolution functions point of origin to be centered on the output and then multiplies all the values in the input with the value of the convolution function at that location and adds them together.
One property of convolution is that the integral of the output is the product of the integrals of the two input functions. If you consider the input and output images, then the integral means average brightness and if you want the brightness to remain the same the integral of the convolution function needs to add up to one.
One way how to understand them is to think of the convolution function as something that shows how much input pixels influence the output pixel depending on their distance.
Convolution functions are usually defined so that they are zero when the distance is larger than some value so that you don't have to consider every input value for every output value.
For lanczos interpolation the convolution function is based on the sinc(x) = sin(x*pi)/x function, but only the first few lobes are taken. Usually 3:
lanczos(x) = {
0 if abs(x) > 3,
1 if x == 0,
else sin(x*pi)/x
}
This function is called the filter kernel.
To resample with lanczos imagine you overlay the output and input over eachother, with points signifying where the pixel locations are. For each output pixel location you take a box +- 3 output pixels from that point. For every input pixel that lies in that box, calculate the value of the lanczos function at that location with the distance from the output location in output pixel coordinates as the parameter. You then need to normalize the calculated values by scaling them so that they add up to 1. After that multiply each input pixel value with the corresponding scaling value and add the results together to get the value of the output pixel.
Because lanzos function has the separability property and, if you are resizing, the grid is regular, you can optimize this by doing the convolution horizontally and vertically separately and precalculate the vertical filters for each row and horizontal filters for each column.
Bicubic convolution is basically the same, with a different filter kernel function.
To get more detail, there's a pretty good and thorough explanation in the book Digital Image Processing, section 16.3.
Also, image_operations.cc and convolver.cc in skia have a pretty well commented implementation of lanczos interpolation.
While what Ants Aasma says roughly describes the difference, I don't think it is particularly informative as to why you might do such a thing.
As far as links go, you are asking a very basic question in image processing, and any decent introductory textbook on the subject will describe this. If I remember correctly, Gonzales and Woods is decent on it, but I'm away from my books and can't check.
Now on to the particulars, it should help to think about what you are doing fundamentally. You have a square lattice of measurements that you want to interpolate new values for. In the simple case of upsampling, lets imagine you want a new measurement in between every one that you already have (e.g. double the resolution).
Now you won't get the "correct" value, because in general you don't have that information. So you have to estimate it. How to do this? A very simple way would be to linearly interpolate. Everyone knows how to do this with two points, you just draw a line between them, and read the new value off the line (in this case, at the half way point).
Now an image is two dimensional, so you really want to do this in both the left-right and up-down directions. Use the result for your estimate and voila you have "bilinear" interpolation.
The main problem with this is that it isn't very accurate, although it's better (and slower) than the "nearest neighbor" approach which is also very local and fast.
To address the first problem, you want something better than a linear fit of two points, you want to fit something to more data points (pixels), and something that can be nonlinear. A good trade off on accuracy and computational cost is something called a cubic spline. So this will give you a smooth fit line, and again you approximate your new "measurement" by the value it takes in the middle. Do this in both directions and you've got "bicubic" interpolation.
So that's more accurate, but still heavy. One way to address the speed issue is to use a convolution, which has the nice property that in the Fourier domain, it's just a multiplication, so we can implement it quite quickly. But you don't need to worry about the implementation to understand that the convolution result at any point is one function (your image) being integrated in product another, typically much smaller support (the part that is non-zero) function called the kernel), after that kernel has been centered over that particular point. In the discrete world, these are just sums of the products.
It turns out that you can design a convolution kernel that has properties quite like the cubic spline, and use that to get a fast "bicubic"
Lancsoz resampling is a similar thing, with slightly different properties in the kernel, which primarily means they will have different characteristic artifacts. You can look up the details of these kernel functions easily enough (I'm sure wikipedia has them, or any intro text). The implementations used in graphics programs tend to be highly optimized and sometimes have specialized assumptions which make them more efficient but less general.
I would like suggest the following article for a basic understanding of different image interpolation methods image interpolation via convolution. If you want to try more interpolation methods, the imageresampler is a nice open source project to begin with.
In my opinion image interpolation can be understood from two aspects, one is from function fitting perspective, and one is from convolution perspective. For example, the spline interpolation explained in image interpolation via convolution is well explained from function fitting perspective in Cubic interpolation.
Additionally, image interpolation is always related to a specific application, for example image zooming, image rotation and so on. In fact for a specific application, image interpolation can be implemented i.n a smart way. For example, image rotation can be implemented via a three-shearing method, and during each shearing operation different one-dimension interpolation algorithms can be implemented.

Resources