What is the difference between computeScalingRotation and computeRotationScaling - eigen

In the documentation of Eigen's Transform class, there are two member functions with almost identical signatures:
void computeRotationScaling(RotationMatrixType*, ScalingMatrixType*) const
void computeScalingRotation(ScalingMatrixType*, RotationMatrixType*) const
Both functions have the identical documentation (The multiplication order is rotation * scaling in both functions).
decomposes the linear part of the transformation as a product rotation x scaling, the scaling being not necessarily positive.
If either pointer is zero, the corresponding computation is skipped.
This is defined in the SVD module.
What is the difference between them?

There difference is in the order. If you look closely, the difference is:
// computeRotationScaling
if(scaling) scaling->lazyAssign(svd.matrixV() * sv.asDiagonal() * svd.matrixV().adjoint());
// computeScalingRotation
if(scaling) scaling->lazyAssign(svd.matrixU() * sv.asDiagonal() * svd.matrixU().adjoint());
// ^ ^

Related

Gradient of a function in OpenCL

I'm playing around a bit with OpenCL and I have a problem which can be simplified as follows.
I'm sure this is a common problem but I cannot find many references or examples that would show me how this is usually done
Suppose for example you have a function (writing in CStyle syntax)
float function(float x1, float x2, float x3, float x4, float x5)
{
return sin(x1) + x1*cos(x2) + x3*exp(-x3) + x4 + x5;
}
I can also implement the gradient of this function as
void functionGradient(float x1, float x2, float x3, float x4, float x5, float gradient[])
{
gradient[0] = cos(x1) + cos(x2);
gradient[1] = -sin(x2);
gradient[2] = exp(-x3) - x3*exp(-x3);
gradient[3] = 1.0f;
gradient[4] = 1.0f;
}
Now I was thinking of implementing an OpenCL C kernel function that would do the same thing, cause I wanted to speed this up. The only way I have in mind to do this is to assign to each workunit a component of the gradient but then I'd need to put a bunch of if statements within the code to figure which workunit is computing what component, which isn't good in general because of divergence.
Therefore here is the question, how is such problem tackled in general? I'm aware for example of Gradient Descent implementations on GPU, see machine learning with backpropagation for example. So I wonder what is generally done to avoid divergence in the code.
Follow up from suggestion
I'm thinking of a possible SIMD compatible implementation as follows:
/*
Pseudo OpenCL-C code
here weight is a 5x5 array containing weights in {0,1} masking the relevant
computation
*/
__kernel void functionGradient(float x1, float x2, float x3, float x4, float x5, __global float* weight,__global* float gradient)
{
size_t threadId = get_global_id(0);
gradient[threadId] =
weight[5*threadId]*(cos(x1) + cos(x2)) +
weight[5*threadId + 1]*(-sin(x2)) +
weight[5*threadId + 2]*(exp(-x3) - x3*exp(x3)) +
weight[5*threadId + 3] + weight[5*threadId + 4];
barrier(CLK_GLOBAL_MEM_FENCE);
}
If your gradient function only has 5 components, it does not make sense to parallelize it in a way that one thread does one component. As you mentioned, GPU parallelization does not work if the mathematical structure of each components is different (multiple instructionsmultiple data, MIMD).
If you would need to compute the 5-dimensional gradient at 100k different coordinates however, then each thread would do all 5 components for each coordinate and parallelization would work efficiently.
In the backpropagation example, you have one gradient function with thousands of dimensions. In this case you would indeed parallelize the gradient function itself such that one thread computes one component of the gradient. However in this case all gradient components have the same mathematical structure (with different weighting factors in global memory), so branching is not required. Each gradient component is the same equation with different numbers (single instruction multiple data, SIMD). GPUs are designed to only handle SIMD; this is also why they are so energy efficient (~30TFLOPs # 300W) compared to CPUs (which can do MIMD, ~2-3TFLOPs # 150W).
Finally, note that backpropagation / neural nets are specifically designed to be SIMD. Not every new algorithm you come across can be parallelize in this manner.
Coming back to your 5-dimensional gradient example: There are ways to make it SIMD-compatible without branching. Specifically bit-maskimg: You would compute 2 cosines (for componet 1 express the sine through cosine) and one exponent and add all the terms up with a factor in front of each. The terms that you don't need, you multiply by a factor 0. Lastly, the factors are functions of the component ID. However as mentioned above, this only makes sense if you have many thousands to millions of dimensions.
Edit: here the SIMD-compatible version with bit masking:
kernel void functionGradient(const global float x1, const global float x2, const global float x3, const global float x4, const global float x5, global float* gradient) {
const float gid = get_global_id(0);
const float cosx1 = cos(x1);
const float cosx2 = cos((gid!=1)*x2+(gid==1)*3.1415927f);
const float expmx3 = exp(-x3);
gradient[gid] = (gid==0)*cosx1 + (gid<=1)*cosx2 + (gid==2)*(expmx3-x3*expmx3) + (gid>=3);
}
Note that there is no additional global/local memory access and all the (mutually exclusive) weighting factors are functions of the gloal ID. Each thread computes exactly the same thing (2 cos, 1 exp and a fes multiplications/additions) without any branching. Trigonometric functions / divisions take much more time than multiplications/additions, so as few as possible should be used by pre-calculating terms.

Algorithm for automatic channel detection

I'm currently working on a spare-time project to perform automatic modulation classification (AMC) on radio signals (more precisely, I'm interested in L-band satellite channels), using SDR. I would like it to be able to discover channels in the power spectrum, along with their frequencies and bandwidths, so I can direct the application of AMC on the output of this step.
My first (naive) approach was rather simple: after every N samples (say 1024) apply a window function, perform the FFT on the last N, compute an estimation of the power spectrum and apply some exponential smoothing to reduce the noise. Then, in the smoothed power spectrum, find the maximum and minimum signal levels, calculate some threshold value based on a weighted mean of both levels and use this threshold to determine which frequency bins belong to a channel.
This works well in my unit tests (one QPSK channel + gaussian noise). However, in real-life scenarios I either get a few channels or a lot of false-positives. Of course I can fix this by fine-tuning the weights in the threshold calculation, but then it wouldn't be automatic anymore.
I've been doing some research on Google but maybe I'm not using the right search keywords, or there is no real interest on this subject (which would be strange, as frequency scanners must perform this task somehow).
How could I find the appropriate values for the mean weights? Maybe there is a better approach than calculating a threshold for the whole spectrum?
Your smoothing approach seems a bit counter-productive: Why would noise in a sufficiently long DFT form something like a "sharp" shape? You're most likely suppressing narrowband carriers that way.
Then: There's really a lot of signal detectors, many simply based on energy detection in spectrum (just like your approach).
However, the better an estimator has to be, the more info on the signal you're looking for you'll need to have. Are you looking for 20 kHz wide narrowband channels, or for dozens of Megahertzes of high-rate QPSK satellite downlink? Do you know anything about the channel/pulse shape? Autocorrelation?
This is a bit of a wide field, so I'd propose you look at something that already works:
gr-inspector of Sebastian Müller is a proven and pretty awesome automatic channel detector, and can, for some types of transmissions, also infer modulation parameters.
See a demo video (dubstep warning!); what you seem to be sketching looks something like this in that:
But: that's just one of the things it can do.
More importantly: Energy-band detection based on a DFT is really just one of many things you can do to detect signals. It wouldn't, for example, detect spread-spectrum transmissions, like GPS, for example. For that, you'd need knowledge of the spreading technique or at least a large autocorrelation-based detector.
After considering Marcus Müller's suggestion, I developed an algorithm that still relies on a global energy threshold but also on an estimation of the noise floor. It can be improved in many ways, but as its simplest realization already provides acceptable results with real-world captures (again, L-band captures at 250 ksps) I believe it could be a good ground for other people to start with.
The central idea of the algorithm is to calculate the energy threshold based on a continuous estimation of the noise power, updating it with every update of the spectrogram. This estimation keeps track of the maximum and minimum levels attained by each FFT bin after during all FFT runs, using them to estimate the PSD in that bin and discard outliers (i.e. channels, spurs...). The algorithm is to be executed every fixed number of samples. More formally:
Parameters of the algorithm:
int N /* spectrogram length (FFT bins) */
complex x[N] /* last N samples */
float alpha /* spectrogram smoothing factor between 0 (infinite smoothing,
spectrogram will never be updated) and 1 (no smoothing at all) */
float beta /* level decay factor between 0 (levels will never decay) and 1
(levels will be equal to the spectrogram). */
float gamma /* smooth factor applied to updates of the current noise estimation
between 0 (no updates allowed) and 1 /* no smoothing */
float SNR /* detection threshold above noise, in dB */
Recommended values for alpha, beta and gamma are 1e-2, 1e-3 and .5 respectively.
Variables:
complex dft[N] /* Fourier transform of the last N samples */
float spect[N] /* smoothed spectrogram */
float spmin[N] /* lower levels for spectrogram bins */
float spmax[N] /* upper levels for spectrogram bins */
int runs /* FFT run counter (initially zero) */
float N0 /* Current noise level estimation */
float new_N0 /* New noise level estimation */
float min_pwr /* Minimum power density found */
int min_pwr_bin /* FFT bin where min_pwr is */
int valid /* Number of valid bins for noise estimation */
int min_runs /* Minimum number of runs required to detect channels */
Algorithm:
min_runs = max(2. / alpha, 1. / beta)
dft = FFT(x);
++runs;
if (runs == 1) then /* First FFT run */
spect = dft * conj(dft) /* |FFT(x)|^2 */
spmin = spect /* Copy spect to spmin */
spmax = spect /* Copy spect to spmax */
N0 = min(spect); /* First noise estimation */
else
/* Smooth spectrogram w.r.t the previous run */
spect += alpha * (dft * conj(dft) - spect)
/* Update levels. This has to be performed element-wise */
new_N0 = 0
valid = 0
min_pwr = INFINITY
min_pwr_bin = -1
for (int i = 0; i < N; ++i)
/* Update current lower levels or raise them */
if (spect[i] < spmin[i]) then
spmin[i] = spect[i]
else
spmin[i] += beta * (spect[i] - spmin[i]);
end
/* Update current upper levels or decrease them */
if (spect[i] > spmax[i]) then
spmax[i] = spect[i]
else
spmax[i] += beta * (spect[i] - spmax[i]);
end
if (runs > min_runs) then
/* Use previous N0 estimation to detect outliers */
if (spmin[i] < N0 or N0 < spmax[i]) then
new_N0 += spect[i]
++valid
end
/* Update current minimum power */
if (spect[i] < min_pwr) then
min_pwr = spect[i]
min_pwr_bin = i
end
end
end
/*
* Check whether levels have stabilized and update noise
* estimation accordingly
*/
if (runs > min_runs) then
/*
* This is a key step: if the number of valid bins is
* 0 this means that our previous estimation was
* absolutely wrong. We reset it with a cruder estimation
* based on where the minimum value of the current
* spectrogram was found
*/
if (valid == 0) then
N0 = .5 * (spmin[min_pwr_bin] + spmax[min_pwr_bin])
else
N0 += gamma * (new_N0 / valid - N0)
end
/*
* Detect channels based on this threshold (trivial,
* not detailed)
*/
detect_channels(spect, 10^(SNR / 10) * N0)
end
end
Even though this algorithm makes the strong assumption that the noise floor is flat (which is false in most cases as in real-world radios, tuner output passes through a low-pass filter whose response is not flat), it works even if this condition doesn't hold. These are some of the algorithm results for different values of alpha, N = 4096 and SNR = 3 dB. Noise estimation is marked in yellow, channel threshold in green, upper levels in red, spectrogram in white and lower levels in cyan. I also provide an evolution of the N0 estimation after every FFT run:
Results for alpha = 1e-1:
Results for alpha = 1e-2. Note how the number of valid bins has been reduced as the spectrogram got clearer:
Results for alpha = 1e-3. In this case, the levels are so tight and the noise floor so obviously non-flat that there are novalid bins from one FFT run to another. In this case we fall back to the crude estimation of looking for the bin with the lowest power density:
The min_runs calculation is critical. To prevent the noise level to updrift (this is, to follow a channel and not the noise floor) we must wait at least 2. / alpha FFT runs before trusting the signal levels. This value was found experimentally: in my previous implementations, I was intuitively using 1. / alpha which failed miserably for alpha = 1e-3:
I haven't tested this yet on other scenarios (like burst transmissions) where this algorithm may not perform as well as with continuous channels because of the persistence of min/max levels, and it may fail to detect burst transmissions as outliers. However, given the kind of channels I'm working with, this is not a priority for now.

Unprojecting Screen coords to world in OpenGL es 2.0

Long time listener, first time caller.
So I have been playing around with the Android NDK and I'm at a point where I want to Unproject a tap to world coordinates but I can't make it work.
The problem is the x and y values for both the near and far points are the same which doesn't seem right for a perspective projection. Everything in the scene draws OK so I'm a bit confused why it wouldn't unproject properly, anyway here is my code please help thanks
//x and y are the normalized screen coords
ndk_helper::Vec4 nearPoint = ndk_helper::Vec4(x, y, 1.f, 1.f);
ndk_helper::Vec4 farPoint = ndk_helper::Vec4(x, y, 1000.f, 1.f);
ndk_helper::Mat4 inverseProjView = this->matProjection * this->matView;
inverseProjView = inverseProjView.Inverse();
nearPoint = inverseProjView * nearPoint;
farPoint = inverseProjView * farPoint;
nearPoint = nearPoint *(1 / nearPoint.w_);
farPoint = farPoint *(1 / farPoint.w_);
Well, after looking at the vector/matrix math code in ndk_helper, this isn't a surprise. In short: Don't use it. After scanning through it for a couple of minutes, it has some obvious mistakes that look like simple typos. And particularly the Vec4 class is mostly useless for the kind of vector operations you need for graphics. Most of the operations assume that a Vec4 is a vector in 4D space, not a vector containing homogenous coordinates in 3D space.
If you want, you can check it out here, but be prepared for a few face palms:
https://android.googlesource.com/platform/development/+/master/ndk/sources/android/ndk_helper/vecmath.h
For example, this is the implementation of the multiplication used in the last two lines of your code:
Vec4 operator*( const float& rhs ) const
{
Vec4 ret;
ret.x_ = x_ * rhs;
ret.y_ = y_ * rhs;
ret.z_ = z_ * rhs;
ret.w_ = w_ * rhs;
return ret;
}
This multiplies a vector in 4D space by a scalar, but is completely wrong if you're operating with homogeneous coordinates. Which explains the results you are seeing.
I would suggest that you either write your own vector/matrix library that is suitable for graphics type operations, or use one of the freely available libraries that are tested, and used by others.
BTW, the specific values you are using for your test look somewhat odd. You definitely should not be getting the same results for the two vectors, but it's probably not what you had in mind anyway. For the z coordinate in your input vectors, you are using the distances of the near and far planes in eye coordinates. But then you apply the inverse view-projection matrix to those vectors, which transforms them back from clip/NDC space into world space. So your input vectors for this calculation should be in clip/NDC space, which means the z-coordinate values corresponding to the near/far plane should be at -1 and 1.

Finding translation and scale on two sets of points to get least square error in their distance?

I have two sets of 3D points (original and reconstructed) and correspondence information about pairs - which point from one set represents the second one. I need to find 3D translation and scaling factor which transforms reconstruct set so the sum of square distances would be least (rotation would be nice too, but points are rotated similarly, so this is not main priority and might be omitted in sake of simplicity and speed). And so my question is - is this solved and available somewhere on the Internet? Personally, I would use least square method, but I don't have much time (and although I'm somewhat good at math, I don't use it often, so it would be better for me to avoid it), so I would like to use other's solution if it exists. I prefer solution in C++, for example using OpenCV, but algorithm alone is good enough.
If there is no such solution, I will calculate it by myself, I don't want to bother you so much.
SOLUTION: (from your answers)
For me it's Kabsch alhorithm;
Base info: http://en.wikipedia.org/wiki/Kabsch_algorithm
General solution: http://nghiaho.com/?page_id=671
STILL NOT SOLVED:
I also need scale. Scale values from SVD are not understandable for me; when I need scale about 1-4 for all axises (estimated by me), SVD scale is about [2000, 200, 20], which is not helping at all.
Since you are already using Kabsch algorithm, just have a look at Umeyama's paper which extends it to get scale. All you need to do is to get the standard deviation of your points and calculate scale as:
(1/sigma^2)*trace(D*S)
where D is the diagonal matrix in SVD decomposition in the rotation estimation and S is either identity matrix or [1 1 -1] diagonal matrix, depending on the sign of determinant of UV (which Kabsch uses to correct reflections into proper rotations). So if you have [2000, 200, 20], multiply the last element by +-1 (depending on the sign of determinant of UV), sum them and divide by the standard deviation of your points to get scale.
You can recycle the following code, which is using the Eigen library:
typedef Eigen::Matrix<double, 3, 1, Eigen::DontAlign> Vector3d_U; // microsoft's 32-bit compiler can't put Eigen::Vector3d inside a std::vector. for other compilers or for 64-bit, feel free to replace this by Eigen::Vector3d
/**
* #brief rigidly aligns two sets of poses
*
* This calculates such a relative pose <tt>R, t</tt>, such that:
*
* #code
* _TyVector v_pose = R * r_vertices[i] + t;
* double f_error = (r_tar_vertices[i] - v_pose).squaredNorm();
* #endcode
*
* The sum of squared errors in <tt>f_error</tt> for each <tt>i</tt> is minimized.
*
* #param[in] r_vertices is a set of vertices to be aligned
* #param[in] r_tar_vertices is a set of vertices to align to
*
* #return Returns a relative pose that rigidly aligns the two given sets of poses.
*
* #note This requires the two sets of poses to have the corresponding vertices stored under the same index.
*/
static std::pair<Eigen::Matrix3d, Eigen::Vector3d> t_Align_Points(
const std::vector<Vector3d_U> &r_vertices, const std::vector<Vector3d_U> &r_tar_vertices)
{
_ASSERTE(r_tar_vertices.size() == r_vertices.size());
const size_t n = r_vertices.size();
Eigen::Vector3d v_center_tar3 = Eigen::Vector3d::Zero(), v_center3 = Eigen::Vector3d::Zero();
for(size_t i = 0; i < n; ++ i) {
v_center_tar3 += r_tar_vertices[i];
v_center3 += r_vertices[i];
}
v_center_tar3 /= double(n);
v_center3 /= double(n);
// calculate centers of positions, potentially extend to 3D
double f_sd2_tar = 0, f_sd2 = 0; // only one of those is really needed
Eigen::Matrix3d t_cov = Eigen::Matrix3d::Zero();
for(size_t i = 0; i < n; ++ i) {
Eigen::Vector3d v_vert_i_tar = r_tar_vertices[i] - v_center_tar3;
Eigen::Vector3d v_vert_i = r_vertices[i] - v_center3;
// get both vertices
f_sd2 += v_vert_i.squaredNorm();
f_sd2_tar += v_vert_i_tar.squaredNorm();
// accumulate squared standard deviation (only one of those is really needed)
t_cov.noalias() += v_vert_i * v_vert_i_tar.transpose();
// accumulate covariance
}
// calculate the covariance matrix
Eigen::JacobiSVD<Eigen::Matrix3d> svd(t_cov, Eigen::ComputeFullU | Eigen::ComputeFullV);
// calculate the SVD
Eigen::Matrix3d R = svd.matrixV() * svd.matrixU().transpose();
// compute the rotation
double f_det = R.determinant();
Eigen::Vector3d e(1, 1, (f_det < 0)? -1 : 1);
// calculate determinant of V*U^T to disambiguate rotation sign
if(f_det < 0)
R.noalias() = svd.matrixV() * e.asDiagonal() * svd.matrixU().transpose();
// recompute the rotation part if the determinant was negative
R = Eigen::Quaterniond(R).normalized().toRotationMatrix();
// renormalize the rotation (not needed but gives slightly more orthogonal transformations)
double f_scale = svd.singularValues().dot(e) / f_sd2_tar;
double f_inv_scale = svd.singularValues().dot(e) / f_sd2; // only one of those is needed
// calculate the scale
R *= f_inv_scale;
// apply scale
Eigen::Vector3d t = v_center_tar3 - (R * v_center3); // R needs to contain scale here, otherwise the translation is wrong
// want to align center with ground truth
return std::make_pair(R, t); // or put it in a single 4x4 matrix if you like
}
For 3D points the problem is known as the Absolute Orientation problem. A c++ implementation is available from Eigen http://eigen.tuxfamily.org/dox/group__Geometry__Module.html#gab3f5a82a24490b936f8694cf8fef8e60 and paper http://web.stanford.edu/class/cs273/refs/umeyama.pdf
you can use it via opencv by converting the matrices to eigen with cv::cv2eigen() calls.
Start with translation of both sets of points. So that their centroid coincides with the origin of the coordinate system. Translation vector is just the difference between these centroids.
Now we have two sets of coordinates represented as matrices P and Q. One set of points may be obtained from other one by applying some linear operator (which performs both scaling and rotation). This operator is represented by 3x3 matrix X:
P * X = Q
To find proper scale/rotation we just need to solve this matrix equation, find X, then decompose it into several matrices, each representing some scaling or rotation.
A simple (but probably not numerically stable) way to solve it is to multiply both parts of the equation to the transposed matrix P (to get rid of non-square matrices), then multiply both parts of the equation to the inverted PT * P:
PT * P * X = PT * Q
X = (PT * P)-1 * PT * Q
Applying Singular value decomposition to matrix X gives two rotation matrices and a matrix with scale factors:
X = U * S * V
Here S is a diagonal matrix with scale factors (one scale for each coordinate), U and V are rotation matrices, one properly rotates the points so that they may be scaled along the coordinate axes, other one rotates them once more to align their orientation to second set of points.
Example (2D points are used for simplicity):
P = 1 2 Q = 7.5391 4.3455
2 3 12.9796 5.8897
-2 1 -4.5847 5.3159
-1 -6 -15.9340 -15.5511
After solving the equation:
X = 3.3417 -1.2573
2.0987 2.8014
After SVD decomposition:
U = -0.7317 -0.6816
-0.6816 0.7317
S = 4 0
0 3
V = -0.9689 -0.2474
-0.2474 0.9689
Here SVD has properly reconstructed all manipulations I performed on matrix P to get matrix Q: rotate by the angle 0.75, scale X axis by 4, scale Y axis by 3, rotate by the angle -0.25.
If sets of points are scaled uniformly (scale factor is equal by each axis), this procedure may be significantly simplified.
Just use Kabsch algorithm to get translation/rotation values. Then perform these translation and rotation (centroids should coincide with the origin of the coordinate system). Then for each pair of points (and for each coordinate) estimate Linear regression. Linear regression coefficient is exactly the scale factor.
A good explanation Finding optimal rotation and translation between corresponding 3D points
The code is in matlab but it's trivial to convert to opengl using the cv::SVD function
You might want to try ICP (Iterative closest point).
Given two sets of 3d points, it will tell you the transformation (rotation + translation) to go from the first set to the second one.
If you're interested in a c++ lightweight implementation, try libicp.
Good luck!
The general transformation, as well the scale can be retrieved via Procrustes Analysis. It works by superimposing the objects on top of each other and tries to estimate the transformation from that setting. It has been used in the context of ICP, many times. In fact, your preference, Kabash algorithm is a special case of this.
Moreover, Horn's alignment algorithm (based on quaternions) also finds a very good solution, while being quite efficient. A Matlab implementation is also available.
Scale can be inferred without SVD, if your points are uniformly scaled in all directions (I could not make sense of SVD-s scale matrix either). Here is how I solved the same problem:
Measure distances of each point to other points in the point cloud to get a 2d table of distances, where entry at (i,j) is norm(point_i-point_j). Do the same thing for the other point cloud, so you get two tables -- one for original and the other for reconstructed points.
Divide all values in one table by the corresponding values in the other table. Because the points correspond to each other, the distances do too. Ideally, the resulting table has all values being equal to each other, and this is the scale.
The median value of the divisions should be pretty close to the scale you are looking for. The mean value is also close, but I chose median just to exclude outliers.
Now you can use the scale value to scale all the reconstructed points and then proceed to estimating the rotation.
Tip: If there are too many points in the point clouds to find distances between all of them, then a smaller subset of distances will work, too, as long as it is the same subset for both point clouds. Ideally, just one distance pair would work if there is no measurement noise, e.g when one point cloud is directly derived from the other by just rotating it.
you can also use ScaleRatio ICP proposed by BaoweiLin
The code can be found in github

How do I rotate in object space in 3D (using Matrixes)

what I am trying to do is to set up functions that can perform global and object space rotations, but am having problems understand how to go about object space rotations, as just multiplying a point by the rotation only works for global space, so my idea was to build the rotation in object space, then multiply it by the inverse of the objects matrix, supposedly taking away all the excess rotation between object and global space, so still maintaining the object space rotation, but in global values, I was wrong in this logic, as it did not work, here is my code, if you want to inspect it, all functions it calls have been tested to work:
// build object space rotation
sf::Vector3<float> XMatrix (MultiplyByMatrix(sf::Vector3<float> (cosz,sinz,0)));
sf::Vector3<float> YMatrix (MultiplyByMatrix(sf::Vector3<float> (-sinz,cosz,0)));
sf::Vector3<float> ZMatrix (MultiplyByMatrix(sf::Vector3<float> (0,0,1)));
// build cofactor matrix
sf::Vector3<float> InverseMatrix[3];
CoFactor(InverseMatrix);
// multiply by the transpose of the cofactor matrix(the adjoint), to bring the rotation to world space coordinates
sf::Vector3<float> RelativeXMatrix = MultiplyByTranspose(XMatrix, InverseMatrix[0], InverseMatrix[1], InverseMatrix[2]);
sf::Vector3<float> RelativeYMatrix = MultiplyByTranspose(YMatrix, InverseMatrix[0], InverseMatrix[1], InverseMatrix[2]);
sf::Vector3<float> RelativeZMatrix = MultiplyByTranspose(ZMatrix, InverseMatrix[0], InverseMatrix[1], InverseMatrix[2]);
// perform the rotation from world space
PointsPlusMatrix(RelativeXMatrix, RelativeYMatrix, RelativeZMatrix);
The difference between rotation in world-space and object-space is where you apply the rotation matrix.
The usual way computer graphics uses matrices is to map vertex points:
from object-space, (multiply by MODEL matrix to transform)
into world-space, (then multiply by VIEW matrix to transform)
into camera-space, (then multiply by PROJECTION matrix to transform)
into projection-, or "clip"- space
Specifically, suppose points are represented as column vectors; then, you transform a point by left-multiplying it by a transformation matrix:
world_point = MODEL * model_point
camera_point = VIEW * world_point = (VIEW*MODEL) * model_point
clip_point = PROJECTION * camera_point = (PROJECTION*VIEW*MODEL) * model_point
Each of these transformation matrices may itself be the result of multiple matrices multiplied in sequence. In particular, the MODEL matrix is often composed of a sequence of rotations, translations, and scalings, based on a hierarchical articulated model, e.g.:
MODEL = STAGE_2_WORLD * BODY_2_STAGE *
SHOULDER_2_BODY * UPPERARM_2_SHOULDER *
FOREARM_2_UPPERARM * HAND_2_FOREARM
So, whether you are rotating in model-space or world-space depends on which side of the MODEL matrix you apply your rotation matrix. Of course, you can easily do both:
MODEL = WORLD_ROTATION * OLD_MODEL * OBJECT_ROTATION
In this case, WORLD_ROTATION rotates about the center of world-space, while OBJECT_ROTATION rotates about the center of object-space.

Resources