Translate - Rotate - Translate Back using XMMATRIX - matrix

If all I know is an object's World matrix (because its x/y/z position is not tracked, which would be easier), how do I go about rotating it around it's center?
If I knew the location, it'd be about as simple as something like this:
XMMATRIX world = pMissile->GetWorldMatrix();
XMMATRIX matrixTranslation = XMMatrixTranslationFromVector(pMissile->GetPosition());
XMMATRIX matrixInvTranslations = XMMatrixInverse(nullptr, matrixTranslation);
float rotationAmount = (60 * XMConvertToRadians((float)fElapsedTime / 2.0f));
XMMATRIX missileWorld = world *
matrixInvTranslations
* XMMatrixRotationX(rotationAmount)
* XMMatrixRotationY(rotationAmount)
* XMMatrixRotationZ(rotationAmount)
* matrixTranslation;
pMissile->SetWorldMatrix(missileWorld);
Unfortunately, since I don't know the position, I'm not sure what to do. Basically I need to be able to get the "Translate back to the origin" from just the world matrix. Before I start pulling elements out of the matrix, there must be a DirectX or DirectXTK function to do this, no?
Currently I'm decomposing the matrix to get it:
XMVECTOR vectorTranslation, vectorScale, rotationQuat;
XMMatrixDecompose(&vectorScale, &rotationQuat, &vectorTranslation, world)
If that's the right/best way, let me know!
Somewhat tangentially, as you can see I use an inverse of the translation to "move it back" to where it was originally before I translated it to the origin for rotation. A lot of samples skip this - is there something I'm missing in that you don't -need- to translate back at the end?

XMMatrixDecompose is the correct, fully general way to get the elements of an arbitrary transformation matrix. The computation is expensive, so most folks make assumptions about what's in the matrix--because they control it at all points. For example, avoiding non-uniform scaling can really simplify things.
Many games exclusively use rotation and translation, and avoid scaling or at least avoid non-uniform scaling. You can quickly compute the inverse from such matrices by just transposing the upper 3x3 elements and then negating the x, y, and z elements of the last row.
If you know your matrix only contains a rotation and translation, and never contains scale, then the rotation matrix is just the upper 3x3 elements. As long as your matrix is homogenous (i.e. the last column is [0 0 0 1]), you can just read out the translation from the last row: world.r[3] should be (x, y, z, 1).
If you are new to DirectXMath, you should consider using the SimpleMath wrapper in the DirectX Tool Kit. It handles the alignment complexities a bit more automatically, and includes handy helpers like Matrix::Translation which just extracts the equivalent world.r[3] x, y, and z.

Related

How use raw Gryoscope Data °/s for calculating 3D rotation?

My question may seem trivial, but the more I read about it - the more confused I get... I have started a little project where I want to roughly track the movements of a rotating object. (A basketball to be precise)
I have a 3-axis accelerometer (low-pass-filtered) and a 3-axis gyroscope measuring °/s.
I know about the issues of a gyro, but as the measurements will only be several seconds and the angles tend to be huge - I don't care about drift and gimbal right now.
My Gyro gives me the rotation speed of all 3 axis. As I want to integrate the acceleration twice to get the position at each timestep, I wanted to convert the sensors coordinate-system into an earthbound system.
For the first try, I want to keep things simple, so I decided to go with the big standard rotation matrix.
But as my results are horrible I wonder if this is the right way to do so. If I understood correctly - the matrix is simply 3 matrices multiplied in a certain order. As rotation of a basketball doesn't have any "natural" order, this may not be a good idea. My sensor measures 3 angular velocitys at once. If I throw them into my system "step by step" it will not be correct since my second matrix calculates the rotation around the "new y-axis" , but my sensor actually measured an angular velocity around the "old y-axis". Is that correct so far?
So how can I correctly calculate the 3D rotation?
Do I need to go for quaternoins? but how do I get one from 3 different rotations? And don't I have the same issue here again?
I start with a unity-matrix ((1, 0, 0)(0, 1, 0)(0, 0, 1)) multiplied with the acceleration vector to give me the first movement.
Then I want use the Rotation matrix to find out, where the next acceleration is really heading so I can simply add the accelerations together.
But right now I am just too confused to find a proper way.
Any suggestions?
btw. sorry for my poor english, I am tired and (obviously) not a native speaker ;)
Thanks,
Alex
Short answer
Yes, go for quaternions and use a first order linearization of the rotation to calculate how orientation changes. This reduces to the following pseudocode:
float pose_initial[4]; // quaternion describing original orientation
float g_x, g_y, g_z; // gyro rates
float dt; // time step. The smaller the better.
// quaternion with "pose increment", calculated from the first-order
// linearization of continuous rotation formula
delta_quat = {1, 0.5*dt*g_x, 0.5*dt*g_y, 0.5*dt*g_z};
// final orientation at start time + dt
pose_final = quaternion_hamilton_product(pose_initial, delta_quat);
This solution is used in PixHawk's EKF navigation filter (it is open source, check out formulation here). It is simple, cheap, stable and accurate enough.
Unit matrix (describing a "null" rotation) is equivalent to quaternion [1 0 0 0]. You can get the quaternion describing other poses using a suitable conversion formula (for example, if you have Euler angles you can go for this one).
Notes:
Quaternions following [w, i, j, k] notation.
These equations assume angular speeds in SI units, this is, radians per second.
Long answer
A gyroscope describes the rotational speed of an object as a decomposition in three rotational speeds around the orthogonal local axes XYZ. However, you could equivalently describe the rotational speed as a single rate around a certain axis --either in reference system that is local to the rotated body or in a global one.
The three rotational speeds affect the body simultaneously, continously changing the rotation axis.
Here we have the problem of switching from the continuous-time real world to a simpler discrete-time formulation that can be easily solved using a computer. When discretizing, we are always going to introduce errors. Some approaches will lead to bigger errors, while others will be notably more accurate.
Your approach of concatenating three simultaneous rotations around orthogonal axes work reasonably well with small integration steps (let's say smaller than 1/1000 s, although it depends on the application), so that you are simulating the continuous change of rotation axis. However, this is computationally expensive, and error grows as you make time steps bigger.
As an alternative to first-order linearization, you can calculate pose increments as a small delta of angular speed gradient (also using quaternion representation):
quat_gyro = {0, g_x, g_y, g_z};
q_grad = 0.5 * quaternion_product(pose_initial, quat_gyro);
// Important to normalize result to get unit quaternion!
pose_final = quaternion_normalize(pose_initial + q_grad*dt);
This technique is used in Madgwick rotation filter (here an implementation), and works pretty fine for me.

Any faster method to move things in a circle?

Currently I'm using Math.cos and Math.sin to move objects in a circle in my game, however I suspect it's slow (didn't make proper tests yet though) after reading a bit about it.
Are there any ways to calculate this in a faster way?. Been reading that one alternative could be to have a sort of hash table with stored pre-calculated results, like old people used it in the old times before the computer age.
Any input is appreciated.
Expanding on my comment, if you don't have any angular acceleration (the angular velocity stays constant -- this is a requirement for the object to remain traveling in a circle with constant radius without changing the center-pointing force, e.g. via tension in a string), then you can use the following strategy:
1) Compute B = angular_velocity * time_step_size. This is how much angle change the object needs to go through in a single time step.
2) Compute sinb = sin(B) and cosb = cos(B).
3)
Note that we want to change the angle from A to A+B (the object is going counterclockwise). In this derivation, the center of the circle we're orbiting is given by the origin.
Since the radius of the circle is constant, we know r*sin(A+B) = y_new = r*sin(A)cos(B) + r*cos(A)sin(B) = y_old * cos(B) + x_old*sin(B) and r*cos(A+B) = x_new = r*cos(A)*cos(B) - r*sin(A)sin(B) = x_old*cos(B) - y_old*sin(B).
We've removed the cosine and sine of anything we don't already know, so the Cartesian coordinates can be written as
x_new = x_old*cosb - y_old*sinb
y_new = x_old*sinb + y_old*cosb
No more cos or sin calls except in an initialization step which is called once. Obviously, this won't save you anything if B keeps changing for whatever reason (either angular velocity or time step size changes).
You'll notice this is the same as multiplying the position vector by a fixed rotation matrix. You can translate by the circle center and translate back if you don't want to only consider circles with a center at the origin.
First Edit
As #user5428643 mentions, this method is numerically unstable over time due to drift in the radius. You can probably correct this by periodically renormalizing x and y (x_new = x_old * r_const / sqrt(x_old^2 + y_old^2) and similarly for y every few thousand steps -- if you implement this, save the factor r_const / sqrt(x_old^2 + y_old^2) since it is the same for both x and y). I'll think about it some more and edit this answer if I come up with a better fix.
Second Edit
Some more comments on the numerical drift over time:
I did a couple of tests in C++ and python. In C++ using single precision floats, there is sizable drift even after 1 million time steps when B = 0.1. I used a circle with radius 1. In double precision, I didn't notice any drift visually after 100 million steps, but checking the radius shows that it is contaminated in the lower few digits. Doing the renormalization on every step (which is unnecessary if you're just doing visualization) results in an approximately 4 times slower running time versus the drifty version. However, even this version is about 2-3 times faster than using sin and cos on every iteration. I used full optimization (-O3) in g++. In python (using the math package) I only got a speed up of 2 between the drifty and normalized versions, however the sin and cos version actually slots in between these two -- it's almost exactly halfway between these two in terms of run time. Renormalizing every once in a few thousand steps would still make this faster, but it's not nearly as big a difference as my C++ version would indicate.
I didn't do too much scientific testing to get the timings, just a few tests with 1 million to 1 billion steps in increments of 10.
Sorry, not enough rep to comment.
The answers by #neocpp and #oliveryas01 would both be perfectly correct without roundoff error.
The answer by #oliveryas01, just using sine and cosine directly, and precalculating and storing many values if necessary, will work fine.
However, #neocpp's answer, repeatedly rotating by small angles using a rotation matrix, is numerically unstable; over time, the roundoff error in the radius will tend to grow exponentially, so if you run your programme for a long time the objects will slowly move off the circle, spiralling either inwards or outwards.
You can see this mathematically with a little numerical analysis: at each stage, the squared radius is approximately multiplied by a number which is approximately constant and approximately equal to 1, but almost certainly not exactly equal to 1 due to inexactness of floating point representations.
If course, if you're using double precision numbers and only trying to achieve a simple visual effect, this error may not be large enough to matter to you.
I would stick with sine and cosine if I were you. They're the most efficient way to do what you're trying to do. If you really want maximum performance then you should generate an array of x and y values from the sine and cosine values, then plug that array's values into the circle's position. This way, you aren't running sine and cosine repeatedly, instead only for one cycle.
Another possibility completely avoiding the trig functions would be use a polar-coordinate model, where you set the distance and angle. For example, you could set the x coordinate to be the distance, and the rotation to be the angle, as in...
var gameBoardPin:Sprite = new Sprite();
var gameEntity:Sprite = new YourGameEntityHere();
gameBoardPin.addChild( gameEntity );
...and in your loop...
// move gameEntity relative to the center of gameBoardPin
gameEntity.x = circleRadius;
// rotate gameBoardPin from its center causes gameEntity to rotate at the circleRadius
gameBoardPin.rotation = desiredAngleForMovingObject
gameBoardPin's x,y coordinates would be set to the center of rotation for gameEntity. So, if you wanted the gameEntity to rotate with a 100 pixel tether around the center of the stage, you might...
gameBoardPin.x = stage.stageWidth / 2;
gameBoardPin.y = stage.stageHeight / 2;
gameEntity.x = 100;
...and then in the loop you might...
desiredAngleForMovingObject += 2;
gameBoardPin.rotation = desiredAngleForMovingObject
With this method you're using degrees instead of radians.

How to tilt compensate my magnetometer ? Tried a lot

I try to tilt compensate a magnetometer (BMX055) reading and tried various approaches I have found online, not a single one works.
I atually tried almost any result I found on Google.
I run this on an AVR, it would be extra awesome to find something that works without complex functions (trigonometry etc) for angles up to 50 degree.
I have a fused gravity vector (int16 signed in a float) from gyro+acc (1g gravity=16k).
attitude.vect_mag.x/y/z is a float but contains a 16bit integer ranging from around -250 to +250 per axis.
Currently I try this code:
float rollRadians = attitude.roll * DEG_TO_RAD / 10;
float pitchRadians = attitude.pitch * DEG_TO_RAD / 10;
float cosRoll = cos(rollRadians);
float sinRoll = sin(rollRadians);
float cosPitch = cos(pitchRadians);
float sinPitch = sin(pitchRadians);
float Xh = attitude.vect_mag.x * cosPitch + attitude.vect_mag.z * sinPitch;
float Yh = attitude.vect_mag.x * sinRoll * sinPitch + attitude.vect_mag.y * cosRoll - attitude.vect_mag.z *sinRoll * cosPitch;
float heading = atan2(Yh, Xh);
attitude.yaw = heading*RAD_TO_DEG;
The result is meaningless, but the values without tilt compensation are correct.
The uncompensated formula:
atan2(attitude.vect_mag.y,attitude.vect_mag.x);
works fine (when not tilted)
I am sort of clueless what is going wrong, the normal atan2 returns a good result (when balanced) but using the wide spread formulas for tilt compensation completely fails.
Do I have to keep the mag vector values within a specific range for the trigonometry to work ?
Any way to do the compensation without trig functions ?
I'd be glad for some help.
Update:
I found that the BMX055 magnetometer has X and Y inverted as well as Y axis is *-1
The sin/cos functions now seem to lead to a better result.
I am trying to implement the suggested vector algorithms, struggling so far :)
Let us see.
(First, forgive me a bit of style nagging. The keyword volatile means that the variable may change even if we do not change it ourselves in our code. This may happen with a memory position that is written by another process (interrupt request in AVR context). For the compiler volatile means that the variable has to be always loaded and stored into memory when used. See:
http://en.wikipedia.org/wiki/Volatile_variable
So, most likely you do not want to have any attributes to your floats.)
Your input:
three 12-bit (11 bits + sign) integers representing accelerometer data
three approximately 9-bit (8 bits + sign) integers representing the magnetic field
Good news (well...) is that your resolution is not that big, so you can use integer arithmetics, which is much faster. Bad news is that there is no simple magical one-liner which would solve your problem.
First of all, what would you like to have as the compass bearing when the device is tilted? Should the device act as if it was not tilted, or should it actually show the correct projection of the magnetic field lines on the screen? The latter is how an ordinary compass acts (if the needle moves at all when tilted). In that case you should not compensate for anything, and the device can show the fancy vertical tilt of the magnetic lines when rolled sideways.
In any case, try to avoid trigonometry, it takes a lot of code space and time. Vector arithmetics is much simpler, and most of the time you can make do with multiplys and adds.
Let us try to define your problem in vector terms. Actually you have two space vectors to start with, m pointing to the direction of the magnetic field, g to the direction of gravity. If I have understood your intention correctly, you need to have vector d which points along some fixed direction in the device. (If I think of a mobile phone, d would be a vector parallel to the screen left or right edges.)
With vector mathematics this looks rather simple:
g is a normal to a horizontal (truly horizontal) plane
the projection of m on this plane defines the direction a horizontal compass would show
the projection of d on the plane defines the "north" on the compass face
the angle between m and d gives the compass bearing
Now that we are not interested in the magnitude of the magnetic field, we can scale everything as we want. This reduces the need to use unity vectors which are expensive to calculate.
So, the maths will be something along these lines:
# projection of m on g (. represents dot product)
mp := m - g (m.g) / (g.g)
# projection of d on g
dp := d - g (d.g) / (g.g)
# angle between mp and dp
cos2 := (mp.dp)^2 / (mp.mp * dp.dp)
sgn1 := sign(mp.dp)
# create a vector 90 rotated from d on the plane defined by g (x is cross product)
drot := dp x g
sin2 := (mp.drot)^2 / (mp.mp * drot.drot)
sgn2 := sign(mp.drot)
After this you will have a sin^2 and cos^2 of the compass directions. You need to create a resolving function for one quadrant and then determine the correct quadrant by using the signs. The resolving function may sound difficult, but actually you just need to create a table lookup function for sin2/cos2 or cos2/sin2 (whichever is smaller). It is relatively fast, and only a few points are required in the lookup (with bilinear approximation even fewer).
So, as you can see, there are no trig functions around, and even no square roots around. Vector dots and crosses are just multiplys. The only slightly challenging trick is to scale the fixed point arithmetics to the correct scale in each calculation.
You might notice that there is a lot of room for optimization, as the same values are used several times. The first step is to get the algorithm run on a PC with floating point with the correct results. The optimizations come later.
(Sorry, I am not going to write the actual code here, but if there is something that needs clarifying, I'll be glad to help.)

Path Tracing algorithm - Need help understanding key point

So the Wikipedia page for path tracing (http://en.wikipedia.org/wiki/Path_tracing) contains a naive implementation of the algorithm with the following explanation underneath:
"All these samples must then be averaged to obtain the output color. Note this method of always sampling a random ray in the normal's hemisphere only works well for perfectly diffuse surfaces. For other materials, one generally has to use importance-sampling, i.e. probabilistically select a new ray according to the BRDF's distribution. For instance, a perfectly specular (mirror) material would not work with the method above, as the probability of the new ray being the correct reflected ray - which is the only ray through which any radiance will be reflected - is zero. In these situations, one must divide the reflectance by the probability density function of the sampling scheme, as per Monte-Carlo integration (in the naive case above, there is no particular sampling scheme, so the PDF turns out to be 1)."
The part I'm having trouble understanding is the part in bold. I am familiar with PDFs but I am not quite sure how they fit into here. If we stick to the mirror example, what would be the PDF value we would divide by? Why? How would I go about finding the PDF value to divide by if I was using an arbitrary BRDF value such as a Phong reflection model or Cook-Torrance reflection model, etc? Lastly, why do we divide by the PDF instead of multiply? If we divide, don't we give more weight to a direction with a lower probability?
Let's assume that we have only materials without color (greyscale). Then, their BDRF at each point can be expressed as a single valued function
float BDRF(phi_in, theta_in, phi_out, theta_out, pointWhereObjWasHit);
Here, phi and theta are the azimuth and zenith angles of the two rays under consideration. For pure Lambertian reflection, this function would look like this:
float lambertBRDF(phi_in, theta_in, phi_out, theta_out, pointWhereObjWasHit)
{
return albedo*1/pi*cos(theta_out);
}
albedo ranges from 0 to 1 - this measures how much of the incoming light is reemitted. The factor 1/pi ensures that the integral of BRDF over all outgoing vectors does not exceed 1. With the naive approach of the Wikipedia article (http://en.wikipedia.org/wiki/Path_tracing), one can use this BRDF as follows:
Color TracePath(Ray r, depth) {
/* .... */
Ray newRay;
newRay.origin = r.pointWhereObjWasHit;
newRay.direction = RandomUnitVectorInHemisphereOf(normal(r.pointWhereObjWasHit));
Color reflected = TracePath(newRay, depth + 1);
return emittance + reflected*lambertBDRF(r.phi,r.theta,newRay.phi,newRay.theta,r.pointWhereObjWasHit);
}
As mentioned in the article and by Ross, this random sampling is unfortunate because it traces incoming directions (newRay's) from which little light is reflected with the same probability as directions from which there is lots of light. Instead, directions whence much light is reflected to the observer should be selected preferentially, to have an equal sample rate per contribution to the final color over all directions. For that, one needs a way to generate random rays from a probability distribution. Let's say there exists a function that can do that; this function takes as input the desired PDF (which, ideally should be be equal to the BDRF) and the incoming ray:
vector RandomVectorWithPDF(function PDF(p_i,t_i,p_o,t_o,point x), Ray incoming)
{
// this function is responsible to create random Rays emanating from x
// with the probability distribution PDF. Depending on the complexity of PDF,
// this might somewhat involved. It is possible, however, to do it for Lambertian
// reflection (how exactly is math, not programming):
vector randomVector;
if(PDF==lambertBDRF)
{
float phi = uniformRandomNumber(0,2*pi);
float rho = acos(sqrt(uniformRandomNumber(0,1)));
float theta = pi/2-rho;
randomVector = getVectorFromAzimuthZenithAndNormal(phi,zenith,normal(incoming.whereObjectWasHit));
}
else // deal with other PDFs
return randomVector;
}
The code in the TracePath routine would then simply look like this:
newRay.direction = RandomVectorWithPDF(lambertBDRF,r);
Color reflected = TracePath(newRay, depth + 1);
return emittance + reflected;
Because the bright directions are preferred in the choice of samples, you do not have to weight them again by applying the BDRF as a scaling factor to reflected. However, if PDF and BDRF are different for some reason, you would have to scale down the output whenever PDF>BDRF (if you picked to many from the respective direction) and enhance it when you picked to little .
In code:
newRay.direction = RandomVectorWithPDF(PDF,r);
Color reflected = TracePath(newRay, depth + 1);
return emittance + reflected*BDRF(...)/PDF(...);
The output is best, however, if BDRF/PDF is equal to 1.
The question remains why can't one always choose the perfect PDF which is exactly equal to the BDRF? First, some random distributions are harder to compute than others. For example, if there was a slight variation in the albedo parameter, the algorithm would still do much better for the non-naive sampling than for uniform sampling, but the correction term BDRF/PDF would be needed for the slight variations. Sometimes, it might even be impossible to do it at all. Imagine a colored object with different reflective behavior of red green and blue - you could either render in three passes, one for each color, or use an average PDF, which fits all color components approximately, but none perfectly.
How would one go about implementing something like Phong shading? For simplicity, I still assume that there is only one color component, and that the ratio of diffuse to specular reflection is 60% / 40% (the notion of ambient light makes no sense in path tracing). Then my code would look like this:
if(uniformRandomNumber(0,1)<0.6) //diffuse reflection
{
newRay.direction=RandomVectorWithPDF(lambertBDRF,r);
reflected = TracePath(newRay,depth+1)/0.6;
}
else //specular reflection
{
newRay.direction=RandomVectorWithPDF(specularPDF,r);
reflected = TracePath(newRay,depth+1)*specularBDRF/specularPDF/0.4;
}
return emittance + reflected;
Here specularPDF is a distribution with a narrow peak around the reflected ray (theta_in=theta_out,phi_in=phi_out+pi) for which a way to create random vectors is available, and specularBDRF returns the specular intensity from Phong's model (http://en.wikipedia.org/wiki/Phong_reflection_model).
Note how the PDFs are modified by 0.6 and 0.4 respectively.
I'm by no means an expert in ray tracing, but this seems to be classic Monte Carlo:
You have lots of possible rays, and you choose one uniformly at random and then average over lots of trials.
The distribution you used to choose one of the rays was uniform (they were all equally as likely)
so you don't have to do any clever re-normalising.
However, Perhaps there are lots of possible rays to choose, but only a few would possibly lead to useful results.We therefore bias towards picking those 'useful' possibilities with higher probability, and then re-normalise (we are not choosing the rays uniformly any more, so we can't just take the average). This is
importance sampling.
The mirror example seems to be the following: only one possible ray will give a useful result.
If we choose a ray at random then the probability we hit that useful ray is zero: this is a property
of conditional probability on continuous spaces (it's not actually continuous, it's implicitly discretised
by your computer, so it's not quite true...): the probability of hitting something specific when there are infinitely many things must be zero.
Thus we are re-normalising by something with probability zero - standard conditional probability definitions
break when we consider events with probability zero, and that is where the problem would come from.

Fastest way to to take coordinates from model space, to canonical coordinates space in OpenGL ES 2.0

Like many 3d graphical programs, I have a bunch of objects that have their own model coordinates (from -1 to 1 in x, y, and z axis). Then, I have a matrix that takes it from model coordinates to world coordinates (using the location, rotation, and scale of the object being drawn). Finally, I have a second matrix to turn those world coordinates into canonical coordinates that OopenGL ES 2.0 will use to draw to the screen.
So, because one object can contain many vertices, all of which use the same transform into both world space, and canonical coordinates, it's faster to calculate the product of those two matrices once, and put each vertex through the resulting matrix, rather than putting each vertex through both matrices.
But, as far as I can tell, there doesn't seem to be a way in OpenGL ES 2.0 shaders to have it calculate the matrix once, and keep using it until the one of the two matrices used until glUniformMatrix4fv() (or another function to set a uniform) is called. So it seems like the only way to calculate the matrix once would be to do it on the CPU, and then result to the GPU using a uniform. Otherwise, when something like:
gl_Position = uProjection * uMV * aPosition;
it will calculate it over and over again, which seems like it would waste time.
So, which way is usually considered standard? Or is there a different way that I am completely missing? As far as I could tell, the shader used to implement the OpenGL ES 1.1 pipeline in the OpenGL ES 2.0 Programming Guide only used one matrix, so is that used more?
First, the correct OpenGL term for "canonical coordinates" is clip space.
Second, it should be this:
gl_Position = uProjection * (uMV * aPosition);
What you posted does a matrix/matrix multiply followed by a matrix/vector multiply. This version does 2 matrix/vector multiplies. That's a substantial difference.
You're using shader-based hardware; how you handle matrices is up to you. There is nothing that is "considered standard"; you do what you best need to do.
That being said, unless you are doing lighting in model space, you will often need some intermediary between model space and 4D homogeneous clip-space. This is the space you transform the positions and normals into in order to compute the light direction, dot(N, L), and so forth.
Personally, I wouldn't suggest world space for reasons that I explain thoroughly here. But whether it's world space, camera space, or something else, you will generally have some intermediate space that you need positions to be in. At which point, the above code becomes necessary, and thus there is no time wasted.

Resources