I have a rather vague understanding of how rasterization is supposed to work.
So I totally understand how vertices make up a 3d image. I also ventured into model to world projection and even though I don't understand the math behind it ( I use helper libraries to multiply the matrices and have a chart denoting how to apply different transformations: rotate, scale, translate, etc).
So it's very easy for me to build some 3d model using blender and apply that logic to build a world matrix for each object.
But i've hit a brick wall trying to envision how to camera matrix is supposed to "look at" a specific cluster of vertices and what exactly happens to the object's world coordinates after the camera matrix is applied to the world matrix? and what does a camera matrix look like and how does the camera's "view axis" affect it's matrix (the camera could be looking at the z,x, y axis respectively)
I've managed to render a couple 3d objects with various rendering engines (openGL, XNA, etc) but most of it was due to having followed some guide on the internet or trying to interpret what some guy on youtube is trying to teach, and i'm still struggling trying to get an "intuitive" sense on how matrices are supposed to work as per camera parameters and how the camera is supposed to alter the object's world matrix
There are 5 steps in going from "world space" (Wx,Wy,Wz) to "screen space" (Sx,Sy): View, Clipping, Projection, Perspective Divide, Viewport. This is described pretty well here but some details are glossed over. I will try to explain the steps conceptually.
Imagine you have some vertices (what we want to render), a camera (with a position and orientation - which direction it is pointing), and a screen (a rectangular grid of WIDTHxHEIGHT pixels).
The Model Matrix I think you already understand: it scales, rotates, and translates each vertex into world coordinates (Wx,Wy,Wz,1.0). The last "1.0" (sometimes called the w component) allows us to represent translation and projection (as well as scaling and rotation) as a single 4x4 matrix.
The View Matrix (aka camera matrix) moves all the vertices to the point of view of the camera. I think of it as working in 2 steps: First it translates the entire scene (all vertices including the camera) such that in the new coordinate system the camera is at the origin. Second it rotates the entire scene such that the camera is looking from the origin in the direction of the -Z axis. There is a good description of this here. (Mathematically the rotation happens first, but I find it easier to visualize if I do the translation first.) At this point each vertex is in View coordinates (Vx,Vy,Vz,1.0). A good way to visualize this is to imagine the entire scene is embedded in ice; grab the block of ice and move it so the camera is at the origin pointing along the -z axis (and all the other objects in the world move along with the ice they are embedded in).
Next, the projection matrix encodes what kind of lens (wide angle vs telephoto) the camera has; in other words how much of the world will be visible on the screen. This is described well here but here is how it is calculated:
[ near/width ][ 0 ][ 0 ][ 0 ]
[ 0 ][ near/height ][ 0 ][ 0 ]
[ 0 ][ 0 ][(far+near)/(far-near) ][ 1 ]
[ 0 ][ 0 ][-(2*near*far)/(far-near)][ 0 ]
near = near plane distance (everything closer to the camera than this is clipped).
far = far plane distance (everything farther from the camera than this is clipped).
width = the widest object we can see if it is at the near plane.
height = the tallest object we can see if it is at the near plane.
. It results in "clip coordinates" (Cx,Cy,Cz,Cw=Vz). Note that the viewspace z coordinate (Vz) ends up in the w coordinate of the clip coordinates (Cw) (more on this below). This matrix stretches the world so that the camera's field of view is now 45 degrees up,down,left, and right. In other words, in this coordinate system if you look from the origin (camera position) straight along the -z axis (direction the camera is pointing) you will see what is in the center of the screen, and if you rotate your head up {down,left,right} you will see what will be at the top {bottom,left,right} of the screen. You can visualize this as a pyramid shape where the camera is at the top of the pyramid and the camera is looking straight down inside the pyramid. (This shape is called a "frustum" once you clip the top and bottom of the pyramid off with the near and far plane - see next paragraph.) The Cz value calculation makes vertices at the near plane have Cz=-Cw and vertices at the far plane have Cz=Cw
Clipping takes place in clip coordinates (which is why they are called that). Clipping means you take some scissors and clip away anything that is outside that pyramid shape. You also clip everything that is too close to the camera (the "near plane") and everything that is too far away from the camera (the "far plane"). See here for details.
Next comes the perspective divide. Remember that Cw == Vz? This is the distance from the camera to the vertex along the z axiz (the direction the camera is pointing). We divide each component by this Cw value to get Normalized Projection Coordinates (NPC) (Nx=Cx/Cw, Ny=Cy/Cw, Nz=Cz/Cw, Nw=Cw/Cw=1.0). All these values (Nx, Ny and Nz) will be between -1 and 1 because we clipped away anything where Cx > Cw or Cx < -Cw or Cy > Cw or Cy < -Cw or Cz > Cw or Cz < -Cw. Again see here for lots of details on this. The perspective divide is what makes things that are farther away appear smaller. The farther away from the camera something is, the larger the Cw (Vz) is, and the more its X and Y coordinate will be reduced when we divide.
The final step is the viewport transform. Nx Ny and Nz (each ranging from -1 to 1) are converted to pixel coordinates. For example Nx=-1 is at the left of the screen and Nx=1 is at the right of the screen, so we get Sx = (Nx * WIDTH/2) + (WIDTH/2) or equivalently Sx = (Nx+1) * WIDTH. Similar for Sy. You can think of Sz as the value that will be used in a depth buffer, so it needs to range from 0 for vertices at the near plane (Vz=near) to the maximum value that the depth buffer can hold (e.g. 2^24= 16777216 for a 24 bit z buffer) for vertices at the far plane (Vz=far).
The "camera matrix" as you called it sounds like a combination of two matrices: the view matrix and the projection matrix. It's possible you're only talking about one of these, but it's not clear.
View matrix: The view matrix is the inverse of what the camera's model matrix would be if you drew it in the world. In order to draw different camera angles, we actually move the entire world in the opposite direction - so there is only one camera angle.
Usually in OpenGL, the camera "really" stays at (0,0,0) and looks along the Z axis in the positive direction (towards 0,0,+∞). You can apply a rotation to the projection matrix to get a different direction, but why would you? Do all the rotation in the view matrix and your life is simpler.
So if you want your camera to be at (0,3,0) for example, instead of moving the camera up 3 units, we leave it at (0,0,0) and move the entire world down 3 units. If you want it to rotate 90 degrees, we actually rotate the world 90 degrees in the opposite direction. The world doesn't mind - it's just numbers in a computer - it doesn't get dizzy.
We only do this when rendering. All of the game physics calculations, for example, aren't done in the rotated world. The coordinates of the stuff in the world don't get changed when we rotate the camera - except inside the rendering system. Usually, we tell the GPU the normal world coordinates of the objects, and we get the GPU to move and rotate them for us, using a matrix.
Projection matrix: You know the view frustum? This shape you've probably seen before: (credit)
Everything inside the cut-off pyramid shape (frustum) is displayed on the screen. You know this.
Except the computer doesn't actually render in a frustum. It renders a cube. The view matrix transforms the frustum into a cube.
If you're familiar with linear algebra, you may notice that a 3D matrix can't make a cube into a frustum. That's what the 4th coordinate (w) is for. After this calculation, the x, y and z coordinates are all divided by w. By using a view matrix that makes w depend on z, the coordinates of far-away points get divided by a larger number, so they get pushed towards the middle of the screen - that's how a cube is able to turn into a frustum.
You don't have to have a frustum - that's what you get with a perspective projection. You can also use an orthographic projection, which turns a cube into a cube, by not changing w.
Unless you want to do a bunch of math yourself, I'd recommend you just use the library functions to generate projection matrices.
If you're multiplying vertices by several matrices in a row it's more efficient to combine them into one matrix, and then multiply the vertices by the combined matrix - hence you will often see MVP, MV and VP matrices used. (M = model matrix - I think it's the same thing you called a world matrix)
I am trying to draw a line segment from a point a 3D scene to point on a HUD UI. One end of the line segment is specified in 3D e.g. (1.232, -34.12, 4.21) but the other I want to specify in 2D pixel coordinates e.g. (320, 200).
How can I convert the 2D coordinate to a 3D point and have it remain at those pixel coordinates as the camera (Perspective) moves? Initially I thought of taking the 2D position and projecting it onto the near view frustum maybe that would work, but wasn't sure how to do it or if there was a better way?
var vector = new THREE.Vector3(320, 200, 0.5);
vector.unproject(camera);
will return in vector a 3D point which you can use to draw.
If you keep unprojecting as the perspective camera moves you are guaranteed that the 2D point will seem not to move in your HUD.
I’m not that good in geometry. I need to calibrate a projector to project on the ground. To do this process I need to use a frontal chart, as shown in this picture .
I know already the height of projector from the ground, and the distances of projected points (A,B,C and D) from the projector. I need to know the coordinates of points projected on the chart, perpendicular with the ground level.
Do triangle rule work for this problematic? Or are there some other techniques in the transformation world?
Yes, triangle rule works here for vertical coordinates of a,b,c,d points. For example,
c.y / h = D2 / (D1+D2)
But you have not enough information to get horizontal positions for these points.
I'm trying to make a rubik cube game in webgl using three.js (you can try it here).
And I have problems to detect on witch axis I have to rotate my cube according the rotation of the cube. For instance, if the cube is in original position/rotation, if I want to rotate the left layer from down to up, I must make a rotation on the Y axis. But I rotate my cube 90 degrees on Y, I will have to rotate It on the Z axis to rotate my left layer from down to up.
I'm trying to find a way to get the correct rotation axis according the orientation of the cube.
For the moment I check witch vector of the axis of the rotation matrix of the cube is most parallel with the vector(0,1,0) if I want to move a front layer from down to up. But it do not works in edge cases like this for instance :
I guess there is some simple way to do that, but I'm not good enough in matrix and mathematical stuff :)
An AxisHelper can show the aixs of the scene which you could determine the orientation with.
var axishelper = new THREE.AxisHelper(40);
axishelper.position.y = 300;
scene.add(axishelper);
You could also log your cube and check the position and rotation properties with Chrome Developer Tools or Firebug.
You can store the orientation of each cube in its own 4x4 matrix (i.e. a "model" matrix) that tells you how to get from the cube's local coordinates to the world's coordinates. Now, since you want to rotate the cube around to an axis (i.e. vector) in world coordinates, you need to translate the axis into cube coordinates. This is exactly what the inverse of the model matrix yields.
Let me clarify my question with a question:
Assume that I have a big cube = 100*100*100 and there are little cubes inside the big cube which construct big cube and their sizes are = 10*10*10. (I have 1000 little cubes inside the big cube) Now, I need to check, in which cube my point (2,2,2) exists. Answer is for sure 1st cube for this question. Then after finding the cube, I will hold the number of points that each cube consists.
My attempt: At first I thought if I compare my point with 8 corners it would be enough. I though that my point's coordinates must be greater then the 4 corners of the cube, and less then the remaining 4 corners of the cube and then I was going to iteratively increment the coordinates of the corner point to check for the other cubes. However, now I see that I am wrong.
What would be the best suiting algorithm for this problem?
Regards,
Amadeus
note: I am using MATLAB, therefore if there are any built-in functions for this purpose, I can use them also.
Lets say you have a cube N*N*N. And you create small cubes of dimension n*n*n.
Then all cubes can be represented by a 3D array such that the cube with its closest edge to origin at ( a*n, b*n, c*n ) is represented by index (a, b , c) in this 3d array.
And the value stored at that index is number of points inside that cube.
Here is pseudo code
//Pseudo code
int [N/n][N/n][N/n] arrayWithCountOfPointsInsideCubes;
void function countPointsForCubes(double point_x, point_y, point_z)
{
arrayWithCountOfPointsInsideCubes[pint_x/n][point_y/n][point_z/n]++;
}