How would I project 2D coordinates on to 3D?
For example I have an image (represented as a particle) that is 256px wide. If I pretend this image is centered on the origin (0,0) in 2D space then the vertical sides of the square are located at x = 128 and x = -128.
So when this image(particle) is placed in a Three.js scene at the origin(0,0) and the camera is at CamZ distance from the origin, then how do I project the original 2D coordinates to 3D which in tern will tell me the width the image(particle!) appears on screen in three.js units.
An easy to understand way would be creating a geometry with vertices in -128, 128, 0, 128, 128, 0, -128, -128, 0 and 128, -128, 0. Then use Projector for projecting that geometry using the camera. It will give you an array of projected points that should be from -1 to 1. You'll then need to multiply that to the viewport size.
There is another way to do it. The exact translation between 2D and 3D can be approximated heuristically. Often it is more difficult to implement than using three.js to project a vector, but with an exponential translation you can map many 2D/3D translations.
The idea here is to use an Exponential easing function to calculate the translation. The easing functions that most libraries use (such as jQuery UI) are the Robert Penner Easing functions.
The easeOutExpo function works surprisingly well at approximating 2D/3D translations. Generally it would look something like this:
// easeOutExpo(time, base, change, duration)
var xPosition3D = xPosition2D * expo(xPosition2D, 0, coefficient, xMax2D);
It takes a co-efficient, and the exact number will depend on the aspect ratio and focal length of the 3D camera. Usually, something like -.2 works well.
I know this is an insufficient explanation, but hopefully it points you in the right direction.
Related
I have a rather vague understanding of how rasterization is supposed to work.
So I totally understand how vertices make up a 3d image. I also ventured into model to world projection and even though I don't understand the math behind it ( I use helper libraries to multiply the matrices and have a chart denoting how to apply different transformations: rotate, scale, translate, etc).
So it's very easy for me to build some 3d model using blender and apply that logic to build a world matrix for each object.
But i've hit a brick wall trying to envision how to camera matrix is supposed to "look at" a specific cluster of vertices and what exactly happens to the object's world coordinates after the camera matrix is applied to the world matrix? and what does a camera matrix look like and how does the camera's "view axis" affect it's matrix (the camera could be looking at the z,x, y axis respectively)
I've managed to render a couple 3d objects with various rendering engines (openGL, XNA, etc) but most of it was due to having followed some guide on the internet or trying to interpret what some guy on youtube is trying to teach, and i'm still struggling trying to get an "intuitive" sense on how matrices are supposed to work as per camera parameters and how the camera is supposed to alter the object's world matrix
There are 5 steps in going from "world space" (Wx,Wy,Wz) to "screen space" (Sx,Sy): View, Clipping, Projection, Perspective Divide, Viewport. This is described pretty well here but some details are glossed over. I will try to explain the steps conceptually.
Imagine you have some vertices (what we want to render), a camera (with a position and orientation - which direction it is pointing), and a screen (a rectangular grid of WIDTHxHEIGHT pixels).
The Model Matrix I think you already understand: it scales, rotates, and translates each vertex into world coordinates (Wx,Wy,Wz,1.0). The last "1.0" (sometimes called the w component) allows us to represent translation and projection (as well as scaling and rotation) as a single 4x4 matrix.
The View Matrix (aka camera matrix) moves all the vertices to the point of view of the camera. I think of it as working in 2 steps: First it translates the entire scene (all vertices including the camera) such that in the new coordinate system the camera is at the origin. Second it rotates the entire scene such that the camera is looking from the origin in the direction of the -Z axis. There is a good description of this here. (Mathematically the rotation happens first, but I find it easier to visualize if I do the translation first.) At this point each vertex is in View coordinates (Vx,Vy,Vz,1.0). A good way to visualize this is to imagine the entire scene is embedded in ice; grab the block of ice and move it so the camera is at the origin pointing along the -z axis (and all the other objects in the world move along with the ice they are embedded in).
Next, the projection matrix encodes what kind of lens (wide angle vs telephoto) the camera has; in other words how much of the world will be visible on the screen. This is described well here but here is how it is calculated:
[ near/width ][ 0 ][ 0 ][ 0 ]
[ 0 ][ near/height ][ 0 ][ 0 ]
[ 0 ][ 0 ][(far+near)/(far-near) ][ 1 ]
[ 0 ][ 0 ][-(2*near*far)/(far-near)][ 0 ]
near = near plane distance (everything closer to the camera than this is clipped).
far = far plane distance (everything farther from the camera than this is clipped).
width = the widest object we can see if it is at the near plane.
height = the tallest object we can see if it is at the near plane.
. It results in "clip coordinates" (Cx,Cy,Cz,Cw=Vz). Note that the viewspace z coordinate (Vz) ends up in the w coordinate of the clip coordinates (Cw) (more on this below). This matrix stretches the world so that the camera's field of view is now 45 degrees up,down,left, and right. In other words, in this coordinate system if you look from the origin (camera position) straight along the -z axis (direction the camera is pointing) you will see what is in the center of the screen, and if you rotate your head up {down,left,right} you will see what will be at the top {bottom,left,right} of the screen. You can visualize this as a pyramid shape where the camera is at the top of the pyramid and the camera is looking straight down inside the pyramid. (This shape is called a "frustum" once you clip the top and bottom of the pyramid off with the near and far plane - see next paragraph.) The Cz value calculation makes vertices at the near plane have Cz=-Cw and vertices at the far plane have Cz=Cw
Clipping takes place in clip coordinates (which is why they are called that). Clipping means you take some scissors and clip away anything that is outside that pyramid shape. You also clip everything that is too close to the camera (the "near plane") and everything that is too far away from the camera (the "far plane"). See here for details.
Next comes the perspective divide. Remember that Cw == Vz? This is the distance from the camera to the vertex along the z axiz (the direction the camera is pointing). We divide each component by this Cw value to get Normalized Projection Coordinates (NPC) (Nx=Cx/Cw, Ny=Cy/Cw, Nz=Cz/Cw, Nw=Cw/Cw=1.0). All these values (Nx, Ny and Nz) will be between -1 and 1 because we clipped away anything where Cx > Cw or Cx < -Cw or Cy > Cw or Cy < -Cw or Cz > Cw or Cz < -Cw. Again see here for lots of details on this. The perspective divide is what makes things that are farther away appear smaller. The farther away from the camera something is, the larger the Cw (Vz) is, and the more its X and Y coordinate will be reduced when we divide.
The final step is the viewport transform. Nx Ny and Nz (each ranging from -1 to 1) are converted to pixel coordinates. For example Nx=-1 is at the left of the screen and Nx=1 is at the right of the screen, so we get Sx = (Nx * WIDTH/2) + (WIDTH/2) or equivalently Sx = (Nx+1) * WIDTH. Similar for Sy. You can think of Sz as the value that will be used in a depth buffer, so it needs to range from 0 for vertices at the near plane (Vz=near) to the maximum value that the depth buffer can hold (e.g. 2^24= 16777216 for a 24 bit z buffer) for vertices at the far plane (Vz=far).
The "camera matrix" as you called it sounds like a combination of two matrices: the view matrix and the projection matrix. It's possible you're only talking about one of these, but it's not clear.
View matrix: The view matrix is the inverse of what the camera's model matrix would be if you drew it in the world. In order to draw different camera angles, we actually move the entire world in the opposite direction - so there is only one camera angle.
Usually in OpenGL, the camera "really" stays at (0,0,0) and looks along the Z axis in the positive direction (towards 0,0,+∞). You can apply a rotation to the projection matrix to get a different direction, but why would you? Do all the rotation in the view matrix and your life is simpler.
So if you want your camera to be at (0,3,0) for example, instead of moving the camera up 3 units, we leave it at (0,0,0) and move the entire world down 3 units. If you want it to rotate 90 degrees, we actually rotate the world 90 degrees in the opposite direction. The world doesn't mind - it's just numbers in a computer - it doesn't get dizzy.
We only do this when rendering. All of the game physics calculations, for example, aren't done in the rotated world. The coordinates of the stuff in the world don't get changed when we rotate the camera - except inside the rendering system. Usually, we tell the GPU the normal world coordinates of the objects, and we get the GPU to move and rotate them for us, using a matrix.
Projection matrix: You know the view frustum? This shape you've probably seen before: (credit)
Everything inside the cut-off pyramid shape (frustum) is displayed on the screen. You know this.
Except the computer doesn't actually render in a frustum. It renders a cube. The view matrix transforms the frustum into a cube.
If you're familiar with linear algebra, you may notice that a 3D matrix can't make a cube into a frustum. That's what the 4th coordinate (w) is for. After this calculation, the x, y and z coordinates are all divided by w. By using a view matrix that makes w depend on z, the coordinates of far-away points get divided by a larger number, so they get pushed towards the middle of the screen - that's how a cube is able to turn into a frustum.
You don't have to have a frustum - that's what you get with a perspective projection. You can also use an orthographic projection, which turns a cube into a cube, by not changing w.
Unless you want to do a bunch of math yourself, I'd recommend you just use the library functions to generate projection matrices.
If you're multiplying vertices by several matrices in a row it's more efficient to combine them into one matrix, and then multiply the vertices by the combined matrix - hence you will often see MVP, MV and VP matrices used. (M = model matrix - I think it's the same thing you called a world matrix)
In my program (using MATLAB), I specified(through dragging) the pedestrian lane as my Region Of Interest (ROI) with the coordinates [7, 178, 620, 190] (in xmin, ymin, width, and height respectively) using the getrect, roipoly and insertshape function. Refer to the image below.
The video from where this snapshot is taken is in 640x480 pixels resolution (480p)
Defining a real world space as my ROI by mouse dragging is barbaric. That's why the ROI coordinates must be derived mathematically.
What I'm going at is using real-world measurements from the video capturing site and use the Pythagorean Theorem from where the camera is positioned:
How do I obtain the equivalent pixel coordinates and parameters using the real-world measurements?
I'll try to split your question into 2 smaller questions.
A) How do I obtain the equivalent pixel coordinates of an interesting
point? (pratical question)
Your program shoudl be able to retrieve/reconnaise a feature/marker that you positioned in the "real-world" interesting point. The output is a coordinate in pixel. This can be done quite easily (think about QR-codes, for example)
B) What is the analytical relationship between 1 point in 3D space and
its pixel coordinate in the image? (theoretical question)
This is the projection equation based on the pinhole camera model. X,Y,Z 3D coordinates are related with x,y pixel coordinates
Cool, but some detail have to be explained (and there will be any "automatic short formula")
s represent the scale factor. A single pixel in an image could be the projection of infinite different point, due to perspective. In your photo, a pixel containing a piece of a car (when the car is present) will be the same pixel that contain a piece of street under the car (when the car is passed).
So there is not an univocal relationship starting from pixels coordinates
The matrix on the left involves the camera parameters (focal length, etc.) which are called intrinsic parameters. They have to be known to build the relationship between 3D coordinates and pixel coordinates
The matrix on the right seems to be trivial, is the combination of an identity matrix which represents rotation and a column array of zeros which represents translation. Something like T = [R|t].
Which rotation, which translation? You have to consider that every set of coordinates is implicitly expressed in its own reference system. So you have to determine the relationship between the reference system of your measurement and the camera reference system: not only to retrieve position of the camera in your 3D space with euclidean geometry, but also orientation of the camera (angles).
I am currently working on a computer-vision program that requires me to determine the "direction" of a color blob in an image. The color blob generally follows an elliptical shape and thus can be used to track direction (with respect to an initially defined/determined orientation) through time.
The means by which I figured I would calculate changes in direction are described as follows:
Quantize possible directions (360 degrees) into N directions (potentially 8, for 45 degree angle increments).
Given a stored matrix representing the initial state (t0) of the color blob, also acquire a matrix representing the current state (tn) of the blob.
Iterate through these N directions and search for the longest stretch of the color value for that given direction. (e.g. if the ellipse is rotated 45 degrees with 0 being vertical, the longest length should be attributed to the 45 degree mark / or 225 degrees).
The concept itself isn't complicated, but I'm having trouble with the following:
Calculating the longest stretch of a value at any angle in an image. This is simple for angles such as 0, 45, 90, etc. but more difficult for the in-between angles. "Quantizing" the angles is not as easy to me as it sounds.
Please do not worry about potential issue with distinguishing angles such as 0 and 90. Inertia can be used to determine the most likely direction of the color blob (in other words, based upon past orientation states).
My main concern is identifying the "longest stretch" in the matrix.
Thank you for your help!
You can use image moments as suggested here: Matlab - Image Momentum Calculation.
In matlab you would use regionprops with the property 'Orientation', but the wiki article in the previous answer should give you all of the information you need to code it in the language of your choice.
I'm trying to write a simple voxel raycaster as a learning exercise. This is purely CPU based for now until I figure out how things work exactly -- fow now, OpenGL is just (ab)used to blit the generated bitmap to the screen as often as possible.
Now I have gotten to the point where a perspective-projection camera can move through the world and I can render (mostly, minus some artifacts that need investigation) perspective-correct 3-dimensional views of the "world", which is basically empty but contains a voxel cube of the Stanford Bunny.
So I have a camera that I can move up and down, strafe left and right and "walk forward/backward" -- all axis-aligned so far, no camera rotations. Herein lies my problem.
Screenshots: (1) raycasting voxels while... ...(2) the camera remains... ...(3) strictly axis-aligned.
Now I have for a few days been trying to get rotation to work. The basic logic and theory behind matrices and 3D rotations, in theory, is very clear to me. Yet I have only ever achieved a "2.5 rendering" when the camera rotates... fish-eyey, bit like in Google Streetview: even though I have a volumetric world representation, it seems --no matter what I try-- like I would first create a rendering from the "front view", then rotate that flat rendering according to camera rotation. Needless to say, I'm by now aware that rotating rays is not particularly necessary and error-prone.
Still, in my most recent setup, with the most simplified raycast ray-position-and-direction algorithm possible, my rotation still produces the same fish-eyey flat-render-rotated style looks:
camera "rotated to the right by 39 degrees" -- note how the blue-shaded left-hand side of the cube from screen #2 is not visible in this rotation, yet by now "it really should"!
Now of course I'm aware of this: in a simple axis-aligned-no-rotation-setup like I had in the beginning, the ray simply traverses in small steps the positive z-direction, diverging to the left or right and top or bottom only depending on pixel position and projection matrix. As I "rotate the camera to the right or left" -- ie I rotate it around the Y-axis -- those very steps should be simply transformed by the proper rotation matrix, right? So for forward-traversal the Z-step gets a bit smaller the more the cam rotates, offset by an "increase" in the X-step. Yet for the pixel-position-based horizontal+vertical-divergence, increasing fractions of the x-step need to be "added" to the z-step. Somehow, none of my many matrices that I experimented with, nor my experiments with matrix-less hardcoded verbose sin/cos calculations really get this part right.
Here's my basic per-ray pre-traversal algorithm -- syntax in Go, but take it as pseudocode:
fx and fy: pixel positions x and y
rayPos: vec3 for the ray starting position in world-space (calculated as below)
rayDir: vec3 for the xyz-steps to be added to rayPos in each step during ray traversal
rayStep: a temporary vec3
camPos: vec3 for the camera position in world space
camRad: vec3 for camera rotation in radians
pmat: typical perspective projection matrix
The algorithm / pseudocode:
// 1: rayPos is for now "this pixel, as a vector on the view plane in 3d, at The Origin"
rayPos.X, rayPos.Y, rayPos.Z = ((fx / width) - 0.5), ((fy / height) - 0.5), 0
// 2: rotate around Y axis depending on cam rotation. No prob since view plane still at Origin 0,0,0
rayPos.MultMat(num.NewDmat4RotationY(camRad.Y))
// 3: a temp vec3. planeDist is -0.15 or some such -- fov-based dist of view plane from eye and also the non-normalized, "in axis-aligned world" traversal step size "forward into the screen"
rayStep.X, rayStep.Y, rayStep.Z = 0, 0, planeDist
// 4: rotate this too -- 0,zstep should become some meaningful xzstep,xzstep
rayStep.MultMat(num.NewDmat4RotationY(CamRad.Y))
// set up direction vector from still-origin-based-ray-position-off-rotated-view-plane plus rotated-zstep-vector
rayDir.X, rayDir.Y, rayDir.Z = -rayPos.X - me.rayStep.X, -rayPos.Y, rayPos.Z + rayStep.Z
// perspective projection
rayDir.Normalize()
rayDir.MultMat(pmat)
// before traversal, the ray starting position has to be transformed from origin-relative to campos-relative
rayPos.Add(camPos)
I'm skipping the traversal and sampling parts -- as per screens #1 through #3, those are "basically mostly correct" (though not pretty) -- when axis-aligned / unrotated.
It's a lot easier if you picture the system as a pinhole camera rather than anything else. Instead of shooting rays from the surface of a rectangle representing your image, shoot the rays from a point, through the rectangle that will be your image plane, into the scene. All the primary rays should have the same point of origin, only with slightly different directions. The directions are determined using basic trig by which pixel in the image plane you want them to go through. To make the simplest example, let's imagine your point is at the camera, and your image plane is one unit along the z axis, and two units tall and wide. That way, the pixel at the upper-left corner wants to go from (0,0,0) through (-1, -1, 1). Normalize (-1, -1, 1) to get the direction. (You don't actually need to normalize the direction just to do ray intersection, but if you decide not to, remember that your directions are non-normalized before you try to compute the distance the ray has travelled or anything like that.) For every other pixel, compute the point on the plane it wants to go through the way you've already been doing, by dividing the size of the plane by the number of pixels, in each direction.
Then, and this is the most important thing, don't try to do a perspective projection. That's necessary for scan-conversion techniques, to map every vertex to a point on the screen, but in ray-tracing, your rays accomplish that just by spreading out from one point into space. The direction from your start point (camera position, the origin in this example), through your image plane, is exactly the direction you need to trace with. If you were to want an orthographic projection instead (and you almost never want this), you'd accomplish this by having the direction be the same for all the rays, and the starting positions vary across the image plane.
If you do that, you'll have a good starting point. Then you can try again to add camera rotation, either by rotating the image plane about the origin before you iterate over it to compute ray directions, or by rotating the ray directions directly. There's nothing wrong with rotating directions directly! When you bear in mind that a direction is just the position your ray goes through if it starts from the origin, it's easy to see that rotating the direction, and rotating the point it goes through, do exactly the same thing.
Where can I find algorithms for image distortions? There are so much info of Blur and other classic algorithms but so little of more complex ones. In particular, I am interested in swirl effect image distortion algorithm.
I can't find any references, but I can give a basic idea of how distortion effects work.
The key to the distortion is a function which takes two coordinates (x,y) in the distorted image, and transforms them to coordinates (u,v) in the original image. This specifies the inverse function of the distortion, since it takes the distorted image back to the original image
To generate the distorted image, one loops over x and y, calculates the point (u,v) from (x,y) using the inverse distortion function, and sets the colour components at (x,y) to be the same as those at (u,v) in the original image. One ususally uses interpolation (e.g. http://en.wikipedia.org/wiki/Bilinear_interpolation ) to determine the colour at (u,v), since (u,v) usually does not lie exactly on the centre of a pixel, but rather at some fractional point between pixels.
A swirl is essentially a rotation, where the angle of rotation is dependent on the distance from the centre of the image. An example would be:
a = amount of rotation
b = size of effect
angle = a*exp(-(x*x+y*y)/(b*b))
u = cos(angle)*x + sin(angle)*y
v = -sin(angle)*x + cos(angle)*y
Here, I assume for simplicity that the centre of the swirl is at (0,0). The swirl can be put anywhere by subtracting the swirl position coordinates from x and y before the distortion function, and adding them to u and v after it.
There are various swirl effects around: some (like the above) swirl only a localised area, and have the amount of swirl decreasing towards the edge of the image. Others increase the swirling towards the edge of the image. This sort of thing can be done by playing about with the angle= line, e.g.
angle = a*(x*x+y*y)
There is a Java implementation of lot of image filters/effects at Jerry's Java Image Filters. Maybe you can take inspiration from there.
The swirl and others like it are a matrix transformation on the pixel locations. You make a new image and get the color from a position on the image that you get from multiplying the current position by a matrix.
The matrix is dependent on the current position.
here is a good CodeProject showing how to do it
http://www.codeproject.com/KB/GDI-plus/displacementfilters.aspx
there has a new graphic library have many feature
http://code.google.com/p/picasso-graphic/
Take a look at ImageMagick. It's a image conversion and editing toolkit and has interfaces for all popular languages.
The -displace operator can create swirls with the correct displacement map.
If you are for some reason not satisfied with the ImageMagick interface, you can always take a look at the source code of the filters and go from there.