I'm following examples from an intro to webgl book (WebGL Programming Guide: Interactive 3D Graphics Programming with WebGL) and I'm having trouble understanding why an orthographic projection helps solve this specific problem.
One of the examples has us changing the 'eye point' of how we're viewing 3 triangles by applying some matrix transformation. They show that if we change the viewpoint enough to the right (+X) that the triangle starts to disappear. Here is the exact webgl example from the book's website (Press right arrow key to rotate triangle): http://www.magic.ubc.ca/webgl-pg/uploads/examples/ch07/LookAtTrianglesWithKeys.html
The book says that this happens because "This is because you haven’t specified the visible range (the boundaries of what you can actually see) correctly."
To solve this they apply an orthographic projection matrix to each vertex first and the problem is then solved. Why does this solve the problem, how can a matrix transformation cause something which did not exist before to now be visible? Where can I find the full explanation as to why webgl chose to not display the triangle anymore?
The coordinate system in which the objects in OpenGL are rendered (screen space) has an range of [-1,1] for x, y, and z.
With viewMatrix.setLookAt(g_eyeX, g_eyeY, g_eyeZ, 0, 0, 0, 0, 1, 0); your example creates a transform matrix that is used to transforms the coordinates of the triangles from world space to camera space (the direction from which you look onto that object).
Because this transformation will change the coordinates of the triangles these may not be in the range of [-1,1] anymore, in your example this happens for the z coordinate (moving behind your screen).
To solve this issue you can use an orthographic to change the range of the z coordinates without changing the perspective look, only the current z values of the screen spaces are scaled.
Related
A project I've been working on for the past few months is calculating the top area of an object taken with a 3D depth camera from top view.
workflow of my project:
capture a group of objects image(RGB,DEPTH data) from top-view
Instance Segmentation with RGB image
Calculate the real area of the segmented mask with DEPTH data
Some problem on the project:
All given objects have different shapes
The side of the object, not the top, begins to be seen as it moves to the outside of the image.
Because of this, the mask area to be segmented gradually increases.
As a result, the actual area of an object located outside the image is calculated to be larger than that of an object located in the center.
In the example image, object 1 is located in the middle of the angle, so only the top of the object is visible, but object 2 is located outside the angle, so part of the top is lost and the side is visible.
Because of this, the mask area to be segmented is larger for objects located on the periphery than for objects located in the center.
I only want to find the area of the top of an object.
example what I want image:
Is there a way to geometrically correct the area of an object located on outside of the image?
I tried to calibrate by multiplying the area calculated according to the angle formed by Vector 1 connecting the center point of the camera lens to the center point of the floor and Vector 2 connecting the center point of the lens to the center of gravity of the target object by a specific value.
However, I gave up because I couldn't logically explain how much correction was needed.
fig 3:
What I would do is convert your RGB and Depth image to 3D mesh (surface with bumps) using your camera settings (FOVs,focal length) something like this:
Align already captured rgb and depth images
and then project it onto ground plane (perpendicul to camera view direction in the middle of screen). To obtain ground plane simply take 3 3D positions of the ground p0,p1,p2 (forming triangle) and using cross product to compute the ground normal:
n = normalize(cross(p1-p0,p2-p1))
now you plane is defined by p0,n so just each 3D coordinate convert like this:
by simply adding normal vector (towards ground) multiplied by distance to ground, if I see it right something like this:
p' = p + n * dot(p-p0,n)
That should eliminate the problem with visible sides on edges of FOV however you should also take into account that by showing side some part of top is also hidden so to remedy that you might also find axis of symmetry, and use just half of top side (that is not hidden partially) and just multiply the measured half area by 2 ...
Accurate computation is virtually hopeless, because you don't see all sides.
Assuming your depth information is available as a range image, you can consider the points inside the segmentation mask of a single chicken, estimate the vertical direction at that point, rotate and project the points to obtain the silhouette.
But as a part of the surface is occluded, you may have to reconstruct it using symmetry.
There is no way to do this accurately for arbitrary objects, since there can be parts of the object that contribute to the "top area", but which the camera cannot see. Since the camera cannot see these parts, you can't tell how big they are.
Since all your objects are known to be chickens, though, you could get a pretty accurate estimate like this:
Use Principal Component Analysis to determine the orientation of each chicken.
Using many objects in many images, find a best-fit polynomial that estimates apparent chicken size by distance from the image center, and orientation relative to the distance vector.
For any given chicken, then, you can divide its apparent size by the estimated average apparent size for its distance and orientation, to get a normalized chicken size measurement.
for a personal project, I've created a simple 3D engine in python using as little libraries as possible. I did what I wanted - I am able to render simple polygons, and have a movable camera. However, there is a problem:
I implemented a simple flat shader, but in order for it to work, I need to know the camera location (the camera is my light source). However, the problem is that I have no way of knowing the camera's location in the world space. At any point, I am able to display my view matrix, but I am unsure about how to extract the camera's location from it, especially after I rotate the camera. Here is a screenshot of my engine with the view matrix. The camera has not been rotated yet and it is very simple to extract its location (0, 1, 4).
However, upon moving the camera to a point between the X and Z axes and pointing it upwards (and staying at the same height), the view matrix changes to this:
It is obvious now that the last column cannot be taken directly to determine the camera location (it should be something like (4,1,4) on the last picture).
I have tried a lot of math, but I can't figure out the way to determine the camera x,y,z location from the view matrix. I will appreciate any and all help in solving this, as it seems to be a simple problem, yet whose solution eludes me. Thank you.
EDIT:
I was advised to transform a vertex (0,0,0,1) by my view matrix. This, however, does not work. See the example (the vertex obviously is not located at the printed coordinates):
Just take the transform of the vector (0,0,0,1) with the modelview matrix: Which is simply the rightmost column of the modelview matrix.
EDIT: #ampersander: I wonder why you're trying to work with the camera location in the first place, if you assume the source of illumination to be located at the camera's position. In that case, just be aware, that in OpenGL there is no such thing as a camera, and in fact, what the "view" transform does, is move everything in the world around so that where you assume your camera to be ends up at the coordinate origin (0,0,0).
Or in other words: After the modelview transform, the transformed vertex position is in fact the vector from the camera to the vertex, in view space. Which means that for your assumed illumination calculation the direction toward the light source, is the negative vertex position. Take that, normalize it to unit length and stick it into the illumination term.
I have some images captured from an wide angle appx. (180 degree) camera.
I am using opencv 2.4.8 which gives some details about camera matrix n distortion matrix.
MatK = [537.43775285, 0, 327.61133999], [0, 536.95118778, 248.89561998], [0, 0, 1]
MatD = [-0.29741743, 0.14930169, 0, 0, 0]
And this info I have used further to remove the distortion.
But the result is not as expected.
I have attached some input images of chess board which i have used to calibrate.
Or Is there any other tools or library by which it can be removed.
input images
from a Normal Camera or even captured by my smart phone
This is not an answer to the question, but something about the "discussion" of distortion and planarness.
In reality you have some straight lines on a pattern:
With (nearly any) lens you'll get some kind of distortion so that those straight lines aren't straight anymore after projection to your image. This effect is much stronger for wide angle lenses. You could expect something like this (for wide angle stronger but similar):
But the images you provided look more like this, which can be because of your pattern wasnt really planar on the ground, or because the lens has some additional "hills" on your lens.
The whole point of the calibration process is to tell OpenCV what a straight line looks like under distortion. A chess board is used to present a number of straight lines that are easy for OpenCV to detect. In your image, these lines are simply not straight. I'm moderately sure that OpenCV also needs square boxes.
So, use a real chess board pattern. Print it out, glue it to a piece of wood or hard plastic or whatever. But make sure it's a regular chessboard pattern on a level plane.
The most common method (used by the Oculus Rift Runtime for example) draws a fine enough textured grid for which the texture coordinates or the grid node positions are chosen to compensate the distortion. To obtain the grid normally one fits a polynomial or a spline to some reference picture. For example the checkerboard in your camera is a common calibration target.
I'm creating an HTML5 canvas 3D renderer, and I'd say I've gotten pretty far without the help of SO, but I've run into a showstopper of sorts. I'm trying to implement backface culling on a cube with the help of some normals calculations. Also, I've tagged this as WebGL, as this is a general enough question that it could apply to both my use case and a 3D-accelerated one.
At any rate, as I'm rotating the cube, I've found that the wrong faces are being hidden. Example:
I'm using the following vertices:
https://developer.mozilla.org/en/WebGL/Creating_3D_objects_using_WebGL#Define_the_positions_of_the_cube%27s_vertices
The general procedure I'm using is:
Create a transformation matrix by which to transform the cube's vertices
For each face, and for each point on each face, I convert these to vec3s, andn multiply them by the matrix made in step 1.
I then get the surface normal of the face using Newell's method, then get a dot-product from that normal and some made-up vec3, e.g., [-1, 1, 1], since I couldn't think of a good value to put in here. I've seen some folks use the position of the camera for this, but...
Skipping the usual step of using a camera matrix, I pull the x and y values from the resulting vectors to send to my line and face renderers, but only if they have a dot-product above 0. I realize it's rather arbitrary which ones I pull, really.
I'm wondering two things; if my procedure in step 3 is correct (it most likely isn't), and if the order of the points I'm drawing on the faces is incorrect (very likely). If the latter is true, I'm not quite sure how to visualize the problem. I've seen people say that normals aren't pertinent, that it's the direction the line is being drawn, but... It's hard for me to wrap my head around that, or if that's the source of my problem.
It probably doesn't matter, but the matrix library I'm using is gl-matrix:
https://github.com/toji/gl-matrix
Also, the particular file in my open source codebase I'm using is here:
http://code.google.com/p/nanoblok/source/browse/nb11/app/render.js
Thanks in advance!
I haven't reviewed your entire system, but the “made-up vec3” should not be arbitrary; it should be the “out of the screen” vector, which (since your projection is ⟨x, y, z⟩ → ⟨x, y⟩) is either ⟨0, 0, -1⟩ or ⟨0, 0, 1⟩ depending on your coordinate system's handedness and screen axes. You don't have an explicit "camera matrix" (that is usually called a view matrix), but your camera (view and projection) is implicitly defined by your step 4 projection!
However, note that this approach will only work for orthographic projections, not perspective ones (consider a face on the left side of the screen, facing rightward and parallel to the view direction; the dot product would be 0 but it should be visible). The usual approach, used in actual 3D hardware, is to first do all of the transformation (including projection), then check whether the resulting 2D triangle is counterclockwise or clockwise wound, and keep or discard based on that condition.
Given the 3D vector of the direction that the camera is facing and the orientation/direction vector of a 3D object in the 3D space, how can I calculate the 2-dimensional slope that the mouse pointer must follow on the screen in order to visually be moving along the direction of said object?
Basically I'd like to be able to click on an arrow and make it move back and forth by dragging it, but only if the mouse pointer drags (roughly) along the length of the arrow, i.e. in the direction that it's pointing to.
thank you
I'm not sure I 100% understand your question. Would you mind posting a diagram?
You might find these of interest. I answered previous questions to calculate a local X Y Z axis given a camera direction (look at) vector, and also a question to translate an object in a plane parallel to the camera.
Both of these examples use Vector dot product, Vector cross product to compute the required vectors. In your example the vector dot product can be also used to output the angle between two vectors once you have found them.
It depends to an extent on the transformation that you are using to convert your 3d real world coordinates to 2d screen coordinates, e.g. perspective, isometric, etc... You will typically have a forward (3d -> 2d) and backward (2d -> 3d) transformation in play, where the backward transformation loses information. (i.e. going forward each 3d point will map to a unique 2d point, but going back from the point may not yield the same 3d point). You can often project the mouse point onto the object to get the missing dimension.
For mouse dragging, you typically get the user to specify an operation (translation on the plane of projection, zooming in or out, or rotating about an anchor point). Your input is the mouse coordinate at the start and end of the drag, which you transform into your 3d coordinate system to get two 3d coordinates, which will give you dx, dy, dz for dragging / translation etc...