How to calculate screen coordinates after transformations? - matrix

I am trying to solve a question related to transformation of coordinates in 3-D space but not sure how to approach it.
Lets a vertex point named P is drawn at the origin with a 4x4 transformation matrix. It's then views through a camera that's positioned with a model view matrix and then through a simple projective transform matrix.
How do I calculate the new screen coordinates of P' (x,y,z)?

Before explain of pipeline, you need to know is how pipeline do process to draw on screen.
Everything between process is just matrix multiplication with vector
Model - World - Camera - Projection(or Nomalized Coordinate) - Screen
First step, we call it 'Model Space' because of (0,0,0) is based in model.
And we need to move model space to world space because of we are gonna place model to world so
we need to do transform will be (translate, rotation, scale)TRS * Model(Vector4) because definition of world transform will be different
After do it, model place in world.
Thrid, need to render on camrea space because what we see is through the camera. in world, camera also has position, viewport size and
rotation.. It needs to project from the camera. see
General Formula for Perspective Projection Matrix
After this job done, you will get nomalized coordinate which is Techinically 0-1 coordinates.
Finaly, Screen space. suppose that we are gonna make vido game for mobile. mobile has a lot of screen resolution. so how to get it done?
Simple, scale and translate to get result in screen space coordinate. Because of origin and screen size is different.
So what you are tring to do is 'step 4'.
If you want to get screen position P1 from world, formula will be "Screen Matrix * projection matrix * Camera matrix * P1"
If you want to get position from camera space it would be "Screen Matrix * projection matrix * P1".
There are useful links to understand matrix and calculation so see above links.
https://answers.unity.com/questions/1359718/what-do-the-values-in-the-matrix4x4-for-cameraproj.html
https://www.google.com/search?q=unity+camera+to+screen+matrix&newwindow=1&rlz=1C5CHFA_enKR857KR857&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjk5qfQ18DlAhUXfnAKHabECRUQ_AUIEigB&biw=1905&bih=744#imgrc=Y8AkoYg3wS4PeM:

Related

Camera Geometry: Algorithm for "object area correction"

A project I've been working on for the past few months is calculating the top area of ​​an object taken with a 3D depth camera from top view.
workflow of my project:
capture a group of objects image(RGB,DEPTH data) from top-view
Instance Segmentation with RGB image
Calculate the real area of ​​the segmented mask with DEPTH data
Some problem on the project:
All given objects have different shapes
The side of the object, not the top, begins to be seen as it moves to the outside of the image.
Because of this, the mask area to be segmented gradually increases.
As a result, the actual area of ​​an object located outside the image is calculated to be larger than that of an object located in the center.
In the example image, object 1 is located in the middle of the angle, so only the top of the object is visible, but object 2 is located outside the angle, so part of the top is lost and the side is visible.
Because of this, the mask area to be segmented is larger for objects located on the periphery than for objects located in the center.
I only want to find the area of ​​the top of an object.
example what I want image:
Is there a way to geometrically correct the area of ​​an object located on outside of the image?
I tried to calibrate by multiplying the area calculated according to the angle formed by Vector 1 connecting the center point of the camera lens to the center point of the floor and Vector 2 connecting the center point of the lens to the center of gravity of the target object by a specific value.
However, I gave up because I couldn't logically explain how much correction was needed.
fig 3:
What I would do is convert your RGB and Depth image to 3D mesh (surface with bumps) using your camera settings (FOVs,focal length) something like this:
Align already captured rgb and depth images
and then project it onto ground plane (perpendicul to camera view direction in the middle of screen). To obtain ground plane simply take 3 3D positions of the ground p0,p1,p2 (forming triangle) and using cross product to compute the ground normal:
n = normalize(cross(p1-p0,p2-p1))
now you plane is defined by p0,n so just each 3D coordinate convert like this:
by simply adding normal vector (towards ground) multiplied by distance to ground, if I see it right something like this:
p' = p + n * dot(p-p0,n)
That should eliminate the problem with visible sides on edges of FOV however you should also take into account that by showing side some part of top is also hidden so to remedy that you might also find axis of symmetry, and use just half of top side (that is not hidden partially) and just multiply the measured half area by 2 ...
Accurate computation is virtually hopeless, because you don't see all sides.
Assuming your depth information is available as a range image, you can consider the points inside the segmentation mask of a single chicken, estimate the vertical direction at that point, rotate and project the points to obtain the silhouette.
But as a part of the surface is occluded, you may have to reconstruct it using symmetry.
There is no way to do this accurately for arbitrary objects, since there can be parts of the object that contribute to the "top area", but which the camera cannot see. Since the camera cannot see these parts, you can't tell how big they are.
Since all your objects are known to be chickens, though, you could get a pretty accurate estimate like this:
Use Principal Component Analysis to determine the orientation of each chicken.
Using many objects in many images, find a best-fit polynomial that estimates apparent chicken size by distance from the image center, and orientation relative to the distance vector.
For any given chicken, then, you can divide its apparent size by the estimated average apparent size for its distance and orientation, to get a normalized chicken size measurement.

Align Pointclouds to a coordinate system

I'm currently working on 3D-Reconstruction from images to measure the object size.
I have created a pointcloud and wanted to measure it size now (in the scaling of the pointcloud, as I know the size, because it is the reference object).
But that's where I don't know how to do it. I was previously thinking about searching the outer points (it's a box) and just calculate the distance. This of course works, but it's not really the width, length, and height of the object, as the pointcloud is rotated and translated.
Plotted pointclouds with coordinate axis
As you can see in the image, the pointclouds are rotated and translated and my question is:
How can I calculate the rotation and translation to apply it back to the coordinate axis?
If something was unclear, hit me up, and I will try to elaborate better.
Edit:
So I calculated the plane of the side of my object now, which is:
0.96x + -0.03y + 0.28z + -4.11 = 0
I don't really care if it's aligned with the XY-plane or the YZ-plane, but it has to be aligned with one of them.
Now it's about the calculation from the object plane and the world plane where I'm struggling.

How to fit a rectangle into a space (get relative camera position) having its dimentions and shape?

Say we have a web cam image of a road. We have picked 4 connected 2d lines of a rectangle, and we know there are 90deg angles between them. We have set their real life dimentions. How one can get 3d camera position relative to that rectangle from such data?
Having 4 pairs of corresponding coordinates - real word ones and coordinates at camera matrix (in photo image coordinate system), one can calcalate matrix of perspective transformation, for example, with OpenCV function getPerspectiveTransform.
Then apply decomposeProjectionMatrix
If you are using another means/libraries for image acquisition/treatment, they might contain something similar

ModelView and Projection Matrix WebGL

I'm a bit confused about the differences between these matrices. I don't know if I understood how them work.
The ModelView matrix is the combination of the Model and the View Matrix where the View matrix is the one that specifies features like location and orientation of my camera while the Model matrix is the one that specifies the position's frame of the primitives that I'm going to draw.
The projection matrix specifies other features of the camera like clip space , projection method and field of view.
Is that right ?
Thanks
It's a little confusing. The View Matrix moves the entire world to be relative the the camera. A Camera matrix (the inverse of the view matrix) puts the camera in the world.
There are multiple ways to make a view matrix. While it's common to use a "lookAt" function that directly generates a view matrix. It's actually more common to put your camera in the world just like any other object. You'd have a scene hierarchy and put everything in the world, rocks, trees, houses, cars, people, camera. You then compute the world matrix for the camera which is the "camera matrix" you then take the inverse of that and you get a "view matrix". This is how pretty much all 3d engines work, Unity, Unreal, Maya, 3DSMax, etc.
The projection matrix decides things like field of view (a wide angle lens or a telephoto lens). It also helps define the aspect so that you can render to a rectangular area and it helps defined what distances in front of the camera are visible.
Here's an article on projection matrices. Here's another on camera and view matrices

How to Use FindPlane()

Could someone explain to me how FindPlane works? (I understand the inputs, and the outputs, but not the process.) I am getting random values for the output and therefore I do not understand how it actually functions: does it raycast a normal vector from my camera according to my touch position and gets the depth point that hit the raycast and gets a plane out of that?
Operation is similar to raycast, but other way round. When you click any point on screen, screen coordinates are recorded. All 3D points in Pointcloud are projected onto image plane using camera intrinsic. Points which are close to screen coordinates are taken. RANSAC method is used to extract plane information from those points. SVD can also be used to extract plane normal from inliers obtained from RANSAC. This method should be used only once per frame transformation operation is applied on all points in pointcloud.
This method gives random values in cases where Sparse Pointcloud, reflections in 3D point cloud, reflective surfaces, Cluttered 3D space, IR from outside etc.,

Resources