Webot camera default parameters like pixel size and focus - camera-calibration

I am using two cameras without lens or any other settings in webot to measure the position of an object. To apply the localization, I need to know the focus length, which is the distance from the camera center to the imaging plane center,namely f. I see the focus parameter in the camera node, but when I set it NULL as default, the imaging is still normal. Thus I consider this parameter has no relation with f. In addition, I need to know the width and height of a pixel in the image, namely dx and dy respectively. But I have no idea how to get these information.
This is the calibration model I used, where c means camera and w means world coordinate. I need to calculate xw,yw,zw from u,v. For ideal camera, gama is 0, u0, v0 are just half of the resolution. So my problems exist in fx and fy.

First important thing to know is that in Webots pixels are square, therefore dx and dy are equivalent.
Then in the Camera node, you will find a 'fieldOfView' which will give you the horizontal field of view, using the resolution of the camera you can then compute the vertical field of view too:
2 * atan(tan(fieldOfView * 0.5) / (resolutionX / resolutionY))
Finally, you can also get the near projection plane from the 'near' field of the Camera node.
Note also that Webots cameras are regular OpenGL cameras, you can therefore find more information about the OpenGL projection matrix here for example: http://www.songho.ca/opengl/gl_projectionmatrix.html

Related

Camera Geometry: Algorithm for "object area correction"

A project I've been working on for the past few months is calculating the top area of ​​an object taken with a 3D depth camera from top view.
workflow of my project:
capture a group of objects image(RGB,DEPTH data) from top-view
Instance Segmentation with RGB image
Calculate the real area of ​​the segmented mask with DEPTH data
Some problem on the project:
All given objects have different shapes
The side of the object, not the top, begins to be seen as it moves to the outside of the image.
Because of this, the mask area to be segmented gradually increases.
As a result, the actual area of ​​an object located outside the image is calculated to be larger than that of an object located in the center.
In the example image, object 1 is located in the middle of the angle, so only the top of the object is visible, but object 2 is located outside the angle, so part of the top is lost and the side is visible.
Because of this, the mask area to be segmented is larger for objects located on the periphery than for objects located in the center.
I only want to find the area of ​​the top of an object.
example what I want image:
Is there a way to geometrically correct the area of ​​an object located on outside of the image?
I tried to calibrate by multiplying the area calculated according to the angle formed by Vector 1 connecting the center point of the camera lens to the center point of the floor and Vector 2 connecting the center point of the lens to the center of gravity of the target object by a specific value.
However, I gave up because I couldn't logically explain how much correction was needed.
fig 3:
What I would do is convert your RGB and Depth image to 3D mesh (surface with bumps) using your camera settings (FOVs,focal length) something like this:
Align already captured rgb and depth images
and then project it onto ground plane (perpendicul to camera view direction in the middle of screen). To obtain ground plane simply take 3 3D positions of the ground p0,p1,p2 (forming triangle) and using cross product to compute the ground normal:
n = normalize(cross(p1-p0,p2-p1))
now you plane is defined by p0,n so just each 3D coordinate convert like this:
by simply adding normal vector (towards ground) multiplied by distance to ground, if I see it right something like this:
p' = p + n * dot(p-p0,n)
That should eliminate the problem with visible sides on edges of FOV however you should also take into account that by showing side some part of top is also hidden so to remedy that you might also find axis of symmetry, and use just half of top side (that is not hidden partially) and just multiply the measured half area by 2 ...
Accurate computation is virtually hopeless, because you don't see all sides.
Assuming your depth information is available as a range image, you can consider the points inside the segmentation mask of a single chicken, estimate the vertical direction at that point, rotate and project the points to obtain the silhouette.
But as a part of the surface is occluded, you may have to reconstruct it using symmetry.
There is no way to do this accurately for arbitrary objects, since there can be parts of the object that contribute to the "top area", but which the camera cannot see. Since the camera cannot see these parts, you can't tell how big they are.
Since all your objects are known to be chickens, though, you could get a pretty accurate estimate like this:
Use Principal Component Analysis to determine the orientation of each chicken.
Using many objects in many images, find a best-fit polynomial that estimates apparent chicken size by distance from the image center, and orientation relative to the distance vector.
For any given chicken, then, you can divide its apparent size by the estimated average apparent size for its distance and orientation, to get a normalized chicken size measurement.

How to calculate screen coordinates after transformations?

I am trying to solve a question related to transformation of coordinates in 3-D space but not sure how to approach it.
Lets a vertex point named P is drawn at the origin with a 4x4 transformation matrix. It's then views through a camera that's positioned with a model view matrix and then through a simple projective transform matrix.
How do I calculate the new screen coordinates of P' (x,y,z)?
Before explain of pipeline, you need to know is how pipeline do process to draw on screen.
Everything between process is just matrix multiplication with vector
Model - World - Camera - Projection(or Nomalized Coordinate) - Screen
First step, we call it 'Model Space' because of (0,0,0) is based in model.
And we need to move model space to world space because of we are gonna place model to world so
we need to do transform will be (translate, rotation, scale)TRS * Model(Vector4) because definition of world transform will be different
After do it, model place in world.
Thrid, need to render on camrea space because what we see is through the camera. in world, camera also has position, viewport size and
rotation.. It needs to project from the camera. see
General Formula for Perspective Projection Matrix
After this job done, you will get nomalized coordinate which is Techinically 0-1 coordinates.
Finaly, Screen space. suppose that we are gonna make vido game for mobile. mobile has a lot of screen resolution. so how to get it done?
Simple, scale and translate to get result in screen space coordinate. Because of origin and screen size is different.
So what you are tring to do is 'step 4'.
If you want to get screen position P1 from world, formula will be "Screen Matrix * projection matrix * Camera matrix * P1"
If you want to get position from camera space it would be "Screen Matrix * projection matrix * P1".
There are useful links to understand matrix and calculation so see above links.
https://answers.unity.com/questions/1359718/what-do-the-values-in-the-matrix4x4-for-cameraproj.html
https://www.google.com/search?q=unity+camera+to+screen+matrix&newwindow=1&rlz=1C5CHFA_enKR857KR857&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjk5qfQ18DlAhUXfnAKHabECRUQ_AUIEigB&biw=1905&bih=744#imgrc=Y8AkoYg3wS4PeM:

Rotate camera along "eye" direction in rgl

In rgl, you can set up camera direction with rgl.viewpoint. It accepts theta, phi: polar coordinates. They specify the position of the camera looking at the origin. However, there is yet another one degree of freedom: angle of rotation of the camera along "eye" vector. I.e. one can imagine two vectors associated with camera: "eye" vector and "up" vector; theta and phi allow one to adjust "eye" vector, but I want then to adjust "up" vector after it. Is it possible to do it?
I guess that it can be possible to do it with userMatrix parameter («4x4 matrix specifying user point of view»), but I found no information how to use it.
The ?par3d help topic documents the rendering process in the "Rendering" section. It's often tricky to accomplish what you're asking for, but in this case it's not too hard:
par3d(userMatrix = rotationMatrix(20*pi/180, 0,0,1)
%*% par3d("userMatrix"))
will rotate by 20 degrees around the user's z-axis, i.e. line of sight.

Can't center a 3D object on screen

Currently, I'm taking each corner of my object's bounding box and converting it to Normalized Device Coordinates (NDC) and I keep track of the maximum and minimum NDC. I then calculate the middle of the NDC, find it in the world and have my camera look at it.
<Determine max and minimum NDCs>
centerX = (maxX + minX) / 2;
centerY = (maxY + minY) / 2;
point.set(centerX, centerY, 0);
projector.unprojectVector(point, camera);
direction = point.sub(camera.position).normalize();
point = camera.position.clone().add(direction.multiplyScalar(distance));
camera.lookAt(point);
camera.updateMatrixWorld();
This is an approximate method correct? I have seen it suggested in a few places. I ask because every time I center my object the min and max NDCs should be equal when their are calculated again (before any other change is made) but they are not. I get close but not equal numbers (ignoring the negative sign) and as I step closer and closer the 'error' between the numbers grows bigger and bigger. IE the error for the first few centers are: 0.0022566539084770687, 0.00541687811360958, 0.011035676399427596, 0.025670088917273515, 0.06396864345885889, and so on.
Is there a step I'm missing that would cause this?
I'm using this code as part of a while loop to maximize and center the object on screen. (I'm programing it so that the user can enter a heading an elevation and the camera will be positioned so that it's viewing the object at that heading and elevation. After a few weeks I've determined that (for now) it's easier to do it this way.)
However, this seems to start falling apart the closer I move the camera to my object. For example, after a few iterations my max X NDC is 0.9989318709122867 and my min X NDC is -0.9552042384799428. When I look at the calculated point though, I look too far right and on my next iteration my max X NDC is 0.9420058636660581 and my min X NDC is 1.0128126740876888.
Your approach to this problem is incorrect. Rather than thinking about this in terms of screen coordinates, think about it terms of the scene.
You need to work out how much the camera needs to move so that a ray from it hits the centre of the object. Imagine you are standing in a field and opposite you are two people Alex and Burt, Burt is standing 2 meters to the right of Alex. You are currently looking directly at Alex but want to look at Burt without turning. If you know the distance and direction between them, 2 meters and to the right. You merely need to move that distance and direction, i.e. right and 2 meters.
In a mathematical context you need to do the following:
Get the centre of the object you are focusing on in 3d space, and then project a plane parallel to your camera, i.e. a tangent to the direction the camera is facing, which sits on that point.
Next from your camera raycast to the plane in the direction the camera is facing, the resultant difference between the centre point of the object and the point you hit the plane from the camera is the amount you need to move the camera. This should work irrespective of the direction or position of the camera and object.
You are playing the what came first problem. The chicken or the egg. Every time you change the camera attributes you are effectively changing where your object is projected in NDC space. So even though you think you are getting close, you will never get there.
Look at the problem from a different angle. Place your camera somewhere and try to make it as canonical as possible (ie give it a 1 aspect ratio) and place your object around the cameras z-axis. Is this not possible?

3D sprites, writing correct depth buffer information

I am writing a particle engine for iOS using Monotouch and openTK. My approach is to project the coordinate of each particle, and then write a correctly scaled textured rectangle at this screen location.
it works fine, but I have trouble calculating the correct depth value so that the sprite will correctly overdraw and be overdrawn by 3D objects in the scene.
This is the code I am using today:
//d=distance to projection plane
float d=(float)(1.0/(Math.Tan(MathHelper.DegreesToRadians(fovy/2f))));
Vector3 screenPos=Vector3.Transform(ref objPos,ref viewMatrix, out screenPos);
float depth=1-d/-screenPos.Z;
Then I am drawing a trianglestrip at the screen coordinate where I put the depth value calculated above as the z coordinate.
The results are almost correct, but not quite. I guess I need to take the near and far clipping planes into account somehow (near is 1 and far is 10000 in my case), but I am not sure how. I tried various ways and algorithms without getting accurate results.
I'd appreciate some help on this one.
What you really want to do is take your source position and pass it through modelview and projection or whatever you've got set up instead if you're not using the fixed pipeline. Supposing you've used one of the standard calls to set up the stack, such as glFrustum, and otherwise left things at identity then you can get the relevant formula directly from the man page. So reading directly from that you'd transform as:
z_clip = -( (far + near) / (far - near) ) * z_eye - ( (2 * far * near) / (far - near) )
w_clip = -z
Then, finally:
z_device = z_clip / w_clip;
EDIT: as you're working in ES 2.0, you can actually avoid the issue entirely. Supply your geometry for rendering as GL_POINTS and perform a normal transform in your vertex shader but set gl_PointSize to be the size in pixels that you want that point to be.
In your fragment shader you can then read gl_PointCoord to get a texture coordinate for each fragment that's part of your point, allowing you to draw a point sprite if you don't want just a single colour.

Resources