misalignment error in pixels - image

I have two cameras I have calibrated the cameras considering there position at the same point. But actually the positions of the cameras is slightly different than considered during calibration. This caused a parallax error. Now when I capture a point with these two cameras I get a misalignment in the images due to parallax Now I want to calculate this misalignment in pixels.
I tried to calculate the misalignment in m
Z(measured) = Z(calib) + (Du /tan a1 + tan a2)
Z(measured) is actual distance from cam to object in m
Z(calib) is distance from camera to calibration marker point.
Du is distance between the projected point of the object captured by two cameras on image plane in meters
tan a1 = (distance between camera position during calibration and actual camera 1 position/ distance between camera position during calibration and position of calibration marker point)
tan a2 = (distance between camera position during calibration and actual camera 2 position/ distance between camera position during calibration and position of calibration marker point)
How can I now convert this value of Du in meters to pixels

If you know what the ground sample distance of your image you can use that to determine how much distance a pixel represents and use that number to convert meters to pixels.
Ground sample distance is calculated as:
GSD = D/F* PS
GSD = Ground sample distance
D = Distance to object (from camera)
F = Focal Length
PS = Pixel size (calculated using Photo dimension/Camera Sensor Dimension.
PS should be almost if not exactly the same when comparing Width and Height result.
Having GSD you can then work backwards to determine number of pixels based on distance in meters (note this means you will want units to all be in meters).

Related

Inverse Camera Intrinsic Matrix for Image Plane at Z = -1

A similar question was asked before, unfortunately I cannot comment Samgaks answer so I open up a new post with this one. Here is the link to the old question:
How to calculate ray in real-world coordinate system from image using projection matrix?
My goal is to map from image coordinates to world coordinates. In fact I am trying to do this with the Camera Intrinsics Parameters of the HoloLens Camera.
Of course this mapping will only give me a ray connecting the Camera Optical Centre and all points, which can lie on that ray. For the mapping from image coordinates to world coordinates we can use the inverse camera matrix which is:
K^-1 = [1/fx 0 -cx/fx; 0 1/fy -cy/fy; 0 0 1]
Pcam = K^-1 * Ppix;
Pcam_x = P_pix_x/fx - cx/fx;
Pcam_y = P_pix_y/fy - cy/fy;
Pcam_z = 1
Orientation of Camera Coordinate System and Image Plane
In this specific case the image plane is probably at Z = -1 (However, I am a bit uncertain about this). The Section Pixel to Application-specified Coordinate System on page HoloLens CameraProjectionTransform describes how to go form pixel coordinates to world coordinates. To what I understand two signs in the K^-1 are flipped s.t. we calculate the coordinates as follows:
Pcam_x = (Ppix_x/fx) - (cx*(-1)/fx) = P_pix_x/fx + cx/fx;
Pcam_y = (Ppix_y/fy) - (cy*(-1)/fy) = P_pix_y/fy + cy/fy;
Pcam_z = -1
Pcam = (Pcam_x, Pcam_y, -1)
CameraOpticalCentre = (0,0,0)
Ray = Pcam - CameraOpticalCentre
I do not understand how to create the Camera Intrinsics for the case of the image plane being at a negative Z-coordinate. And I would like to have a mathematical explanation or intuitive understanding of why we have the sign flip (P_pix_x/fx + cx/fx instead of P_pix_x/fx - cx/fx).
Edit: I read in another post that the thirst column of the camera matrix has to be negated for the case that the camera is facing down the negative z-direction. This would explain the sign flip. However, why do we need to change the sign of the third column. I would like to have a intuitive understanding of this.
Here the link to the post Negation of third column
Thanks a lot in advance,
Lisa
why do we need to change the sign of the third column
To understand why we need to negate the third column of K (i.e. negate the principal points of the intrinsic matrix) let's first understand how to get the pixel coordinates of a 3D point already in the camera coordinates frame. After that, it is easier to understand why -z requires negating things.
let's imagine a Camera c, and one point B in the space (w.r.t. the camera coordinate frame), let's put the camera sensor (i.e. image) at E' as in the image below. Therefore f (in red) will be the focal length and ? (in blue) will be the x coordinate in pixels of B (from the center of the image). To simplify things let's place B at the corner of the field of view (i.e. in the corner of the image)
We need to calculate the coordinates of B projected into the sensor d (which is the same as the 2d image). Because the triangles AEB and AE'B' are similar triangles then ?/f = X/Z therefore ? = X*f/Z. X*f is the first operation of the K matrix is. We can multiply K*B (with B as a column vector) to check.
This will give us coordinates in pixels w.r.t. the center of the image. Let's imagine the image is size 480x480. Therefore B' will look like this in the image below. Keep in mind that in image coordinates, the y-axis increases going down and the x-axis increases going right.
In images, the pixel at coordinates 0,0 is in the top left corner, therefore we need to add half of the width of the image to the point we have. then px = X*f/Z + cx. Where cx is the principal point in the x-axis, usually W/2. px = X*f/Z + cx is exactly as doing K * B / Z. So X*f/Z was -240, if we add cx (W/2 = 480/2 = 240) and therefore X*f/Z + cx = 0, same with the Y. The final pixel coordinates in the image are 0,0 (i.e. top left corner)
Now in the case where we use z as negative, when we divide X and Y by Z, because Z is negative, it will change the sign of X and Y, therefore it will be projected to B'' at the opposite quadrant as in the image below.
Now the second image will instead be:
Because of this, instead of adding the principal point, we need to subtract it. That is the same as negating the last column of K.
So we have 240 - 240 = 0 (where the second 240 is the principal point in x, cx) and the same for Y. The pixel coordinates are 0,0 as in the example when z was positive. If we do not negate the last column we will end up with 480,480 instead of 0,0.
Hope this helped a little bit

Retrieve corner coordinates of image given resolution and center point

I'm going through an image dataset which has image pixel coordinate and the resolution of the image. Is there any way to map that information to corner coordinates of the image.
For instance if the image pixel coordinates are -403059.626, -12869811.372 and image is 4168 x 3632 pixels, Is it possible to extract the real world coordinates of the four corners of each image in the rectangle? We can assume the size of the pixel as 1 unit
Assuming p = (-403059.626, -12869811.372) is the pixel in the middle of the image, and an image of size s = (4168, 3632) pixels, and a pixel size of 1 (meaning pixels are in the same units as the location given by p), then the coordinates of the top-left corner can be computed as follows:
q = p - s/2 = ( -403059.626 - 4168/2 , -12869811.372 - 3632/2 )
The s/2 value above can be computed differently depending on what you consider the pixel in the middle of the image. Here I assume the top-left pixel has index (0,0), and the pixel in the middle has index (4168/2,3632/2).
The above assumes no rotation (i.e. the image axes are aligned with the coordinate system), and no distortion (it is possible that the camera adds distortion to the image, causing the pixel pitch to change in different parts of the image).
The bottom-right corner then has coordinates:
r = q + s-1 = p + s/2 - 1

Convert Cubemap coordinates to equivalents in Equirectangular

I have a set of coordinates of a 6-image Cubemap (Front, Back, Left, Right, Top, Bottom) as follows:
[ [160, 314], Front; [253, 231], Front; [345, 273], Left; [347, 92], Bottom; ... ]
Each image is 500x500p, being [0, 0] the top-left corner.
I want to convert these coordinates to their equivalents in equirectangular, for a 2500x1250p image. The layout is like this:
I don't need to convert the whole image, just the set of coordinates. Is there any straight-forward conversion por a specific pixel?
convert your image+2D coordinates to 3D normalized vector
the point (0,0,0) is the center of your cube map to make this work as intended. So basically you need to add the U,V direction vectors scaled to your coordinates to 3D position of texture point (0,0). The direction vectors are just unit vectors where each axis has 3 options {-1, 0 , +1} and only one axis coordinate is non zero for each vector. Each side of cube map has one combination ... Which one depends on your conventions which we do not know as you did not share any specifics.
use Cartesian to spherical coordinate system transformation
you do not need the radius just the two angles ...
convert the spherical angles to your 2D texture coordinates
This step depends on your 2D texture geometry. The simplest is rectangular texture (I think that is what you mean by equirectangular) but there are other mappings out there with specific features and each require different conversion. Here few examples:
Bump-map a sphere with a texture map
How to do a shader to convert to azimuthal_equidistant
For the rectangle texture you just scale the spherical angles into texture resolution size...
U = lon * Usize/(2*Pi)
V = (lat+(Pi/2)) * Vsize/Pi
plus/minus some orientation signs to match your coordinate systems.
btw. just found this (possibly duplicate QA):
GLSL Shader to convert six textures to Equirectangular projection

Invariant scale geometry

I am writing a mesh editor where I have manipulators with the help of which I change the vertices of the mesh. The task is to render the manipulators with constant dimensions, which would not change when changing the camera and viewport parameters. The projection matrix is perspective. I will be grateful for ideas how to implement the invariant scale geometry.
If I got it right you want to render some markers (for example vertex drag editation area) with the same visual size for any depth they are rendered to.
There are 2 approaches for this:
scale with depth
compute perpendicular distance to camera view (simple dot product) and scale the marker size so it has the same visual size invariant on the depth.
So if P0 is your camera position and Z is your camera view direction unit vector (usually Z axis). Then for any position P compute the scale like this:
depth = dot(P-P0,Z)
Now the scale depends on wanted visual size0 at some specified depth0. Now using triangle similarity we want:
size/dept = size0/depth0
size = size0*depth/depth0
so render your marker with size or scale depth/depth0. In case of using scaling you need to scale around your target position P otherwise your marker would shift to the sides (so translate, scale, translate back).
compute screen position and use non perspective rendering
so you transform target coordinates the same way as the graphic pipeline does until you got the screen x,y position. Remember it and in pass that will render your markers just use that instead of real position. For this rendering pass either use some constant depth (distance from camera) or use non perspective view matrix.
For more info see Understanding 4x4 homogenous transform matrices
[Edit1] pixel size
you need to use FOVx,FOVy projection angles and view/screen resolution (xs,ys) for that. That means if depth is znear and coordinate is at half of the angle then the projected coordinate will go to edge of screen:
tan(FOVx/2) = (xs/2)*pixelx/znear
tan(FOVy/2) = (ys/2)*pixely/znear
---------------------------------
pixelx = 2*znear*tan(FOVx/2)/xs
pixely = 2*znear*tan(FOVy/2)/ys
Where pixelx,pixely is size (per axis) representing single pixel visually at depth znear. In case booth sizes are the same (so pixel is square) you have all you need. In case they are not equal (pixel is not square) then you need to render markers in screen axis aligned coordinates so approach #2 is more suitable for such case.
So if you chose depth0=znear then you can set size0 as n*pixelx and/or n*pixely to get the visual size of n pixels. Or use any dept0 and rewrite the computation to:
pixelx = 2*depth0*tan(FOVx/2)/xs
pixely = 2*depth0*tan(FOVy/2)/ys
Just to be complete:
size0x = size_in_pixels*(2*depth0*tan(FOVx/2)/xs)
size0y = size_in_pixels*(2*depth0*tan(FOVy/2)/ys)
-------------------------------------------------
sizex = size_in_pixels*(2*depth0*tan(FOVx/2)/xs)*(depth/depth0)
sizey = size_in_pixels*(2*depth0*tan(FOVy/2)/ys)*(depth/depth0)
---------------------------------------------------------------
sizex = size_in_pixels*(2*tan(FOVx/2)/xs)*(depth)
sizey = size_in_pixels*(2*tan(FOVy/2)/ys)*(depth)
---------------------------------------------------------------
sizex = size_in_pixels*2*depth*tan(FOVx/2)/xs
sizey = size_in_pixels*2*depth*tan(FOVy/2)/ys

What coordinate system does a depth texture use?

I have a depth texture and I would like to know in which coordinate system are the values stored inside the depth texture. Homogeneous coordinates, camera coordinates, world coordinates or model coordinates?
I also would like to know what values are stored in the depth texture and what do they mean.
Thanks.
This should be a value in range [min, max] where min is either -1.0 or 0.0 and max is 1.0 though what you get from the texture might simply be an integer value which might need to be transformed (from 24-bit to 32-bit). If none confirms any of these you will need to test it yourself.
Anyway, these values min and max should represent the clipping planes so min = near and max = far due to the depth buffer optimisation. To get the true Z value from texture coordinate ZT then:
Z = near + ((far-near) * ((ZT-min)/(max-min)))
This Z then represents the distance from (0,0,0) from the user perspective this is the distance between object and the camera position.
Try looking for some literature.

Resources