Assume I have a 2x2 matrix filled with values which will represent a plane. Now I want to rotate the plane around itself in a 3-D way, in the "z-Direction". For a better understanding, see the following image:
I wondered if this is possible by a simple affine matrix, thus I created the following simple script:
%Create a random value matrix
A = rand*ones(200,200);
%Make a box in the image
A(50:200-50,50:200-50) = 1;
Now I can apply transformations in the 2-D room simply by a rotation matrix like this:
R = affine2d([1 0 0; .5 1 0; 0 0 1])
tform = affine3d(R);
transformed = imwarp(A,tform);
However, this will not produce the desired output above, and I am not quite sure how to create the 2-D affine matrix to create such behavior.
I guess that a 3-D affine matrix can do the trick. However, if I define a 3-D affine matrix I cannot work with the 2-D representation of the matrix anymore, since MATLAB will throw the error:
The number of dimensions of the input image A must be 3 when the
specified geometric transformation is 3-D.
So how can I code the desired output with an affine matrix?
The answer from m3tho correctly addresses how you would apply the transformation you want: using fitgeotrans with a 'projective' transform, thus requiring that you specify 4 control points (i.e. 4 pairs of corresponding points in the input and output image). You can then apply this transform using imwarp.
The issue, then, is how you select these pairs of points to create your desired transformation, which in this case is to create a perspective projection. As shown below, a perspective projection takes into account that a viewing position (i.e. "camera") will have a given view angle defining a conic field of view. The scene is rendered by taking all 3-D points within this cone and projecting them onto the viewing plane, which is the plane located at the camera target which is perpendicular to the line joining the camera and its target.
Let's first assume that your image is lying in the viewing plane and that the corners are described by a normalized reference frame such that they span [-1 1] in each direction. We need to first select the degree of perspective we want by choosing a view angle and then computing the distance between the camera and the viewing plane. A view angle of around 45 degrees can mimic the sense of perspective of normal human sight, so using the corners of the viewing plane to define the edge of the conic field of view, we can compute the camera distance as follows:
camDist = sqrt(2)./tand(viewAngle./2);
Now we can use this to generate a set of control points for the transformation. We first apply a 3-D rotation to the corner points of the viewing plane, rotating around the y axis by an amount theta. This rotates them out of plane, so we now project the corner points back onto the viewing plane by defining a line from the camera through each rotated corner point and finding the point where it intersects the plane. I'm going to spare you the mathematical derivations (you can implement them yourself from the formulas in the above links), but in this case everything simplifies down to the following set of calculations:
term1 = camDist.*cosd(theta);
term2 = camDist-sind(theta);
term3 = camDist+sind(theta);
outP = [-term1./term2 camDist./term2; ...
term1./term3 camDist./term3; ...
term1./term3 -camDist./term3; ...
-term1./term2 -camDist./term2];
And outP now contains your normalized set of control points in the output image. Given an image of size s, we can create a set of input and output control points as follows:
scaledInP = [1 s(1); s(2) s(1); s(2) 1; 1 1];
scaledOutP = bsxfun(#times, outP+1, s([2 1])-1)./2+1;
And you can apply the transformation like so:
tform = fitgeotrans(scaledInP, scaledOutP, 'projective');
outputView = imref2d(s);
newImage = imwarp(oldImage, tform, 'OutputView', outputView);
The only issue you may come across is that a rotation of 90 degrees (i.e. looking end-on at the image plane) would create a set of collinear points that would cause fitgeotrans to error out. In such a case, you would technically just want a blank image, because you can't see a 2-D object when looking at it edge-on.
Here's some code illustrating the above transformations by animating a spinning image:
img = imread('peppers.png');
s = size(img);
outputView = imref2d(s);
scaledInP = [1 s(1); s(2) s(1); s(2) 1; 1 1];
viewAngle = 45;
camDist = sqrt(2)./tand(viewAngle./2);
for theta = linspace(0, 360, 360)
term1 = camDist.*cosd(theta);
term2 = camDist-sind(theta);
term3 = camDist+sind(theta);
outP = [-term1./term2 camDist./term2; ...
term1./term3 camDist./term3; ...
term1./term3 -camDist./term3; ...
-term1./term2 -camDist./term2];
scaledOutP = bsxfun(#times, outP+1, s([2 1])-1)./2+1;
tform = fitgeotrans(scaledInP, scaledOutP, 'projective');
spinImage = imwarp(img, tform, 'OutputView', outputView);
if (theta == 0)
hImage = image(spinImage);
set(gca, 'Visible', 'off');
else
set(hImage, 'CData', spinImage);
end
drawnow;
end
And here's the animation:
You can perform a projective transformation that can be estimated using the position of the corners in the first and second image.
originalP='peppers.png';
original = imread(originalP);
imshow(original);
s = size(original);
matchedPoints1 = [1 1;1 s(1);s(2) s(1);s(2) 1];
matchedPoints2 = [1 1;1 s(1);s(2) s(1)-100;s(2) 100];
transformType = 'projective';
tform = fitgeotrans(matchedPoints1,matchedPoints2,'projective');
outputView = imref2d(size(original));
Ir = imwarp(original,tform,'OutputView',outputView);
figure; imshow(Ir);
This is the result of the code above:
Original image:
Transformed image:
Related
A similar question was asked before, unfortunately I cannot comment Samgaks answer so I open up a new post with this one. Here is the link to the old question:
How to calculate ray in real-world coordinate system from image using projection matrix?
My goal is to map from image coordinates to world coordinates. In fact I am trying to do this with the Camera Intrinsics Parameters of the HoloLens Camera.
Of course this mapping will only give me a ray connecting the Camera Optical Centre and all points, which can lie on that ray. For the mapping from image coordinates to world coordinates we can use the inverse camera matrix which is:
K^-1 = [1/fx 0 -cx/fx; 0 1/fy -cy/fy; 0 0 1]
Pcam = K^-1 * Ppix;
Pcam_x = P_pix_x/fx - cx/fx;
Pcam_y = P_pix_y/fy - cy/fy;
Pcam_z = 1
Orientation of Camera Coordinate System and Image Plane
In this specific case the image plane is probably at Z = -1 (However, I am a bit uncertain about this). The Section Pixel to Application-specified Coordinate System on page HoloLens CameraProjectionTransform describes how to go form pixel coordinates to world coordinates. To what I understand two signs in the K^-1 are flipped s.t. we calculate the coordinates as follows:
Pcam_x = (Ppix_x/fx) - (cx*(-1)/fx) = P_pix_x/fx + cx/fx;
Pcam_y = (Ppix_y/fy) - (cy*(-1)/fy) = P_pix_y/fy + cy/fy;
Pcam_z = -1
Pcam = (Pcam_x, Pcam_y, -1)
CameraOpticalCentre = (0,0,0)
Ray = Pcam - CameraOpticalCentre
I do not understand how to create the Camera Intrinsics for the case of the image plane being at a negative Z-coordinate. And I would like to have a mathematical explanation or intuitive understanding of why we have the sign flip (P_pix_x/fx + cx/fx instead of P_pix_x/fx - cx/fx).
Edit: I read in another post that the thirst column of the camera matrix has to be negated for the case that the camera is facing down the negative z-direction. This would explain the sign flip. However, why do we need to change the sign of the third column. I would like to have a intuitive understanding of this.
Here the link to the post Negation of third column
Thanks a lot in advance,
Lisa
why do we need to change the sign of the third column
To understand why we need to negate the third column of K (i.e. negate the principal points of the intrinsic matrix) let's first understand how to get the pixel coordinates of a 3D point already in the camera coordinates frame. After that, it is easier to understand why -z requires negating things.
let's imagine a Camera c, and one point B in the space (w.r.t. the camera coordinate frame), let's put the camera sensor (i.e. image) at E' as in the image below. Therefore f (in red) will be the focal length and ? (in blue) will be the x coordinate in pixels of B (from the center of the image). To simplify things let's place B at the corner of the field of view (i.e. in the corner of the image)
We need to calculate the coordinates of B projected into the sensor d (which is the same as the 2d image). Because the triangles AEB and AE'B' are similar triangles then ?/f = X/Z therefore ? = X*f/Z. X*f is the first operation of the K matrix is. We can multiply K*B (with B as a column vector) to check.
This will give us coordinates in pixels w.r.t. the center of the image. Let's imagine the image is size 480x480. Therefore B' will look like this in the image below. Keep in mind that in image coordinates, the y-axis increases going down and the x-axis increases going right.
In images, the pixel at coordinates 0,0 is in the top left corner, therefore we need to add half of the width of the image to the point we have. then px = X*f/Z + cx. Where cx is the principal point in the x-axis, usually W/2. px = X*f/Z + cx is exactly as doing K * B / Z. So X*f/Z was -240, if we add cx (W/2 = 480/2 = 240) and therefore X*f/Z + cx = 0, same with the Y. The final pixel coordinates in the image are 0,0 (i.e. top left corner)
Now in the case where we use z as negative, when we divide X and Y by Z, because Z is negative, it will change the sign of X and Y, therefore it will be projected to B'' at the opposite quadrant as in the image below.
Now the second image will instead be:
Because of this, instead of adding the principal point, we need to subtract it. That is the same as negating the last column of K.
So we have 240 - 240 = 0 (where the second 240 is the principal point in x, cx) and the same for Y. The pixel coordinates are 0,0 as in the example when z was positive. If we do not negate the last column we will end up with 480,480 instead of 0,0.
Hope this helped a little bit
I am writing a mesh editor where I have manipulators with the help of which I change the vertices of the mesh. The task is to render the manipulators with constant dimensions, which would not change when changing the camera and viewport parameters. The projection matrix is perspective. I will be grateful for ideas how to implement the invariant scale geometry.
If I got it right you want to render some markers (for example vertex drag editation area) with the same visual size for any depth they are rendered to.
There are 2 approaches for this:
scale with depth
compute perpendicular distance to camera view (simple dot product) and scale the marker size so it has the same visual size invariant on the depth.
So if P0 is your camera position and Z is your camera view direction unit vector (usually Z axis). Then for any position P compute the scale like this:
depth = dot(P-P0,Z)
Now the scale depends on wanted visual size0 at some specified depth0. Now using triangle similarity we want:
size/dept = size0/depth0
size = size0*depth/depth0
so render your marker with size or scale depth/depth0. In case of using scaling you need to scale around your target position P otherwise your marker would shift to the sides (so translate, scale, translate back).
compute screen position and use non perspective rendering
so you transform target coordinates the same way as the graphic pipeline does until you got the screen x,y position. Remember it and in pass that will render your markers just use that instead of real position. For this rendering pass either use some constant depth (distance from camera) or use non perspective view matrix.
For more info see Understanding 4x4 homogenous transform matrices
[Edit1] pixel size
you need to use FOVx,FOVy projection angles and view/screen resolution (xs,ys) for that. That means if depth is znear and coordinate is at half of the angle then the projected coordinate will go to edge of screen:
tan(FOVx/2) = (xs/2)*pixelx/znear
tan(FOVy/2) = (ys/2)*pixely/znear
---------------------------------
pixelx = 2*znear*tan(FOVx/2)/xs
pixely = 2*znear*tan(FOVy/2)/ys
Where pixelx,pixely is size (per axis) representing single pixel visually at depth znear. In case booth sizes are the same (so pixel is square) you have all you need. In case they are not equal (pixel is not square) then you need to render markers in screen axis aligned coordinates so approach #2 is more suitable for such case.
So if you chose depth0=znear then you can set size0 as n*pixelx and/or n*pixely to get the visual size of n pixels. Or use any dept0 and rewrite the computation to:
pixelx = 2*depth0*tan(FOVx/2)/xs
pixely = 2*depth0*tan(FOVy/2)/ys
Just to be complete:
size0x = size_in_pixels*(2*depth0*tan(FOVx/2)/xs)
size0y = size_in_pixels*(2*depth0*tan(FOVy/2)/ys)
-------------------------------------------------
sizex = size_in_pixels*(2*depth0*tan(FOVx/2)/xs)*(depth/depth0)
sizey = size_in_pixels*(2*depth0*tan(FOVy/2)/ys)*(depth/depth0)
---------------------------------------------------------------
sizex = size_in_pixels*(2*tan(FOVx/2)/xs)*(depth)
sizey = size_in_pixels*(2*tan(FOVy/2)/ys)*(depth)
---------------------------------------------------------------
sizex = size_in_pixels*2*depth*tan(FOVx/2)/xs
sizey = size_in_pixels*2*depth*tan(FOVy/2)/ys
I have initiated a PIXI js canvas:
g_App = new PIXI.Application(800, 600, { backgroundColor: 0x1099bb });
Set up a container:
container = new PIXI.Container();
g_App.stage.addChild(container);
Put a background texture (2000x2000) into the container:
var texture = PIXI.Texture.fromImage('picBottom.png');
var back = new PIXI.Sprite(texture);
container.addChild(back);
Set the global:
var g_Container = container;
I do various pivot points and rotations on container and canvas stage element:
// Set the focus point of the container
g_App.stage.x = Math.floor(400);
g_App.stage.y = Math.floor(500); // Note this one is not central
g_Container.pivot.set(1000, 1000);
g_Container.rotation = 1.5; // radians
Now I need to be able to convert a canvas pixel to the pixel on the background texture.
g_Container has an element transform which in turn has several elements localTransform, pivot, position, scale ands skew. Similarly g_App.stage has the same transform element.
In Maths this is simple, you just have vector point and do matix operations on them. Then to go back the other way you just find inverses of those matrices and multiply backwards.
So what do I do here in pixi.js?
How do I convert a pixel on the canvas and see what pixel it is on the background container?
Note: The following is written using the USA convention of using matrices. They have row vectors on the left and multiply them by the matrix on the right. (Us pesky Brits in the UK do the opposite. We have column vectors on the right and multiply it by the matrix on the left. This means UK and USA matrices to do the same job will look slightly different.)
Now I have confused you all, on with the answer.
g_Container.transform.localTransform - this matrix takes the world coords to the scaled/transposed/rotated COORDS
g_App.stage.transform.localTransform - this matrix takes the rotated world coords and outputs screen (or more accurately) html canvas coords
So for example the Container matrix is:
MatContainer = [g_Container.transform.localTransform.a, g_Container.transform.localTransform.b, 0]
[g_Container.transform.localTransform.c, g_Container.transform.localTransform.d, 0]
[g_Container.transform.localTransform.tx, g_Container.transform.localTransform.ty, 1]
and the rotated container matrix to screen is:
MatToScreen = [g_App.stage.transform.localTransform.a, g_App.stage.transform.localTransform.b, 0]
[g_App.stage.transform.localTransform.c, g_App.stage.transform.localTransform.d, 0]
[g_App.stage.transform.localTransform.tx, g_App.stage.transform.localTransform.ty, 1]
So to get from World Coordinates to Screen Coordinates (noting our vector will be a row on the left, so the first operation matrix that acts first on the World coordinates must also be on the left), we would need to multiply the vector by:
MatAll = MatContainer * MatToScreen
So if you have a world coordinate vector vectWorld = [worldX, worldY, 1.0] (I'll explain the 1.0 at the end), then to get to the screen coords you would do the following:
vectScreen = vectWorld * MatAll
So to get screen coords and to get to world coords we first need to calculate the inverse matrix of MatAll, call it invMatAll. (There are loads of places that tell you how to do this, so I will not do it here.)
So if we have screen (canvas) coordinates screenX and screenY, we need to create a vector vectScreen = [screenX, screenY, 1.0] (again I will explain the 1.0 later), then to get to world coordinates worldX and worldY we do:
vectWorld = vectScreen * invMatAll
And that is it.
So what about the 1.0?
In a 2D system you can do rotations, scaling with 2x2 matrices. Unfortunately you cannot do a 2D translations with a 2x2 matrix. Consequently you need 3x3 matrices to fully describe all 2D scaling, rotations and translations. This means you need to make your vector 3D as well, and you need to put a 1.0 in the third position in order to do the translations properly. This 1.0 will also be 1.0 after any matrix operation as well.
Note: If we were working in a 3D system we would need 4x4 matrices and put a dummy 1.0 in our 4D vectors for exactly the same reasons.
I am working on problem related to camera calibration. In the below image, we consider a world coordinate system with X-axis going leftward, Y-axis rightward and Z-axis upward. We select 15 points(x,y,z) distributed uniformly across the 3 planes. The distance between grid lines is 1 inch. We also obtain MATLAB coordinates for the 15 pixels(u,v). The objective is to obtain the 3x4 camera matrix (M) using homogeneous linear least squares and then project the world points (x,y,z) to the image (u',v') using M. I have written code to do this but the coordinates I'm obtaining (u',v') seem to be very small in magnitude compared to the actual coordinates (u,v). The RMS error is too large and the projected points don't even map onto the image anywhere near the actual points. Is there any scaling that I need to do to convert it to MATLAB coordinates? I am also including my code which isn't very well written since I am relatively new to MATLAB.
P=[];% 2nx12 matrix - 30x12 matrix
for i=1:15 %compute P
world_row = world_coords(i,:); % 3d homogeneous coordinates (x,y,z,1)
zeroelem = repelem(0,4);
image_coord = image_coords(i,:);
img_u = image_coord(1);
prod = -img_u*world_row;
row1 = [world_row,zeroelem,prod];
zeroelem = repelem(0,3);
img_v = image_coord(2);
prod = -img_v*world_row;
row2 = [0,world_row,zeroelem,prod];
P=[P;row1;row2];
end
var1 = P'*P;
[V,D] = eig(var1');//compute eigen vector corresponding to least eigen value
m = V(:,1); //unit vector of norm 1
M = reshape(m,3,4); //camera matrix of 3x4 size
%get projected points
proj = M*world_coords';
U = proj (1,:);
V = proj (2,:);
W = proj (3,:);
for i=1:15
U(i) = U(i)/W(i);
V(i) = V(i)/W(i);
end
final = [U;V];//(u',v')
I am also including the image with the 15 points I have selected. Take P1(u,v) = (286,260) and P1(x,y,z) = (4,0,3). The (u',v') I obtained for this has low values. Can anyone point me what I'm doing wrong?
It was a silly error from my me that was giving me the wrong camera matrix. I noted down the world coordinates of the point P wrongly ((7,0,1) instead of (1,0,1)). This led to wrongly formed 30x12 matrix which we use to form an equation to be solved by homogeneous linear least squares. I have obtained the calibration matrix which projects the 3D points with a low RMS error after correcting this mistake.
this is my situation: I have a 30x30 image and I want to calculate the radial and tangent component of the gradient of each point (pixel) along the straight line passing through the centre of the image (15,15) and the same (i,j) point.
[dx, dy] = gradient(img);
for i=1:30
for j=1:30
pt = [dx(i, j), dy(i,j)];
line = [i-15, j-15];
costh = dot(line, pt)/(norm(line)*norm(pt));
par(i,j) = norm(costh*line);
tang(i,j) = norm(sin(acos(costh))*line);
end
end
is this code correct?
I think there is a conceptual error in your code, I tried to get your results with a different approach, see how it compares to yours.
[dy, dx] = gradient(img);
I inverted x and y because the usual convention in matlab is to have the first dimension along the rows of a matrix while gradient does the opposite.
I created an array of the same size as img but with each pixel containing the angle of the vector from the center of the image to this point:
[I,J] = ind2sub(size(img), 1:numel(img));
theta=reshape(atan2d(I-ceil(size(img,1)/2), J-ceil(size(img,2)/2)), size(img))+180;
The function atan2d ensures that the 4 quadrants give distinct angle values.
Now the projection of the x and y components can be obtained with trigonometry:
par=dx.*sind(theta)+dy.*cosd(theta);
tang=dx.*cosd(theta)+dy.*sind(theta);
Note the use of the .* to achieve point-by-point multiplication, this is a big advantage of Matlab's matrix computations which saves you a loop.
Here's an example with a well-defined input image (no gradient along the rows and a constant gradient along the columns):
img=repmat(1:30, [30 1]);
The results:
subplot(1,2,1)
imagesc(par)
subplot(1,2,2)
imagesc(tang)
colorbar