Screen to world coordinates matrix - matrix

I am trying to get a transformation matrix that would convert screen coordinates to world coordinates. I have a calibration process in which I can find an 8-sided in screen (the world dimensions of the die I know), and I can find the corners of the die. I've never been stellar at linear algebra, but I can plow my way through. I just don't know where to begin. I've been searching for unprojecting theories, but nothing that matches what I have.
Is this even possible?
Thanks!

I don't think that it's possible. Usually, mapping 3D coordinates to 2D screen coordinates involves assuming that the viewer is looking at the 3D world through a frustum. Imagine a line going straight down the center of the frustum, through the center of the "eye". Any point on that line would map to the screen coordinate at the center of the screen. So, if you have the 2D screen coordinate of the center of the screen, you couldn't know which point on that line is the 3D coordinate.
Perhaps if you know more information about the object that you're looking at (e.g., if it's an 8-sided die of a known size and you have the positions of some corners), you can use that information to determine which point on the line is correct.

Related

Looking for algorithm to map an image to 4 sides polygon

This is more a math question than a programming question beside the fact that I must implement it using Delphi inside a graphic application.
Assuming I have a picture of a sheet of paper. The actual sheet of paper is of course a rectangular area. When the picture is shown on a computer screen the rectangular area is no more rectangular because when the picture was taken, the camera was not perfectly positioned above the sheet of paper. There is all kinds of perspective effects which result in deformations.
My application needs to tweak the image so that the original rectangular area is displayed as a rectangular area on screen.
Most photo processing software have an interactive tool to do that. The user draw a rectangular area on screen around the rectangular object and then drag each corner to deform the displayed rectangular area until he see the real area as rectangular. What I'm looking for is the algorithm to do that computation.
You need to split the problem into 2 steps. Find the edges or corners of the sheet and remap the pixels.
To find the corners or edges it's a really hard problem since they might be invisible, outside of the picture, obstructed, bent or deformed. Assuming you have a very simple setup (black uniform background, white paper, very little distortion) you could run an edge detection kernel over the image then find the 4 outer edges. If you find the edges you can intersect them to find the corners and the other way around.
Once you find the corners run an interpolation over the image to map the pixels onto the rectangle you want. You should be able to get the graphics engine to do this for you if you provide the coordinates of the corners as texture coordinates for the rectangle and map the image as a texture.
I made it sound simple, but you will encounter many parameters to set and experiment with.
It seems (because you mentioned bilinear interpolation) that you need perspective transformations.
There is implementation of perspective transformations (mapping of arbitrary convex quad to rectangle and vice versa) in Anti-Grain Geometry library (exe example). Delphi port.
With agg_trans_perspective one can calculate the matrix of persp. transformation and then apply it to map coordinates from one quad to another.

How to Use FindPlane()

Could someone explain to me how FindPlane works? (I understand the inputs, and the outputs, but not the process.) I am getting random values for the output and therefore I do not understand how it actually functions: does it raycast a normal vector from my camera according to my touch position and gets the depth point that hit the raycast and gets a plane out of that?
Operation is similar to raycast, but other way round. When you click any point on screen, screen coordinates are recorded. All 3D points in Pointcloud are projected onto image plane using camera intrinsic. Points which are close to screen coordinates are taken. RANSAC method is used to extract plane information from those points. SVD can also be used to extract plane normal from inliers obtained from RANSAC. This method should be used only once per frame transformation operation is applied on all points in pointcloud.
This method gives random values in cases where Sparse Pointcloud, reflections in 3D point cloud, reflective surfaces, Cluttered 3D space, IR from outside etc.,

Find my camera's 3D position and orientation according to a 2D marker

I am currently building an Augmented Reality application and stuck on a problem that seem quite easy but is very hard to me ... The problem is as follow:
My device's camera is calibrated and detect a 2D marker (such as a QRCode). I know the focal length, the sensor's position, the distance between my camera and the center of the marker, the real size of the marker and the coordinates of the 4 corners of the marker and of it center on the 2D image I got from the camera. See the following image:
On the image, we know the a,b,c,d distances and the coordinates of the red dots.
What I need to know is the position and the orientation of the camera according to the marker (as represented on the image, the origin is the center of the marker).
Is there an easy and fast way to do so? I tried some method imagined by myself (using Al-Kashi's formulas), but this ended with too much errors :(. Could someone point out a way to get me out of this?
You can find some example code for the EPnP algorithm on this webpage. This code consists in one header file and one source file, plus one file for the usage example, so this shouldn't be too hard to include in your code.
Note that this code is released for research/evaluation purposes only, as mentioned on this page.
EDIT:
I just realized that this code needs OpenCV to work. By the way, although this would add a pretty big dependency to your project, the current version of OpenCV has a builtin function called solvePnP, which does what you want.
You can compute the homography between the image points and the corresponding world points. Then from the homography you can compute the rotation and translation mapping a point from the marker's coordinate system into the camera's coordinate system. The math is described in the paper on camera calibration by Zhang.
Here's an example in MATLAB using the Computer Vision System Toolbox, which does most of what you need. It is using the extrinsics function, which computes a 3D rotation and a translation from matching image and world points. The points need not come from a checkerboard.

3D space: following the direction that an object is pointing towards, using the mouse pointer

Given the 3D vector of the direction that the camera is facing and the orientation/direction vector of a 3D object in the 3D space, how can I calculate the 2-dimensional slope that the mouse pointer must follow on the screen in order to visually be moving along the direction of said object?
Basically I'd like to be able to click on an arrow and make it move back and forth by dragging it, but only if the mouse pointer drags (roughly) along the length of the arrow, i.e. in the direction that it's pointing to.
thank you
I'm not sure I 100% understand your question. Would you mind posting a diagram?
You might find these of interest. I answered previous questions to calculate a local X Y Z axis given a camera direction (look at) vector, and also a question to translate an object in a plane parallel to the camera.
Both of these examples use Vector dot product, Vector cross product to compute the required vectors. In your example the vector dot product can be also used to output the angle between two vectors once you have found them.
It depends to an extent on the transformation that you are using to convert your 3d real world coordinates to 2d screen coordinates, e.g. perspective, isometric, etc... You will typically have a forward (3d -> 2d) and backward (2d -> 3d) transformation in play, where the backward transformation loses information. (i.e. going forward each 3d point will map to a unique 2d point, but going back from the point may not yield the same 3d point). You can often project the mouse point onto the object to get the missing dimension.
For mouse dragging, you typically get the user to specify an operation (translation on the plane of projection, zooming in or out, or rotating about an anchor point). Your input is the mouse coordinate at the start and end of the drag, which you transform into your 3d coordinate system to get two 3d coordinates, which will give you dx, dy, dz for dragging / translation etc...

How to determine top most object in 2d projection of 3d object?

I have a surface to which a set of 3d objects is drawn. The task is to determine an object by the given coordinates on the surface.
For example: some objects are drawn on the desktop application, I need to determine on which object user clicked.
Could you please advise, how such task is usually resolved? Am I need to create remember a top-most object for each pixel? I don't think it is the best approach.
Any thoughts are welcome!
Thanks!
The name for this task is picking (which ought to help you Google for more help on it). There are two main approaches:
Ray-casting: find the line that starts at the camera position and passes through the surface point you are interested in. (The line "under the mouse", or "under your finger" for a touch screen.) Depending on which 3D system you are using, there may be an API call to generate this line: for example Camera.ViewportPointToRay in Unity3D, or you may have to generate it yourself by inverting the camera transform. Find all the points of intersection between this line and the objects in your scene. Which of these points is closest to the near plane of the camera? You can use space partitioning to speed this up.
Rendering: do an extra render pass, in which instead of writing textures to the frame buffer, you record which objects were drawn. You don't do the render pass for the whole screen, you just do it for the area (e.g. the pixel) you are interested in. (This is GL_SELECT mode in OpenGL: see the Picking Tutorial for details.)
If you've described the surface somehow in 3D space, then the ray, defined by your point of observation and a 3D point that is a solution for where you clicked, should intersect one or more objects in your world, if indeed you clicked on one of them.
Given the equations for the surfaces of the objects, you can determine where this ray intersects the objects, if at all, since you also know the equation for the ray in the same coordinate system.
The object that has the closest intersection point to your point of observation (assuming you're looking at the objects from above) is the winner.

Resources