Is there any difference between hololens' viewport and UE4's viewport? - viewport

Our main idea is that we take a picture by hololens then we get the 2D coordinates of something(the thermos and the printer) in this picture. Then we deproject these two things' 2D coordinates of their screenshot back to 3D coordinates in the unreal world, then we draw a box at coordinates' position.
However, as you can see, we marked the thermos(the first picture) and the printer(the second picture) with the 3D coordinates we calculated from their coordinates in their 2D screenshot with a static mesh. But, they have an obvious offset to left down. We speculate that maybe such kind of problem comes from the reason that our camera center is wrong.
Did you meet or solve such kind of problem? Can you give me some advice? Thanks a lot.

We noticed that you already have a new one with more information in Microsoft Q&A community platform. It seems like where exactly is returned from GetActorLocation is headset position and there is an offset from location of PV camera. So that, it is recommended you using the GetPVCameraToWorldTransform API which in HoloLensARFunctionLibrary.h header file to find the camera position in World Space. And then GetWorldSpaceRayFromCameraPoint can help to find what exists in world space at a particular pixel coordinate. For more detail about how to implement this solution, please go through this section: Find Camera Positions in World Space

Related

How to determine camera location from view matrix?

for a personal project, I've created a simple 3D engine in python using as little libraries as possible. I did what I wanted - I am able to render simple polygons, and have a movable camera. However, there is a problem:
I implemented a simple flat shader, but in order for it to work, I need to know the camera location (the camera is my light source). However, the problem is that I have no way of knowing the camera's location in the world space. At any point, I am able to display my view matrix, but I am unsure about how to extract the camera's location from it, especially after I rotate the camera. Here is a screenshot of my engine with the view matrix. The camera has not been rotated yet and it is very simple to extract its location (0, 1, 4).
However, upon moving the camera to a point between the X and Z axes and pointing it upwards (and staying at the same height), the view matrix changes to this:
It is obvious now that the last column cannot be taken directly to determine the camera location (it should be something like (4,1,4) on the last picture).
I have tried a lot of math, but I can't figure out the way to determine the camera x,y,z location from the view matrix. I will appreciate any and all help in solving this, as it seems to be a simple problem, yet whose solution eludes me. Thank you.
EDIT:
I was advised to transform a vertex (0,0,0,1) by my view matrix. This, however, does not work. See the example (the vertex obviously is not located at the printed coordinates):
Just take the transform of the vector (0,0,0,1) with the modelview matrix: Which is simply the rightmost column of the modelview matrix.
EDIT: #ampersander: I wonder why you're trying to work with the camera location in the first place, if you assume the source of illumination to be located at the camera's position. In that case, just be aware, that in OpenGL there is no such thing as a camera, and in fact, what the "view" transform does, is move everything in the world around so that where you assume your camera to be ends up at the coordinate origin (0,0,0).
Or in other words: After the modelview transform, the transformed vertex position is in fact the vector from the camera to the vertex, in view space. Which means that for your assumed illumination calculation the direction toward the light source, is the negative vertex position. Take that, normalize it to unit length and stick it into the illumination term.

Matching 2D image pixels in corresponding 3D point cloud

I want to match pixels of calibrated 3D lidar and 2D camera data. I will use this to train a network. Can this be considered as labeled data with this matching? If it is, is there anyone to help me to achive this? Any suggestions will be appreciated.
On a high level, assuming you have some transformation (rotation/translation) between your camera and your lidar, and the calibration matrix of the camera, you have a 3D image and a 2D projection of it.
That is, if you project the 3D pointcloud onto the the image plane of the camera, you will have a (x,y)_camera (point in camera frame) for every (RGB)D_world == (x,y,z)_world) point.
Whether this is helpful to train on depends on what you're trying to achieve; if you're trying to find where the camera is or calibrate it, given (RGB)D data and image(s), that could be done better with a Perspective-n point algorithm (the lidar could make it easier, perhaps, if it built up a "real" view of the world to compare against). Whether it would be considered labeled data, depends on how you are trying to label it. They both say very similar things.

Screen to world coordinates matrix

I am trying to get a transformation matrix that would convert screen coordinates to world coordinates. I have a calibration process in which I can find an 8-sided in screen (the world dimensions of the die I know), and I can find the corners of the die. I've never been stellar at linear algebra, but I can plow my way through. I just don't know where to begin. I've been searching for unprojecting theories, but nothing that matches what I have.
Is this even possible?
Thanks!
I don't think that it's possible. Usually, mapping 3D coordinates to 2D screen coordinates involves assuming that the viewer is looking at the 3D world through a frustum. Imagine a line going straight down the center of the frustum, through the center of the "eye". Any point on that line would map to the screen coordinate at the center of the screen. So, if you have the 2D screen coordinate of the center of the screen, you couldn't know which point on that line is the 3D coordinate.
Perhaps if you know more information about the object that you're looking at (e.g., if it's an 8-sided die of a known size and you have the positions of some corners), you can use that information to determine which point on the line is correct.

Find my camera's 3D position and orientation according to a 2D marker

I am currently building an Augmented Reality application and stuck on a problem that seem quite easy but is very hard to me ... The problem is as follow:
My device's camera is calibrated and detect a 2D marker (such as a QRCode). I know the focal length, the sensor's position, the distance between my camera and the center of the marker, the real size of the marker and the coordinates of the 4 corners of the marker and of it center on the 2D image I got from the camera. See the following image:
On the image, we know the a,b,c,d distances and the coordinates of the red dots.
What I need to know is the position and the orientation of the camera according to the marker (as represented on the image, the origin is the center of the marker).
Is there an easy and fast way to do so? I tried some method imagined by myself (using Al-Kashi's formulas), but this ended with too much errors :(. Could someone point out a way to get me out of this?
You can find some example code for the EPnP algorithm on this webpage. This code consists in one header file and one source file, plus one file for the usage example, so this shouldn't be too hard to include in your code.
Note that this code is released for research/evaluation purposes only, as mentioned on this page.
EDIT:
I just realized that this code needs OpenCV to work. By the way, although this would add a pretty big dependency to your project, the current version of OpenCV has a builtin function called solvePnP, which does what you want.
You can compute the homography between the image points and the corresponding world points. Then from the homography you can compute the rotation and translation mapping a point from the marker's coordinate system into the camera's coordinate system. The math is described in the paper on camera calibration by Zhang.
Here's an example in MATLAB using the Computer Vision System Toolbox, which does most of what you need. It is using the extrinsics function, which computes a 3D rotation and a translation from matching image and world points. The points need not come from a checkerboard.

How to determine top most object in 2d projection of 3d object?

I have a surface to which a set of 3d objects is drawn. The task is to determine an object by the given coordinates on the surface.
For example: some objects are drawn on the desktop application, I need to determine on which object user clicked.
Could you please advise, how such task is usually resolved? Am I need to create remember a top-most object for each pixel? I don't think it is the best approach.
Any thoughts are welcome!
Thanks!
The name for this task is picking (which ought to help you Google for more help on it). There are two main approaches:
Ray-casting: find the line that starts at the camera position and passes through the surface point you are interested in. (The line "under the mouse", or "under your finger" for a touch screen.) Depending on which 3D system you are using, there may be an API call to generate this line: for example Camera.ViewportPointToRay in Unity3D, or you may have to generate it yourself by inverting the camera transform. Find all the points of intersection between this line and the objects in your scene. Which of these points is closest to the near plane of the camera? You can use space partitioning to speed this up.
Rendering: do an extra render pass, in which instead of writing textures to the frame buffer, you record which objects were drawn. You don't do the render pass for the whole screen, you just do it for the area (e.g. the pixel) you are interested in. (This is GL_SELECT mode in OpenGL: see the Picking Tutorial for details.)
If you've described the surface somehow in 3D space, then the ray, defined by your point of observation and a 3D point that is a solution for where you clicked, should intersect one or more objects in your world, if indeed you clicked on one of them.
Given the equations for the surfaces of the objects, you can determine where this ray intersects the objects, if at all, since you also know the equation for the ray in the same coordinate system.
The object that has the closest intersection point to your point of observation (assuming you're looking at the objects from above) is the winner.

Resources