Matching 2D image pixels in corresponding 3D point cloud - image

I want to match pixels of calibrated 3D lidar and 2D camera data. I will use this to train a network. Can this be considered as labeled data with this matching? If it is, is there anyone to help me to achive this? Any suggestions will be appreciated.

On a high level, assuming you have some transformation (rotation/translation) between your camera and your lidar, and the calibration matrix of the camera, you have a 3D image and a 2D projection of it.
That is, if you project the 3D pointcloud onto the the image plane of the camera, you will have a (x,y)_camera (point in camera frame) for every (RGB)D_world == (x,y,z)_world) point.
Whether this is helpful to train on depends on what you're trying to achieve; if you're trying to find where the camera is or calibrate it, given (RGB)D data and image(s), that could be done better with a Perspective-n point algorithm (the lidar could make it easier, perhaps, if it built up a "real" view of the world to compare against). Whether it would be considered labeled data, depends on how you are trying to label it. They both say very similar things.

Related

How is point cloud data acquired from the structured light 3D scanning?

I am trying to understand the 3D reconstruction of Object using 3D structured Lighting scanner and I am stuck at the point where a method of decoding set of camera and projector correspondences to use to reconstruct a 3D point cloud. How exactly is 3D point cloud information acquired from the information obtained from those correspondences? I want to understand the mathematical implementation, not the code implementation.
assuming you used structured light method which uses some sort of lines (vertical or horizontal - like binary coding or de-brujin) the idea is as follows:
a light plane goes through the projector perspective center and the line in the pattern.
the light plane normal needs to be rotated with the projector rotation matrix relative to the camera (or world depends on the calibration). the rotation part for the light plane can be avoided if if treat the projector perspective center as system origin.
using the correspondences you find a pixel in the image that match he light plane. now you need to define a vector that goes from the camera perspective center to the pixel in the image and then rotate this vector by the camera rotation (relative to the projector or world. again' depending on the calibration).
intersect the light plane with the found vector. how to compute that can be found in wikipedia: https://en.wikipedia.org/wiki/Line%E2%80%93plane_intersection
the mathematical problem (3d reconstruction) here is very simple as you can see. the hard part is recognizing the projected pattern in the image (easier than regular stereo but still hard) and calibrating (finding relative orientation between camera and projector).

Find my camera's 3D position and orientation according to a 2D marker

I am currently building an Augmented Reality application and stuck on a problem that seem quite easy but is very hard to me ... The problem is as follow:
My device's camera is calibrated and detect a 2D marker (such as a QRCode). I know the focal length, the sensor's position, the distance between my camera and the center of the marker, the real size of the marker and the coordinates of the 4 corners of the marker and of it center on the 2D image I got from the camera. See the following image:
On the image, we know the a,b,c,d distances and the coordinates of the red dots.
What I need to know is the position and the orientation of the camera according to the marker (as represented on the image, the origin is the center of the marker).
Is there an easy and fast way to do so? I tried some method imagined by myself (using Al-Kashi's formulas), but this ended with too much errors :(. Could someone point out a way to get me out of this?
You can find some example code for the EPnP algorithm on this webpage. This code consists in one header file and one source file, plus one file for the usage example, so this shouldn't be too hard to include in your code.
Note that this code is released for research/evaluation purposes only, as mentioned on this page.
EDIT:
I just realized that this code needs OpenCV to work. By the way, although this would add a pretty big dependency to your project, the current version of OpenCV has a builtin function called solvePnP, which does what you want.
You can compute the homography between the image points and the corresponding world points. Then from the homography you can compute the rotation and translation mapping a point from the marker's coordinate system into the camera's coordinate system. The math is described in the paper on camera calibration by Zhang.
Here's an example in MATLAB using the Computer Vision System Toolbox, which does most of what you need. It is using the extrinsics function, which computes a 3D rotation and a translation from matching image and world points. The points need not come from a checkerboard.

Camera matching application

I am trying to build a simple camera matching (or match moving) application. The functionality is the same as that in most 3d applications like 3ds Max or Maya. Given an image of a cube and a 3d model of the cube, the user selects points on the image corresponding to each vertex of the model. The application must then generate a camera view that displays the 3d cube model from the same angle as shown in the image.
Can anyone point me in the direction of an algorithm for that?
PS: The camera is calibrated and the camera calibration matrix is available to the program
You can try with the algorithm illustrated step-by-step on http://www.offbytwo.net/camera-matching/. The Octave source code is provided, too.
As a plus, you don't need to start with a cube, but just with any two edges parallel to the x axis and two in the y direction.

Camera calibration: the projection matrix

I have been working on a 3D scanner for a while now and I still have some questions about the projection matrix I want to clear out before I continue.
I understand the fact that this matrix describes the relation between the camera coordinate system and the world coordinate system. Yet I don't understand why all the calibration software packages give you this matrix? Does the software just picks a random world coordinate system in space and does it calculate the matrix afterwards?
I was thinking it would be way easier to choose the world coordinate system by yourself (if it is even possible). My plan is to create a scanner where the object stands still on a static surface and where the camera + laser moves around the object in a circular movement. If it would be possible to create your projection matrix this way so the world coordinate system is nicely placed in the middle of the static platform.
If I'm not very clear, let me know and I'll add an image.
Hopefully someone can clear things a little bit up for me so I can make some progress :).
Kind regards
Ruts
The matrix after camera calibration give you relation between two cameras (stereo vision) and it consist of intrinsic and extrinsic of camera. The matrix convert your image to 3D coordinate system and give you depth of objects.
The are number of video on youtube about 3D scanner.
http://www.youtube.com/watch?v=AYq5n7jwe40 or http://www.youtube.com/watch?v=H3WzY8EWM9s

How to generate one texture from N textures?

Let's say I have N pictures of an object, taken from N know positions. I also have the 3D geometry of the object, and I know all the characteristics of both the camera and the lens.
I want to generate a unique giant picture from the N pictures I have, so that it can be mapped/projected onto the object surface.
Does anybody knows where to start? Articles, references, books?
Not sure if it helps you directly, but these guys have some amazing demos of some related techniques: http://grail.cs.washington.edu/projects/videoenhancement/videoEnhancement.htm.
Generate texture-mapping coords for your geometry
Generate a big blank texture
For each pixel
Figure out the point on the geometry it maps to
Figure out the pixel in each image that projects onto this point
Colour the pixel with a weighted blend of all these pixels, weighted by how much the surface normal is facing the corresponding camera and ignoring those images where there's another piece of geometry between the point and the camera
Apply your completed texture to the geometry
Google up "shadow mapping", as the same problem is solved during that process (images of the scene as seen from some known points are projected onto the 3D geometry in the scene). The problem is well-understood and there is plenty of code.
I'd suspect that this can be done using some variation of projection maps mixed with image reconstruction.
Have a look at cubemapping. It may be useful. You may want to project another convex shape to the cube and use the resulting texture as a conventional cubemap texture.

Resources