Using RGB images and PointCloud, how to generate depth map from the PointClouds? (python) - image

I am working on fusing Lidar and Camera images in order to perform a classification object algorithm using CNN.
I want to use the KITTI Dataset which provide synchronized lidar and rgb image data. Lidar are 3D-scanners, so the output is a 3D Point Cloud.
I want to use depth information from point cloud as a channel for the CNN. But I have never work with point cloud so I am asking for some help. Is projecting the point cloud into the camera image plane (using projection matrix provide by Kitti) will give me the depth map that I want? Is Python libray pcl useful or I should move to c++ libraries?
If you have any suggestions, thanks you in advance

I'm not sure what projection matrix provide by Kitti includes, so the answer is it depends. If this projection matrix only contains a transformation matrix, you cannot generate depth map from it. The 2D image has distortion that comes from the 2D camera and the point cloud usually doesn't have distortion, so you cannot "precisely" map point cloud to rgb image without intrinsic and extrinsic parameters.
PCL is not required to do this.
Depth map essentially is mapping depth value to rgb image. You can treat each point in point cloud(each laser of lider) as a pixel of the rgb image. Therefore, I think all you need to do is finding which point in point cloud corresponding to the first pixel(top left corner) of the rgb image. Then read the depth value from point cloud based on rgb image resolution.

You have nothing to do with camera. This is all about point cloud data. Lets say you have 10 million of points and each point has x,y,z in meters. If the data is not in meters first convert it. Then you need the position of the lidar. When you subtract position of car from all the points one by one, you will take the position of lidar to the (0,0,0) point, then you can just print the point on a white image. The rest is simple math, there may be many ways to do it. First that comes to my mind: think rgb as binary numbers. Lets say 1cm is scaled to change in 1 blue, 256cm change equals to change in 1 green and 256x256 which is 65536 cm change equals change in 1 red. We know that cam is (0,0,0) if rgb of the point is 1,0,0 then that means 256x256x1+0x256+0x1=65536 cm away from the camera. This could be done in C++. Also you can use interpolation and closest point algorithms to fill blanks if there are

Related

Camera Geometry: Algorithm for "object area correction"

A project I've been working on for the past few months is calculating the top area of ​​an object taken with a 3D depth camera from top view.
workflow of my project:
capture a group of objects image(RGB,DEPTH data) from top-view
Instance Segmentation with RGB image
Calculate the real area of ​​the segmented mask with DEPTH data
Some problem on the project:
All given objects have different shapes
The side of the object, not the top, begins to be seen as it moves to the outside of the image.
Because of this, the mask area to be segmented gradually increases.
As a result, the actual area of ​​an object located outside the image is calculated to be larger than that of an object located in the center.
In the example image, object 1 is located in the middle of the angle, so only the top of the object is visible, but object 2 is located outside the angle, so part of the top is lost and the side is visible.
Because of this, the mask area to be segmented is larger for objects located on the periphery than for objects located in the center.
I only want to find the area of ​​the top of an object.
example what I want image:
Is there a way to geometrically correct the area of ​​an object located on outside of the image?
I tried to calibrate by multiplying the area calculated according to the angle formed by Vector 1 connecting the center point of the camera lens to the center point of the floor and Vector 2 connecting the center point of the lens to the center of gravity of the target object by a specific value.
However, I gave up because I couldn't logically explain how much correction was needed.
fig 3:
What I would do is convert your RGB and Depth image to 3D mesh (surface with bumps) using your camera settings (FOVs,focal length) something like this:
Align already captured rgb and depth images
and then project it onto ground plane (perpendicul to camera view direction in the middle of screen). To obtain ground plane simply take 3 3D positions of the ground p0,p1,p2 (forming triangle) and using cross product to compute the ground normal:
n = normalize(cross(p1-p0,p2-p1))
now you plane is defined by p0,n so just each 3D coordinate convert like this:
by simply adding normal vector (towards ground) multiplied by distance to ground, if I see it right something like this:
p' = p + n * dot(p-p0,n)
That should eliminate the problem with visible sides on edges of FOV however you should also take into account that by showing side some part of top is also hidden so to remedy that you might also find axis of symmetry, and use just half of top side (that is not hidden partially) and just multiply the measured half area by 2 ...
Accurate computation is virtually hopeless, because you don't see all sides.
Assuming your depth information is available as a range image, you can consider the points inside the segmentation mask of a single chicken, estimate the vertical direction at that point, rotate and project the points to obtain the silhouette.
But as a part of the surface is occluded, you may have to reconstruct it using symmetry.
There is no way to do this accurately for arbitrary objects, since there can be parts of the object that contribute to the "top area", but which the camera cannot see. Since the camera cannot see these parts, you can't tell how big they are.
Since all your objects are known to be chickens, though, you could get a pretty accurate estimate like this:
Use Principal Component Analysis to determine the orientation of each chicken.
Using many objects in many images, find a best-fit polynomial that estimates apparent chicken size by distance from the image center, and orientation relative to the distance vector.
For any given chicken, then, you can divide its apparent size by the estimated average apparent size for its distance and orientation, to get a normalized chicken size measurement.

Find my camera's 3D position and orientation according to a 2D marker

I am currently building an Augmented Reality application and stuck on a problem that seem quite easy but is very hard to me ... The problem is as follow:
My device's camera is calibrated and detect a 2D marker (such as a QRCode). I know the focal length, the sensor's position, the distance between my camera and the center of the marker, the real size of the marker and the coordinates of the 4 corners of the marker and of it center on the 2D image I got from the camera. See the following image:
On the image, we know the a,b,c,d distances and the coordinates of the red dots.
What I need to know is the position and the orientation of the camera according to the marker (as represented on the image, the origin is the center of the marker).
Is there an easy and fast way to do so? I tried some method imagined by myself (using Al-Kashi's formulas), but this ended with too much errors :(. Could someone point out a way to get me out of this?
You can find some example code for the EPnP algorithm on this webpage. This code consists in one header file and one source file, plus one file for the usage example, so this shouldn't be too hard to include in your code.
Note that this code is released for research/evaluation purposes only, as mentioned on this page.
EDIT:
I just realized that this code needs OpenCV to work. By the way, although this would add a pretty big dependency to your project, the current version of OpenCV has a builtin function called solvePnP, which does what you want.
You can compute the homography between the image points and the corresponding world points. Then from the homography you can compute the rotation and translation mapping a point from the marker's coordinate system into the camera's coordinate system. The math is described in the paper on camera calibration by Zhang.
Here's an example in MATLAB using the Computer Vision System Toolbox, which does most of what you need. It is using the extrinsics function, which computes a 3D rotation and a translation from matching image and world points. The points need not come from a checkerboard.

How to correlate 2D mask with noisy image?

I have a matrix 'A' with two columns which contains 2D points ( coordinates 'x' and 'y'). These points earlier were projected onto the plane from 3d cloud, so they create a 2d shape of some object.
For the second I have a noisy image 'B'(4k x 4k matrix) with similar (but translated and scaled) shape. What I want to do is to correlate points from matrix 'A' and use them as a binary mask for the object on the image 'B'. Currently i dont have a slightest idea how to do it.
Thanks for all help.
Following off of what AnonSubmitter85 suggested with the pattern recognition method, something like a SIFT (Scale Invariant Feature Transformation) detector might be helpful in the case of the scaled and rotated object. Similarly, matlab has a set of functions that do SURF (Speeded Up Robust Features) detection:
http://www.mathworks.com/help/vision/examples/find-image-rotation-and-scale-using-automated-feature-matching.html
Hopefully this can stimulate some new ideas.

Map image/texture to a predefined uneven surface (t-shirt with folds, mug, etc.)

Basically I was trying to achieve this: impose an arbitrary image to a pre-defined uneven surface. (See examples below).
-->
I do not have a lot of experience with image processing or 3D algorithms, so here is the best method I can think of so far:
Predefine a set of coordinates (say if we have a 10x10 grid, we have 100 coordinates that starts with (0,0), (0,10), (0,20), ... etc. There will be 9x9 = 81 grids.
Record the transformations for each individual coordinate on the t-shirt image e.g. (0,0) becomes (51,31), (0, 10) becomes (51, 35), etc.
Triangulate the original image into 81x2=162 triangles (with 2 triangles for each grid). Transform each triangle of the image based on the coordinate transformations obtained in Step 2 and draw it on the t-shirt image.
Problems/questions I have:
I don't know how to smooth out each triangle so that the image on t-shirt does not look ragged.
Is there a better way to do it? I want to make sure I'm not reinventing the wheels here before I proceed with an implementation.
Thanks!
This is called digital image warping. There was a popular graphics text on it in the 1990s (probably from somebody's thesis). You can also find an article on it from Dr. Dobb's Journal.
Your process is essentially correct. If you work pixel by pixel, rather than trying to use triangles, you'll avoid some of the problems you're facing. Scan across the pixels in target bitmap, and apply the local transformation based on the cell you're in to determine the coordinate of the corresponding pixel in the source bitmap. Copy that pixel over.
For a smoother result, you do your coordinate transformations in floating point and interpolate the pixel values from the source image using something like bilinear interpolation.
It's not really a solution for the problem, it's just a workaround :
If you have the 3D model that represents the T-Shirt.
you can use directX\OpenGL and put your image as a texture of the t-shirt.
Then you can ask it to render the picture you want from any point of view.

Mapping corners to arbitrary positions using Direct2D

I'm using WIC and Direct2D (via SharpDX) to composite photos into video frames. For each frame I have the exact coordinates where each corner will be found. While the photos themselves are a standard aspect ratio (e.g. 4:3 or 16:9) the insertion points are not -- they may be rotated, scaled, and skewed.
Now I know in Direct2D I can apply matrix transformations to accomplish this... but I'm not exactly sure how. The examples I've seen are more about applying specific transformations (e.g. rotate 30 degrees) than trying to match an exact destination.
Given that I know the exact coordinates (A,B,C,D) above, is there an easy way to map the source image onto the target? Alternately how would I generate the matrix given the source and destination coordinates?
If Direct3D is an option, all you will need to do is to render the quadrilateral as two triangles (with the frog texture mapped onto it).
To make sure there are no artifacts, render the quad as an indexed mesh, like in the example here (note that it shares vertex 0 and vertex 2 across both triangles). Of course, you can replace the actual vertex coordinates with A, B, C and D.
To begin, you can check out these tutorials for SlimDX, an excellent set of .NET bindings to DirectX.
It is not really possible to achieve this with Direct2D. That could be possible with Direct2D1.1 (from Win8 Metro) with a custom vertex shader, but in the end, as ananthonline suggest, It will be much easier to do it with Direct3D11.
Also you can use triangle strip primitives that are easier to setup (you don't need to create an index buffer). For the coordinates, you can directly sends coordinates to a vertex shader without any transforms (the vertex shaders will copy input SV_POSITION directly to the pixel shader). You just have to map your coordinates into x [-1,1] and y[-1,1]. I suggest you to start with SharpDX MiniCubeTexture sample, change the matrix to perform an orthonormal projection (instead of the sample perspective).

Resources