Initial camera intrinsic and extrinsic matrix and 3D point coordinates for Bundle Adjustment - camera-calibration

I want to reconstruct 3d scene using multi rgb cameras. The input data has no camera calibration information so I want to use bundle adjustment algorithm (Ceres-solver) to estimate the calibration information.
Now I have already obtained pair-wise matched feature points but I find that the algorithm in bundle adjustment algorithm (Ceres-solver) also need initial camera intrinsic and extrinsic matrix and 3d point coordinates as input. However, I do not have this information and I do not know how to generate an initial guess, either.
What should I do to generate the initial camera intrinsic and extrinsic matrix and 3d point coordinates?
Thanks very much!

Initial parameters are important to help the algorithm in converging to right local minima, and therefore to obtain a good reconstruction. You have different options to find the intrinsics of your camera(s):
If you know the camera brand(s) used for taking the pictures you could try to find those intrinsics in a database. Important parameters for you are the CCD width and the focal length (mm). Try this one.
Check EXIF tags of your images. You can use tools like jhead or exiftool for that purpose.
You basically need the focal length in pixels and the lens distortion coefficients. To calculate the focal length in pixels you can use the next equation:
focal_pixels = res_x * (focal_mm / ccd_width_mm)
If in any case you can't find intrinsics parameters for your camera(s) you can use the following approximation as initial guess:
focal_pixels = 1.2 * res_x
Don't set the parameters as fixed, so focal length and distortion parameters will be optimized in the bundle adjustment step.
On the other hand, extrinsic parameters are the values of the R|T (roto-translation matrix) of every camera, calculated/optimized in the reconstruction and bundle adjustment step. Since scale is unknown in SfM scenarios, the first reconstructed pair of cameras (the ones with higher score in the cross-matching step) are generated from points projected on a random depth value (Z towards scene). You don't need any initial value for extrinsics or 3D point coordinates.

Related

Using RGB images and PointCloud, how to generate depth map from the PointClouds? (python)

I am working on fusing Lidar and Camera images in order to perform a classification object algorithm using CNN.
I want to use the KITTI Dataset which provide synchronized lidar and rgb image data. Lidar are 3D-scanners, so the output is a 3D Point Cloud.
I want to use depth information from point cloud as a channel for the CNN. But I have never work with point cloud so I am asking for some help. Is projecting the point cloud into the camera image plane (using projection matrix provide by Kitti) will give me the depth map that I want? Is Python libray pcl useful or I should move to c++ libraries?
If you have any suggestions, thanks you in advance
I'm not sure what projection matrix provide by Kitti includes, so the answer is it depends. If this projection matrix only contains a transformation matrix, you cannot generate depth map from it. The 2D image has distortion that comes from the 2D camera and the point cloud usually doesn't have distortion, so you cannot "precisely" map point cloud to rgb image without intrinsic and extrinsic parameters.
PCL is not required to do this.
Depth map essentially is mapping depth value to rgb image. You can treat each point in point cloud(each laser of lider) as a pixel of the rgb image. Therefore, I think all you need to do is finding which point in point cloud corresponding to the first pixel(top left corner) of the rgb image. Then read the depth value from point cloud based on rgb image resolution.
You have nothing to do with camera. This is all about point cloud data. Lets say you have 10 million of points and each point has x,y,z in meters. If the data is not in meters first convert it. Then you need the position of the lidar. When you subtract position of car from all the points one by one, you will take the position of lidar to the (0,0,0) point, then you can just print the point on a white image. The rest is simple math, there may be many ways to do it. First that comes to my mind: think rgb as binary numbers. Lets say 1cm is scaled to change in 1 blue, 256cm change equals to change in 1 green and 256x256 which is 65536 cm change equals change in 1 red. We know that cam is (0,0,0) if rgb of the point is 1,0,0 then that means 256x256x1+0x256+0x1=65536 cm away from the camera. This could be done in C++. Also you can use interpolation and closest point algorithms to fill blanks if there are

is it possible to obtain object movement from affine registration?

My question is theoretical - algorithmic.
I have a video of some object, taken from a static calibrated camera.
I also have the affine transformation matrix of the object from one frame to the consecutive frame.
meaning, for each pixel in the tracked object, i have the correspoding pixel in the next frame.
Is it theoretically possible to obtain world coordinates of the tracked object using its projected affine transformation over time?
Couldn't use google to my advantage here, as I'm not sure what to search for.
would appreciate any leads, as well as answers.
Thanks
There are some parameters you should know before you can find object coordinate in the real world. If you have the camera coordinates you can calculate the object distance using pinhole project formula.
I found a very good explanation for pinhole formula here.

Find my camera's 3D position and orientation according to a 2D marker

I am currently building an Augmented Reality application and stuck on a problem that seem quite easy but is very hard to me ... The problem is as follow:
My device's camera is calibrated and detect a 2D marker (such as a QRCode). I know the focal length, the sensor's position, the distance between my camera and the center of the marker, the real size of the marker and the coordinates of the 4 corners of the marker and of it center on the 2D image I got from the camera. See the following image:
On the image, we know the a,b,c,d distances and the coordinates of the red dots.
What I need to know is the position and the orientation of the camera according to the marker (as represented on the image, the origin is the center of the marker).
Is there an easy and fast way to do so? I tried some method imagined by myself (using Al-Kashi's formulas), but this ended with too much errors :(. Could someone point out a way to get me out of this?
You can find some example code for the EPnP algorithm on this webpage. This code consists in one header file and one source file, plus one file for the usage example, so this shouldn't be too hard to include in your code.
Note that this code is released for research/evaluation purposes only, as mentioned on this page.
EDIT:
I just realized that this code needs OpenCV to work. By the way, although this would add a pretty big dependency to your project, the current version of OpenCV has a builtin function called solvePnP, which does what you want.
You can compute the homography between the image points and the corresponding world points. Then from the homography you can compute the rotation and translation mapping a point from the marker's coordinate system into the camera's coordinate system. The math is described in the paper on camera calibration by Zhang.
Here's an example in MATLAB using the Computer Vision System Toolbox, which does most of what you need. It is using the extrinsics function, which computes a 3D rotation and a translation from matching image and world points. The points need not come from a checkerboard.

How to correlate 2D mask with noisy image?

I have a matrix 'A' with two columns which contains 2D points ( coordinates 'x' and 'y'). These points earlier were projected onto the plane from 3d cloud, so they create a 2d shape of some object.
For the second I have a noisy image 'B'(4k x 4k matrix) with similar (but translated and scaled) shape. What I want to do is to correlate points from matrix 'A' and use them as a binary mask for the object on the image 'B'. Currently i dont have a slightest idea how to do it.
Thanks for all help.
Following off of what AnonSubmitter85 suggested with the pattern recognition method, something like a SIFT (Scale Invariant Feature Transformation) detector might be helpful in the case of the scaled and rotated object. Similarly, matlab has a set of functions that do SURF (Speeded Up Robust Features) detection:
http://www.mathworks.com/help/vision/examples/find-image-rotation-and-scale-using-automated-feature-matching.html
Hopefully this can stimulate some new ideas.

Camera calibration: the projection matrix

I have been working on a 3D scanner for a while now and I still have some questions about the projection matrix I want to clear out before I continue.
I understand the fact that this matrix describes the relation between the camera coordinate system and the world coordinate system. Yet I don't understand why all the calibration software packages give you this matrix? Does the software just picks a random world coordinate system in space and does it calculate the matrix afterwards?
I was thinking it would be way easier to choose the world coordinate system by yourself (if it is even possible). My plan is to create a scanner where the object stands still on a static surface and where the camera + laser moves around the object in a circular movement. If it would be possible to create your projection matrix this way so the world coordinate system is nicely placed in the middle of the static platform.
If I'm not very clear, let me know and I'll add an image.
Hopefully someone can clear things a little bit up for me so I can make some progress :).
Kind regards
Ruts
The matrix after camera calibration give you relation between two cameras (stereo vision) and it consist of intrinsic and extrinsic of camera. The matrix convert your image to 3D coordinate system and give you depth of objects.
The are number of video on youtube about 3D scanner.
http://www.youtube.com/watch?v=AYq5n7jwe40 or http://www.youtube.com/watch?v=H3WzY8EWM9s

Resources