How to perform 3D Reconstruction using ARCore? - point-clouds

I was trying to perform 3D Reconstruction from the point cloud being obtained from ARCore. However, The point cloud I was able to obtain using ARCore was not accurate or dense enough to perform 3D Reconstruction. Specifically, Points that were supposed to be on the same surface were off by a few miilimeteres due to which the surface looked as if it had a thickness.
Am I isolating the point cloud correctly or is it a limitation of ARCore ?
What should be the standard approach towards isolating the point cloud and for 3D Reconstruction ?
I am attaching below the point cloud obtained from a laptop.
( The file is in the .PLY format. The file can be viewed on https://sketchfab.com/ ,http://www.meshlab.net/ or any other software capable of rendering 3D models.)
The Keyboard and the screen of the laptop here look as if they have a thickness, although all the points were supposed to be at the same depth.
Please have a look at the point cloud and guide me as to what is going wrong here, since I have been stuck at it for quite some time now.
Thank you
https://drive.google.com/file/d/18KMchFgYd8KOcyi8hPpB5yEfbnJ6DxmR/view?usp=sharing

Related

Transforming and registering point clouds

I’m starting to develop with Project Tango API.
I need to save PointCloud data that I get in the event OnXyzIjAvailable;
to do this, I started from your example "PointCloudJava" and wrote PointCloud coordinates in single files (an AsyncTask is started for this purpose).
So I have one file with xyz for each event. On the same event I get the corresponding transformation matrix (mRenderer.getModelMatCalculator(). GetPointCloudModelMatrixCopy()).
Point clouds
Then I’ve imported all this data (xyz point cloud with corresponding transformation matrix; the transformation matrix is applied to the point clouds) but the point clouds doesn’t match exactly; it seems that point clouds are closed each other but not overlapping exactly.
My questions are:
-Why I don’t have the matching between the single point clouds ?
-What I should have to do to have this matching ?
Then I’ve notice the following that is probably related to the above problem; I’ve used Project Tango Explore application (Area learning), I can see my position, but is constantly in motion even if I don't move.
Which is the problem ? Is it necessary a calibration?
Device Information
Poses delivered by Tango have a non-negligible amount of drift. Here is a sample graph of pose position when my tablet was in its stand observing a static scene (ideally the traces should be flat):
When we couple this drift with tracking errors when the device is actually moving then this produces noticeable registration issues. I see this especially when the device is rolled, i.e. rotated about the view axis. The raw pose quality may be sufficient for some applications (e.g. location) but causes problems for others (e.g. 3D scanning, seamless augmented reality).
I was disappointed when I saw this. But if Tango is attempting to measure motion by using the fisheye camera to correct inertial motion prediction - and not by using stereo vision between the fisheye and color cameras - then that is a really hard problem. And the reason for doing that would be to stay within CPU/GPU/RAM/latency/battery budgets to leave something for applications. So after consideration, while I remain disappointed, I can understand it.
I am hopeful that Tango will improve their pose algorithm over time, but I suspect that applications that depend on precise tracking will still have to add their own corrections, e.g. via stereo, structure from motion, point cloud correlation, etc.
Point clouds should be viewed as statistically accurate, not exactly accurate - there is a distance estimation error range that is a function of distance and surface characteristics - a tango fixed in a specific location will not return a constant point clout - rotation of the device can cause apparent drift, but it really isn't, it's just that the error is rotating along with the tango

Match 3D point cloud to CAD model

I have a point cloud of an object, obtained with a laser scanner, and a CAD surface model of that object.
How can I match the point cloud to the surface, to obtain the translation and rotation between cloud and model?
I suppose I could sample the surface and try the Iterative Closest Point (ICP) algorithm to match the resulting sampled point cloud to the scanner point cloud.
Would that actually work?
And are there better algorithms for this task?
In new OpenCV, I have implemented a surface matching module to match a 3D model to a 3D scene. No initial pose is required and the detection process is fully automatic. The model also involves an ICP.
To get an idea, please check that out a video here (though it is not generated by the implementation in OpenCV):
https://www.youtube.com/watch?v=uFnqLFznuZU
The full source code is here and the documentation is here.
You mentioned that you needed to sample your CAD model. This is correct and we have given a sampling algorithm suited for point pair feature matching, such as the one implemented in OpenCV:
Birdal, Tolga, and Slobodan Ilic. A point sampling algorithm for 3D matching of irregular geometries. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017.
http://campar.in.tum.de/pub/tbirdal2017iros/tbirdal2017iros.pdf
Yes, ICP can be applied to this problem, as you suggest with sampling the surface. It would be best if you have all available faces in your laser scan otherwise you may have to remove invisible faces from your model (depending on how many of these there are).
One way of automatically preparing a model by getting rid of some of the hidden faces is to calculate the concave hull which can be used to discard hidden faces (which are for example faces that are not close to the concave hull). Depending on how involved the model is this may or may not be necessary.
ICP works well if given a good initial guess because it ignores points that are not close with respect to the current guess. If ICP is not coming up with a good alignment you may try it with multiple random restarts to try and fix this problem, choosing the best alignment.
A more involved solution is to do local feature matching. You sample and calculate an invariant descriptor like SHOT or FPFH. You find the best matches, reject non-consistent matches, use them to come up with a good initial alignment and then refine with ICP. But you may not need this step depending on how robust and fast the random-restart ICP is.
There's an open source library for point cloud algorithms which implements registration against other point clouds. May be you can try some of their methods to see if any fit.
As a starter, if they don't have anything specific to fit against a polygon mesh, you can treat the mesh vertices as another point cloud and fit your point cloud against it. This is something that they definitely support.

360 degree 3D view of a room using a single rotating kinect

I am working on a research project to construct the 360 degree 3D view of a room using a single rotating kinect placed in the center.
My current approach is to obtain the 3D point clouds obtained by kinect after every 2 to 5 degrees of rotation, using the Iterative Closest Point Algorithm.
Note that we need to build the view real time as the kinect rotates so we need to capture the point cloud after a small degree of rotation of kinect.
However the ICP algo is computationally expensive.
I am looking for a better solution to the above problem. Any help/ pointers in this direction will be appreciated.
I'm not sure how familiar you are with the intersection of machine learning and computer vision. But recently, a much harder problem has been solved with advances in machine learning: generating 3D models of large areas from an unstructured collection of images. For example, this example of "building Rome in a day": see this video, as it may just blow your mind.
With your mind suitably blown, you may want to check out the machine learning techniques that allowed this computation to take place efficiently in this video.
You may want to follow up with Noah Snavely's PhD thesis and check out the algorithms that he used and other work that has been incorporated to build this system. It seems that the problem of reconstructing a single room from one rotating point must be a much easier inference problem. Then again, you may just want to check out the implementation in their software :)

Making 3D representation of an object with a webcam

Is it possible to make a 3D representation of an object by capturing many different angles using a webcam? If it is, how is it possible and how is the image-processing done?
My plan is to make a 3D representation of a person using a webcam, then from the 3D representation, i will be able to tell the person's vital statistics.
As Bart said (but did not post as an actual answer) this is entirely possible.
The research topic you are interested in is often called multi view stereo or something similar.
The basic idea resolves around using point correspondences between two (or more) images and then try to find the best matching camera positions. When the positions are found you can use stereo algorithms to back project the image points into a 3D coordinate system and form a point cloud.
From that point cloud you can then further process it to get the measurements you are looking for.
If you are completely new to the subject you have some fascinating reading to look forward to!
Bart proposed Multiple view geometry by Hartley and Zisserman, which is a very nice book indeed.
As Bart and Kigurai pointed out, this process has been studied under the title of "stereo" or "multi-view stereo" techniques. To be able to get a 3D model from a set of pictures, you need to do the following:
a) You need to know the "internal" parameters of a camera. This includes the focal length of the camera, the principal point of the image and account for radial distortion in the image.
b) You also need to know the position and orientation of each camera with respect to each other or a "world" co-ordinate system. This is called the "pose" of the camera.
There are algorithms to perform (a) and (b) which are described in Hartley and Zisserman's "Multiple View Geometry" book. Alternatively, you can use Noah Snavely's "Bundler" http://phototour.cs.washington.edu/bundler/ software to also do the same thing in a very robust manner.
Once you have the camera parameters, you essentially know how a 3D point (X,Y,Z) in the world maps to an image co-ordinate (u,v) on the photo. You also know how to map an image co-ordinate to the world. You can create a dense point cloud by searching for a match for each pixel on one photo in a photo taken from a different view-point. This requires a two-dimensional search. You can simplify this procedure by making the search 1-dimensional. This is called "rectification". You essentially take two photos and transform then so that their rows correspond to the same line in the world (simplified statement). Now you only have to search along image rows.
An algorithm for this can be also found in Hartley and Zisserman.
Finally, you need to do the matching based on some measure. There is a lot of literature out there on "stereo matching". Another word used is "disparity estimation". This is basically searching for the match of pixel (u,v) on one photo to its match (u, v') on the other photo. Once you have the match, the difference between them can be used to map back to a 3D point.
You can use Yasutaka Furukawa's "CMVS" or "PMVS2" software to do this. Or if you want to experiment by yourself, openCV is a open-source computer vision toolbox to do many of the sub-tasks required for this.
This can be done with two webcams in the same ways your eyes work. It is called stereoscopic vision.
Have a look at this:
http://opencv.willowgarage.com/documentation/camera_calibration_and_3d_reconstruction.html
An affordable alternative to get 3D data would be the Kinect camera system.
Maybe not the answer you are hoping for but Microsoft's Kinect is doing that exact thing, there are some open source drivers out there that allow you to connect it to your windows/linux box.

3d model construction using multiple images from multiple points (kinect)

is it possible to construct a 3d model of a still object if various images along with depth data was gathered from various angles, what I was thinking was have a sort of a circular conveyor belt where a kinect would be placed and the conveyor belt while the real object that is to be reconstructed in 3d space sits in the middle. The conveyor belt thereafter rotates around the image in a circle and lots of images are captured (perhaps 10 image per second) which would allow the kinect to catch an image from every angle including the depth data, theoretically this is possible. The model would also have to be recreated with the textures.
What I would like to know is whether there are any similar projects/software already available and any links would be appreciated
Whether this is possible within perhaps 6 months
How would I proceed to do this? Such as any similar algorithm you could point me to and such
Thanks,
MilindaD
It is definitely possible and there are a lot of 3D scanners which work out there, with more or less the same principle of stereoscopy.
You probably know this, but just to contextualize: The idea is to get two images from the same point and to use triangulation to compute the 3d coordinates of the point in your scene. Although this is quite easy, the big issue is to find the correspondence between the points in your 2 images, and this is where you need a good software to extract and recognize similar points.
There is an open-source project called Meshlab for 3d vision, which includes 3d reconstruction* algorithms. I don't know the details of the algorithms, but the software is definitely a good entrance point if you want to play with 3d.
I used to know some other ones, I will try to find them and add them here:
Insight3d
(*Wiki page has no content, redirects to login for editing)
Check out https://bitbucket.org/tobin/kinect-point-cloud-demo/overview which is a code sample for the Kinect for Windows SDK that does specifically this. Currently it uses the bitmaps captured by the depth sensor, and iterates through the byte array to create a point cloud in a PLY format that can read by MeshLab. The next stage of us is to apply/refine a delanunay triangle algoirthim to form a mesh instead of points, which a texture can be applied. A third stage would then me a mesh merging formula to combine multiple caputres from the Kinect to form a full 3D object mesh.
This is based on some work I done in June using Kinect for the purposes of 3D printing capture.
The .NET code in this source code repository will however get you started with what you want to achieve.
Autodesk has a piece of software that will do what you are asking for it is called "Photofly". It is currently in the labs section. Using a series of images taken from multiple angles the 3d geometry is created and then photo mapped with your images to create the scene.
If you interested more in theoretical (i mean if you want to know how) part of this problem,
here is some document from Microsoft Research about moving depth camera and 3D reconstruction.
Try out VisualSfM (http://ccwu.me/vsfm/) by Changchang Wu (http://ccwu.me/)
It takes multiple images from different angles of the scene and outputs a 3D point cloud.
The algorithm is called "Structure from Motion".
Brief idea of the algorithm : It involves extracting feature points in each image; finding correspondences between them across images; building feature tracks, estimating camera matrices and thereby the 3D coordinates of the feature points.

Resources