Getting closest target image in Vuforia - opengl-es

I'm using the video playback Vuforia example for building an app. When the app recognises multiple image targets I would like to know which is the closest one to the centre of the screen (which is supposed to be my camera view). In the source code I have found this line:
const QCAR::Matrix34F & QCAR::TrackableResult::getPose()
which gives me a 3x4 pose matrix of the target. How can use this matrix for extracting this information?
thanks

This Vuforia Knowledge database article explains in detail the meaning of the pose matrix, you should probably have a look at it.
To make it short, the pose matrix is a 3x4 matrix whose last column is the translation vector <x,y,z> from the camera to the detected target. The "closest target to the center of the screen" should thus be the one with the smallest <x,y> vector.
Hope this helps!

Related

Rotation and translation in 3D reconstruction using 2D images

I am trying to do 3D model reconstruction using 2D images from different views. I am following this example code from Matlab to get the desired results:
Structure From Motion From Two Views.
Following are the test images taken from the camera:
Manually taken images of 1st and 2nd image with translation of 1cm:
Overlay with matched features of first and second image:
Manually taken images of 1st and 2nd image with translation of 2cm:
Overlay with matched features of first and second image:
These are the translation vectors and rotation matrices I get for each case:
1cm translation:
translation vector:[0.0245537412606279 -0.855696925927505 -0.516894461905255]
rotation matrix:
[0.999958322438693 0.00879926762261436 0.00243439415451741;
-0.00887800587357739 0.999365801035702 0.0344844418829408;
-0.00212941243132160 -0.0345046172211024 0.999402269855899]
2cm translation:
translation vector:[-0.215835469166982 -0.228607603749042 -0.949291111175908]
rotation matrix:
[0.999989695803078 -0.00104036790630347 -0.00441881457943975;
0.00149220346018613 0.994626852476622 0.103514238930121;
0.00428737874479779 -0.103519766069424 0.994618156086259]
In documentation, it says it is relative rotation and translation between the 2 images.
But I am unable to understand what these numbers mean and what is the unit of the above values.
Can anyone at least let me know in what units are we getting the translation and rotation or how to extract the rotation and translation which is in any way comparable to the real world values like cm/mm and radians/degrees respectively?
You can translate the rotation matrix into a axis-angle-representation where you get the angles in radians. This can be done using the vrrotmat2vec function or by implementing a translater yourself by following this if you don't have access to the package. The angle will then be in radians.
When it comes to translation however you wont get it in a unit that makes sense in the real world, since you don't know the scale. This is unfortunately a problem with structure from motion in general. It is impossible to know if you take a image close to something small or far away from something large.
When using structure from motion to construct a 3D model this is fortunately not a problem since you still get relative distances correctly. Therefore you will be able to capture the scene (by following the rest of the tutorial) but you wont be able to say if something is 2cm or 2km tall, unless you have something in the image that you know the real life size of.
Hope it helps :)

Making 3D representation of an object with a webcam

Is it possible to make a 3D representation of an object by capturing many different angles using a webcam? If it is, how is it possible and how is the image-processing done?
My plan is to make a 3D representation of a person using a webcam, then from the 3D representation, i will be able to tell the person's vital statistics.
As Bart said (but did not post as an actual answer) this is entirely possible.
The research topic you are interested in is often called multi view stereo or something similar.
The basic idea resolves around using point correspondences between two (or more) images and then try to find the best matching camera positions. When the positions are found you can use stereo algorithms to back project the image points into a 3D coordinate system and form a point cloud.
From that point cloud you can then further process it to get the measurements you are looking for.
If you are completely new to the subject you have some fascinating reading to look forward to!
Bart proposed Multiple view geometry by Hartley and Zisserman, which is a very nice book indeed.
As Bart and Kigurai pointed out, this process has been studied under the title of "stereo" or "multi-view stereo" techniques. To be able to get a 3D model from a set of pictures, you need to do the following:
a) You need to know the "internal" parameters of a camera. This includes the focal length of the camera, the principal point of the image and account for radial distortion in the image.
b) You also need to know the position and orientation of each camera with respect to each other or a "world" co-ordinate system. This is called the "pose" of the camera.
There are algorithms to perform (a) and (b) which are described in Hartley and Zisserman's "Multiple View Geometry" book. Alternatively, you can use Noah Snavely's "Bundler" http://phototour.cs.washington.edu/bundler/ software to also do the same thing in a very robust manner.
Once you have the camera parameters, you essentially know how a 3D point (X,Y,Z) in the world maps to an image co-ordinate (u,v) on the photo. You also know how to map an image co-ordinate to the world. You can create a dense point cloud by searching for a match for each pixel on one photo in a photo taken from a different view-point. This requires a two-dimensional search. You can simplify this procedure by making the search 1-dimensional. This is called "rectification". You essentially take two photos and transform then so that their rows correspond to the same line in the world (simplified statement). Now you only have to search along image rows.
An algorithm for this can be also found in Hartley and Zisserman.
Finally, you need to do the matching based on some measure. There is a lot of literature out there on "stereo matching". Another word used is "disparity estimation". This is basically searching for the match of pixel (u,v) on one photo to its match (u, v') on the other photo. Once you have the match, the difference between them can be used to map back to a 3D point.
You can use Yasutaka Furukawa's "CMVS" or "PMVS2" software to do this. Or if you want to experiment by yourself, openCV is a open-source computer vision toolbox to do many of the sub-tasks required for this.
This can be done with two webcams in the same ways your eyes work. It is called stereoscopic vision.
Have a look at this:
http://opencv.willowgarage.com/documentation/camera_calibration_and_3d_reconstruction.html
An affordable alternative to get 3D data would be the Kinect camera system.
Maybe not the answer you are hoping for but Microsoft's Kinect is doing that exact thing, there are some open source drivers out there that allow you to connect it to your windows/linux box.

Transform a set of 2d images representing all dimensions of an object into a 3d model

Given a set of 2d images that cover all dimensions of an object (e.g. a car and its roof/sides/front/read), how could I transform this into a 3d objdct?
Is there any libraries that could do this?
Thanks
These "2D images" are usually called "textures". You probably want a 3D library which allows you to specify a 3D model with bitmap textures. The library would depend on platform you are using, but start with looking at OpenGL!
OpenGL for PHP
OpenGL for Java
... etc.
I've heard of the program "Poser" doing this using heuristics for human forms, but otherwise I don't believe this is actually theoretically possible. You are asking to construct volumetric data from flat data (inferring the third dimension.)
I think you'd have to make a ton of assumptions about your geometry, and even then, you'd only really have a shell of the object. If you did this well, you'd have a contiguous surface representing the boundary of the object - not a volumetric object itself.
What you can do, like Tomas suggested, is slap these 2d images onto something. However, you still will need to construct a triangle mesh surface, and actually do all the modeling, for this to present a 3D surface.
I hope this helps.
What there is currently that can do anything close to what you are asking for automagically is extremely proprietary. No libraries, but there are some products.
This core issue is matching corresponding points in the images and being able to say, this spot in image A is this spot in image B, and they both match this spot in image C, etc.
There are three ways to go about this, manually matching (you have the photos and have to use your own brain to find the corresponding points), coded targets, and texture matching.
PhotoModeller, www.photomodeller.com, $1,145.00US, supports manual matching and coded targets. You print out a bunch of images, attach them to your object, shoot your photos, and the software finds the targets in each picture and creates a 3D object based on those points.
PhotoModeller Scanner, $2,595.00US, adds texture matching. Tiny bits of the the images are compared to see if they represent the same source area.
Both PhotoModeller products depend on shooting the images with a calibrated camera where you use a consistent focal length for every shot and you got through a calibration process to map the lens distortion of the camera.
If you can do manual matching, the Match Photo feature of Google SketchUp may do the job, and SketchUp is free. If you can shoot new photos, you can add your own targets like colored sticker dots to the object to help you generate contours.
If your images are drawings, like profile, plan view, etc. PhotoModeller will not help you, but SketchUp may be just the tool you need. You will have to build up each part manually because you will have to supply the intelligence to recognize which lines and points correspond from drawing to drawing.
I hope this helps.

3d model construction using multiple images from multiple points (kinect)

is it possible to construct a 3d model of a still object if various images along with depth data was gathered from various angles, what I was thinking was have a sort of a circular conveyor belt where a kinect would be placed and the conveyor belt while the real object that is to be reconstructed in 3d space sits in the middle. The conveyor belt thereafter rotates around the image in a circle and lots of images are captured (perhaps 10 image per second) which would allow the kinect to catch an image from every angle including the depth data, theoretically this is possible. The model would also have to be recreated with the textures.
What I would like to know is whether there are any similar projects/software already available and any links would be appreciated
Whether this is possible within perhaps 6 months
How would I proceed to do this? Such as any similar algorithm you could point me to and such
Thanks,
MilindaD
It is definitely possible and there are a lot of 3D scanners which work out there, with more or less the same principle of stereoscopy.
You probably know this, but just to contextualize: The idea is to get two images from the same point and to use triangulation to compute the 3d coordinates of the point in your scene. Although this is quite easy, the big issue is to find the correspondence between the points in your 2 images, and this is where you need a good software to extract and recognize similar points.
There is an open-source project called Meshlab for 3d vision, which includes 3d reconstruction* algorithms. I don't know the details of the algorithms, but the software is definitely a good entrance point if you want to play with 3d.
I used to know some other ones, I will try to find them and add them here:
Insight3d
(*Wiki page has no content, redirects to login for editing)
Check out https://bitbucket.org/tobin/kinect-point-cloud-demo/overview which is a code sample for the Kinect for Windows SDK that does specifically this. Currently it uses the bitmaps captured by the depth sensor, and iterates through the byte array to create a point cloud in a PLY format that can read by MeshLab. The next stage of us is to apply/refine a delanunay triangle algoirthim to form a mesh instead of points, which a texture can be applied. A third stage would then me a mesh merging formula to combine multiple caputres from the Kinect to form a full 3D object mesh.
This is based on some work I done in June using Kinect for the purposes of 3D printing capture.
The .NET code in this source code repository will however get you started with what you want to achieve.
Autodesk has a piece of software that will do what you are asking for it is called "Photofly". It is currently in the labs section. Using a series of images taken from multiple angles the 3d geometry is created and then photo mapped with your images to create the scene.
If you interested more in theoretical (i mean if you want to know how) part of this problem,
here is some document from Microsoft Research about moving depth camera and 3D reconstruction.
Try out VisualSfM (http://ccwu.me/vsfm/) by Changchang Wu (http://ccwu.me/)
It takes multiple images from different angles of the scene and outputs a 3D point cloud.
The algorithm is called "Structure from Motion".
Brief idea of the algorithm : It involves extracting feature points in each image; finding correspondences between them across images; building feature tracks, estimating camera matrices and thereby the 3D coordinates of the feature points.

Get POINT CLOUD through 360 Degree Rotation and Image Processing

My Question is as below in two parts……
QUESTION (IN SHORT):
• To generate point cloud of real-world object….
• Through 360 degree rotation of it….on rotating table
• Getting 360 images… one image at each degree (1° to 360°).
• I know how to process image and getting pixel value of it.
• See one sample image below…you can see image is black and white...because I have to deal with the objects which are much shiny (glittery)…and it is DIAMOND. So I have setting up background so that shiny object (diamond) converted in to B/W object. And so I can easily scan outer edge of object (e.g. Diamond).
• And one thing to consider is I don’t using any laser… I just using one rotating table and one camera for taking image…you can see one sample project over here… but there MATLAB hides all the things…because that guy using MATLAB’s in Built functionality.
• Actually I am looking for Math routine or Algorithm or any Technique which helping me out to how getting point cloud…….using the way I have mentioned……..
MORE ELABORATION:
I need to have point-cloud of real-world object. So, I can display it in Computer Screen.
For that I am using one rotating table. I will put my object on it and I will rotate table a complete 360° degree rotation and I will take 360 images…one image at each degree (1° to 360°).
Camera which is used for taking image is well calibrated. I have given one sample image as below. I also know how to scan image and getting pixel value of it.
Also take in consideration that my images are Silhouette type…means just black and white... No color images.
But my problem is or where I am trapped down is in...
Getting Points cloud of object…….from the data which I have getting through processing of image.
One same kind of project I found over here……..
But it just using built in MATLAB functions…I am using Microsoft Visual C#.Net so I have to build the entire algorithm myself….because MATLAB hides all the things which I want to know….
Is there any master…….who know this entire thing well and getting me out of trap...!!!!
Thanks…..
I have no experience of this but If I wanted to do something like this I would have tried this:
Use a single color light source
if Possible create a lightsource which falls on a thin verticle slice of the object.
have 360 B/W Images, those Images will be images of a verticle line having variyng intensity. If you use matlab your matrix will have a/few column with sime values.
now asume a verticle line(your Axis of rotation).
5 plot or convert (imageno, rownoOfMatrix, ValueInPopulatedColumnInSameRow)... [Assuming numbering Image from 0 to 360]
under ideal conditions A lame way To get X and Y use K1 * cos imgNo * ValInCol and K1 * sin imgNo * ValInCol, and Z will be some K2 * rowNum.. K1 and K2 can be caliberated knowing actual size of object.
I mean Something like this:
http://fab.cba.mit.edu/content/processes/structured_light/
but instead of using structured light using a single verticle light
http://www.geom.uiuc.edu/~samuelp/del_project.html This link might help in triangulation...

Resources