Transform a set of 2d images representing all dimensions of an object into a 3d model - algorithm

Given a set of 2d images that cover all dimensions of an object (e.g. a car and its roof/sides/front/read), how could I transform this into a 3d objdct?
Is there any libraries that could do this?
Thanks

These "2D images" are usually called "textures". You probably want a 3D library which allows you to specify a 3D model with bitmap textures. The library would depend on platform you are using, but start with looking at OpenGL!
OpenGL for PHP
OpenGL for Java
... etc.

I've heard of the program "Poser" doing this using heuristics for human forms, but otherwise I don't believe this is actually theoretically possible. You are asking to construct volumetric data from flat data (inferring the third dimension.)
I think you'd have to make a ton of assumptions about your geometry, and even then, you'd only really have a shell of the object. If you did this well, you'd have a contiguous surface representing the boundary of the object - not a volumetric object itself.
What you can do, like Tomas suggested, is slap these 2d images onto something. However, you still will need to construct a triangle mesh surface, and actually do all the modeling, for this to present a 3D surface.
I hope this helps.

What there is currently that can do anything close to what you are asking for automagically is extremely proprietary. No libraries, but there are some products.
This core issue is matching corresponding points in the images and being able to say, this spot in image A is this spot in image B, and they both match this spot in image C, etc.
There are three ways to go about this, manually matching (you have the photos and have to use your own brain to find the corresponding points), coded targets, and texture matching.
PhotoModeller, www.photomodeller.com, $1,145.00US, supports manual matching and coded targets. You print out a bunch of images, attach them to your object, shoot your photos, and the software finds the targets in each picture and creates a 3D object based on those points.
PhotoModeller Scanner, $2,595.00US, adds texture matching. Tiny bits of the the images are compared to see if they represent the same source area.
Both PhotoModeller products depend on shooting the images with a calibrated camera where you use a consistent focal length for every shot and you got through a calibration process to map the lens distortion of the camera.
If you can do manual matching, the Match Photo feature of Google SketchUp may do the job, and SketchUp is free. If you can shoot new photos, you can add your own targets like colored sticker dots to the object to help you generate contours.
If your images are drawings, like profile, plan view, etc. PhotoModeller will not help you, but SketchUp may be just the tool you need. You will have to build up each part manually because you will have to supply the intelligence to recognize which lines and points correspond from drawing to drawing.
I hope this helps.

Related

Augment reality like zookazam

What algorithms are used for augmented reality like zookazam ?
I think it analyze image and find planes by contrast, but i don't know how.
What topics should I read before starting with app like this?
[Prologue]
This is extremly broad topic and mostly off topic in it's current state. I reedited your question but to make your question answerable within the rules/possibilities of this site
You should specify more closely what your augmented reality:
should do
adding 2D/3D objects with known mesh ...
changing light conditions
adding/removing body parts/clothes/hairs ...
a good idea is to provide some example image (sketch) of input/output of what you want to achieve.
what input it has
video,static image, 2D,stereo,3D. For pure 2D input specify what conditions/markers/illumination/LASER patterns you have to help the reconstruction.
what will be in the input image? empty room, persons, specific objects etc.
specify target platform
many algorithms are limited to memory size/bandwidth, CPU power, special HW capabilities etc so it is a good idea to add tag for your platform. The OS and language is also a good idea to add.
[How augmented reality works]
acquire input image
if you are connecting to some device like camera you need to use its driver/framework or something to obtain the image or use some common API it supports. This task is OS dependent. My favorite way on Windows is to use VFW (video for windows) API.
I would start with some static file(s) from start instead to ease up the debug and incremental building process. (you do not need to wait for camera and stuff to happen on each build). And when your App is ready for live video then switch back to camera...
reconstruct the scene into 3D mesh
if you use 3D cameras like Kinect then this step is not necessary. Otherwise you need to distinguish the object by some segmentation process usually based on the edge detections or color homogenity.
The quality of the 3D mesh depends on what you want to achieve and what is your input. For example if you want realistic shadows and lighting then you need very good mesh. If the camera is fixed in some room you can predefine the mesh manually (hard code it) and compute just the objects in view. Also the objects detection/segmentation can be done very simply by substracting the empty room image from current view image so the pixels with big difference are the objects.
you can also use planes instead of real 3D mesh as you suggested in the OP but then you can forget about more realistic quality of effects like lighting,shadows,intersections... if you assume the objects are standing straight then you can use room metrics to obtain the distance from camera. see:
selection criteria for different projections
estimate measure of photographed things
For pure 2D input you can also use the illumination to estimate the 3D mesh see:
Turn any 2D image into 3D printable sculpture with code
render
Just render the scene back to some image/video/screen... with added/removed features. If you are not changing the light conditions too much you can also use the original image and render directly to it. Shadows can be achieved by darkening the pixels ... For better results with this the illumination/shadows/spots/etc. are usually filtered out from the original image and then added directly by rendering instead. see
White balance (Color Suppression) Formula?
Enhancing dynamic range and normalizing illumination
The rendering process itself is also platform dependent (unless you are doing it by low level graphics in memory). You can use things like GDI,DX,OpenGL,... see:
Graphics rendering
You also need camera parameters for rendering like:
Transformation of 3D objects related to vanishing points and horizon line
[Basic topics to google/read]
2D
DIP digital image processing
Image Segmentation
3D
Vector math
Homogenous coordinates
3D scene reconstruction
3D graphics
normal shading
paltform dependent
image acquisition
rendering

Lightweight 3D animation driven by external data

I'm a structural engineering master student work on a seismic evaluation of a temple structure in Portugal. For the evaluation, I have created a 3D block model of the structure and will use a discrete element code to analyze the behaviour of the structure under a variety of seismic (earthquake) records. The software that I will use for the analysis has the ability to produce snapshots of the structure at regular intervals which can then be put together to make a movie of the response. However, producing the images slows down the analysis. Furthermore, since the pictures are 2D images from a specified angle, there is no possibility to rotate and view the response from other angles without re-running the model (a process that currently takes 3 days of computer time).
I am looking for an alternative method for creating a movie of the response of the structure. What I want is a very lightweight solution, where I can just bring in the block model which I have and then produce the animation by feeding in the location and the three principal axis of each block at regular intervals to produce the animation on the fly. The blocks are described as prisms with the top and bottom planes defining all of the vertices. Since the model is produced as text files, I can modify the output so that it can be read and understood by the animation code. The model is composed of about 180 blocks with 24 vertices per block (so 4320 vertices). The location and three unit vectors describing the block axis are produced by the program and I can write them out in a way that I want.
The main issue is that the quality of the animation should be decent. If the system is vector based and allows for scaling, that would be great. I would like to be able to rotate the model in real time with simple mouse dragging without too much lag or other issues.
I have very limited time (in fact I am already very behind). That is why I wanted to ask the experts here so that I don't waste my time on something that will not work in the end. I have been using Rhino and Grasshopper to generate my model but I don't think it is the right tool for this purpose. I was thinking that Processing might be able to handle this but I don't have any experience with it. Another thing that I would like to be able to do is to maybe have a 3D PDF file for distribution. But I'm not sure if this can be done with 3D PDF.
Any insight or guidance is greatly appreciated.
Don't let the name fool you, but BluffTitler DX9, a commercial software, may be what your looking for.
It's simple interface provides a fast learning curve, may quick tutorials to either watch or dissect. Depending on how fast your GPU is, real-time previews are scalable.
Reference:
Model Layer Page
User Submitted Gallery (3D models)
Jim Merry from tetra4D here. We make the 3D CAD conversion tools for Acrobat X to generate 3D PDFs. Acrobat has a 3D javascript API that enables you to manipulate objects, i.e, you could drive translations, rotations, etc of objects from your animation information after translating your model to 3D PDF. Not sure I would recommend this approach if you are in a hurry however. Also - I don't think there are any commercial 3D PDF generation tools for the formats you are using (Rhino, Grasshopper, Processing).
If you are trying to animate geometric deformations, 3D PDF won't really help you at all. You could capture the animation and encode it as flash video and embed in a PDF, but this a function of the multimedia tool in Acrobat Pro, i.e, is not specific to 3D.

Making 3D representation of an object with a webcam

Is it possible to make a 3D representation of an object by capturing many different angles using a webcam? If it is, how is it possible and how is the image-processing done?
My plan is to make a 3D representation of a person using a webcam, then from the 3D representation, i will be able to tell the person's vital statistics.
As Bart said (but did not post as an actual answer) this is entirely possible.
The research topic you are interested in is often called multi view stereo or something similar.
The basic idea resolves around using point correspondences between two (or more) images and then try to find the best matching camera positions. When the positions are found you can use stereo algorithms to back project the image points into a 3D coordinate system and form a point cloud.
From that point cloud you can then further process it to get the measurements you are looking for.
If you are completely new to the subject you have some fascinating reading to look forward to!
Bart proposed Multiple view geometry by Hartley and Zisserman, which is a very nice book indeed.
As Bart and Kigurai pointed out, this process has been studied under the title of "stereo" or "multi-view stereo" techniques. To be able to get a 3D model from a set of pictures, you need to do the following:
a) You need to know the "internal" parameters of a camera. This includes the focal length of the camera, the principal point of the image and account for radial distortion in the image.
b) You also need to know the position and orientation of each camera with respect to each other or a "world" co-ordinate system. This is called the "pose" of the camera.
There are algorithms to perform (a) and (b) which are described in Hartley and Zisserman's "Multiple View Geometry" book. Alternatively, you can use Noah Snavely's "Bundler" http://phototour.cs.washington.edu/bundler/ software to also do the same thing in a very robust manner.
Once you have the camera parameters, you essentially know how a 3D point (X,Y,Z) in the world maps to an image co-ordinate (u,v) on the photo. You also know how to map an image co-ordinate to the world. You can create a dense point cloud by searching for a match for each pixel on one photo in a photo taken from a different view-point. This requires a two-dimensional search. You can simplify this procedure by making the search 1-dimensional. This is called "rectification". You essentially take two photos and transform then so that their rows correspond to the same line in the world (simplified statement). Now you only have to search along image rows.
An algorithm for this can be also found in Hartley and Zisserman.
Finally, you need to do the matching based on some measure. There is a lot of literature out there on "stereo matching". Another word used is "disparity estimation". This is basically searching for the match of pixel (u,v) on one photo to its match (u, v') on the other photo. Once you have the match, the difference between them can be used to map back to a 3D point.
You can use Yasutaka Furukawa's "CMVS" or "PMVS2" software to do this. Or if you want to experiment by yourself, openCV is a open-source computer vision toolbox to do many of the sub-tasks required for this.
This can be done with two webcams in the same ways your eyes work. It is called stereoscopic vision.
Have a look at this:
http://opencv.willowgarage.com/documentation/camera_calibration_and_3d_reconstruction.html
An affordable alternative to get 3D data would be the Kinect camera system.
Maybe not the answer you are hoping for but Microsoft's Kinect is doing that exact thing, there are some open source drivers out there that allow you to connect it to your windows/linux box.

3d model construction using multiple images from multiple points (kinect)

is it possible to construct a 3d model of a still object if various images along with depth data was gathered from various angles, what I was thinking was have a sort of a circular conveyor belt where a kinect would be placed and the conveyor belt while the real object that is to be reconstructed in 3d space sits in the middle. The conveyor belt thereafter rotates around the image in a circle and lots of images are captured (perhaps 10 image per second) which would allow the kinect to catch an image from every angle including the depth data, theoretically this is possible. The model would also have to be recreated with the textures.
What I would like to know is whether there are any similar projects/software already available and any links would be appreciated
Whether this is possible within perhaps 6 months
How would I proceed to do this? Such as any similar algorithm you could point me to and such
Thanks,
MilindaD
It is definitely possible and there are a lot of 3D scanners which work out there, with more or less the same principle of stereoscopy.
You probably know this, but just to contextualize: The idea is to get two images from the same point and to use triangulation to compute the 3d coordinates of the point in your scene. Although this is quite easy, the big issue is to find the correspondence between the points in your 2 images, and this is where you need a good software to extract and recognize similar points.
There is an open-source project called Meshlab for 3d vision, which includes 3d reconstruction* algorithms. I don't know the details of the algorithms, but the software is definitely a good entrance point if you want to play with 3d.
I used to know some other ones, I will try to find them and add them here:
Insight3d
(*Wiki page has no content, redirects to login for editing)
Check out https://bitbucket.org/tobin/kinect-point-cloud-demo/overview which is a code sample for the Kinect for Windows SDK that does specifically this. Currently it uses the bitmaps captured by the depth sensor, and iterates through the byte array to create a point cloud in a PLY format that can read by MeshLab. The next stage of us is to apply/refine a delanunay triangle algoirthim to form a mesh instead of points, which a texture can be applied. A third stage would then me a mesh merging formula to combine multiple caputres from the Kinect to form a full 3D object mesh.
This is based on some work I done in June using Kinect for the purposes of 3D printing capture.
The .NET code in this source code repository will however get you started with what you want to achieve.
Autodesk has a piece of software that will do what you are asking for it is called "Photofly". It is currently in the labs section. Using a series of images taken from multiple angles the 3d geometry is created and then photo mapped with your images to create the scene.
If you interested more in theoretical (i mean if you want to know how) part of this problem,
here is some document from Microsoft Research about moving depth camera and 3D reconstruction.
Try out VisualSfM (http://ccwu.me/vsfm/) by Changchang Wu (http://ccwu.me/)
It takes multiple images from different angles of the scene and outputs a 3D point cloud.
The algorithm is called "Structure from Motion".
Brief idea of the algorithm : It involves extracting feature points in each image; finding correspondences between them across images; building feature tracks, estimating camera matrices and thereby the 3D coordinates of the feature points.

Image recognition and 3d rendering

How hard would it be to take an image of an object (in this case of a predefined object), and develop an algorithm to cut just that object out of a photo with a background of varying complexity.
Further to this, a photo's object (say a house, car, dog - but always of one type) would need to be transformed into a 3d render. I know there are 3d rendering engines available (at a cost, free, or with some clause), but for this to work the object (subject) would need to be measured in all sorts of ways - e.g. if this is a person, we need to measure height, the curvature of the shoulder, radius of the face, length of each finger, etc.
What would the feasibility of solving this problem be? Anyone know any good links specialing in this research area? I've seen open source solutions to this problem which leaves me with the question of the ease of measuring the object while tracing around it to crop it out.
Thanks
Essentially I want to take a 2d image (typical image:which is easier than a complex photo containing multiple objects, etc.)
,
But effectively I want to turn that into a 3d image, so wouldn't what I want to do involve building a 3d rendering/modelling engine?
Furthermore, that link I have provided goes into 3ds max, with a few properties set, and a render is made.
It sounds like you want to do several things, all in the domain of computer vision.
Object Recognition (i.e. find the predefined object)
3D Reconstruction (make the 3d model from the image)
Image Segmentation (cut out just the object you are worried about from the background)
I've ranked them in order of easiest to hardest (according to my limited understanding). All together I would say it is a very complicated problem. I would look at the following Wikipedia links for more information:
Computer Vision Overview (Wikipedia)
The Eight Point Algorithm (for 3d reconstruction)
Image Segmentation
You're right this is an extremely hard set of problems, particularly that of inferring 3D information from a 2D image. Only a very limited understanding exists of how our visual system extrapolates 3D information from 2D images, one such approach is known as "Shape from Shading" and the linked google search shows how much (and consequently how little) we know.
Rob
This is a very difficult task. The hardest part is not recognising or segmenting the object from the image, but rather inferring the 3-D geometry of the object from the 2-D image. You will have more success if you can use a stereoscopic camera (or a laser scanner, if you have access to one ;).
For the case of 2-D images, try googling for "shape-from-shading". This is a method for inferring 3-D shape from a 2-D image. It does make assumptions about illumination conditions and surface properties (BRDF and geometry) that may fail in many cases, but if you are using it for only a predefined class of objects (e.g. human faces) it can work reasonably well.
Assuming it's possible, that would be extremely difficult, especially with only one image of the object. The rasterizer has to guess at the depth and distances of objects.
What you describe sounds very similar to Microsoft PhotoSynth.

Resources