What's the best depth map generation algorithm? - algorithm

I'm into a 2D-to-3D application project and I'm looking for a method to produce the depth map of a single input image, without other external informations. I know that's a sort of "artificial intelligence" mattern but maybe an efficient algorythm exists.
At the moment I've found this one: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.7959&rep=rep1&type=pdf but I'm wondering if there is a better method, before start implementing. Suggestions? Thanks!

I've written quite a few automatic depth map generators. I don't think there's one that's better than all others in all cases. It all depends on the stereo pair you're starting with. I personally think a depth map generator based on local method (window or block based) with an edge preserving smoother is probably the best all-around depth map generator.
In any case, on this page:
depth map generation software
you can find depth map generator software based on optical flow, weight-based windows, graph cuts, and many other things that relate to depth map generation and lenticular creation. The best part is that it's all free.
For 2d to 3d conversion (which is more what you are asking), there's a piece of software called DMAG4 that uses a scarsely populated depth map (typically, done in Gimp with the paint brush) to indicate the main depths and then fills the unfilled areas using interpolation while maintaining the edges of the objects (edge-preserving).
DMAG4 can be found here (it's free to use):
2d to 3d conversion software DMAG4
Another way to 2d to 3d conversion is to use a sculpting program like Gimpel3d or Blender, both free. Clearly, this goes beyond depth map since you're essentially creating a 3d scene in which you can then move around (using the camera movement in Blender). This is often referred to as "camera mapping".

Well, I have recently come upon this:
http://make3d.cs.cornell.edu/code.html
which comes together with code, although the license might be too restrictive
("Noncommercial — You may not use this work for commercial purposes").
the gallery is impressive
http://make3d.stanford.edu/images/showall

Related

Data structure for circular sector in robot vision

I'm trying to build a model of a 360-degree view of the surrounding environment from a distance sensor for continuous rotation (radar). I require a data structure for making a quickly computable strategy that will bring a robot to the first clear of obstacles point (or where the obstacle is far away).
I thought to a matrix of 360 numerical elements in which each element represents the detected distance in that degree of circumference.
Do you know a name for this data structure (used in this way)?
There are better representations for the situation I described?
The main language for the controller is Java.
It sounds to me that you are aware that your range data is effectively in polar co-ordinates.
The uniqueness of working with such 360° is in its circular, “wrap-around” nature.
Many people end up writing their own custom implementation around this data. Their is lots of theory in the robotics literature based on it for smoothing, segmenting, finding features, etc. (for example: “Line Extraction in 2D Range Images for Mobile Robotics”.)
Practically speaking, you might want to then consider checking out some robotics libraries. Something like ARIA. Another very good place to start is to use WeBots to emulate/model things - including range data - before transferring to a physical robotics platform.

image registration(non-rigid \ nonlinear)

I'm looking for some algorithm (preferably if source code available)
for image registration.
Image deformation can't been described by homography matrix(because I think that distortion not symmetrical and not
homogeneous),more specifically deformations are like barrel/distortion and trapezoid distortion, maybe some rotation of image.
I want to obtain pairs of pixel of two images and so i can obtain representation of "deformation field".
I google a lot and find out that there are some algorithm base on some phisics ideas, but it seems that they can converge
to local maxima, but not global.
I can affort program to be semi-automatic, it means some simple user interation.
maybe some algorithms like SIFT will be appropriate?
but I think it can't provide "deformation field" with regular sufficient density.
if it important there is no scale changes.
example of complicated field
http://www.math.ucla.edu/~yanovsky/Research/ImageRegistration/2DMRI/2DMRI_lambda400_grid_only1.png
What you are looking for is "optical flow". Searching for these terms will yield you numerous results.
In OpenCV, there is a function called calcOpticalFlowFarneback() (in the video module) that does what you want.
The C API does still have an implementation of the classic paper by Horn & Schunck (1981) called "Determining optical flow".
You can also have a look at this work I've done, along with some code (but be careful, there are still some mysterious bugs in the opencl memory code. I will release a corrected version later this year.): http://lts2www.epfl.ch/people/dangelo/opticalflow
Besides OpenCV's optical flow (and mine ;-), you can have a look at ITK on itk.org for complete image registration chains (mostly aimed at medical imaging).
There's also a lot of optical flow code (matlab, C/C++...) that can be found thanks to google, for example cs.brown.edu/~dqsun/research/software.html, gpu4vision, etc
-- EDIT : about optical flow --
Optical flow is divided in two families of algorithms : the dense ones, and the others.
Dense algorithms give one motion vector per pixel, non-dense ones one vector per tracked feature.
Examples of the dense family include Horn-Schunck and Farneback (to stay with opencv), and more generally any algorithm that will minimize some cost function over the whole images (the various TV-L1 flows, etc).
An example for the non-dense family is the KLT, which is called Lucas-Kanade in opencv.
In the dense family, since the motion for each pixel is almost free, it can deal with scale changes. Keep in mind however that these algorithms can fail in the case of large motions / scales changes because they usually rely on linearizations (Taylor expansions of the motion and image changes). Furthermore, in the variational approach, each pixel contributes to the overall result. Hence, parts that are invisible in one image are likely to deviate the algorithm from the actual solution.
Anyway, techniques such as coarse-to-fine implementations are employed to bypass these limits, and these problems have usually only a small impact. Brutal illumination changes, or large occluded / unoccluded areas can also be explicitly dealt with by some algorithms, see for example this paper that computes a sparse image of "innovation" alongside the optical flow field.
i found some software medical specific, but it's complicate and it's not work with simple image formats, but seems that it do that I need.
http://www.csd.uoc.gr/~komod/FastPD/index.html
Drop - Deformable Registration using Discrete Optimization

Convert polygons into mesh

I have a lot of polygons. Ideally, all the polygons must not overlap one other, but they can be located adjacent to one another.
But practically, I would have to allow for slight polygon overlap ( defined by a certain tolerance) because all these polygons are obtained from user hand drawing input, which is not as machine-precised as I want them to be.
My question is, is there any software library components that:
Allows one to input a range of polygons
Check if the polygons are overlapped more than a prespecified tolerance
If yes, then stop, or else, continue
Create mesh in terms of coordinates and elements for the polygons by grouping common vertex and edges together?
More importantly, link back the mesh edges to the original polygon(s)'s edge?
Or is there anyone tackle this issue before?
This issue is a daily "bread" of GIS applications - this is what is exactly done there. We also learned that at a GIS course. Look into GIS systems how they address this issue. E.g. ArcGIS define so called topology rules and has some functions to check if the edited features are topologically correct. See http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=Topology_rules
This is pretty long, only because the question is so big. I've tried to group my comments based on your bullet points.
Components to draw polygons
My guess is that you'll have limited success without providing more information - a component to draw polygons will be very much coupled to the language and UI paradigm you are using for the rest of your project, ie. code for a web component will look very different to a native component.
Perhaps an alternative is to separate this element of the process out from the rest of what you're trying to do. There are some absolutely fantastic pre-existing editors that you can use to create 2d and 3d polygons.
Inkscape is an example of a vector graphics editor that makes it easy to enter 2d polygons, and has the advantage of producing output SVG, which is reasonably easy to parse.
In three dimensions Blender is an open source editor that can be used to produce arbitrary geometries that can be exported to a number of formats.
If you can use a google-maps API (possibly in an native HTML rendering control), and you are interested in adding spatial points on a map overlay, you may be interested in related click-to-draw polygon question on stackoverflow. From past experience, other map APIs like OpenLayers support similar approaches.
Check whether polygons are overlapped
Thomas T made the point in his answer, that there are families of related predicates that can be used to address this and related queries. If you are literally just looking for overlaps and other set theoretic operations (union, intersection, set difference) in two dimensions you can use the General Polygon Clipper
You may also need to consider the slightly more generic problem when two polygons that don't overlap or share a vertex when they should. You can use a Minkowski sum to dilate (enlarge) two and three dimensional polygons to avoid such problems. The Computational Geometry Algorithms Library has robust implementations of these algorithms.
I think that it's more likely that you are really looking for a piece of software that can perform vertex welding, Christer Ericson's book Real-time Collision Detection includes extensive and very readable description of the basics in this field, and also on related issues of edge snapping, crack detection, T-junctions and more. However, even though code snippets are included for that book, I know of no ready made library that addresses these problems, in particular, no complete implementation is given for anything beyond basic vertex welding.
Obviously all 3D packages (blender, maya, max, rhino) all include built in software and tools to solve this problem.
Group polygons based on vertices
From past experience, this turned out to be one of the most time consuming parts of developing software to solve problems in this area. It requires reasonable understanding of graph theory and algorithms to traverse boundaries. It is worth relying upon a solid geometry or graph library to do the heavy lifting for you. In the past I've had success with igraph.
Link the updated polygons back to the originals.
Again, from past experience, this is just a case of careful bookkeeping, and some very careful design of your mesh classes up-front. I'd like to give more advice, but even after spending a big chunk of the last six months on this, I'm still struggling to find a "nice" way to do this.
Other Comments
If you're interacting with users, I would strongly recommend avoiding this issue where possible by using an editor that "snaps", rounding all user entered points onto a grid. This will hopefully significantly reduce the amount of work that you have to do.
Yes, you can use OGR. It has python bindings. Specifically, the Geometry class has an Intersects method. I don't fully understand what you want in points 4 and 5.

Object detection + segmentation

I 'm trying to find an efficient way of acceptable complexity to
detect an object in an image so I can isolate it from its surroundings
segment that object to its sub-parts and label them so I can then fetch them at will
It's been 3 weeks since I entered the image processing world and I've read about so many algorithms (sift, snakes, more snakes, fourier-related, etc.), and heuristics that I don't know where to start and which one is "best" for what I'm trying to achieve. Having in mind that the image dataset in interest is a pretty large one, I don't even know if I should use some algorithm implemented in OpenCV or if I should implement one my own.
Summarize:
Which methodology should I focus on? Why?
Should I use OpenCV for that kind of stuff or is there some other 'better' alternative?
Thank you in advance.
EDIT -- More info regarding the datasets
Each dataset consists of 80K images of products sharing the same
concept e.g. t-shirts, watches, shoes
size
orientation (90% of them)
background (95% of them)
All pictures in each datasets look almost identical apart from the product itself, apparently. To make things a little more clear, let's consider only the 'watch dataset':
All the pictures in the set look almost exactly like this:
(again, apart form the watch itself). I want to extract the strap and the dial. The thing is that there are lots of different watch styles and therefore shapes. From what I've read so far, I think I need a template algorithm that allows bending and stretching so as to be able to match straps and dials of different styles.
Instead of creating three distinct templates (upper part of strap, lower part of strap, dial), it would be reasonable to create only one and segment it into 3 parts. That way, I would be confident enough that each part was detected with respect to each other as intended to e.g. the dial would not be detected below the lower part of the strap.
From all the algorithms/methodologies I've encountered, active shape|appearance model seem to be the most promising ones. Unfortunately, I haven't managed to find a descent implementation and I'm not confident enough that that's the best approach so as to go ahead and write one myself.
If anyone could point out what I should be really looking for (algorithm/heuristic/library/etc.), I would be more than grateful. If again you think my description was a bit vague, feel free to ask for a more detailed one.
From what you've said, here are a few things that pop up at first glance:
Simplest thing to do it binarize the image and do Connected Components using OpenCV or CvBlob library. For simple images with non-complex background this usually yeilds objects
HOwever, looking at your sample image, texture-based segmentation techniques may work better - the watch dial, the straps and the background are wisely variant in texture/roughness, and this could be an ideal way to separate them.
The roughness of a portion can be easily found by the Eigen transform (explained a bit on SO, check the link to the research paper provided there), then the Mean Shift filter can be applied on the output of the Eigen transform. This will give regions clearly separated according to texture. Both the pyramidal Mean Shift and finding eigenvalues by SVD are implemented in OpenCV, so unless you can optimize your own code its better (and easier) to use inbuilt functions (if present) as far as speed and efficiency is concerned.
I think I would turn the problem around. Instead of hunting for the dial, I would use a set of robust features from the watch to 'stitch' the target image onto a template. The first watch has a set of squares in the dial that are white, the second watch has a number of white circles. I would per type of watch:
Segment out the squares or circles in the dial. Segmentation steps can be tricky as they are usually both scale and light dependent
Estimate the centers or corners of the above found feature areas. These are the new feature points.
Use the Hungarian algorithm to match features between the template watch and the target watch. Alternatively, one can take the surroundings of each feature point in the original image and match these using cross correlation
Use matching features between the template and the target to estimate scaling, rotation and translation
Stitch the image
As the image is now in a known form, one can extract the regions simply via pre set coordinates

Image recognition and 3d rendering

How hard would it be to take an image of an object (in this case of a predefined object), and develop an algorithm to cut just that object out of a photo with a background of varying complexity.
Further to this, a photo's object (say a house, car, dog - but always of one type) would need to be transformed into a 3d render. I know there are 3d rendering engines available (at a cost, free, or with some clause), but for this to work the object (subject) would need to be measured in all sorts of ways - e.g. if this is a person, we need to measure height, the curvature of the shoulder, radius of the face, length of each finger, etc.
What would the feasibility of solving this problem be? Anyone know any good links specialing in this research area? I've seen open source solutions to this problem which leaves me with the question of the ease of measuring the object while tracing around it to crop it out.
Thanks
Essentially I want to take a 2d image (typical image:which is easier than a complex photo containing multiple objects, etc.)
,
But effectively I want to turn that into a 3d image, so wouldn't what I want to do involve building a 3d rendering/modelling engine?
Furthermore, that link I have provided goes into 3ds max, with a few properties set, and a render is made.
It sounds like you want to do several things, all in the domain of computer vision.
Object Recognition (i.e. find the predefined object)
3D Reconstruction (make the 3d model from the image)
Image Segmentation (cut out just the object you are worried about from the background)
I've ranked them in order of easiest to hardest (according to my limited understanding). All together I would say it is a very complicated problem. I would look at the following Wikipedia links for more information:
Computer Vision Overview (Wikipedia)
The Eight Point Algorithm (for 3d reconstruction)
Image Segmentation
You're right this is an extremely hard set of problems, particularly that of inferring 3D information from a 2D image. Only a very limited understanding exists of how our visual system extrapolates 3D information from 2D images, one such approach is known as "Shape from Shading" and the linked google search shows how much (and consequently how little) we know.
Rob
This is a very difficult task. The hardest part is not recognising or segmenting the object from the image, but rather inferring the 3-D geometry of the object from the 2-D image. You will have more success if you can use a stereoscopic camera (or a laser scanner, if you have access to one ;).
For the case of 2-D images, try googling for "shape-from-shading". This is a method for inferring 3-D shape from a 2-D image. It does make assumptions about illumination conditions and surface properties (BRDF and geometry) that may fail in many cases, but if you are using it for only a predefined class of objects (e.g. human faces) it can work reasonably well.
Assuming it's possible, that would be extremely difficult, especially with only one image of the object. The rasterizer has to guess at the depth and distances of objects.
What you describe sounds very similar to Microsoft PhotoSynth.

Resources