I was googling the last few days about active appearance model (AAM). I found a shape model and texture model and now I'm trying to do some research about active shape model (ASM) and I'm getting confused.
Are the active shape model (ASM) and the shape model in AAM the same?
AAM involves (among other things) both a shape model and a texture model. The shape model is usually obtained by what is referred to as an active shape model (ASM). So the answer is yes, the shape model in AAM is the active shape model ASM.
Are the active shape model (ASM) and the shape model in AAM the same?
The answer to that is neither a straightforward yes or no, and needs some information.
The shape model in both the ASM and the AAM are the same. They are point distribution models (PDMs), i.e. you have a set of rigidly aligned points, and learn a PCA, and then you've got a statistical model of those points, which is what one would call shape model.
Now, what one commonly calls Active Shape Model is the combination of a PDM and a fitting algorithm - hence the term Active. The most simple method for that is to search along each points normal ("profile direction"), and search for the strongest edge. A slightly more elaborate method is to learn the 1D gradient profile along each points normals from the training images. Cootes describes both methods in An Introduction to Active Shape Models.
An AAM has the same PDM as an ASM to model the statistical variability of the shape, but the fitting is done differently. AAMs use the learned statistical model of the appearance for the fitting, and not the method used by ASMs.
Hence, strictly speaking, the answer to your question that is quoted above is: No, the ASM and the shape model in an AAM are not the same. An AAM does not "contain" an ASM. It contains, however, the PDM part of the ASM, and the shape model in both the ASM and AAM are the same. The shape model is fitted differently though in ASMs and AAMs.
I recommend to read the paper I linked above for more details, it's a very, very well-written and very easy to understand paper.
Related
I'm developing a simple game, where user can place different but modular objects (for instance: tracks, road etc).
My question is: how to match and place different object when placed one near the other ?
My first approach is to create an hidden child object (a box) for each module objects, and put it in the border where is possible to place other object (see my image example), so i can use that coordinates (x,y,z) to align other object.
But i don't know if the best approach.
Thanks
Summary:
1.Define what is a "snapping point"
2.Define which is your threshold
3.Update new game object position
Little Explanation
1.
So I suppose that you need a way to define which parts of the object are the "snapping points".
Cause they can be clear in some examples, like a Cube, where the whole vertex could be snapping points, but it's hard to define that every vertex in amorphous objects.
A simple solution could be the one exposed by #PierreBaret, whic consists in define on your transform component which are the "snapping points".
The other one is the one you propouse, creating empty game objects that will act as snapping points locations on the game object.
2.After having those snaped points, when you will drop your new gameObject, you need to define a threshold, as long as you don't want that every object snaps allways to the nearest game object.
3.So you define a minimum distance between snapping points, so if your snapping point is under that threshold, you will need to update it's position, to adjust to the the snapped point.
Visual Representation:
Note: The Threshold distance is showing just ONE of the 4 current threshold checks on the 4 vertex in the square, but this dark blue circle should be repilcate 3 more times, one for each green snapping point of the red square
Of course this method seems expensive, you can make some improvements like setting a first threshold between gameobjects, and if the gameObject is inside this threshold, then check snapping threshold distance.
Hope it helps!
Approach for arbitrary objects/models and deformable models.
[A] A physical approach would consider all the surfaces of the 2 objects, and you might need to check that objects don't overlap, using dot products between surfaces. That's a bit more expensive computing, but nothing nasty. If there is no match involved here, you'll be able to add matching features (see [B]). However, that's the only way to work with non predefined models or deformable models.
Approaches for matching simple and complex models
[B] Snapping points are a good thing but it's not sufficient alone. I think you need to make an object have:
a sparse representation (eg., complex oriented sphere to a cube),
and place key snapping points,
tagged by polarity or color, and eventually orientation (that's oriented snapping points); eg., in the case of rails, you'll want rails to snap {+} with {+} and forbid {+} with {-}. In the case of a more complex object, or when you have several orientations (eg., 2 faces of a surface, but only one is candidate for an pair of objects matching) you'll need more than 2 polarities, but 3 different ones per matching candidate surface or feature therefore the colors (or any enumeration). You need 3 different colors to make sure there is a unique 3D space configuration. You create something that is called in chemistry an enantiomer.
You can also use point pair features that describes the relative
position and orientation of two oriented points, when an oriented
surface is not appropriate.
References
Some are computer vision papers or book extracts, but they expose algorithms and concepts to achieve what I developed in my answer.
Model Globally, Match Locally: Efficient and Robust 3D Object Recognition, Drost et al.
3D Models and Matching
How do I get the kinect facetracking mesh?
this is the mesh: http://imgur.com/TV6dHBC
I have tried several ways, but could not make it work.
e.g.: http://msdn.microsoft.com/en-us/library/jj130970.aspx
3D Face Model Provided by IFTModel Interface
The Face Tracking SDK also tries to fit a 3D mask to the user’s face.
The 3D model is based on the Candide3 model
(http://www.icg.isy.liu.se/candide/) :
Note:
This model is not returned directly at each call to the Face Tracking
SDK, but can be computed from the AUs and SUs.
There is no direct functionality to do that. You have to use the triangle and vertex data to generate the necessary vertex and indices lists that are required.
GetTriangles method gets you the faces (indexes of the vertices of the triangles in a clockwise fashion), and then from using these indexes for the array of vertices to get the 3d model. Array of vertices has to be reconstructed every frame from the AUs and SUs with Get3DShape or GetProjectedShape (2D) functions.
For more, search for IFTModel (http://msdn.microsoft.com/en-us/library/jj130970.aspx) and for visualizeFaceModel (a sample code, which can help in understanding the input parameters of get3DShape).
(This sample uses the getProjectedShape, but the input parameters are nearly identical for both functions)
if i do a human model and import him to game engine. does game engine knows all point cordinates on model and rotates each ones? all models consists million points and and if i rotate a model 90 degree , does game engine calculates millions point new location and rotate? how does it works. Thanks
This is a bit of a vague question since each game engine will work differently, but in general the game engine will not touch the model coordinates.
Models are usually loaded with model space (or local space) coordinates - this simply means that each vertex is defined with a location relative to the origin of that model. The origin is defined as (0,0,0) and is the point around which rotations take place.
Now the game engine loads and keeps the model in this coordinate space. Then you provide your transformations (such as translation and rotation matrices) to place that model somewhere in your "world" (i.e. the global coordinate space shared by all objects). You also provide the way you want to view this world with various other transforms such projection and view matrices.
The game engine then takes all of these transformations and passes them to the GPU (or software renderer, in some cases) - it will also setup other stuff such as textures, etc. These are usually set once per frame (or per object for a frame).
Finally, it then passes each vertex that needs to be processed to the renderer. Each vertex is then transformed by the renderer using all the transformations specified to get a final vertex position - first in world space and then in screen space - which it can use to render pixels based on various other information (such as textures and lighting).
So the point is, in most cases, the engine really has nothing to do with the rotation of the model/vertices. It is simply a way to manage the model and the various settings that apply to it.
Of course, the engine can rotate the model and modify it's vertices, but this is usually only done during loading - for example if the model needs to be converted between different coordinate spaces.
There is a lot more going on, and this is a very basic description of what actually happens. There are many many sources that describe this process in great detail, so I won't even try to duplicate it. Hopefully this gives you enough detail to understand the basics.
I'm working on a project with computer vision (opencv 2.4 on c++). On this project I'm trying to detect certain features to build a map (an internal representation) of the world around.
The information I have available is the camera pose (6D vector with 3 position and 3 angular values), calibration values (focal length, distortion, etc) and the features detected on the object being tracked (this features are basically the contour of the object but it doesn't really matter)
Since the camera pose, the position of the features and other variables are subject to errors, I want to model the object as a 3D probability density function (with the probability of finding the "object" on a given 3D point on space, this is important since each contour has a probability associated of how likely it is that it is an actually object-contour instead of a noise-contour(bear with me)).
Example:
If the object were a sphere, I would detect a circle (contour). Since I know the camera pose, but have no depth information, the internal representation of that object should be a fuzzy cylinder (or a cone, if the camera's perspective is included but it's not relevant). If new information is available (new images from a different location) a new contour would be detected, with it's own fuzzy cylinder merged with previous data. Now we should have a region where the probability of finding the object is greater in some areas and weaker somewhere else. As new information is available, the model should converge to the original object shape.
I hope the idea is clear now.
This model should be able to:
Grow dynamically if needed.
Update efficiently as new observations are made (updating the probability inside making stronger the areas observed multiple times and weaker otherwise). Ideally the system should be able to update in real time.
Now the question:
How can I do to computationally represent this kind of fuzzy information in such a way that I can perform these tasks on it?
Any suitable algorithm, data structure, c++ library or tool would help.
I'll answer with the computer vision equivalent of Monty Python: "SLAM, SLAM, SLAM, SLAM!": :-) I'd suggest starting with Sebastian Thrun's tome.
However, there's older older work on the Bayesian side of active computer vision that's directly relevant to your question of geometry estimation, e.g. Whaite and Ferrie's seminal IEEE paper on uncertainty modeling (Waithe, P. and Ferrie, F. (1991). From uncertainty to visual exploration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10):1038–1049.). For a more general (and perhaps mathematically neater) view on this subject, see also chapter 4 of D.J.C. MacKay's Ph.D. thesis.
How hard would it be to take an image of an object (in this case of a predefined object), and develop an algorithm to cut just that object out of a photo with a background of varying complexity.
Further to this, a photo's object (say a house, car, dog - but always of one type) would need to be transformed into a 3d render. I know there are 3d rendering engines available (at a cost, free, or with some clause), but for this to work the object (subject) would need to be measured in all sorts of ways - e.g. if this is a person, we need to measure height, the curvature of the shoulder, radius of the face, length of each finger, etc.
What would the feasibility of solving this problem be? Anyone know any good links specialing in this research area? I've seen open source solutions to this problem which leaves me with the question of the ease of measuring the object while tracing around it to crop it out.
Thanks
Essentially I want to take a 2d image (typical image:which is easier than a complex photo containing multiple objects, etc.)
,
But effectively I want to turn that into a 3d image, so wouldn't what I want to do involve building a 3d rendering/modelling engine?
Furthermore, that link I have provided goes into 3ds max, with a few properties set, and a render is made.
It sounds like you want to do several things, all in the domain of computer vision.
Object Recognition (i.e. find the predefined object)
3D Reconstruction (make the 3d model from the image)
Image Segmentation (cut out just the object you are worried about from the background)
I've ranked them in order of easiest to hardest (according to my limited understanding). All together I would say it is a very complicated problem. I would look at the following Wikipedia links for more information:
Computer Vision Overview (Wikipedia)
The Eight Point Algorithm (for 3d reconstruction)
Image Segmentation
You're right this is an extremely hard set of problems, particularly that of inferring 3D information from a 2D image. Only a very limited understanding exists of how our visual system extrapolates 3D information from 2D images, one such approach is known as "Shape from Shading" and the linked google search shows how much (and consequently how little) we know.
Rob
This is a very difficult task. The hardest part is not recognising or segmenting the object from the image, but rather inferring the 3-D geometry of the object from the 2-D image. You will have more success if you can use a stereoscopic camera (or a laser scanner, if you have access to one ;).
For the case of 2-D images, try googling for "shape-from-shading". This is a method for inferring 3-D shape from a 2-D image. It does make assumptions about illumination conditions and surface properties (BRDF and geometry) that may fail in many cases, but if you are using it for only a predefined class of objects (e.g. human faces) it can work reasonably well.
Assuming it's possible, that would be extremely difficult, especially with only one image of the object. The rasterizer has to guess at the depth and distances of objects.
What you describe sounds very similar to Microsoft PhotoSynth.