Is there a CAD format, which allows to describe mechanical bonds between objects (axes, joints, springs, etc.) and/or motion of objects ?
I don't have any personal CAD experience, but 3D objects can be defined using edges and nodes/vertices which reduces the problem to a graph-representation issue. 3D models defined using other techniques (such as composed functions or a filter pipeline, such as B-spline > Lathe or extrude > Transformation > Material/texture) will have considerably different means of serialization to file.
Storing motion is a separate issue entirely - and also complicated by the many different ways to represent animation, and if your system uses any kind of skeletal basis or solid-body physics/animation. Then there's keyframed and parameterised animation. Are you using an inverse-kinematics system too?
You need to be more specific about why you need such CAD format?
Without really knowing what you are after, I think the closer thing to what you are asking for is Finite Element analysis programs like ANSYS, ABAQAS, etc. They work in CAD like environment and allow adding many types of elements to simulate dynamic (motion) anaylses and many other type of analysis. They include a huge variety of elements like linear springs are rltational springs.. etc.
Related
Could a CNN tell the difference between different size range of the same organism? Like a puppy vs a adult or a child vs an adult? Or more like a large fly vs a small fly, where they look identical but one is just larger than the other?
This is a tricky question to answer but usually theoretical CNN is able to do. It is mainly dependent on the training data itself. In case of a child vs adult, you can gather a dataset that includes alot of variances in sizes and ages in order to make sure that you will have CNN model that able to find patterns and generalize at the end. At the end, the CNN will learn many other features that make the classification scale or size invariant (In dependent of Size) such as shapes,colors, clothes and face features ....etc. Such Intra-class classification problems, it is not easily tackled with traditional supervised learning and therefore some researchers are applying an approach called "Deep Metric Learning".
Metric learning is the task of learning a distance function over objects. A metric or distance function has to obey four axioms: non-negativity, identity of indiscernibles, symmetry and subadditivity (or the triangle inequality). In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric.Wiki Definition
It would be better to differentiate the metric that you mention in the question. At first, it is a different task to recognize age and size.
About the age, yes, it is doable. For deep learning-based approach, you will need appropriate data. For non-training based approach (old-school image processing), you would need to create some metrics for each object based on age (counting the wrinkle, white hair etc. for humans)
About the size, unfortunately, it is still under research and it is not clear to mention if it is properly doable or not. Whenever we mention object size recognition from a single image, there are more things to consider. The first thing is the perspective. If the object found in the image is large with respect to the image coordinates, is it close to the camera, even though its size is tiny, hence, it is showing as large or it is really huge but too far away from the camera? Such a problem may be overcome by knowing the object geometry in prior and by developing an algorithm based on that geometry along with deep learning. However, current deep learning technology is not accurate enough to distinguish the dimensions and location, hence object geometry precisely yet.
Another alternative would be to control the environment. For example, if you know that both objects lie on the same plane (i.e. on the table, next to each other) in the real world, the rest is a trivial problem to resolve.
I have a pairs of images (input-output) but I don't know the transformation to going from A (input) to B (output). I want to record image A and get image B. Physically I can change the setup to get A or B, but I want to do it by software.
If I understood well, a trained Artificial Neural Network is able to do that, having an input can give the corresponding output, is it right?
Is there any software/ANN that just "training" it with entering a number of input-output pairs will be able to provide the correct output if the input is a new (but similar to the others) image?
Thanks
If you have some relevant amount of image pairs (input/output pair) and you don't know transformation between input and output you could train ANN on that training set to imitate that unknown transformation. You will be able to well train your ANN only if you have sufficient amount of training image pairs, but it could be pretty impossible when that unknown transformation is complicated.
For example if that transformation simply increases intensity values of pixels at input image by given value, ANN will very fast learn to imitate that behavior, but if that unknown transformation is some complicated convolution or few serial convolutions or something more complicated it will be very hard, near impossible to train ANN to imitate that transformation. So, more complex transformation will need bigger training set and more complex ANN design.
There are plenty of free opensource ANN libraries implemented in many languages. You could start for example with that tutorial: http://www.codeproject.com/Articles/13091/Artificial-Neural-Networks-made-easy-with-the-FANN
What you are asking is possible in principle -- in theory, an ANN with sufficiently many hidden units can learn an arbitrary function to map inputs to outputs. However, as the comments and other answers have mentioned, there may be many technical issues with your particular problem that could make it impractical. I would classify these problems as (a) mapping complexity, (b) model complexity, (c) scaling complexity, and (d) implementation complexity. They are all somewhat related, but hopefully this is a useful way to break things down.
Mapping complexity
As mentioned by Springfield762, there are many possible functions that map from one image to another image. If the relationship between your input images and your output images is relatively simple -- like increasing the intensity of each pixel by a constant amount -- then an ANN would be able to learn this mapping without much difficulty. There are probably many more transformations that would be similarly easy to learn, such as skewing, flipping, rotating, or translating an image -- basically any affine transformation would be easy to learn. Other, nonlinear transformations could also be feasible, such as squaring the intensity of each pixel.
As a general rule, the more complicated the relationship between your input and output images, the more difficult it will be to get a model to learn this mapping for you.
Model complexity
The more complex the mapping from inputs to outputs, the more complex your ANN model will be to be able to capture this mapping. Models with many hidden layers have been shown in the past 10 years to perform quite well on tasks that people had previously thought impossible, but often these state-of-the-art models have millions or even billions of parameters and take weeks to train on GPU hardware. A simple model can capture many simple mappings, but if you have a complex input-output map to learn, you'll need a large, complex model.
Scaling complexity
Yves mentioned in the comments that it can be difficult to scale models up to typical image sizes. If your images are relatively small (currently the state of the art is to model images on the order of 100x100 pixels), then you can probably just throw a bunch of raw pixel data at an ANN model and see what happens. But if you're using 6000x4000 images from your shiny Nikon DSLR, it's going to be quite difficult to process those in a reasonable amount of time. You'd be better off compressing your image data somehow (PCA is a common technique) and then trying to learn the mapping in the compressed space.
In addition, larger images will have a larger space of possible mappings between them, so you'll need more of your larger images as training data than you would if you had small images.
Springfield762 also mentioned this: If the mapping between your input and output images is simple, then you'll only need a few examples to learn the mapping successfully. But if you have a complicated mapping, then you'll need much more training data to have a chance at learning the mapping properly.
Implementation complexity
It's unlikely that a tool already exists that would let you just throw image data into an ANN model and have a mapping appear. Most likely you'll need, at a minimum, to implement some code that will pre-process your image data. In addition, if you have lots of large images you'll probably need to write code to handle loading data from disk, etc. (There are a lot of "big data" tools for things like this, but they all require some amount of work to get set up.)
There are many, many open source ANN toolkits out there nowadays. FANN (already mentioned) is a popular one in C++ with bindings in other languages. Caffe is quite popular, and is also implemented in C++ with bindings. There seem to be many toolkits that use Python and Theano or some other GPU acceleration library -- Keras, Lasagne, Hebel, Pylearn2, neon, and Theanets (I wrote this one). Many people use Torch, written in Lua. Matlab has at least one neural network toolbox. I'm less familiar with other ecosystems, but Java seems to have Deeplearning4j, C# has Accord, and even R has darch.
But with any of these neural network toolkits, you're going to have to write some code to load the data, process it into the appropriate input format, construct (or load) a network model, train the model, etc.
The problem you're trying to solve is a canonical classification problem that neural networks can help you solve. You treat the B images as a set of labels that you match to A, and once trained, the neural network will be able to match the B images to new input based on where the network locates new input in a high-dimensional vector space. I assume you'd use some combination of convolutional networks to create your features, and softmax for multinomial classification on the output layer. More here: http://deeplearning4j.org/convolutionalnets.html
Since this has been written there has been a lot of work in the realm of cgans ( conditional generative adversarial networks ) please refer to:
https://arxiv.org/pdf/1611.07004.pdf
I am trying to detect the vehicles for that i used the SURF/SIFT and BOW with SVM approach but vehicles are of different type and i just studies that SURF/SIFT is for one specific object detection like usb, mobile phone etc. Is that also mean that it also affect (in detection) different types of car's like toyota and bmw etc ? or vehicles like truck and car ?
If we provide a large number of dataset of 10/15 different vehicles to SURF/SIFT then isn't it give the decent result with detection of different type of vehicles using real time approach?
SURF/SIFT are spatial local features. If the dataset is large enough the result should be good. Even for different vehicles only specific structures of that vehicle are available in a scene image.
However false positives might creep in if similar non-vehicle structure is present. (E.g distorted image of a small rectangular house with fence). So, some global feature like road detection might increase accuracy.
So, i think sirf/surf features of vehicles with single class SVM should help if the false positives are not present in the image of your application.
It seems that features and possibly a processing method are chosen incorrectly. Features, for example, should be representative for all classes of cars and thus cannot be particular interest points with descriptors representing just some peculiarities of gradient around the point and matching looking for exact spacial configuration of points relative to other points.
The processing with SVM means you classify cars against all other objects. I am not sure how you are going to get “all other objects” supporting vectors though. Much more sensible features are the same the humans seems to use to detect cars, try HOG paper for example that uses mixture of deformable parts. This closer to the sate of the art and got multiple awards so there is no need to invent a bicycle.
I am simulating a NAO robot that has published physical properties for its links and joints (such as dimensions, link mass, center of mass, mass moment of inertia about that COM, etc).
The upper torso will be static, and I would like to get the lumped physical properties for the static upper torso. I have the math lined out (inertia tensors with rotation and parallel axis theorem), but I am wondering what the best method is to structure the data.
Currently, I am just defining everything as rules, a method I got from looking at data Import[]'d from struct's in a MAT file. I refer to attributes / properties with strings so that I don't have to worry about symbols being defined. Additionally, it makes it easier to generate the names for the different degrees of freedom.
Here is an example of how I am defining this:
http://pastebin.com/VNBwAVbX
I am also considering using some OOP package for Mathematica, but I am unsure of how to easily define it.
For this task, I had taken a look at Lichtblau's slides but could not figure out a simple way to apply it towards defining my data structure. I ended up using MathOO which got the job accomplished since efficiency was not too much of a concern and it was more or less of a one-time deal.
I have four sets of algorithms that I want to set up as modules but I need all algorithms executed at the same time within each module, I'm a complete noob and have no programming experience. I do however, know how to prove my models are decidable and have already done so (I know Applied Logic).
The models are sensory parsers. I know how to create the state-spaces for the modules but I don't know how to program driver access into ProLog for my web cam (I have a Toshiba Satellite Laptop with a built in web cam). I also don't know how to link the input from the web cam to the variables in the algorithms I've written. The variables I use, when combined and identified with functions, are set to identify unknown input using a probabilistic, database search for best match after a breadth first search. The parsers aren't holistic, which is why I want to run them either in parallel or as needed.
How should I go about this?
I also don't know how to link the
input from the web cam to the
variables in the algorithms I've
written.
I think the most common way for this is to use the machine learning approach: first calculate features from your video stream (like position of color blobs, optical flow, amount of green in image, whatever you like). Then you use supervised learning on labeled data to train models like HMMs, SVMs, ANNs to recognize the labels from the features. The labels are usually higher level things like faces, a smile or waving hands.
Depending on the nature of your "variables", they may already be covered on the feature-level, i.e. they can be computed from the data in a known way. If this is the case you can get away without training/learning.