How to make a neural net give position? - image

I understand how to do classification problems and starting to understand convolution networks which I think is the answer to some extent. I'm a bit confused on how to setup a network to give me the output position.
Let's say you have the position of the end point of noses for a data set with faces. To find the end point do you just do a 'classification' type problem where your output layer is something like 64x64 = 4096 points but if the nose is at point row 43 and column 20 of your grid you just set the output as all zero's except for at element 43*64 + 20 = 2772 where you set it equal to 1? Then just map it back to your image dimensions.
I can't find much info on how this part of identification works and this is my best guess. I'm working towards a project at the second with this methodology, but it is going to be a lot of work and want to know if I'm at least on the right track. This seems to be a solved problem, but I just can't seem to find how people do this.

Although what you describe could feasibly work, generally neural networks (convolutional and otherwise) are not used to determine the position of a feature in an image. In particular, Convolutional Neural Networks (CNNs) are specifically designed to be translation invariant so that they will detect features regardless of their position in the input image - this is sort of the inverse of what you're looking for.
One common and effective solution for the kind of problem you're describing is a cascade classifier. They have some limitations, but for the kind of application you're describing, it would probably work quite well. In particular, cascade classifiers are designed to provide good performance owing to the staged approach in which most sections of the input image are very quickly dismissed by the first couple stages.
Don't get me wrong, it may be interesting to experiment with using the approach you described; just be aware that it may prove difficult to get it to scale well.

Related

How can I isolate and recolor specific color range?

Given an image of the region containing the lips and other "noise" (teeth, skin), how can we isolate and recolor only the lips (simulating a "lipstick" effect)?
Attached is a photo describing the lips/mouth states.
What we have tried so far is a three-part process:
Color matching the lips using a stable point on the lips (provided by internal API).
Use this color as the base color for the lips isolation.
Recolor the lips (lipstick behavior)
We tried a few algorithms like hue difference, HSV difference, ∆and E after converting them to CIE color space. Unfortunately, nothing has panned out or has produced artifacts due to the skin's relative similarity in color to the lips and the discoloration from shadows cast by the nose and mouth.
What are we missing? Is there a better way to approach it?
We are looking for a solution/direction from a classic Computer Vision color algorithm, not a solution from the Machine Learning/Depp Learning domain. Thanks!
You probably won't like this answer, but your question is ill-posed (there is no measurable solution that is better than others, there are only peoples' opinions.)
In this case, the best answer you can hope for then is usually:
Ask an expert for a large set of examples that would be acceptable in practice.
Your problem can easily be solved by an appropriate artist (who you trust will produce usable results) with access to the right tools (for example photoshop,) but a single artist (or even a group of them) can't possibly scale to millions (or whatever large number you care about) of examples.
To address the short-coming of the artist-based solution, you can use the following strategy:
Collect a sufficiently large set of before and after images created by artists, who you deem trustworthy.
Apply your favorite machine learning algorithm to learn a mapping from the before to the after images. There are many possible choices, and it almost really doesn't matter which you choose as long as you know how to use it well.
Note, the above two steps are usually not one-and-done, as most algorithms are. Usually, you will come across pathological or not-well behaved examples to your ML solution above in using the product. The key is to collect these examples, pass them through the artist and retrain or update your ML model. Repeat this enough times and you will produce a state-of-the-art solution to your problem.
Whether you have the funding, time, motivation and resources to accomplish this is another matter.
You should try semantic segmentation techniques that would definitely give you very good results and it would be a generalized concept.

When should these methods be used to calculate blob orientation?

In image processing, each of the following methods can be used to get the orientation of a blob region:
Using second order central moments
Using PCA to find the axis
Using distance transform to get the skeleton and axis
Other techniques, like fitting the contour of the region with an ellipse.
When should I consider using a specific method? How do they compare, in terms of accuracy and performance?
I'll give you a vague general answer, and I'm sure others will give you more details. This issue comes up all the time in image processing. There are N ways to solve my problem, which one should I use? The answer is, start with the simplest one that you understand the best. For most people, that's probably 1 or 2 in your example. In most cases, they will be nearly identical and sufficient. If for some reason the techniques don't work on your data, you have now learned for yourself, a case where the techniques fail. Now, you need to start exploring other techniques. This is where the hard work comes in, in being a image processing practitioner. There are no silver bullets, there's a grab bag of techniques that work in specific contexts, which you have to learn and figure out. When you learn this for yourself, you will become god like among your peers.
For this specific example, if your data is roughly ellipsoidal, all these techniques will be similar results. As your data moves away from ellipsoidal, (say spider like) the PCA/Second order moments / contours will start to give poor results. The skeleton approaches become more robust, but mapping a complex skeleton to a single axis / orientation can become a very difficult problem, and may require more apriori knowledge about the blob.

How to choose inputs to an artificial network?

I have to implement, as part of an assignment, a learning method that enables minesweepers to avoid colliding with mines. I have been given the choice to choose between a supervised/unsupervised/reinforcement learning algorithm.
I remember in one of my lectures, the lecturer mentioned ALVIN. He was teaching artificial neural networks.
Since the behaviour I'm looking for is more or less similar to ALVINN's, I want to implement an ANN. I've implemented an ANN before for solving the 3-parity xor problem, here's my solution. I've never really understood the intuition behind ANNs.
I was wondering, what could the inputs be to my ANN? In the case of the 3parity xor problem it was obvious.
When it comes to frameworks for ANN, each person will have their own preferences. I recently used Encog framework for implementing an image processing project and found it very easy to implement.
Now, coming to your problem statement, "a learning method that enables minesweepers to avoid colliding with mines" is a very wide scope. What is indeed going to be your input to the ANN? You will have to decide your input based on whether it is going to be implemented on a real robot or in a simulation environment.
It can be clearly inferred that an unsupervised learning can be ruled out if you are trying to implement something like the ALVIN.
In a simulation environment, the best option is if you can somehow form a grid map of the environment based on the simulated sensor data. Then the occupancy grid surrounding the robot can form a good input to the robot's ANN.
If you can't form a grid map (if the data is insufficient), then you should try to feed all the available and relevant sensor data to the ANN. However, they might have to be pre-processed, depending on the modelled sensor noise given by your simulation environment. If you have a camera feed (like the ALVIN model), then you may directly follow their footsteps and train your ANN likewise.
If this is a real robot, then the choices vary considerably, depending upon the robustness and accuracy requirements. I really hope you do not want to build a robust and field-ready minesweeper single-handedly. :) For a small, controlled environment, your options will be very similar to that of a simulated environment, however sensor noise would be nastier and you would have to figure in various special cases into your mission planner. Still, it would be advisable to fuse a few other sensors (LRF, ultrasound etc.) with vision sensors and use it as an input to your planner. If nothing else is available, copy paste the ALVIN system with only a front camera input.
The ANN training methodology will be similar (if using only vision). The output will be right/left/straight etc. Try with 5-7 hidden layer nodes first, since that is what ALVIN uses. Increase it up to 8-10 max. Should work. Use activation functions properly.
Given its success in the real world, ALVIN seems like a good system to base yours off of! As the page you linked to discusses, ALVIN essentially receives an image of the road ahead as its input. On a low level, this is achieved through 960 input nodes representing a 30X32 pixel image. The input value for each node is the color saturation of the pixel that that node represents (with 0 being a completely white image and 1 being a completely black image, or something along those lines) (I'm pretty sure the picture is greyscale, although maybe they're using color now, which could be achieved, for instance, by using three input nodes per pixel, one representing red saturation, one representing green, and one blue). Is there a reason that you don't think that this would be a good input for your system too?
For more low level details, see the original paper.

Matlab - distinguish overlapping low contrast objects in a RGB or Grayscale Image

I have a big problem detecting objects within an image - I know this topic was already highly discusses in many forums, but I spend the last 4 days searching for an answer and was not able.
In fact: I have a picture from a branch (http://cl.ly/image/343Y193b2m1c). My goal is to count every single needle in this picture. So I have to face several problems:
Separate the branch with its needles from the background (which in this case is no problem).
Select the borders of the needles. This is a huge problem; I tried different ways including all edges() functions but the problem is always the same - the borders around the needles are not closed and - which leads to the last problem:
Needles are overlapping! This leads in "squares between the needles" which are, if I use imfill() or equal formula, filled in instead of the needles. And: the places where the needles are concentrated (many needles at one place) are nearly impossible to distinguish.
I tried watershed, I tried to enhance the contrast, Kmeans clustering, I tried imerose, imdilate and related functions with subsequent edge detection. I tried as well to filter and smooth the picture a bit in order to "unsharp" the needles a bit so that not every small change in color is recognized as a border (which is another problem).
I am relatively new to matlab, so I dont know what I have to look for. I tried to follow the MatLab tutorial used for Nuclei detection - but with this I just can get all the green objects (all needles at once).
I hope this questions did not came up before - if yes, I apologize deeply for the double post. If anybody has an idea what to do or what methods to use, it would be awesome and would safe this really bad beginning of the week.
Thank you very much in advance,
Phillip
Distinguishing overlapping objects is very, very hard, particularly if you do not know how many objects you have to distinguish. Your brain is much better at distinguishing overlapping objects than any segmentation algorithm I'm aware of, since it is able to integrate a lot of information that is difficult to encode. Therefore: If you're not able to distinguish some of the features yourself, forget about doing it via code.
Having said that, there may be a way for you to be able to get an approximate count of the needles: If you can segment the image pixels into two classes: "needle" versus "not needle", and you know how much area in your picture is covered by a needle (it may help to include a ruler when you take the picture), you can then divide number of "needle"-pixels by the number of pixels covered by a single needle to estimate the total number of needles in the image. This will somewhat underestimate the needle count due to overlaps, and it will underestimate more the denser the needles are (due to more overlaps), but it should allow you to compare automatically between branches with lots of needles and branches with few needles, as well as to identify changes in time, should that be one of your goals.
I agree with #Jonas = you got yourself one HUGE problem.
Let me make a few suggestions.
First, along #Jonas' direction, instead of getting an accurate count, another way of getting a rough estimate is by counting the tips of the needles. Obviously, not all the tips are clearly visible. But, if you can get a clear mask of the branch it might be relatively easy to identify the tips of the needles using some of the morphological operations you mentioned yourself.
Second, is there any way you can get more information? For example, if you could have depth information it might help a little in distinguishing the needles from one another (it will not completely solve the task but it may help). You may get depth information from stereo - that is, taking two pictures of the branch while moving the camera a bit. If you have a Kinect device at your disposal (or some other range-camera) you can get a depth map directly...

Help to learn Image Search algorithm

I am a beginner in image processing. I want to write an application in C++ or in C# for
Searching an image in a list of images
Searching for a particular feature (for e.g. face) in a list of images.
Can anybody suggest where should I start from?
What all should I learn before doing this?
Where can I find the correct information regarding this?
In terms of the second one, you should start off with learning how to solve the decision problem of whether a square patch contains a face (or whatever kind of object you are interested in). For that, I suggest you study a little bit of machine learning, the AdaBoost algorithm, Haar features, and Viola-Jones.
Once you know how to do that, the trick is really just to take a sliding window across your image, feeding the contents of that window into your detector. Then you shrink your main input image and repeat the process until your input image has gotten smaller than the minimum size input for your detector. There are, of course, several clever ways to parallelize the computation and speed it up, but the binary detector is really the interesting part of the process.
You may find some of the material linked from the CSE 517: Machine Learning - Syllabus helpful in getting into machine learning and understanding AdaBoost. You will certainly find the Viola-Jones paper of interest.

Resources