I'm new in computer vision area and I hope you can help me with some fundamental questions regarding CNN architectures.
I know some of the most well-known ones are:
VGG Net
ResNet
Dense Net
Inception Net
Xception Net
They usually need an input of images around 224x224x3 and I also saw 32x32x3.
Regarding my specific problem, my goal is to train biomedical images with size (80x80) for a 4-class classification - at the end I'll have a dense layer of 4. Also my dataset is quite small (1000images) and I wanted to use transfer learning.
Could you please help me with the following questions? It seems to me that there is no single correct answer to them, but I need to understand what should be the correct way of thinking about them. I will appreciate if you can give me some pointers as well.
Should I scale my images? How about the opposite and shrink to 32x32 inputs?
Should I change the input of the CNNs to 80x80? What parameters should I change mainly? Any specific ratio for the kernel and the parameters?
Also I have another problem, the input requires 3 channels (RGB) but I'm working with grayscale images. Will it change the results a lot?
Instead of scaling should I just fill the surroundings (between the 80x80 and 224x224) as background? Should the images be centered in this case?
Do you have any recommendations regarding what architecture to choose?
I've seen some adaptations of these architectures to 3D/volumes inputs instead of 2D/images. I have a similar problem to the one I described here but with 3D inputs. Is there any common reasoning when choosing a 3D CNN architecture instead of a 2D?
In advances I leave my thanks!
I am assuming you basic know-how in using CNN for classification
Answering question 1~3
You scale your image for several purposes. Smaller the image, the faster the training and inference time. However you will lose important information in the process of shrinking the image. There is no one right answer and it all depends on your application. Is real-time process important? If your answer is no, always stick to the original size.
You will also need to resize your image to fit the input size of predefined models if you plan to retrain them. However, since your image is in grayscale, you will need to find models trained in gray or create a 3 channel image and copy the same value to all R,G and B channel. This is not efficient but it will help you reuse the high quality model trained by others.
The best way i see for you to handle this problem is to train everything from start. 1000 can seem to be a small number of data, but since your domain is specific and only require 4 classes, training from scratch doesnt seem that bad.
Question 4
When the size is different, always scale. filling with the surrounding will cause the model to learn the empty spaces and that is not what we want.
Also make sure the input size and format during inference is the same as the input size and format during training.
Question 5
If processing time is not a problem RESNET. If processing time is important, then MobileNet.
Question 6
6) Depends on your input data. If you have 3D data then you can use it. More input data usually helps in better classification. But 2D will be enough to solve certain problem. If you can classify the images by looking at the 2D images, most probabily 2D images will be enough to complete the task.
I hope this will clear some of your problems and direct you to a proper solution.
I am studying neural networks, or more specifically image classification at the moment. While I was reading, I was wondering if the following has ever been done/ is doable. If anybody could point me to some sources or ideas, I'd appreciate it!
In a traditional neural network, you have a training data set of images and the weights of the neurons in the network. The goal is to optimize the weights so that the classification of the images is accurate for training data and new images is as good as possible.
I was wondering if you could reverse this:
Given a neural network and the weights to its neurons, generate a set of images corresponding to the classes that the network separates, i.e., a proto-type of the kinds of images this specific network is able to classify well.
In my mind it would work as follows (I'm sure this is not quite achievable, but just to get the idea across):
Imagine a neural network that is able to classify images containing labels cat, dog and neither of those.
What I want is the "inverse", i.e. an image of a cat, an image of a dog and one that is "furthest away" from the other two classes.
I think that this could be done by generating images and minimizing the loss function for one specific class, while maximizing it for all other classes at the same time.
Is this kind of how Google Deep Dream visualizes what it is "dreaming"?
I hope it is clear what I mean, if not I will answer any questions.
Is this kind of how Google Deep Dream visualizes what it is "dreaming"?
Pretty much, it seems, at least that's how the people behind it explain it:
One way to visualize what goes on [in a neural network layer] is to turn the network upside down and ask it to enhance an input image in such a way as to elicit a particular interpretation. Say you want to know what sort of image would result in “Banana.” Start with an image full of random noise, then gradually tweak the image towards what the neural net considers a banana (see related work [...]). By itself, that doesn’t work very well, but it does if we impose a prior constraint that the image should have similar statistics to natural images, such as neighboring pixels needing to be correlated.
Source - The whole blog post is worth reading.
I think you can understand the mainstream approach from Karpathy's blog:
http://karpathy.github.io/2015/03/30/breaking-convnets/
Normal ConvNet training: "What happens to the score of the correct class when I wiggle this parameter?"
Creating fooling images: "What happens to the score of (whatever class you want) when I wiggle this pixel?"
Fooling the classifier with an image is very close to what you ask. For your goal you need to add some regularization to your loss function to avoid fully misleading results - the absolute minimum loss can be very distorted picture.
First of all, thanks for reading my question. I'm beginner in computer vision.
I read a lot but I didn't find any solution.
I have an image and I want to detect logo/logos on it.
Also, I have a whole of images with different logos, all image containing a logo on it and nothing more.
Can you help me with any idea of how to detect logo/logos on an image when I have a whole (thousands) of training sets (known logos set)?
It can be done by using the SURF or SIFT feature detection algorithm for few known logos, by matching the given image with all of the others but I have a huge dataset, and I can't match with all other images.
To try all images in the dataset takes toooooo much time :)
Can be useful any SDK? (it can be even for mobile phones or for desktop also).
Or can I use some multiple algorithms for it?
I found an interesting paper about this question with a SIGMA algorithm, but I can't find any description for these algorithms (http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5495345).
I think to detect the features on the images is OK (SIFT, maybe SURF).
But I think the problem is with the big number of known images/logos.
I think it should be stored in a special way.
Ex. made a tree somehow from the thousand of known logos, or to separate them in groups.
Is it possible to do this task?
I appreciate any help.
The thousands of training sets is useful only to test your algorithm, it will not help to analyze a new image.
I made a bit of pattern recognition in the past, I would start this way: look for sharp edges (sharp color transitions too). So an edge filter and statistical analysis about features all located in the same corner. The result of the algorithm will be a number that you will use with your training set.
Since you are doing original reserch be prepared for a long work. If a SDK with a function "ImageHasLogo()" exists yet, you will find it on Google.
Recently I've been messing about with algorithms on images, partly for fun and partly to keep my programming skills sharp.
I've just implemented a 'nearest-neighbour' algorithm that picks n random pixels in an image, and then converts the colour of each other pixel in the image to the colour of its nearest neighbour in the set of n chosen pixels. The result is a kind of "frosted glass" effect on the image, for a reasonably large value of n (if n is too small then the image gets blocky).
I'm just wondering if anyone has any other good/fun algorithms on images that might be interesting to implement?
Tom
This book, Digital Image Processing, is one of the most commonly used books in image processing classes, and it will teach you a lot of basic techniques that will help you understand other algorithms better, like the ones Ants Aasma suggested.
Try making an Andy Warhol print. It's pretty easy in Java. For more ideas, just look at the filters available in GIMP or a similar program.
Marching Squares is a computer vision algorithm. Try using that to convert black and white raster images to object based scenes.
Turns the image into a pizza
Take N images, relate them via an MC-Escher-style painting
"Explode" an image from the inside out
Convert the image into a single-color blocks (piet-style) based on all the colours within.
How about tie-dye algorithm?
Fun to toy with and easy to code filters are:
kaleidoscope
lens
twirl
There are a lot of other filters, but especially the kaleidoscope gives much bang for the bucks. I have made my own graphics editor with lots of filters and is also looking for inspiration.
Instead of coding image filters, I personally would love to code Diffusion Curves, but unfortunately have little time for fun.
If you want to try something more challenging look for SIGGRAPH papers on the web. There are some really nifty image algorithms presented at that conference. Seam carving is one cool example that is reasonably straightforward to implement.
If you want something more challenging try to complete the symmetry of broken objects
I am building a web application using .NET 3.5 (ASP.NET, SQL Server, C#, WCF, WF, etc) and I have run into a major design dilemma. This is a uni project btw, but it is 100% up to me what I develop.
I need to design a system whereby I can take an image and automatically crop a certain object within it, without user input. So for example, cut out the car in a picture of a road. I've given this a lot of thought, and I can't see any feasible method. I guess this thread is to discuss the issues and feasibility of achieving this goal. Eventually, I would get the dimensions of a car (or whatever it may be), and then pass this into a 3d modelling app (custom) as parameters, to render a 3d model. This last step is a lot more feasible. It's the cropping issue which is an issue. I have thought of all sorts of ideas, like getting the colour of the car and then the outline around that colour. So if the car (example) is yellow, when there is a yellow pixel in the image, trace around it. But this would fail if there are two yellow cars in a photo.
Ideally, I would like the system to be completely automated. But I guess I can't have everything my way. Also, my skills are in what I mentioned above (.NET 3.5, SQL Server, AJAX, web design) as opposed to C++ but I would be open to any solution just to see the feasibility.
I also found this patent: US Patent 7034848 - System and method for automatically cropping graphical images
Thanks
This is one of the problems that needed to be solved to finish the DARPA Grand Challenge. Google video has a great presentation by the project lead from the winning team, where he talks about how they went about their solution, and how some of the other teams approached it. The relevant portion starts around 19:30 of the video, but it's a great talk, and the whole thing is worth a watch. Hopefully it gives you a good starting point for solving your problem.
What you are talking about is an open research problem, or even several research problems. One way to tackle this, is by image segmentation. If you can safely assume that there is one object of interest in the image, you can try a figure-ground segmentation algorithm. There are many such algorithms, and none of them are perfect. They usually output a segmentation mask: a binary image where the figure is white and the background is black. You would then find the bounding box of the figure, and use it to crop. The thing to remember is that none of the existing segmentation algorithm will give you what you want 100% of the time.
Alternatively, if you know ahead of time what specific type of object you need to crop (car, person, motorcycle), then you can try an object detection algorithm. Once again, there are many, and none of them are perfect either. On the other hand, some of them may work better than segmentation if your object of interest is on very cluttered background.
To summarize, if you wish to pursue this, you would have to read a fair number of computer vision papers, and try a fair number of different algorithms. You will also increase your chances of success if you constrain your problem domain as much as possible: for example restrict yourself to a small number of object categories, assume there is only one object of interest in an image, or restrict yourself to a certain type of scenes (nature, sea, etc.). Also keep in mind, that even the accuracy of state-of-the-art approaches to solving this type of problems has a lot of room for improvement.
And by the way, the choice of language or platform for this project is by far the least difficult part.
A method often used for face detection in images is through the use of a Haar classifier cascade. A classifier cascade can be trained to detect any objects, not just faces, but the ability of the classifier is highly dependent on the quality of the training data.
This paper by Viola and Jones explains how it works and how it can be optimised.
Although it is C++ you might want to take a look at the image processing libraries provided by the OpenCV project which include code to both train and use Haar cascades. You will need a set of car and non-car images to train a system!
Some of the best attempts I've see of this is using a large database of images to help understand the image you have. These days you have flickr, which is not only a giant corpus of images, but it's also tagged with meta-information about what the image is.
Some projects that do this are documented here:
http://blogs.zdnet.com/emergingtech/?p=629
Start with analyzing the images yourself. That way you can formulate the criteria on which to match the car. And you get to define what you cannot match.
If all cars have the same background, for example, it need not be that complex. But your example states a car on a street. There may be parked cars. Should they be recognized?
If you have access to MatLab, you could test your pattern recognition filters with specialized software like PRTools.
Wwhen I was studying (a long time ago:) I used Khoros Cantata and found that an edge filter can simplify the image greatly.
But again, first define the conditions on the input. If you don't do that you will not succeed because pattern recognition is really hard (think about how long it took to crack captcha's)
I did say photo, so this could be a black car with a black background. I did think of specifying the colour of the object, and then when that colour is found, trace around it (high level explanation). But, with a black object in a black background (no constrast in other words), it would be a very difficult task.
Better still, I've come across several sites with 3d models of cars. I could always use this, stick it into a 3d model, and render it.
A 3D model would be easier to work with, a real world photo much harder. It does suck :(
If I'm reading this right... This is where AI shines.
I think the "simplest" solution would be to use a neural-network based image recognition algorithm. Unless you know that the car will look the exact same in each picture, then that's pretty much the only way.
If it IS the exact same, then you can just search for the pixel pattern, and get the bounding rectangle, and just set the image border to the inner boundary of the rectangle.
I think that you will never get good results without a real user telling the program what to do. Think of it this way: how should your program decide when there is more than 1 interesting object present (for example: 2 cars)? what if the object you want is actually the mountain in the background? what if nothing of interest is inside the picture, thus nothing to select as the object to crop out? etc, etc...
With that said, if you can make assumptions like: only 1 object will be present, then you can have a go with using image recognition algorithms.
Now that I think of it. I recently got a lecture about artificial intelligence in robots and in robotic research techniques. Their research went on about language interaction, evolution, and language recognition. But in order to do that they also needed some simple image recognition algorithms to process the perceived environment. One of the tricks they used was to make a 3D plot of the image where x and y where the normal x and y axis and the z axis was the brightness of that particular point, then they used the same technique for red-green values, and blue-yellow. And lo and behold they had something (relatively) easy they could use to pick out the objects from the perceived environment.
(I'm terribly sorry, but I can't find a link to the nice charts they had that showed how it all worked).
Anyway, the point is that they were not interested (that much) in image recognition so they created something that worked good enough and used something less advanced and thus less time consuming, so it is possible to create something simple for this complex task.
Also any good image editing program has some kind of magic wand that will select, with the right amount of tweaking, the object of interest you point it on, maybe it's worth your time to look into that as well.
So, it basically will mean that you:
have to make some assumptions, otherwise it will fail terribly
will probably best be served with techniques from AI, and more specifically image recognition
can take a look at paint.NET and their algorithm for their magic wand
try to use the fact that a good photo will have the object of interest somewhere in the middle of the image
.. but i'm not saying that this is the solution for your problem, maybe something simpler can be used.
Oh, and I will continue to look for those links, they hold some really valuable information about this topic, but I can't promise anything.