I'm currently working on my thesis on the neural networks. I'm using the CIFAR10 as a reference dataset. Now I would like to show some example results in my paper. The problem is, that the images in the dataset are 32x32 pixels so it's really hard to recognize something on them when printed on paper.
Is there any way to get hold of the original images with higher resolution?
UPDATE: I'm not asking for image processing algorithm, but for the original images presented in CIFAR-10. I need some higher resolution samples to put in my paper.
I now have the same problem and I just found your question.
It seems that CIFAR was built from labeling the tinyimages dataset, and are kind enough to share the indexing from CIFAR to tinyimages. Now tinyimages contain metadata file with URL of the original images and a toolbox for getting for any image you wish (e.g. those included in the CIFAR index).
So one may write a mat file which does this and share the results...
They're just small:
The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset.
You could use Google reverse image search if you're curious.
Related
I have a big database of pictures (say, 1 million 512x512px images) and I want to do the following query in a fast way:
Given a cropped image, find an image from the database that contains it.
(The closest question that I could find in StackOverflow is this one, which I address later in this post)
The following image illustrates what I'm trying to do.
I have the following restrictions:
(I) – The query must be fast. 10⁶ is a lot, so I don't think I can compare each image in the query to each of the others individually.
(II) – I need to work with cropped images, so solutions like simple image hashing won't do it (of course, this does not apply to crop-resistant hashes)
(III) – I don't know the proportion between the area of the queried image and the image that contains it. In the example above, the refrigerant is just a small portion of the original image, but the cat takes a lot of space in the image where it is contained. Although I estimate that the proportion is always between 10%~100%, I don't know the exact amount beforehand (suppose the images in queries are always 512x512px, for example)
I've gathered some information in my research:
Simple image hash matching isn't possible because of (II) (I'm working with cropped parts)
Reddit's RepostSleuthBot (available on GitHub) is an excellent starting point for me: It can identify if an image was already posted in an efficient way. Instead of simply matching hashes, seems like it uses the ANNOY algorithm to find similar images (so it can match images with slight modifications in text or brightness, for example). The only problem with this approach is that it isn't well adapted for cropped images. So, this addresses (I) but not (II) and (III).
In my StackOverflow searches, the closest thing I found to help in this problem is that if I knew the proportion between the cropped image and the original, I could match it using phase correlation, like this answer says.
This addresses (II), which is awesome, but then I'll have problems with (I) because I'd have to try to match with each image of the database, and it's also inviable because of (III).
A promising feature would be cropping-resistant image hashing - the paper Efficient Cropping-Resistant Robust Image Hashing, 10.1109/ares.2014.85 describes one, but seems like it isn't that performant, especially taking in consideration that I'm aiming at small crops (10%~100% of the original image) and a huge amount of images.
I got stuck after this point. Is there any other algorithm or method I should be aware of? Anything will be very appreciated.
I have a set of synthetically noisy images. Example is shown below:
I have also their corresponding clean text images as my ground truth data. Example below:
The dimension size of the two images is 4918 x 5856. Is it an appropriate size for training my Convolutional Neural Network that will perform image denoising. If no, what shall I do? Resize or crop? Thanks.
This resolution really is overkill. You can start off with 1/64 of the size ~(600,750), which is already pretty big.
I was facing this problem recently as well. I learned that you need to crop the image into patches, each of about 500x500. Then you need to denoise each patch and put it all together. This usually gets the most accurate results. Let me know if you need anything else!
I'm new in computer vision area and I hope you can help me with some fundamental questions regarding CNN architectures.
I know some of the most well-known ones are:
VGG Net
ResNet
Dense Net
Inception Net
Xception Net
They usually need an input of images around 224x224x3 and I also saw 32x32x3.
Regarding my specific problem, my goal is to train biomedical images with size (80x80) for a 4-class classification - at the end I'll have a dense layer of 4. Also my dataset is quite small (1000images) and I wanted to use transfer learning.
Could you please help me with the following questions? It seems to me that there is no single correct answer to them, but I need to understand what should be the correct way of thinking about them. I will appreciate if you can give me some pointers as well.
Should I scale my images? How about the opposite and shrink to 32x32 inputs?
Should I change the input of the CNNs to 80x80? What parameters should I change mainly? Any specific ratio for the kernel and the parameters?
Also I have another problem, the input requires 3 channels (RGB) but I'm working with grayscale images. Will it change the results a lot?
Instead of scaling should I just fill the surroundings (between the 80x80 and 224x224) as background? Should the images be centered in this case?
Do you have any recommendations regarding what architecture to choose?
I've seen some adaptations of these architectures to 3D/volumes inputs instead of 2D/images. I have a similar problem to the one I described here but with 3D inputs. Is there any common reasoning when choosing a 3D CNN architecture instead of a 2D?
In advances I leave my thanks!
I am assuming you basic know-how in using CNN for classification
Answering question 1~3
You scale your image for several purposes. Smaller the image, the faster the training and inference time. However you will lose important information in the process of shrinking the image. There is no one right answer and it all depends on your application. Is real-time process important? If your answer is no, always stick to the original size.
You will also need to resize your image to fit the input size of predefined models if you plan to retrain them. However, since your image is in grayscale, you will need to find models trained in gray or create a 3 channel image and copy the same value to all R,G and B channel. This is not efficient but it will help you reuse the high quality model trained by others.
The best way i see for you to handle this problem is to train everything from start. 1000 can seem to be a small number of data, but since your domain is specific and only require 4 classes, training from scratch doesnt seem that bad.
Question 4
When the size is different, always scale. filling with the surrounding will cause the model to learn the empty spaces and that is not what we want.
Also make sure the input size and format during inference is the same as the input size and format during training.
Question 5
If processing time is not a problem RESNET. If processing time is important, then MobileNet.
Question 6
6) Depends on your input data. If you have 3D data then you can use it. More input data usually helps in better classification. But 2D will be enough to solve certain problem. If you can classify the images by looking at the 2D images, most probabily 2D images will be enough to complete the task.
I hope this will clear some of your problems and direct you to a proper solution.
I'm not really new to MATLAB, just new to this whole Machine Learning thing.
I have to do a simple binary image classification. I don't care if it's a toolbox or just code, I just need to do it. I tried a couple of classification codes I found online on Github or on other sites, but most of them worked randomly and some of them worked for pre-defined images.
Those that worked on pre-defined images were neat (e.g.: http://www.di.ens.fr/willow/events/cvml2011/materials/practical-classification/), but I had issues applying on a new set of images, just because there were some .txt files (vectors of the name of the images, which was easy to replicate) and some .mat files (with both name and histogram).
I had issues creating the name and histogram in the same order, the piece of code that I use is:
for K = 1 : 4
filename = sprintf('image_%04d.jpg', K);
I = imread(filename);
IGray = rgb2gray(I);
H = hist(Igray(:), 32);
end
save('ImageDatabase.mat', 'I', 'H');
But for one reason or another, only the name and path of the last image remains stored (e.g. in this case, only image_0004 is stored in the name slot).
Another code that I found and it seemed easy was: https://github.com/rich-hart/SVM-Classifier , but the output is really random (for me) so if someone could explain to me what is happening I'd be grateful. There are 19 training images and 20 for test. Yet, if I remove one of the test images, 2 entries disappear from the Support Vector Structure?
Anyway, if you have a toolbox, or a more easy to adapt code or some explanations to the above codes, I'd be grateful.
Cheers!
EDIT:
I tried following the example of this code: http://dipwm.blogspot.ro/2013/01/svm-support-vector-machine-with-matlab.html
And even though I got 30 images of 100x100 I keep getting this error:
Error using svmtrain (line 253)
Y and TRAINING must have the same number of rows.
Error in Untitled (line 74)
SVMStruct = svmtrain(Training_Set , train_label, 'kernel_function', 'linear');
There is no way to train any classifier on raw 100x100 images, when you only have ~40 data points for training, testing and validation. So recommending a Matlab toolbox wouldn't really help your problem.
The answer is: Get more data
For completeness here are two approaches you could try:
Feature extraction
Maybe there are some very obvious features (some pictures are darker, have a white corner etc.) in your pictures, that you can extract before the training. With 3-4 features you could try training a classifier with your data set. In this case I would try fitcensemble as it is very easy to use without the inner workings of the algorithm.
Using a pre-trained classifier
You can use GoogLeNet and maybe your pictures are fitting one of the ImageNet categories. Try transfer learning if your images do not match any category.
Sometimes two image files may be different on a file level, but a human would consider them perceptively identical. Given that, now suppose you have a huge database of images, and you wish to know if a human would think some image X is present in the database or not. If all images had a perceptive hash / fingerprint, then one could hash image X and it would be a simple matter to see if it is in the database or not.
I know there is research around this issue, and some algorithms exist, but is there any tool, like a UNIX command line tool or a library I could use to compute such a hash without implementing some algorithm from scratch?
edit: relevant code from findimagedupes, using ImageMagick
try $image->Sample("160x160!");
try $image->Modulate(saturation=>-100);
try $image->Blur(radius=>3,sigma=>99);
try $image->Normalize();
try $image->Equalize();
try $image->Sample("16x16");
try $image->Threshold();
try $image->Set(magick=>'mono');
($blob) = $image->ImageToBlob();
edit: Warning! ImageMagick $image object seems to contain information about the creation time of an image file that was read in. This means that the blob you get will be different even for the same image, if it was retrieved at a different time. To make sure the fingerprint stays the same, use $image->getImageSignature() as the last step.
findimagedupes is pretty good. You can run "findimagedupes -v fingerprint images" to let it print "perceptive hash", for example.
Cross-correlation or phase correlation will tell you if the images are the same, even with noise, degradation, and horizontal or vertical offsets. Using the FFT-based methods will make it much faster than the algorithm described in the question.
The usual algorithm doesn't work for images that are not the same scale or rotation, though. You could pre-rotate or pre-scale them, but that's really processor intensive. Apparently you can also do the correlation in a log-polar space and it will be invariant to rotation, translation, and scale, but I don't know the details well enough to explain that.
MATLAB example: Registering an Image Using Normalized Cross-Correlation
Wikipedia calls this "phase correlation" and also describes making it scale- and rotation-invariant:
The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation.
Colour histogram is good for the same image that has been resized, resampled etc.
If you want to match different people's photos of the same landmark it's trickier - look at haar classifiers. Opencv is a great free library for image processing.
I don't know the algorithm behind it, but Microsoft Live Image Search just added this capability. Picasa also has the ability to identify faces in images, and groups faces that look similar. Most of the time, it's the same person.
Some machine learning technology like a support vector machine, neural network, naive Bayes classifier or Bayesian network would be best at this type of problem. I've written one each of the first three to classify handwritten digits, which is essentially image pattern recognition.
resize the image to a 1x1 pixle... if they are exact, there is a small probability they are the same picture...
now resize it to a 2x2 pixle image, if all 4 pixles are exact, there is a larger probability they are exact...
then 3x3, if all 9 pixles are exact... good chance etc.
then 4x4, if all 16 pixles are exact,... better chance.
etc...
doing it this way, you can make efficiency improvments... if the 1x1 pixel grid is off by a lot, why bother checking 2x2 grid? etc.
If you have lots of images, a color histogram could be used to get rough closeness of images before doing a full image comparison of each image against each other one (i.e. O(n^2)).
There is DPEG, "The" Duplicate Media Manager, but its code is not open. It's a very old tool - I remember using it in 2003.
You could use diff to see if they are REALLY different.. I guess it will remove lots of useless comparison. Then, for the algorithm, I would use a probabilistic approach.. what are the chances that they look the same.. I'd based that on the amount of rgb in each pixel. You could also find some other metrics such as luminosity and stuff like that.