I am working on an image classification problem and the dataset comes in the following format :
class_1_folder : consists of images_folder and mask_folder
class_2_folder : consists of images_folder and mask_folder
I am quite new to the field and would like your advice on the use of the mask folders in general in classification tasks.
I did some research but I cannot find an understandable answer.
My thought is that I could use them to improve my classifier's metric results by helping to focus on specific parts of the image. Is this valid ? If yes, should I consider the convolution of the image with its mask before feeding the image to a classifier ?
Any help is much appreciated. Thank you.
Related
I'm new in computer vision area and I hope you can help me with some fundamental questions regarding CNN architectures.
I know some of the most well-known ones are:
VGG Net
ResNet
Dense Net
Inception Net
Xception Net
They usually need an input of images around 224x224x3 and I also saw 32x32x3.
Regarding my specific problem, my goal is to train biomedical images with size (80x80) for a 4-class classification - at the end I'll have a dense layer of 4. Also my dataset is quite small (1000images) and I wanted to use transfer learning.
Could you please help me with the following questions? It seems to me that there is no single correct answer to them, but I need to understand what should be the correct way of thinking about them. I will appreciate if you can give me some pointers as well.
Should I scale my images? How about the opposite and shrink to 32x32 inputs?
Should I change the input of the CNNs to 80x80? What parameters should I change mainly? Any specific ratio for the kernel and the parameters?
Also I have another problem, the input requires 3 channels (RGB) but I'm working with grayscale images. Will it change the results a lot?
Instead of scaling should I just fill the surroundings (between the 80x80 and 224x224) as background? Should the images be centered in this case?
Do you have any recommendations regarding what architecture to choose?
I've seen some adaptations of these architectures to 3D/volumes inputs instead of 2D/images. I have a similar problem to the one I described here but with 3D inputs. Is there any common reasoning when choosing a 3D CNN architecture instead of a 2D?
In advances I leave my thanks!
I am assuming you basic know-how in using CNN for classification
Answering question 1~3
You scale your image for several purposes. Smaller the image, the faster the training and inference time. However you will lose important information in the process of shrinking the image. There is no one right answer and it all depends on your application. Is real-time process important? If your answer is no, always stick to the original size.
You will also need to resize your image to fit the input size of predefined models if you plan to retrain them. However, since your image is in grayscale, you will need to find models trained in gray or create a 3 channel image and copy the same value to all R,G and B channel. This is not efficient but it will help you reuse the high quality model trained by others.
The best way i see for you to handle this problem is to train everything from start. 1000 can seem to be a small number of data, but since your domain is specific and only require 4 classes, training from scratch doesnt seem that bad.
Question 4
When the size is different, always scale. filling with the surrounding will cause the model to learn the empty spaces and that is not what we want.
Also make sure the input size and format during inference is the same as the input size and format during training.
Question 5
If processing time is not a problem RESNET. If processing time is important, then MobileNet.
Question 6
6) Depends on your input data. If you have 3D data then you can use it. More input data usually helps in better classification. But 2D will be enough to solve certain problem. If you can classify the images by looking at the 2D images, most probabily 2D images will be enough to complete the task.
I hope this will clear some of your problems and direct you to a proper solution.
I have an image (in very low quality):
I was wondering if there is any suggestions help me improve quality so I can read numbers plate on that image (not 100% but much as possible) by OpenCV?
Thanks in advance!
Since I am not able to comment I can suggest an answer here.
First if all, you should use a noise removal filter as median filter, followed by a laplacian filter or Canny or any edge detection filter to enhance the edges a little bit. You should choose convenient parameters for both filters. Remember, it is subjective from one person to another, so the parameters depends mainly on you.
I'm working on a small program for optical mark recognition.
The processing of the scanned form consists of two steps:
1) Find the form in the scanned image, descew and crop borders.
2) With this "normalized" form, I can simply search the marks by using coordinates from the original document and so on.
For the first step, I'm currently using the Homography functions from OpenCV and a perspecive transform to map the points. I also tried the SurfDetector.
However, both algorithms are quite slow and do not really meet the speed requierements when scanning forms from a document scanner.
Can anyone point me to an alternative algorithm/solution for this specific problem?
Thanks in advance!
Try with ORB or FAST detector: they should be faster than SURF (documentation here).
If those don't match your speed requirement you should probably use a different approach. Do you need scale and rotation invariance? If not, you could try with the cross correlation.
Viola-Jones cascade classifier is pretty quick. It is used in OpenCV for Face detection, but you can train it for different purpose. Depending on the appearance of what you call your "form", you can use simpler algorithms such as cross correlation as said by Muffo.
I would like some help from the aficionados of openCV here.
I would like to know the direction to take (and some advices or piece of code) on how to morph 2 faces together with a kind of ratio saying 10% of the first and 90% of the second.
I have seen functions like cvWarpAffine and cvMakeScanlines but I am not sure how to use them.
So if somebody could help me here, I'll be very grateful.
Thanks in advance.
Unless the images compared are the exact same images, you would not go very far with this.
This is an artificial intelligence problem and needs to be solved as such. Typical solution involves:
Normalising the data (removing noise, skew, ...) from the images
Feature extraction (turn the image into a smaller set of data)
Use a machine learning (typically classifiers) to train the data with your matches
Test the result
Refine previous processes according to the results until you get good recognition
The choice of OpenCV functions used depends on your feature extraction method. Have a look at Eigenface.
What thresholding techique should i apply for the image in order to highlight the bright regions inside the image as well as the outer boundary..
The im2bw function does not give a good result
Help!!
Edit: Most of my images have the following histogram
Edit: Found a triangle threshold method that suits my work :)
Your question isn't very easy to answer since you don't really define what a ideal solution should accomplish.
Have you tried im2bw(yourImage, 0.1); ? I.e using a threshold for what parts should be black and waht parts shouldn't. I got descent results with that (depending on what the purpose is of course). Try it and if it isn't good enough, tell us in what way you need to improve it and i will try to help with some more advanced techniques!
EDIT: Using threshold 0.1 and 0.01 respectively, perhaps something ~0.05 should be good?
It sounds like what you want to do is ''image segmentation'' (see http://en.wikipedia.org/wiki/Segmentation_(image_processing) ).
Most methods are based on the Chan-Vese model which identifies the region of interest by solving an optimization problem involving a level set function. Since you're using matlab, this code: http://www.stanford.edu/~tagoldst/Tom_Goldstein/Split_Bregman.html should do a good job of finding the regions you are interested in.