the dataset in the model I trained for facial emotion recognition contains 48x48 images. The images I want to make predictions are mostly much smaller (20x15, 18x12 etc.) and I think the predictions will be wrong as they are much less detailed than the images in the dataset where I train the model. Do you think I should eliminate pictures whose size is less than a certain level? What should my criteria be here?
Related
I am trying to build a Keras model to implement to approach explained in this paper.
Context of my implementation:
I have two different kinds of data representing the same set of classes(labels) that needs to be classified. The 1st kind is Image data, and the second kind is EEG data (a time series sequence).
I know that to classify image data we can use CNN models like this:
model.add(Conv2D(filters=256, kernel_size=(11,11), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
model.add(Dense(1000))
model.add(Activation('relu'))
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())
# Output Layer
model.add(Dense(40))
model.add(Activation('softmax'))
And to classify sequence data we can use LSTM models like this:
model.add(LSTM(units = 50, return_sequences = True))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(40, activation='softmax'))
But the approach of the paper above shows that EEG feature vectors can be mapped with image vectors through regression like this:
The first approach is to train a CNN to map images to corresponding
EEG feature vectors. Typically, the first layers of CNN attempt to
learn the general (global) features of the images, which are common
between many tasks, thus we initialize the weights of these layers
using pre-trained models, and then learn the weights of the last
layers from scratch in an end-to-end setting. In particular, we used
the pre-trained AlexNet CNN, and modified it by replacing the
softmax classification layer with a regression layer (containing as
many neurons as the dimensionality of the EEG feature vectors),
using Euclidean loss as the objective function.
The second approach consists of extracting image features using
pre-trained CNN models and then employ regression methods to map
image features to EEG feature vectors. We used our fine-tuned
AlexNet as feature extractors by
reading the output of the last fully connected layer, and then
applied several regression methods (namely, k-NN regression, ridge
regression, random forest regression) to obtain the predicted
feature vectors
I am not able to comprehend how to code the above two approaches. I have never used a regressor for feature mapping and then do classification. Any leads on this are much appreciated.
In my understanding the training data consists of (eeg_signal,image,class_label) triplets.
Train the LSTM model with input=eeg_signal, output=class_label. Loss is crossentropy.
Peel off the last layer of the LSTM model. Let's say the pre-last layer's output is a vector of size 20. Let's call it eeg_representation.
Run this truncated model on all your eeg_signal inputs, save the output of eeg_representation. You will get a tensor of [batch, 20]
Take that AlexNet mentioned in the paper (or any other image classifier), peel off the last layer. Let's say the pre-last layer's output is a vector of size 30. Let's call it image_representation.
Stich a linear layer to the end of the previous layer. This layer will convert image_representation to eeg_representation. It has 20 x 30 weight.
Train the stiched model on (image, eeg_representation) pairs. Loss is the Euclidean distance.
And now the fun part: Stich together model trained in step 7. and the peeled off part of model trained in step 1. If you input an image, you will get class predictions.
This sound like not a big deal (because we do image classification all the time), but if this is really working, it means that this is a "prediction that is running through our brains" :)
Thank you bringing up this question and linking the paper.
I feel I just repeated what's in your question and in the the paper.
I would be beneficial to have some toy dataset to be able to provide code examples.
Here's a Tensorflow tutorial on how to "peel off" the last layer of a pretrained image classification model.
I'm training a classifier that's supposed to be tested on underwater images. I'm wondering if feeding the model drawings of a certain class plus real images can affect the results. Was there a study on this? Or are there any past experiences anyone could share to help?
To give a bit of a context: I'm fairly new to machine learning, I've read and seen some educational videos on how CNN works.
I've tried two models so far, a random person's CNN model and the Google's Inception v3 model. I could understand that random's person CNN model and what's happening in there. What I don't understand is how to make it work with different output sizes that are not just a different scale or rotation. Let me just explain what I'm doing:
I basically want to be able to classify a picture (containing a logo) as a brand. For example, you give me a picture that contains the Starbucks logo and our model will tell you it's Starbucks. There is going to be only one logo in every picture (for my case). First try was with the inception model: tried with 20,000 iterations with 2,000 Starbucks receipt pictures, 2,000 Walmart receipt pictures and 2,000 random pictures that were not related to Starbucks or Walmart so I could also classify the picture as 'Neither'. Got 88% accuracy, not good enough and the cross entropy doesn't drop to lower than 0.4 then I tried cropping the logo from those picture and tried again. this time, on cropped pictures it would work like a charm but on bigger pictures containing the starbucks logo, or walmart for that matter, it would fail miserably.
Same thing with the DeepLogo's way: https://github.com/satojkovic/DeepLogo
It works well with the 32 x 32 picture but once I change the input size, it fails.
How can I overcome this?
EDIT: I'm using this for retraining on top of the Inception model: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/image_retraining
Pooling layer?
From my understanding, pooling layer improves the statistical efficiency and also translation invariance. And most important, in your case, it can be used in various size of images.
Maybe you could do some research on that. The book "Deep Learning" from Goodfellow would be my recommendation.
I'm working in Diabetic Retinopathy Detection problem where I've been given Retina images [image1] with score labels. My job is to build a classification model that can detect and score retinopathy given unlabeled retina images.
First step that I'm currently working on is extracting features from these images and build input vector that I'll use as an input for my classification algorithm. I've basic knowledge in image processing and I've tried to crop my images to edges [Image2], turn it to Gray scale and get its histogram as an input vector but it seems that I still have a large representation for an image. In addition to that I may have lost some essential features that was encoded into the RGB image.
Image1:
Image2:
pre-processing medical images is not a trivial task, for the performance improvement of diabetic retinopathy you need to highlight the blood vessels, there are several pre-processing suitable for this, I am sending a link that may be useful
https://github.com/tfs4/IDRID_hierarchical_combination/blob/master/preprocess.py
I have a hard problem to solve which is about automatic image keywording. You can assume that I have a database with 100000+ keyworded low quality jpeg images for training (low quality = low resolution about 300x300px + low compression ratio). Each image has about 40 mostly accurate keywords (data may contain slight "noise"). I can also extract some data on keyword correlations.
Given a color image and a keyword, I want to determine the probability that the keyword is related to this image.
I need a creative understandable solution which I could implement on my own in about a month or less (I plan to use python). What I found so far is machine learning, neural networks and genetic algorithms. I was also thinking about generating some kind of signatures for each keyword which I could then use to check against not yet seen images.
Crazy/novel ideas are appreciated as well if they are practicable. I'm also open to using other python libraries.
My current algorithm is extremely complex and computationally heavy. It suggests keywords instead of calculating probability and 50% of suggested keywords are not accurate.
Given the hard requirements of the application, only gross and brainless solutions can be proposed.
For every image, use some segmentation method and keep, say, the four largest segments. Distinguish one or two of them as being background (those extending to the image borders), and the others as foreground, or item of interest.
Characterize the segments in terms of dominant color (using a very rough classification based on color primaries), and in terms of shape (size relative to the image, circularity, number of holes, dominant orientation and a few others).
Then for every keyword you can build a classifier that decides if a given image has/hasn't this keyword. After training, the classifiers will tell you if the image has/hasn't the keyword(s). If you use a fuzzy classification, you get a "probability".