I'm trying to input an image and get a continuous number as an output.
I built a NN which takes an image with only a single node in the Hidden layer with a linear activation function. However, the model predicts the same number for the given input.
Hence I would like to use the Inception Network for this problem. Based on a recent paper by Google.
Link: https://arxiv.org/pdf/1904.06435.pdf
x = Dense(1, activation="linear")(x)
This is absolutely possible! The example from keras documentation on pre-trained models should help you with your endeavor. Make sure to adjust the output layer and the loss of your new model.
Edit: code example for your specific case
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense
from keras import backend as K
# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a linear output layer
prediction = Dense(1, activation='linear')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=prediction)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
# compile the model (should be done *after* setting layers to non trainable)
model.compile(optimizer='rmsprop', loss='mean_squared_error')
# train the model on the new data for a few epochs
model.fit_generator(...)
This is just training the new top layers, if you like to fine-tune the lower layers as well have a look at the example from the documentation.
Related
I use the ResNet50. ResNet50 is trained for image size 224x224. Why don't they give an error when I submit tensors (images) of a different size?
import torch
from timm.models.resnet import resnet50
y_pred = model_resnet50(torch.rand(4, 3, 224, 224)) # OK
y_pred = model_resnet50(torch.rand(4, 3, 537, 537)) # Again OK. Why? The size is not the one that was trained on ResNet50
I assume that it runs in convolutions throughout the image. It creates a different number of properties for different images (after forward_features). The Global Average Pooling layer brings everything to a one-dimensional vector. Therefore, the image size only affects the number of properties in front of the Dense layer. Is it so?
What size images are better to train then?
I am trying to build a Keras model to implement to approach explained in this paper.
Context of my implementation:
I have two different kinds of data representing the same set of classes(labels) that needs to be classified. The 1st kind is Image data, and the second kind is EEG data (a time series sequence).
I know that to classify image data we can use CNN models like this:
model.add(Conv2D(filters=256, kernel_size=(11,11), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
model.add(Dense(1000))
model.add(Activation('relu'))
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())
# Output Layer
model.add(Dense(40))
model.add(Activation('softmax'))
And to classify sequence data we can use LSTM models like this:
model.add(LSTM(units = 50, return_sequences = True))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(40, activation='softmax'))
But the approach of the paper above shows that EEG feature vectors can be mapped with image vectors through regression like this:
The first approach is to train a CNN to map images to corresponding
EEG feature vectors. Typically, the first layers of CNN attempt to
learn the general (global) features of the images, which are common
between many tasks, thus we initialize the weights of these layers
using pre-trained models, and then learn the weights of the last
layers from scratch in an end-to-end setting. In particular, we used
the pre-trained AlexNet CNN, and modified it by replacing the
softmax classification layer with a regression layer (containing as
many neurons as the dimensionality of the EEG feature vectors),
using Euclidean loss as the objective function.
The second approach consists of extracting image features using
pre-trained CNN models and then employ regression methods to map
image features to EEG feature vectors. We used our fine-tuned
AlexNet as feature extractors by
reading the output of the last fully connected layer, and then
applied several regression methods (namely, k-NN regression, ridge
regression, random forest regression) to obtain the predicted
feature vectors
I am not able to comprehend how to code the above two approaches. I have never used a regressor for feature mapping and then do classification. Any leads on this are much appreciated.
In my understanding the training data consists of (eeg_signal,image,class_label) triplets.
Train the LSTM model with input=eeg_signal, output=class_label. Loss is crossentropy.
Peel off the last layer of the LSTM model. Let's say the pre-last layer's output is a vector of size 20. Let's call it eeg_representation.
Run this truncated model on all your eeg_signal inputs, save the output of eeg_representation. You will get a tensor of [batch, 20]
Take that AlexNet mentioned in the paper (or any other image classifier), peel off the last layer. Let's say the pre-last layer's output is a vector of size 30. Let's call it image_representation.
Stich a linear layer to the end of the previous layer. This layer will convert image_representation to eeg_representation. It has 20 x 30 weight.
Train the stiched model on (image, eeg_representation) pairs. Loss is the Euclidean distance.
And now the fun part: Stich together model trained in step 7. and the peeled off part of model trained in step 1. If you input an image, you will get class predictions.
This sound like not a big deal (because we do image classification all the time), but if this is really working, it means that this is a "prediction that is running through our brains" :)
Thank you bringing up this question and linking the paper.
I feel I just repeated what's in your question and in the the paper.
I would be beneficial to have some toy dataset to be able to provide code examples.
Here's a Tensorflow tutorial on how to "peel off" the last layer of a pretrained image classification model.
I've finally gotten back to working on my project and have found my next hurdle.
I have an enclosed manifold:
Also here's an example trajectory in the game im modelling
that I can have my system drive on like a normal car. I'm curious what the best way to incorporate this type of constraint would be in gekko. The manifold looks like a cube with rounded edges and corners. My current thought is to create a MLP (multii layer perceptron) to approximate the normal vector on the manifold at each point in on the surface. I tried using the GEKKO brain model to do this but it ended up being very slow so I moved to a keras model. I now have a keras model that is about 89% percent accurate connecting positions to normal vectors (which might be enough).
So my first thing is, how can I incorporate the keras model into my gekko equations? If I'm able to calculate the derivative of the neural network output at each point as well would it be possible to black box the model such that gekko puts in a position and then the black box function spits out a normal vector and this normal vectors derivative to ultimately calculate optimal trajectories?
If this is not possible, do you think I could easily model this manifold as a bspline? And what would be the way I should approach making the manifold surface a constraint for the system while it's in the driving state? My thoughts were I would take the system's current velocity vector and dot it with the normal vector of the manifold at the system's position to get how much along the manifold the velocity vector rotates. I already see some problems like for example large time steps missing curvature of the manifold and causing the system to drive off the surface of the manifold. I think the typical way of doing this math is to project the system's velocity into the "tanget space" of the manifold, derive the future state in the tanget space then map back to the manifold using a retraction. I'm still fairly new to this topic of topology and manifolds so correct me if I've made a mistake on the theory.
I don't have much code yet doing this as I'm stuck figuring out how to use the keras model in an equation. I do have a simpler problem available which is instead of driving on this complex manifold I just drive on a circle in R2. I've modelled this circle in R2 using a keras model as well. I plan to start with the simpler version if I'm able to use keras in equations before I jump into driving on the manifold in R3.
Are there any examples doing something similar to this that I could learn form?
Thank You! Excited to get back into this project.
This path planning optimization application may be better with a shooting approach where the model is a "black box" that the optimizer repeatedly calls the simulator. Some of the challenges are the changing equations when the vehicle is interacting with the ground versus in the air. If you do want to try to model both ground and air, an if3 statement would allow the switching or else use slack variables.
For the boundary constraint, maybe there is a simpler way to start modeling it such as simple inequality constraints that would form a box. You could add additional inequality constraints for the edges to model the curvature.
Below is a related application with a rocket launch that is applicable to the air dynamics. You would need to extend this to 3D.
import numpy as np
import matplotlib.pyplot as plt
from gekko import GEKKO
# create GEKKO model
m = GEKKO()
# scale 0-1 time with tf
m.time = np.linspace(0,1,101)
# options
m.options.NODES = 6
m.options.SOLVER = 3
m.options.IMODE = 6
m.options.MAX_ITER = 500
m.options.MV_TYPE = 0
m.options.DIAGLEVEL = 0
# final time
tf = m.FV(value=1.0,lb=0.1,ub=100)
tf.STATUS = 1
# force
u = m.MV(value=0,lb=-1.1,ub=1.1)
u.STATUS = 1
u.DCOST = 1e-5
# variables
s = m.Var(value=0)
v = m.Var(value=0,lb=0,ub=1.7)
mass = m.Var(value=1,lb=0.2)
# differential equations scaled by tf
m.Equation(s.dt()==tf*v)
m.Equation(mass*v.dt()==tf*(u-0.2*v**2))
m.Equation(mass.dt()==tf*(-0.01*u**2))
# specify endpoint conditions
m.fix(s, pos=len(m.time)-1,val=10.0)
m.fix(v, pos=len(m.time)-1,val=0.0)
# minimize final time
m.Obj(tf)
# Optimize launch
m.solve()
print('Optimal Solution (final time): ' + str(tf.value[0]))
# scaled time
ts = m.time * tf.value[0]
# plot results
plt.figure(1)
plt.subplot(4,1,1)
plt.plot(ts,s.value,'r-',linewidth=2)
plt.ylabel('Position')
plt.legend(['s (Position)'])
plt.subplot(4,1,2)
plt.plot(ts,v.value,'b-',linewidth=2)
plt.ylabel('Velocity')
plt.legend(['v (Velocity)'])
plt.subplot(4,1,3)
plt.plot(ts,mass.value,'k-',linewidth=2)
plt.ylabel('Mass')
plt.legend(['m (Mass)'])
plt.subplot(4,1,4)
plt.plot(ts,u.value,'g-',linewidth=2)
plt.ylabel('Force')
plt.legend(['u (Force)'])
plt.xlabel('Time')
plt.show()
Here is one more application with the landing of a reusable rocket with source files. They developed a surrogate model of the rocket dynamics to apply the model in predictive control.
This is an example of a 3D rocket application but they didn't have the complication of ground interaction with changing dynamic equations.
I have a fairly large dataset of images. They have been taken by 'x' number of photographers and each image falls into one of 'y' themes. How would I go about making a train, valid, test split if I want no photographer overlap between the splits and as minimal theme overlap as possible(i.e. theme overlap between valid and train is okay but not with test)?
Some themes are not captured by some photographers. I've tried first splitting the set by photographers and then try to combine these with minimal theme overlap but there's a lot of trial and error and I was wondering if there's a better way.
Well, you can use train_test_split function inside scikit-learn library to split your dataset into train and test. like below
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2, random_state=42)
where
X = Features and Y = Labels
Then you can use cross_validate function that randomly takes some part of the training data as validation data and train it on the algorithm passed,
like below
from sklearn.model_selection import cross_validate
cv_results = cross_validate(algorithm, x_train, y_train, cv=3)
This is how your test data and train data won't overlap
is it possible to classify more than 1000 objects using inception model in tensorflow? I want to classify more than 1000 objects with transfer learning model using TensorFlow image classification.
Popular image classificatuon models can be viewed as a convolutional feature extractor and a classifier in top. The bottom part will take your [208, 208, 3] image and turn it into a columnt of 2048 features [1,1,2048] (all numbers are just for example). After typically a softmax classifier will follow. The classifier is a fullyconnected layer that will have a single neuron for each object-class. If you have 1000 classes it will have 1000*(2048+1) parameters. Note, that only classifier depends on the number of classes.
Doing transfer learning, ine typically discards existing classifier layer and retrains it from scratch. If the feature extractor is trained as well it is called finetuning. While retraining the classifier you can choose arbitrary number of classes to be used.
In short: you are free to do transfer learning on any new number of object classes.