Can we use Genetic Algorithm for selecting the optimal network model and parameter?
For example, can we use it to decide which will give the optimal performance among the following network structure and configuration:
Network Types:
-Convolutional Neural Network with auto encoder/decoder variants
-Deep Neural Networks
Network Parameters:
-Learning Rate
-Momentum
-Number of hidden layers
-Number of hidden nodes
I am not sure how widely this approach is used anymore, but to answer your question - Yes. It is possible to use a genetic algorithm to decide which network architecture is best suitable for a given problem.
You may want to look at this paper. It talks about how you can use genetic algorithms to come up with the best network architecture.
I am a beginner in convolution networks. I use digits to implement them and facing with few doubts.
While trying out a basic classification problem of images, how do we decide on the number of layers - how many conv layers/ fully connected layer, etc.
In digits we have 3 standard papers implemented, for a particular dataset is there any way to find out which architecture to use – or when should we use our own architecture.
How can the hidden layers be helpful in solving the problems – i.e. what possible decisions can we take by looking at the results in the hidden layer
Deciding on how many layers or neurons is needed or the best architecture for building neural network was never clear or possible. the main procedure was taken before is to try building on some parameters and then measure the performance on training set and testing set not bias or to over fit the data and decide on the best parameters, or try some other algorithm like genetic algorithm.
conclusion either you start from scratch every time to measure the network performance or apply other algorithms which doesn't need to start from scratch and can build incrementally by applying transfer learning and fine tuning on the network architecture.
The core philosophy that makes deep learning so democratic and amazing is simple "Don't be a Hero".
What it means is that in most cases the best deep learning models take millions of data points and weeks to train, something most of us cannot achieve with our low performance PC's (yes a single GPU system is low performance). So why would you want to waste your time in building and training NN architectures. Simple you don't.
Transfer learning is your solution!! try to find models that are trained on data similar to your problem and use their pre-trained weights to fine tune your data set. Doing this not only do you get an already proven NN architecture but also a major head start in training.
The best place to find pre-trained models is the caffe model zoo so go have a look at it.
What kind of filter should I use to extract feature maps in convolutional NN?
I been reading about convolutional NN recently and I understood that we use a set of filters to generate a set of feature maps in each convolution layer by convoluting those filters over outputs from the previous layer.
1)How do we get these filters?
2)Do we pick filters randomly and do some 'trial and error'?
3)How do we find perfect filters for our project?
Thank you.
1) You don't directly get it, you let the network get it by showing some examples (training data)
2) We do initialise the filters randomly so that they can learn something useful and different to each.
3) By providing lots of data that are relevant to your project.
Neural networks are trained using backpropagation algorithm, you can get familiar with it here. Simply saying it computes the derivatives of the weights(filter values in your case) with respect to the loss function which measures the perfomance of the network. The backpropagation algorithm is a bit ore complicated for convolutional neural networks, but it's still intuitive.
I'm learning about CNN's (Convolutional Neural Networks) and the wiki page says, "Convolutional networks were inspired by ... and are variations of multilayer perceptrons which are designed to use minimal amounts of preprocessing."
So could one say that a characteristic of the variations of multilayer perceptrons leads to minimal amounts of preprocessing?
If so, how much resources does preprocessing usually take up?
Does using this technique of data mining introduce a new model for minimizing preprocessing?
I realize that this is probably a very niche question, but has anyone had experience with working with continuous neural networks? I'm specifically interested in what a continuous neural network may be useful for vs what you normally use discrete neural networks for.
For clarity I will clear up what I mean by continuous neural network as I suppose it can be interpreted to mean different things. I do not mean that the activation function is continuous. Rather I allude to the idea of a increasing the number of neurons in the hidden layer to an infinite amount.
So for clarity, here is the architecture of your typical discreet NN:
(source: garamatt at sites.google.com)
The x are the input, the g is the activation of the hidden layer, the v are the weights of the hidden layer, the w are the weights of the output layer, the b is the bias and apparently the output layer has a linear activation (namely none.)
The difference between a discrete NN and a continuous NN is depicted by this figure:
(source: garamatt at sites.google.com)
That is you let the number of hidden neurons become infinite so that your final output is an integral. In practice this means that instead of computing a deterministic sum you instead must approximate the corresponding integral with quadrature.
Apparently its a common misconception with neural networks that too many hidden neurons produces over-fitting.
My question is specifically, given this definition of discrete and continuous neural networks, I was wondering if anyone had experience working with the latter and what sort of things they used them for.
Further description on the topic can be found here:
http://www.iro.umontreal.ca/~lisa/seminaires/18-04-2006.pdf
I think this is either only of interest to theoreticians trying to prove that no function is beyond the approximation power of the NN architecture, or it may be a proposition on a method of constructing a piecewise linear approximation (via backpropagation) of a function. If it's the latter, I think there are existing methods that are much faster, less susceptible to local minima, and less prone to overfitting than backpropagation.
My understanding of NN is that the connections and neurons contain a compressed representation of the data it's trained on. The key is that you have a large dataset that requires more memory than the "general lesson" that is salient throughout each example. The NN is supposedly the economical container that will distill this general lesson from that huge corpus.
If your NN has enough hidden units to densely sample the original function, this is equivalent to saying your NN is large enough to memorize the training corpus (as opposed to generalizing from it). Think of the training corpus as also a sample of the original function at a given resolution. If the NN has enough neurons to sample the function at an even higher resolution than your training corpus, then there is simply no pressure for the system to generalize because it's not constrained by the number of neurons to do so.
Since no generalization is induced nor required, you might as well just memorize the corpus by storing all of your training data in memory and use k-nearest neighbor, which will always perform better than any NN, and will always perform as well as any NN even as the NN's sampling resolution approaches infinity.
The term hasn't quite caught on in the machine learning literature, which explains all the confusion. It seems like this was a one off paper, an interesting one at that, but it hasn't really led to anything, which may mean several things; the author may have simply lost interest.
I know that Bayesian neural networks (with countably many hidden units, the 'continuous neural networks' paper extends to the uncountable case) were successfully employed by Radford Neal (see his thesis all about this stuff) to win the NIPS 2003 Feature Selection Challenge using Bayesian neural networks.
In the past I've worked on a few research projects using continuous NN's. Activation was done using a bipolar hyperbolic tan, the network took several hundred floating point inputs and output around one hundred floating point values.
In this particular case the aim of the network was to learn the dynamic equations of a mineral train. The network was given the current state of the train and predicted speed, inter-wagon dynamics and other train behaviour 50 seconds into the future.
The rationale for this particular project was mainly about performance. This was being targeted for an embedded device and evaluating the NN was much more performance friendly then solving a traditional ODE (ordinary differential equation) system.
In general a continuous NN should be able to learn any kind of function. This is particularly useful when its impossible/extremely difficult to solve a system using deterministic methods. As opposed to binary networks which are often used for pattern recognition/classification purposes.
Given their non-deterministic nature NN's of any kind are touchy beasts, choosing the right kinds of inputs/network architecture can be somewhat a black art.
Feed forward neural networks are always "continuous" -- it's the only way that backpropagation learning actually works (you can't backpropagate through a discrete/step function because it's non-differentiable at the bias threshold).
You might have a discrete (e.g. "one-hot") encoding of the input or target output, but all of the computation is continuous-valued. The output may be constrained (i.e. with a softmax output layer such that the outputs always sum to one, as is common in a classification setting) but again, still continuous.
If you mean a network that predicts a continuous, unconstrained target -- think of any prediction problem where the "correct answer" isn't discrete, and a linear regression model won't suffice. Recurrent neural networks have at various times been a fashionable method for various financial prediction applications, for example.
Continuous neural networks are not known to be universal approximators (in the sense of density in $L^p$ or $C(\mathbb{R})$ for the topology of uniform convergence on compacts, i.e.: as in the universal approximation theorem) but only universal interpolators in the sense of this paper:
https://arxiv.org/abs/1908.07838