Can we use Genetic Algorithm for selecting the optimal Network Model and Parameters? - genetic-algorithm

Can we use Genetic Algorithm for selecting the optimal network model and parameter?
For example, can we use it to decide which will give the optimal performance among the following network structure and configuration:
Network Types:
-Convolutional Neural Network with auto encoder/decoder variants
-Deep Neural Networks
Network Parameters:
-Learning Rate
-Momentum
-Number of hidden layers
-Number of hidden nodes

I am not sure how widely this approach is used anymore, but to answer your question - Yes. It is possible to use a genetic algorithm to decide which network architecture is best suitable for a given problem.
You may want to look at this paper. It talks about how you can use genetic algorithms to come up with the best network architecture.

Related

Temporal Difference (Reinforcement Learning) by h2o

I am wondering to know if h2o is capable of implementing Temporal Difference (reinforcement learning)?
I know TensorFlow has the capability.
H2O's list of available algorithms can be found here. The short answer to your question is no. Details on the Deep Learning algorithm that is available here and I will repost for your convenience:
H2O-3's Deep Learning algorithm is based on a multi-layer feedforward artificial neural network that is trained with stochastic gradient descent using back-propagation. The network can contain a large number of hidden layers consisting of neurons with tanh, rectifier, and maxout activation functions. Advanced features such as adaptive learning rate, rate annealing, momentum training, dropout, L1 or L2 regularization, checkpointing, and grid search enable high predictive accuracy. Each compute node trains a copy of the global model parameters on its local data with multi-threading (asynchronously) and contributes periodically to the global model via model averaging across the network.
A feedforward artificial neural network (ANN) model, also known as deep neural network (DNN) or multi-layer perceptron (MLP), is the most common type of Deep Neural Network and the only type that is supported natively in H2O-3. Several other types of DNNs are popular as well, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). MLPs work well on transactional (tabular) data; however if you have image data, then CNNs are a great choice. If you have sequential data (e.g. text, audio, time-series), then RNNs are a good choice.

Community Detection in complete and weighted networks

I do have a complete network graph where every vertex is connected with each other and they only differ in form of their different weights. A example network would be: a trade network, where every country is connected with each other somehow and only differ in form of different trading volumina.
Now the question is how I could perform a community detection in that form of network. The usual suspects (algorithm) are only able to perform in either unweighted or incomplete networks well. The main problem is that the geodesic is everywhere the same.
Two option came into my mind:
Cut the network into smaller pieces by cutting them at a certain "weight-threshold-level"
Or use a hierarchical cluster algorithm to turn the whole network into a blockmodel. But I think the problem "no variance in geodesic terms" will remain.
Several methods were suggested.
One simple yet effective method was suggested in Fast unfolding of communities in large networks (Blondel et al., 2008). It supports weighted networks. Quoting from the abstract:
We propose a simple method to extract the community structure of large
networks. Our method is a heuristic method that is based on modularity
optimization. It is shown to outperform all other known community
detection method in terms of computation time. Moreover, the quality
of the communities detected is very good, as measured by the so-called
modularity.
Quoting from the paper:
We now introduce our algorithm that finds high modularity partitions
of large networks in short time and that unfolds a complete
hierarchical community structure for the network, thereby giving
access to different resolutions of community detection.
So it supposed to work well for complete graph, but you should better check it.
A C++ implementation is available here (now maintained here).
Your other idea - using weight-threshold - may prove as a good pre-processing step, especially for algorithms which won't partition complete graphs. I believe it is best to set it to some percentile (e.g. to the median) of the weights.

Is tabu search a learning algorithm? (CVRP)

I've been set the assignment of producing a solution for the capacitated vehicle routing problem using any algorithm that learns. From my brief search of the literature, tabu search variants seem to be the most successful. Can they be classed as learning algorithms though or are they just variants on local search?
Search methods are not "learning". Learning, in cotenxt of computer science is a term for learning machines - which improve their quality over training (experience). Metaheuristics, which simply search through some space do not "learn", they simply browse all possible solutions (in heuristically guided manner) in order to optimize some function. In other words - optimization techniques are used to train some models, but these optimizers themselves don't "learn". Although this is purely linguistic manner, but I would distinguish between methods that learns - in the sense - are trying to generalize knowledge from some set of examples, from algorithms which simply are searching for best parameters for arbitrary given function. The core idea of machine learning (which distinguishes it from optimization itself) is that the aim is to actually maximize the quality of our model on unknown data, while in optimization (and in particular tabu search) we are simply looking for the best quality on exactly known, and well defined data (function).

Matching two Social Media Profiles

How can you check if two profiles from two different Social Media sites are the same?
What algorithms exist to accomplish this and thereby assigning a weight measure for the match?
Let's say that I have a profile from LinkedIn and another profile from Facebook. I know the properties of these two profiles. What algorithm can I implement to find the matching distance between these two profile.
Thanks
Abhishek S
You can try machine learning algorithms, specifically classification
For simplicity, let's assume you want a binary answer: yes or not (this can be later improved).
What you have to do:
Extract the features you have from the two profile and create a
single instance for two combined profiles. This will be an instance
needed to be classified
Create a training set. A training set is a set of "instances" which you know the classification for (from manually labeling them usually).
Run a classification algorithm, given the training set - that will "guess" the classification for the unclassified instances you will later get.
Some algorithms you might want to use are:
SVM - which is considered by many the best classification algorithm exists today.
Decision Trees - especially C4.5 - Very intuitive classifier (human readable!) and simple to use, also - very short classification time.
K Nearest Neighbor - intuitive and simple to use, but behaves badly when the number of features is big.
You can also use cross validation to evaluate how good your results are.
For java - there is an open source project called Weka that implement these classification algorithms and more.

Continuous vs Discrete artificial neural networks

I realize that this is probably a very niche question, but has anyone had experience with working with continuous neural networks? I'm specifically interested in what a continuous neural network may be useful for vs what you normally use discrete neural networks for.
For clarity I will clear up what I mean by continuous neural network as I suppose it can be interpreted to mean different things. I do not mean that the activation function is continuous. Rather I allude to the idea of a increasing the number of neurons in the hidden layer to an infinite amount.
So for clarity, here is the architecture of your typical discreet NN:
(source: garamatt at sites.google.com)
The x are the input, the g is the activation of the hidden layer, the v are the weights of the hidden layer, the w are the weights of the output layer, the b is the bias and apparently the output layer has a linear activation (namely none.)
The difference between a discrete NN and a continuous NN is depicted by this figure:
(source: garamatt at sites.google.com)
That is you let the number of hidden neurons become infinite so that your final output is an integral. In practice this means that instead of computing a deterministic sum you instead must approximate the corresponding integral with quadrature.
Apparently its a common misconception with neural networks that too many hidden neurons produces over-fitting.
My question is specifically, given this definition of discrete and continuous neural networks, I was wondering if anyone had experience working with the latter and what sort of things they used them for.
Further description on the topic can be found here:
http://www.iro.umontreal.ca/~lisa/seminaires/18-04-2006.pdf
I think this is either only of interest to theoreticians trying to prove that no function is beyond the approximation power of the NN architecture, or it may be a proposition on a method of constructing a piecewise linear approximation (via backpropagation) of a function. If it's the latter, I think there are existing methods that are much faster, less susceptible to local minima, and less prone to overfitting than backpropagation.
My understanding of NN is that the connections and neurons contain a compressed representation of the data it's trained on. The key is that you have a large dataset that requires more memory than the "general lesson" that is salient throughout each example. The NN is supposedly the economical container that will distill this general lesson from that huge corpus.
If your NN has enough hidden units to densely sample the original function, this is equivalent to saying your NN is large enough to memorize the training corpus (as opposed to generalizing from it). Think of the training corpus as also a sample of the original function at a given resolution. If the NN has enough neurons to sample the function at an even higher resolution than your training corpus, then there is simply no pressure for the system to generalize because it's not constrained by the number of neurons to do so.
Since no generalization is induced nor required, you might as well just memorize the corpus by storing all of your training data in memory and use k-nearest neighbor, which will always perform better than any NN, and will always perform as well as any NN even as the NN's sampling resolution approaches infinity.
The term hasn't quite caught on in the machine learning literature, which explains all the confusion. It seems like this was a one off paper, an interesting one at that, but it hasn't really led to anything, which may mean several things; the author may have simply lost interest.
I know that Bayesian neural networks (with countably many hidden units, the 'continuous neural networks' paper extends to the uncountable case) were successfully employed by Radford Neal (see his thesis all about this stuff) to win the NIPS 2003 Feature Selection Challenge using Bayesian neural networks.
In the past I've worked on a few research projects using continuous NN's. Activation was done using a bipolar hyperbolic tan, the network took several hundred floating point inputs and output around one hundred floating point values.
In this particular case the aim of the network was to learn the dynamic equations of a mineral train. The network was given the current state of the train and predicted speed, inter-wagon dynamics and other train behaviour 50 seconds into the future.
The rationale for this particular project was mainly about performance. This was being targeted for an embedded device and evaluating the NN was much more performance friendly then solving a traditional ODE (ordinary differential equation) system.
In general a continuous NN should be able to learn any kind of function. This is particularly useful when its impossible/extremely difficult to solve a system using deterministic methods. As opposed to binary networks which are often used for pattern recognition/classification purposes.
Given their non-deterministic nature NN's of any kind are touchy beasts, choosing the right kinds of inputs/network architecture can be somewhat a black art.
Feed forward neural networks are always "continuous" -- it's the only way that backpropagation learning actually works (you can't backpropagate through a discrete/step function because it's non-differentiable at the bias threshold).
You might have a discrete (e.g. "one-hot") encoding of the input or target output, but all of the computation is continuous-valued. The output may be constrained (i.e. with a softmax output layer such that the outputs always sum to one, as is common in a classification setting) but again, still continuous.
If you mean a network that predicts a continuous, unconstrained target -- think of any prediction problem where the "correct answer" isn't discrete, and a linear regression model won't suffice. Recurrent neural networks have at various times been a fashionable method for various financial prediction applications, for example.
Continuous neural networks are not known to be universal approximators (in the sense of density in $L^p$ or $C(\mathbb{R})$ for the topology of uniform convergence on compacts, i.e.: as in the universal approximation theorem) but only universal interpolators in the sense of this paper:
https://arxiv.org/abs/1908.07838

Resources