How can I encode double values in genetic algorithm? - algorithm

I want to use neural network to learning cars riding on the racetrack. Imo best way to learning net is using genetic algorithm, but in each tutorial the genotype is encode by 0 and 1 (binary values). In my net weights are double values, so genotype looks like 3,12; 9,12; 0,83, -0,73 etc.
So my question is:
Should I encode each weight to binary value ? I think I can use double values but I don't know how can I mutate this ? Binary value I can inverse from 0 to 1 and from 1 to 0 but double ?

From a theoretical point of view yes, you can.
The condition anyhow is you correctly define all the operations (like crossover, mutation, etc.) for continuous values too.
The answer then is yes, if your software implementation enables you to do so.
Let me draw a simplified example.
If the algorithm aims at identifying the best fit for the sine function, and you can use shapes [triangle, square, half-circle], y magnitude and x displacement, you can have a chromosome of let's say N shapes to be summed together.
In such a case x and y must be both double: you can mutate them e.g. by adding a random number in a sensate range, and perform crossover by exchanging x or y with the partner, or even collecting a full tuple (shape-x-y).
I would say that the mantra is to keep things coherent, and let individuals mutate in a sensate way for your model (a bad choice would be cross x with y).

Related

How is the decision boundary piloted after the parameters theta are updated

I have been learning about Machine learning algorithms this semester but I cant seem to understand how the parameters theta are used once Gradient decent is ran and they are updated, specifically in Logistic regression, In short my question is how is the decision boundary piloted after the parameters theta are updated.
After you use gradient descent to estimate your parameters theta, you can use those calculated parameters to make predictions.
For any input x, you can now calculate an predicted outcome y.
Ultimately the goal of machine learning is to make predictions.
So you take a whole bunch of observations x and y. Where x is your input and y is your output. In case of logistic regression, y is one of two values. For example, take a bunch of emails (x) that are labeled spam or no spam (y is 1 for spam and 0 for no spam). Or take a bunch of medical images that are labeled healthy or non healthy. ...
Feed all that data in your machine learning algorithm. Your algorithm (gradient descent for example), will calculate the theta coefficients.
Now you can use these theta coefficient to make predictions for new values of x. For example a new email that the system has never seen, using the theta coefficient, you can predict whether it is spam or not.
As far a plotting the decision boundary. This is probably feasible when you have two dimensions for x. You can have one dimension on each axis. And the resulting dots in your graph would be your y values. You could color them differently or show a different shape whether the result is one way or the other (i.e. your y is 0 or 1).
In practicality, these plots are useful during a lecture to get a general gist of what you're trying to do or accomplish. In reality, every input X would probably be a vector of many values (way more than 2). And thus it becomes impossible to plot a decision boundary.
Typically, logistic regression is parametrized in a following way:
cl(x|theta) = 1 / (1 + exp(-SUM_{i=1}^d theta_i x_i + theta_0 )) ) > 0.5
which is equivalent to
cl(x|theta) = sign(SUM_{i=1}^d theta_i x_i + theta_0 )
so once you get your theta, you use it to make a prediction by computing a simple weighted sum of your data representation and you check the sign of such number.

Evolving a matrix using a genetic algorithm

I recently discovered genetic algothims and after doing a little research I can't find any example on how to evolve structures more complex than a vector or a string.
Let's say that I'm using a covariance matrix for a certain computation (to compute a mahalanobis distance for example) and I want to look for a better matrix to do the job and linimize a certain criteria, are there any classic examples on how to evolve the matrix and which crossover operators to use ?
Thanks !
Any structure of fixed size and shape that is made of numbers (or any other elements) can be rewritten to a 1-D vector and back. You can then use any operator you like which works on vectors.
If you wanted to work with matrices (or any other structures) directly you can always design your own operators, but a matrix basically is a vector, just written in a different way. For the matrix case there are a number of possibilites of operators (crossover):
Swap rows/columns (between the parents)
Swap submatrices (generalization of the above)
Continuous-space crossover methos like BLX-alpha, PCX, arithmetic crossover... These all are designed for vectors but you will just treat the matrix as a vector (it's really not that different).
Mutation is probably going to be more or less identical to the vector-like - you just mutate the elements (or some of them).

Neural Network-like Data Structure

So I'm working on a little side-project for the purpose of experimenting with genetic algorithms. The project involves two classes, Critters and Food. The critters gain hunger each tick and lose hunger when they find food. The Critters can move, and the Food is stationary. Each Critter has a genome which is just a string of randomly generated characters. The goal is that after several generations, the Critters will evolve specialized movement patterns that result in optimal food consumption.
Right now, each Critter is governed by a neural network. The neural network is initialized with weights and biases derived from the Critter's genome. The first input into the neural network is [0,0]. The neural network produces two outputs which dictate the direction of the Critter's x and y movement respectively. This output is used as the input for the neural network at the next tick. For example:
1: [0,0]->NN->[.598.., -.234...] // Critter moves right and up
2: [.598...,-.234...]->NN->[-.409...,-.232...] // Critter moves left and up
3: [-.409...,-.232...]->NN-> etc.
The problem is that, regardless of how the weights are initialized, the neural network is finding a sort of "fixed point." That is, after two or three iterations the output and input are practically the same so the Critter always moves in the same direction. Now I'm not training the neural net and I don't really want to. So what I'm looking for is an alternative method of generating the output.
More specifically, let's say I have n random weights generated by the genome. I need a relation determined by those n weights that can map (in the loosest sense of the word) 2 inputs in the range [-1,1] to two outputs in the same range. The main thing is that I want the weights to have a significant impact on the behavior of the function. I don't want it to be something like y=mx+b where we're only changing y and b.
I know that's a pretty vague description. At first I thought the Neural Network would be perfect, but it seems as though the inputs have virtually no affect on the outputs without training (which is fair since Neural Networks are meant to be trained).
Any advice?
Just an idea.
You have f(genome) -> (w_1, w_2, ..., w_n), where f generates w based on the genome.
You could for example use a hash function h and compute [h(w_1, ..., w_(n/2)), h(w_(n/2+1), ..., w_n))].
Normally a hash function should give very distinct outputs for small change in inputs. But not always. You can look for hash functions that are continuous (small change in input, small change in output). This kind of function would be used for similarity search, http://en.wikipedia.org/wiki/Locality_sensitive_hashing might provide some ideas. This way you could actually use the hash directly on the genome.
Otherwise, you could try to split the genome or the weights and give the splits different purposes. Let's say n = 4.
(w_1, w_2) affects x
(w_3, w_4) affects y
You could then compute x as (w_1 + w_2*Random_[-1,1])/2, where Random_[-1,1] is a random number from the interval [-1,1] and assuming that w_i \in [-1,1] for all i.
Similarly for y.
Your genetic algorithm would then optimize how fast and how randomly the Critters move in order to optimally find food. If you have more weights (or longer genome), you could try to come up with a fancier function in a similar spirit.
This actually shows that with genetic algorithms the problem solving shifts to finding a good genome representation and a good fitness function so don't worry if you get stuck on it a bit.

Converting SVM hyperplane distance (response) to Likelihood

I am trying to use SVM to train some image models. However SVM is not a probabilistic framework so it outputs distance between hyperplanes as a whole number.
Platt converted the output of SVM to likelihood by using some optimisation function but I fail to understand that, does the method assumes one class has same probability I.E for binary classifier if all training sets are even and proportional, then for label 1 or -1 it occurs every time with 50% probability.
Secondly, in some papers I read that for binary SVM classifier they convert -1 and 1 label to range of 0 to 1 and compute the likelihood. But they do not mention anything about how to convert the SVM distance to probability.
Sorry for my english. I would welcome any suggestion and comment. Thank you.
link to paper
Well as far as I can tell that paper is proposing a mapping from the SVM output to a range of [0,1] using a sigmoid function.
From a simplified point of view, it would be something like Sigmoid(RAWSVM(X)) in [0,1], so there is not an explicit "weight" to the labels. The idea is that you take one label (let's say Y=+1) and then you take the output of the SVM and see how close is the prediction for that pattern to that label, if it is close then the sigmoid would give you a number close to 1, otherwise will give you a number close to 0. And hence you have a sense of probability.
Secondly, in some papers I read that for binary SVM classifier they convert -1 and 1 label to range of 0 to 1 and compute the likelihood. But they do not mention anything about how to convert the SVM distance to probability.
Yes, you are correct and some implementations works in the realm of [0,1] instead of [-1,+1], some even maps the label to a factor depending on the value of C. In any case, that shouldn't affect the method proposed in the paper since they would map any range to [0,1]. Keep in mind that this "probabilistic" distribution is just a map from any range to [0,1] assuming uniformity. I am oversimplifying this but the effect is the same.
One last thing, that sigmoid map is not static but data-driven, which means that there would be some training using the dataset to parametrize the sigmoid to adjust it to the data. In other words, for two different datasets you would probably get two different mapping functions.

Mapping arbitrary strings to RGB values

I have a huge set of arbitrary natural language strings. For my tool to analyze them I need to convert each string to unique color value (RGB or other). I need color contrast to depend on string similarity (the more string is different from other, the more their respective colors should be different). Would be perfect if I would always get same color value for the same string.
Any advice on how to approach this problem?
Update on distance between strings
I probably need "similarity" defined as a Levenstein-like distance. No natural language parsing is required.
That is:
"I am going to the store" and
"We are going to the store"
Similar.
"I am going to the store" and
"I am going to the store today"
Similar as well (but slightly less).
"I am going to the store" and
"J bn hpjoh up uif tupsf"
Quite not similar.
(Thanks, Welbog!)
I probably would know exactly what distance function I need only when I'll see program output. So lets start from simpler things.
Update on task simplification
I've removed my own suggestion to split task into two — absolute distance calculation and color distribution. This would not work well as at first we're reducing dimensional information to a single dimension, and then trying to synthesize it up to three dimensions.
You need to elaborate more on what you mean by "similar strings" in order to come up with an appropriate conversion function. Are the strings
"I am going to the store" and
"We are going to the store"
considered similar? What about the strings
"I am going to the store" and
"J bn hpjoh up uif tupsf"
(all of the letters in the original +1), or
"I am going to the store" and
"I am going to the store today"
? Based on what you mean by "similar", you might consider different functions.
If the difference can be based solely on the values of the characters (in Unicode or whatever space they are from), then you can try summing the values up and using the result as a hue for HSV space. If having a longer string should cause the colours to be more different, you might consider weighing characters by their position in the string.
If the difference is more complex, such as by the occurrences of certain letters or words, then you need to identify this. Maybe you can decide red, green and blue values based on the number of Es, Ss and Rs in a string, if your domain has a lot of these. Or pick a hue based on the ratio of vowels to consonents, or words to syllables.
There are many, many different ways to approach this, but the best one really depends on what you mean by "similar" strings.
It sounds like you want a hash of some sort. It doesn't need to be secure (so nothing as complicated as MD5 or SHA) but something along the lines of:
char1 + char2 + char3 + ... + charN % MAX_COLOUR_VALUE
would work as a simple first step. You could also do fancier things along the lines of having each character act as an 'amplitude' for R,G and B (e could be +1R, +2G and -4B, etc.) and then simply add up all the values in a string... clamp them at the end and you have a method of turning arbitrary length strings into colours as a 'colour hash' sort of process.
First, you'll need to pick a way to measure string similarity. Minimal edit distance is traditional, but is not sufficient to well-order the strings, which is what you will need if you want to allocate the same colours to the same strings every time - perhaps you could weight the edit costs by alphabetic distance. Also minimal edit distance by itself may not be very useful if what you are after is similarity in speech rather than in written form (if so, consider a stemming/soundex pass first), or some other sense of "similarity".
Then you need to pick a way of traversing the visible colour space based on that metric. It may be helpful to consider using HSL or HSV colour representation - the algorithm could then become as simple as picking a starting hue and walking the sorted corpus, assigning the current hue to each string before offsetting it by the string's difference from the previous one.
How important is it that you never end up with two dissimilar strings having the same colour?
If it's not that important then maybe this could work?
You could pick a 1 dimensional color space that is "homotopic" to the circle: Say the color function c(x) is defined for x between 0 and 1. Then you'd want c(0) == c(1).
Now you take the sum of all character values modulo some scaling factor and wrap this back to the color space:
c( (SumOfCharValues(word) modulo ScalingFactor) / ScalingFactor )
This might work even better if you defined a "wrapping" color space of higher dimensions and for each dimension pick different SumOfCharValues function; someone suggested alternating sum and length.
Just a thought... HTH
Here is my suggestion (I think there is a general name for this algorithm, but I'm too tired to remember it):
You want to transform each string to a 3D point node(r, g, b) (you can scale the values so that they fit your range) such that the following error is minimized:
Error = \sum_i{\sum_j{(dist(node_i, node_j) - dist(str_i, str_j))^2}}
You can do this:
First assign each string a random color (r, g, b)
Repeat until you see fit (eg. error is adjusted less than \epsilon = 0.0001):
Pick a random node
Adjust it's position (r, g, b) such that the error is minimized
Scale the coordinate system such that each nodes coordinates are in the range [0., 1.) or [0, 256]
You can use something like MinHash or some other LSH method and define similarity as intersection between sets of shingles measured by Jaccard coefficient.
There is a good description in Mining of Massive data sets, Ch.3 by Rajaraman and Ullman.
I would maybe define some delta between two strings. I don't know what you define as the difference (or "unequality") of two strings, but the most obvious thing I could think about would be string length and the number of occurences of particular letters (and their index in the string). It should not be tricky to implement it such that it returns the same color code in equal strings (if you do an equal first, and return before further comparison).
When it comes to the actual RGB value, I would try to convert the string data into 4 bytes (RGBA), or 3 bytes if you only use the RGB. I don't know if every string would fit into them (as that may be language specific?).
Sorry, but you can't do what you're looking for with levenshtein distance or similar. RGB and HSV are 3-dimensional geometric spaces, but levenshtein distance describes a metric space - a much looser set of contstraints with no fixed number of dimensions. There's no way to map a metric space into a fixed number of dimensions while always preserving locality.
As far as approximations go, though, for single terms you could use a modification of an algorithm like soundex or metaphone to pick a color; for multiple terms, you could, for example, apply soundex or metaphone to each word individually, then sum them up (with overflow).

Resources