I am trying to implement genetic algorithm for maximizing a function of n variables. However the problem is that the fitness values can be negative and I am not sure about how to handle negative values while doing selection. I read this article Linear fitness scaling in Genetic Algorithm produces negative fitness values
but it's not clear to me how the negative fitness values were taken care of and how scaling factors a and b were calculated.
Also, from the article I know that roulette wheel selection only works for positive fitness value. Is it the same for tournament selection as well ?
When you have negative values, you could try to find the smallest fitness value in your population and add its opposite to every value. This way you will no longer have negative values, while the differences between fitness values will remain the same.
Tournament selection is not affected by this problem. It simply compares the fitness values of a uniformly sampled subset of size n of the population and takes the one with the best value. Still of course this means that, if you sample without repetition then the worst n-1 individuals will never get selected. If you sample with repetition they have a chance of being selected.
As with proportional selection: It doesn't work with negative fitness values. You can only apply "windowing" or "scaling" of your fitness values in which case they work again.
I once programmed some sampling methods as extension methods for C#'s IEnumerable among them is a SampleProportional and SampleProportionalWithoutRepetition extension method. They're part of HeuristicLab under GPL license.
Okay, it's late to answer, but still someone could google it.
First of all - yes, you can use negative fitness. But I'm totally suggest you not to do it, because I've did it and experienced a lot of problems (still doable, but totally not recommended). So here's explanation:
Say you have population of N creatures. After simulation they all have some fitness values f(n), where f(n) is fitness and n is creature number. After this you want to build some probability distribution to determine which creatures should be killed (of course you can delete say 40% of just worst creatures but it would be better if you use distribution). How do you build such distribution? Say f(a) = 50, and f(b) = 100, so creature b is 2 times better than creature a, so probably you want to make the
survival probability of creature a 2 times higher than creature b (makes great sense if your fitness value is linear). In case you wonder how to do it:
Let's say that sum( f (n) ) is the summ of all fitness values. Then
survival probability p(a) of creature a is:
p(a) = f(a) / sum( f(n) )
This will do the trick.
But now let's make negative fitness allowed. Say f(a) = 50, f(b) = 100, f(c) = -1000. b is again 2 times better than a, makes sense, but it's -10 times better than c? Doesn't make sense. Gentleman above suggested you to add oppositive of worst fitness value, which kinda can "fix" your situation, but really it don't (I maked same mistake before). Okay, let's add 1000 to all fitness values:
f(a) = 1050, f(b) = 1100, f(c) = 0, so survival probability of c is zero now, okay, we can take it. But let's compare a and b now:
b is 1.05 better than a now, which means that fitness of a and b is almost the same, which is totally unacceptable, because it clearly was 2 times better than a (of course in assumption that fitness is linear, but this will mess up nonlinear fitnesses as well)! You can't escape this problem, it will constantly get in your way, because probability can't be negative, so you either can remove the probability from evolution (which is not very good thing to do) or you can do some exceptions and tricks.
Since it was too late in my scenario to remove negative fitness, here's my way in order to fix things up:
Once again, you have population of N creatures. Say neg(N) gives you all negative fitness creatures and pos(N) positive fitness creatures (it's your call to make zero negative or positive, doesn't matter in this case). And let's say you need D creatures to die. And now here's the trick:
the higher f( c ) ( c is pos creature) value, the better creature is, so we can use its fitness to determine the probability of survivial. But the lower (bigger negative) f( m ) (m is neg creature ), the worser creature is, so we can use its fitness to determine the probability of dying.
Now, if D > neg(N) then all neg(N) will die, and (D-neg(N)) of pos(N) will die with use of probability distribution based on all positive creatures fitness (probability of survival p(a) = f(a) / sum( pos(n) ) ). But if D < neg(N), then all pos(N) will survive, and D of neg(N) creatures will die with use of probability distribution based on all negative creatures fitness (probability of dying p(a) = f(a) / sum( neg(n) ) (f(a) will be negative, but sum( neg(n) ) will be negative as well, so probability will be positive).
I know this question has been here for a long time, but if new guys want to know the best way to deal with negative values, and also your problem is minimum. Here is the code for it.
from numpy import min, sum, ptp, array
from numpy.random import uniform
list_fitness1 = array([-12, -45, 0, 72.1, -32.3])
list_fitness2 = array([0.5, 6.32, 988.2, 1.23])
def get_index_roulette_wheel_selection(list_fitness=None):
""" It can handle negative also. Make sure your list fitness is 1D-numpy array"""
scaled_fitness = (list_fitness - min(list_fitness)) / ptp(list_fitness)
minimized_fitness = 1.0 - scaled_fitness
total_sum = sum(minimized_fitness)
r = uniform(low=0, high=total_sum)
for idx, f in enumerate(minimized_fitness):
r = r + f
if r > total_sum:
return idx
Make sure your fitness list is 1D-numpy array
Scaled the fitness list to the range [0, 1]
Transform maximum problem to minimum problem by 1.0 - scaled_fitness_list
Random a number between 0 and sum(minimizzed_fitness_list)
Keep adding element in minimized fitness list until we get the value greater than the total sum
You can see if the fitness is small --> it has bigger value in minimized_fitness --> It has a bigger chance to add and make the value greater than the total sum.
I think that the main issue people are running into here is that they're treating the fitness score improperly. Let's think about an example fitness score as the temperature inside of a truck shipping frozen goods. The truck's internal temperature should be -2 C... but that's also 28.4 F. They are the same exact fitness relative to the food staying frozen, but 2 * -2 = -4, and 2 * 28.4 is 56.8. "Two times colder" doesn't really make any sense here (-4 C != 14.2 F either). Same with fitness scores.
In the case of -1000 in Volot's example, the difference between 50 and 100 is actually comparatively low: the important thing is that you'd pick either / both of those over the -1000, which you will definitely do if you just subtract -1000 from everything. Then the next generation of children may have fitness scores of 50, 100, 200, and 10, let's say. Now the difference between 50 and 100 is much more pronounced, and 50 will have a much lower chance of getting picked. Remember genetic algorithms are iterative. It also reminds me of a saying: You don’t have to run faster than the bear to get away. You just have to run faster than the guy next to you. 50 just needs to outrun -1000 to survive to reproduce.
The problem of subtracting the min resulting in 0 also can be avoided. When estimating probability distributions, people will add 1 occurrence to every (known) possible outcome so that extremely rare events are still captured. That gets somewhat trickier with fitness scores. You can't just add 1. What if your fitness scores are 0.01, 0.02, and -0.01? 1.03, 1.02, and 1.00 are going to result in picking a low relative fitness a lot. You can instead add the lowest non-zero value to everything, resulting in 0.04, 0.05, and 0.02. For the -1000 case, it results in 2150, 2100, and 1050 (so everything that used to be 0 will always be half as likely as the next lowest fitness to get picked)
Still, to make things as consistent as possible with what is a more typical GA sampling method, I would only subtract the min and add back in a small amount of fitness when there are negative values. When everything is positive, there's no reason to do it.
I have a problem statement which says: if you have an array of elements {x1,x2,x3,...x10}, find the combination of elements such that it just sums up above a threshold value (say the threshold value is 100).
So if there exists a combination like x2+x5+x8 = 105, x3+x5+x8=103, and x4+x5 = 101, then the algorithm should output X4, X5.
The knapsack algorithm emits a value that is near but on the lesser side of the threshold (which is 100 here). I want the opposite, that is the smallest sum of selected elements that is greater than 100.
Is there any set of algorithms or any special case of any algorithm which might solve this problem?
I'll start out by noting that you are asking for the smallest value strictly greater than some target. In general "strictly greater than" and "strictly less than" constraints are much harder than "greater than or equal to" or "less than or equal to" constraints. If you have all integer values, then you could simply translate your constraint "the sum exceeds 100" to "the sum is greater than or equal to 101". I'll assume that you've made such a transformation for the rest of the problem.
One approach would be to treat this as an integer optimization problem, in which the binary decision variable y_i for each number is whether or not we include it. Then our goal is to minimize the sum of the numbers, which can be modeled as:
min x_1*y_1 + x_2*y_2 + ... + x_n*y_n
The constraint in this case is that the sum of the numbers is at least 100:
x_1*y_1 + x_2*y_2 + ... + x_n*y_n >= 100
In general this is a hard problem (note that it is at least as hard as the subset sum problem, which is NP-complete). However modern optimization solvers may be efficient enough for your problem instances.
To test the scalability of a free solver for this problem, consider the following implementation with the lpSolve package in R (it returns the selected subset if the problem is feasible and NA otherwise):
min.subset <- function(x, min.sum) {
mod <- lp("min", x, matrix(x, nrow=1), ">=", min.sum, all.bin=TRUE)
if (mod$status == 0) {
which(mod$solution >= 0.999)
} else {
min.subset(1:10, 43.5)
# [1] 2 3 4 5 6 7 8 9
min.subset(1:10, 88)
# [1] NA
To test the scalability, I'll select n elements randomly from [1, 2, ..., 1000], setting the target sum to be half the sum of the elements. The runtimes were:
With n=100, it ran in 0.01 seconds
With n=1000, it ran in 0.1 seconds
With n=10000, it ran in 8.7 seconds
It appears you can solve this problem for more than 10k elements (with the selected distribution) without too many computational challenges. If your problem is too big for the free solver I've used here, you might consider Gurobi or cplex, two commercial solvers that are free for academic use but otherwise not free.
Suppose X is the sum of all x_i. Then equivalently, you are asking for a minimum subset of your x_i that sum up to at most X - 100 (as the complement of these x_i will be the optimum solution to your problem). So all Knapsack theory can be applied here.
In practice (really large instances), I'd suggest this form of Nemhauser-Ullman generalization which can solve instances with millions of objects.
How would you implement a function that is returning a random number from interval 1..1000
in the case there is a number N determining the chance of reaching higher numbers or lower numbers?
It should behave as follows:
if N = 0 and we will generate many times the random number we will get a certain equilibrium (every number from interval 1..1000 has equal chance).
if N = 2321 (I call it positive factor) it will be very hard to achieve small number (often will be generated numbers > 900, sometimes numbers near 500 and rarely numbers < 100). The highest the positive factor the highest probability for high numbers
if N = -2321 (negative factor) this will be the opposite of positive factor
It's clear that the generated numbers will create for given N certain characteristic curve. Could you advise me how to achieve this goal and what curves can I create? What possibilities do I have here? How would you limit positive and negative factors etc.
thank you for help
If you generate a uniform random number, and then raise it to a power > 1, it will get smaller, but stay in the range [0, 1]. If you raise it to a power greater than 0 but less than 1, it will get larger, but stay in the range [0, 1].
So you can use the exponent to pick a power when generating your random numbers.
def biased_random(scale, bias):
return random.random() ** bias * scale
sum(biased_random(1000, 2.5) for x in range(100)) / 100
291.59652962214676 # average less than 500
max(biased_random(1000, 2.5) for x in range(100))
963.81166161355998 # but still occasionally generates large numbers
sum(biased_random(1000, .3) for x in range(100)) / 100
813.90199860117821 # average > 500
min(biased_random(1000, .3) for x in range(100))
265.25040459294883 # but still occasionally generates small numbers
This problem is severely underspecified. There are a million ways to solve it as it is mentioned.
Instead of arbitrary positive and negative values, try to think what is the meaning behind them. IMHO, beta distribution is the one you should consider. By selecting the parameters \alpha and \beta you should be appropriately modulate the behavior of your distribution.
See what shapes you can get with certain \alpha and \beta http://en.wikipedia.org/wiki/Beta_distribution#Shapes
Lets for beginning decide that we will pick numbers from [0,1] because it makes stuff simpler.
n is number that represents distribution (0,2321 or -2321) as in example
We need solution only for n > 0, because if n < 0. You can take positive version of n and subtract from 1.
One simple idea for PDF in interval [0,1] is x^n. (or at least this kind of shape)
Calculating CDF is then integrating x^n and is x^(n+1)/(n+1)
Because CDF must be 1 at the end (in our case at 1) our final CDF is than x^(n+1) and is properly weighted
In order to generate this kind distribution from this, we must calaculate quantile function
Quantile function is just inverse of CDF and is in our case. x^(1/(n+1))
And that is it. Your QF is x^(1/(n+1))
To generate numbers from [0,1] you have to pick uniformly distributetd random from [0,1] (most common random function in programming languages)
and than power this whit (1/(n+1))
Only problem I see is that it can be problem to calculate 1-x^(1/(-n+1)) correctly, where n < 0 but i think that you can use log1p,
so it becomes exp(log1p(-x^(1/(-n+1))) if n<0
conclusion whit normalizations
if n>=0: (x^(1/(n/1000+1)))*1000
if n<0: exp(log1p(-(x^(1/(-(n/1000)+1)))))*1000
where x is uniformly distributed random value in interval [0,1]
I have got this assignment question on HMM and I have solved it. I would like to know if I am correct. The problem is:
Suppose a dishonest dealer has two coins, one fair and one biased; the biased coin
has heads probability 1/4. Assume that the dealer never switches the coins. Which
coin is more likely to have generated the sequence HTTTHHHTTTTHTHHTT? It may
be useful to know that log2(3) = 1.585
I calculated the P for fair coin and biased coin.
The P for fair coin is 7.6*10-6 where as P for biased coin is 3.43*10-6. I didn't use log term, which can be used if I solve it the other way. So, I concluded that it is more likely that the given sequence is generated by a fair coin.
Am I right?
Any help is greatly appreciated.
So you are given the following.
P(H|Fake) = 1/4 P(T|Fake) = 3/4
P(H|Fair) = 1/2 P(T|Fair) = 1/2
P(Fair) = 1/2 P(Fake) = 1/2
To answer the question you need to answer P(Fake/HTTTHHHTTTTHTHHTT) and P(Fair/HTTTHHHTTTTHTHHTT) for which you need to apply bayes:
P(Fake|X) = (P(X|Fake) * P(Fake)) / P(X)
P(Fair|X) = (P(X|Fair) * P(Fair)) / P(X)
P(X) = P(X|Fake) * P(Fake) + P(X|Fair) * P(Fair)
P(X) = (3.43710e-6 * 0.5) + (7.629e-6 * 0.5) = 5.533e-6
And therefore
P(Fake|X) = (3.43710e-6 * 0.5) / 5.533e-6 = 0.3106
P(Fair|X) = (7.629e-6 * 0.5) / 5.533e-6 = 0.6894
So therefore, is more likely that the used coin is the FAIR one. Even though intuitively one might think that the selected coin is the Fake it seems that this is not the case. The given distribution is closer to 0.5 tail 0.5 heads than to 0.25 heads 0.75 tails. For example, in the case of tails 10/17 is 0.58 that is closer to P(T|Fair)=.5 than to P(T|Fake)=.75
HMM is a bit of an overkill for this example. The probability of getting heads in binomially distributed, with p = 0.5 for the fair coin and p = 0.25 for the other one. For both of them, the number of trials n = 17 (if my counting is correct). From the 17 samples you got 7 successes (7 heads). Using Wolfram Alpha, the probability of the fair coin generating this sample is approx 0.15, as opposed to approx 0.07 for the unfair coin. Note I did not bother calculating the exact numbers, just looked at the plots. The formula is there for you to work with if you want to.
If you absolutely must use a HMM, set the set of hidden states to be {fair; unfair} . The transition probabilities are: from a hidden state "fair" to a hidden state "fair"= 1, from a fair to unfair 0, etc, because the dealer is not allowed to change coins halfway through the trial. The emission probability from a hidden state "fair" are 0.5 for observable state "heads" and 0.5 for observable state "tails" (0.25 and 0.75 from "unfair"). You can assume at time t=0 hidden state "fair" and "unfair" are equally likely.
I have a GA with a fitness function that can evaluate to negative or positive values. For the sake of this question let's assume the function
u = 5 - (x^2 + y^2)
x in [-5.12 .. 5.12]
y in [-5.12 .. 5.12]
Now in the selection phase of GA I am using simple roulette wheel. Since to be able to use simple roulette wheel my fitness function must be positive for concrete cases in a population, I started looking for scaling solutions. The most natural seems to be linear fitness scaling. It should be pretty straightforward, for example look at this implementation. However, I am getting negative values even after linear scaling.
For example for the above mentioned function and these fitness values:
-9.734897 -7.479017 -22.834280 -9.868979 -13.180669 4.898595
after linear scaling I am getting these values
-9.6766040 -11.1755111 -0.9727897 -9.5875139 -7.3870793 -19.3997490
Instead, I would like to scale them to positive values, so I can do roulette wheel selection in the next phase.
I must be doing something fundamentally wrong here. How should I approach this problem?
The main mistake was that the input to linear scaling must already be positive (by definition), whereas I was fetching it also negative values.
The talk about negative values is not about input to the algorithm, but about output (scaled values) from the algorithm. The check is to handle this case and then correct it so as not to produce negative scaled values.
if(p->min > (p->scaleFactor * p->avg - p->max)/
(p->scaleFactor - 1.0)) { /* if nonnegative smin */
d = p->max - p->avg;
p->scaleConstA = (p->scaleFactor - 1.0) * p->avg / d;
p->scaleConstB = p->avg * (p->max - (p->scaleFactor * p->avg))/d;
} else { /* if smin becomes negative on scaling */
d = p->avg - p->min;
p->scaleConstA = p->avg/d;
p->scaleConstB = -p->min * p->avg/d;
On the image below, if f'min is negative, go to else clause and handle this case.
Well the solution is then to prescale above mentioned function, so it gives only positive values. As Hyperboreus suggested, this can be done by adding the smallest possible value
u = 5 - (2*5.12^2)
It is best if we separate real fitness values that we are trying to maximize from scaled fitness values that are input to selection phase of GA.
I agree with the previous answer. Linear scaling by itself tries to preserve the average fitness value, so it needs to be offset if the function is negative. For more details, please have a look in Goldberg's Genetic Algorithms book (1989), Chapter 7, pp. 76-79.
Your smallest possible value for u = 5 - (2*5.12^2). Why not just add this to your u?
You have a biased random number generator that produces a 1 with a probability p and 0 with a probability (1-p). You do not know the value of p. Using this make an unbiased random number generator which produces 1 with a probability 0.5 and 0 with a probability 0.5.
Note: this problem is an exercise problem from Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein.(clrs)
The events (p)(1-p) and (1-p)(p) are equiprobable. Taking them as 0 and 1 respectively and discarding the other two pairs of results you get an unbiased random generator.
In code this is done as easy as:
int UnbiasedRandom()
int x, y;
x = BiasedRandom();
y = BiasedRandom();
} while (x == y);
return x;
The procedure to produce an unbiased coin from a biased one was first attributed to Von Neumann (a guy who has done enormous work in math and many related fields). The procedure is super simple:
Toss the coin twice.
If the results match, start over, forgetting both results.
If the results differ, use the first result, forgetting the second.
The reason this algorithm works is because the probability of getting HT is p(1-p), which is the same as getting TH (1-p)p. Thus two events are equally likely.
I am also reading this book and it asks the expected running time. The probability that two tosses are not equal is z = 2*p*(1-p), therefore the expected running time is 1/z.
The previous example looks encouraging (after all, if you have a biased coin with a bias of p=0.99, you will need to throw your coin approximately 50 times, which is not that many). So you might think that this is an optimal algorithm. Sadly it is not.
Here is how it compares with the Shannon's theoretical bound (image is taken from this answer). It shows that the algorithm is good, but far from optimal.
You can come up with an improvement if you will consider that HHTT will be discarded by this algorithm, but in fact it has the same probability as TTHH. So you can also stop here and return H. The same is with HHHHTTTT and so on. Using these cases improves the expected running time, but are not making it theoretically optimal.
And in the end - python code:
import random
def biased(p):
# create a biased coin
return 1 if random.random() < p else 0
def unbiased_from_biased(p):
n1, n2 = biased(p), biased(p)
while n1 == n2:
n1, n2 = biased(p), biased(p)
return n1
p = random.random()
print p
tosses = [unbiased_from_biased(p) for i in xrange(1000)]
n_1 = sum(tosses)
n_2 = len(tosses) - n_1
print n_1, n_2
It is pretty self-explanatory, and here is an example result:
505 495
As you see, nonetheless we had a bias of 0.097, we got approximately the same number of 1 and 0
The trick attributed to von Neumann of getting two bits at a time, having 01 correspond to 0 and 10 to 1, and repeating for 00 or 11 has already come up. The expected value of bits you need to extract to get a single bit using this method is 1/p(1-p), which can get quite large if p is especially small or large, so it is worthwhile to ask whether the method can be improved, especially since it is evident that it throws away a lot of information (all 00 and 11 cases).
Googling for "von neumann trick biased" produced this paper that develops a better solution for the problem. The idea is that you still take bits two at a time, but if the first two attempts produce only 00s and 11s, you treat a pair of 0s as a single 0 and a pair of 1s as a single 1, and apply von Neumann's trick to these pairs. And if that doesn't work either, keep combining similarly at this level of pairs, and so on.
Further on, the paper develops this into generating multiple unbiased bits from the biased source, essentially using two different ways of generating bits from the bit-pairs, and giving a sketch that this is optimal in the sense that it produces exactly the number of bits that the original sequence had entropy in it.
You need to draw pairs of values from the RNG until you get a sequence of different values, i.e. zero followed by one or one followed by zero. You then take the first value (or last, doesn't matter) of that sequence. (i.e. Repeat as long as the pair drawn is either two zeros or two ones)
The math behind this is simple: a 0 then 1 sequence has the very same probability as a 1 then zero sequence. By always taking the first (or the last) element of this sequence as the output of your new RNG, we get an even chance to get a zero or a one.
Besides the von Neumann procedure given in other answers, there is a whole family of techniques, called randomness extraction (also known as debiasing, deskewing, or whitening), that serve to produce unbiased random bits from random numbers of unknown bias. They include Peres's (1992) iterated von Neumann procedure, as well as an "extractor tree" by Zhou and Bruck (2012). Both methods (and several others) are asymptotically optimal, that is, their efficiency (in terms of output bits per input) approaches the optimal limit as the number of inputs gets large (Pae 2018).
For example, the Peres extractor takes a list of bits (zeros and ones with the same bias) as input and is described as follows:
Create two empty lists named U and V. Then, while two or more bits remain in the input:
If the next two bits are 0/0, append 0 to U and 0 to V.
Otherwise, if those bits are 0/1, append 1 to U, then write a 0.
Otherwise, if those bits are 1/0, append 1 to U, then write a 1.
Otherwise, if those bits are 1/1, append 0 to U and 1 to V.
Run this algorithm recursively, reading from the bits placed in U.
Run this algorithm recursively, reading from the bits placed in V.
This is not to mention procedures that produce unbiased random bits from biased dice or other biased random numbers (not just biased bits); see, e.g., Camion (1974).
I discuss more on randomness extractors in a note on randomness extraction.
Peres, Y., "Iterating von Neumann's procedure for extracting random bits", Annals of Statistics 1992,20,1, p. 590-597.
Zhou, H. And Bruck, J., "Streaming algorithms for optimal generation of random bits", arXiv:1209.0730 [cs.IT], 2012.
S. Pae, "Binarization Trees and Random Number Generation", arXiv:1602.06058v2 [cs.DS].
Camion, Paul, "Unbiased die rolling with a biased die", North Carolina State University. Dept. Of Statistics, 1974.
Here's one way, probably not the most efficient. Chew through a bunch of random numbers until you get a sequence of the form [0..., 1, 0..., 1] (where 0... is one or more 0s). Count the number of 0s. If the first sequence is longer, generate a 0, if the second sequence is longer, generate a 1. (If they're the same, try again.)
This is like what HotBits does to generate random numbers from radioactive particle decay:
Since the time of any given decay is random, then the interval between two consecutive decays is also random. What we do, then, is measure a pair of these intervals, and emit a zero or one bit based on the relative length of the two intervals. If we measure the same interval for the two decays, we discard the measurement and try again
HotBits: How It Works
I'm just explaining the already proposed solutions with some running proof. This solution will be unbiased, no matter how many times we change the probability. In a head n tail toss, the exclusivity of consecutive head n tail or tail n head is always unbiased.
import random
def biased_toss(probability):
if random.random() > probability:
return 1
return 0
def unbiased_toss(probability):
x = biased_toss(probability)
y = biased_toss(probability)
while x == y:
x = biased_toss(probability)
y = biased_toss(probability)
return x
# results with contain counts of heads '0' and tails '1'
results = {'0':0, '1':0}
for i in range(1000):
# on every call we are changing the probability
p = random.random()
results[str(unbiased_toss(p))] += 1
# it still return unbiased result