Fitness Proportionate Selection when some fitnesses are 0 - algorithm

I have a question about what to do with the fitnesses (fitness'?) that are 0 when getting the fitness proportionate probabilities. Should the container for the members be sorted by highest fitness first, then do code similar to this:
for all members of population
sum += fitness of this individual
end for
for all members of population
probability = sum of probabilities + (fitness / sum)
sum of probabilities += probability
end for
loop until new population is full
do this twice
number = Random between 0 and 1
for all members of population
if number > probability but less than next probability then you have been selected
end for
end
create offspring
end loop
My problem that I am seeing as I go through one iteration by hand with randomly generated members is that I have some member's fitness as 0, but when getting the probability of those members, it keeps the same probability as the last non zero member. Is there a way I can separate the non zero probabilities from the zero probabilities? I was thinking that even if I sort based on highest fitness, the last non zero member would have the same probability as the zero probabilities.

Consider this example:
individual fitness(i) probability(i) partial_sum(i)
1 10 10/20 = 0.50 0.50
2 3 3/20 = 0.15 0.5+0.15 = 0.65
3 2 2/20 = 0.10 0.5+0.15+0.1 = 0.75
4 0 0/20 = 0.00 0.5+0.15+0.1+0.0 = 0.75
5 5 5/20 = 0.25 0.5+0.15+0.1+0.0+0.25 = 1.00
------
Sum 20
Now if number = Random between [0;1[ we are going to pick individual i if:
individual condition
1 0.00 <= number < partial_sum(1) = 0.50
2 0.50 = partial_sum(1) <= number < partial_sum(2) = 0.65
3 0.65 = partial_sum(2) <= number < partial_sum(3) = 0.75
4 0.75 = partial_sum(3) <= number < partial_sum(4) = 0.75
5 0.75 = partial_sum(4) <= number < partial_sum(5) = 1.00
If an individual has fitness 0 (e.g. I4) it cannot be selected because of its selection condition (e.g. I4 has the associated condition 0.75 <= number < 0.75).

Related

A variant of the Knapsack algorithm

I have a list of items, a, b, c,..., each of which has a weight and a value.
The 'ordinary' Knapsack algorithm will find the selection of items that maximises the value of the selected items, whilst ensuring that the weight is below a given constraint.
The problem I have is slightly different. I wish to minimise the value (easy enough by using the reciprocal of the value), whilst ensuring that the weight is at least the value of the given constraint, not less than or equal to the constraint.
I have tried re-routing the idea through the ordinary Knapsack algorithm, but this can't be done. I was hoping there is another combinatorial algorithm that I am not aware of that does this.
In the german wiki it's formalized as:
finite set of objects U
w: weight-function
v: value-function
w: U -> R
v: U -> R
B in R # constraint rhs
Find subset K in U subject to:
sum( w(u) <= B ) | all w in K
such that:
max sum( v(u) ) | all u in K
So there is no restriction like nonnegativity.
Just use negative weights, negative values and a negative B.
The basic concept is:
sum( w(u) ) <= B | all w in K
<->
-sum( w(u) ) >= -B | all w in K
So in your case:
classic constraint: x0 + x1 <= B | 3 + 7 <= 12 Y | 3 + 10 <= 12 N
becomes: -x0 - x1 <= -B |-3 - 7 <=-12 N |-3 - 10 <=-12 Y
So for a given implementation it depends on the software if this is allowed. In terms of the optimization-problem, there is no problem. The integer-programming formulation for your case is as natural as the classic one (and bounded).
Python Demo based on Integer-Programming
Code
import numpy as np
import scipy.sparse as sp
from cylp.cy import CyClpSimplex
np.random.seed(1)
""" INSTANCE """
weight = np.random.randint(50, size = 5)
value = np.random.randint(50, size = 5)
capacity = 50
""" SOLVE """
n = weight.shape[0]
model = CyClpSimplex()
x = model.addVariable('x', n, isInt=True)
model.objective = value # MODIFICATION: default = minimize!
model += sp.eye(n) * x >= np.zeros(n) # could be improved
model += sp.eye(n) * x <= np.ones(n) # """
model += np.matrix(-weight) * x <= -capacity # MODIFICATION
cbcModel = model.getCbcModel()
cbcModel.logLevel = True
status = cbcModel.solve()
x_sol = np.array(cbcModel.primalVariableSolution['x'].round()).astype(int) # assumes existence
print("INSTANCE")
print(" weights: ", weight)
print(" values: ", value)
print(" capacity: ", capacity)
print("Solution")
print(x_sol)
print("sum weight: ", x_sol.dot(weight))
print("value: ", x_sol.dot(value))
Small remarks
This code is just a demo using a somewhat low-level like library and there are other tools available which might be better suited (e.g. windows: pulp)
it's the classic integer-programming formulation from wiki modifies as mentioned above
it will scale very well as the underlying solver is pretty good
as written, it's solving the 0-1 knapsack (only variable bounds would need to be changed)
Small look at the core-code:
# create model
model = CyClpSimplex()
# create one variable for each how-often-do-i-pick-this-item decision
# variable needs to be integer (or binary for 0-1 knapsack)
x = model.addVariable('x', n, isInt=True)
# the objective value of our IP: a linear-function
# cylp only needs the coefficients of this function: c0*x0 + c1*x1 + c2*x2...
# we only need our value vector
model.objective = value # MODIFICATION: default = minimize!
# WARNING: typically one should always use variable-bounds
# (cylp problems...)
# workaround: express bounds lower_bound <= var <= upper_bound as two constraints
# a constraint is an affine-expression
# sp.eye creates a sparse-diagonal with 1's
# example: sp.eye(3) * x >= 5
# 1 0 0 -> 1 * x0 + 0 * x1 + 0 * x2 >= 5
# 0 1 0 -> 0 * x0 + 1 * x1 + 0 * x2 >= 5
# 0 0 1 -> 0 * x0 + 0 * x1 + 1 * x2 >= 5
model += sp.eye(n) * x >= np.zeros(n) # could be improved
model += sp.eye(n) * x <= np.ones(n) # """
# cylp somewhat outdated: need numpy's matrix class
# apart from that it's just the weight-constraint as defined at wiki
# same affine-expression as above (but only a row-vector-like matrix)
model += np.matrix(-weight) * x <= -capacity # MODIFICATION
# internal conversion of type neeeded to treat it as IP (or else it would be
LP)
cbcModel = model.getCbcModel()
cbcModel.logLevel = True
status = cbcModel.solve()
# type-casting
x_sol = np.array(cbcModel.primalVariableSolution['x'].round()).astype(int)
Output
Welcome to the CBC MILP Solver
Version: 2.9.9
Build Date: Jan 15 2018
command line - ICbcModel -solve -quit (default strategy 1)
Continuous objective value is 4.88372 - 0.00 seconds
Cgl0004I processed model has 1 rows, 4 columns (4 integer (4 of which binary)) and 4 elements
Cutoff increment increased from 1e-05 to 0.9999
Cbc0038I Initial state - 0 integers unsatisfied sum - 0
Cbc0038I Solution found of 5
Cbc0038I Before mini branch and bound, 4 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (0.00 seconds)
Cbc0038I After 0.00 seconds - Feasibility pump exiting with objective of 5 - took 0.00 seconds
Cbc0012I Integer solution of 5 found by feasibility pump after 0 iterations and 0 nodes (0.00 seconds)
Cbc0001I Search completed - best objective 5, took 0 iterations and 0 nodes (0.00 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from 5 to 5
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Result - Optimal solution found
Objective value: 5.00000000
Enumerated nodes: 0
Total iterations: 0
Time (CPU seconds): 0.00
Time (Wallclock seconds): 0.00
Total time (CPU seconds): 0.00 (Wallclock seconds): 0.00
INSTANCE
weights: [37 43 12 8 9]
values: [11 5 15 0 16]
capacity: 50
Solution
[0 1 0 1 0]
sum weight: 51
value: 5

What is the probability of the survival of a tribble?

You have a population of k Tribbles. This particular species of Tribbles live for exactly one day and then die. Just before death, a single Tribble has the probability P_i of giving birth to i more Tribbles. What is the probability that after m generations, every Tribble will be dead?
Is my analysis right? If it is right, why it not matching the output?
Case 1:
Number of tribbles: k = 1
Number of generations: m = 1
Probabilities: P_0 = 0.33 P_1 = 0.34 P_2 = 0.33
The probability that after 1 generation every Tribble would be dead = P_0 = 0.33
Case 2:
Number of tribbles: k = 1
Number of generations: m = 2
Probabilities: P_0 = 0.33 P_1 = 0.34 P_2 = 0.33
Each tribble can have either 0 or 1 or 2 children.
At the end of the first year there has to be at least one tribble to ensure that there are tribbles in the second generation also.
The tribble of the first generation should have 1 or 2 children. So, the number of tribbles at the end of the first year would be either 1 or 2 with probabilities P_1=0.34 P_1=0.34 and P_2=0.33 P_2=0.33 respectively.
If there is to be no children after the second generation, none of these children should have children of their own.
If there is 1 child in the second generation, the probability it would have no children is P_0=0.33
If there are 2 children in the second generation, the probability that none of them would have children is (P_0)^2=(0.33)^2=0.1089
The probability that after 2 generations every tribble would be dead is the probability of there being 1 child times the probability of it not having children plus the probability of there being 2 children times the probability of none of them having children =0.34×0.33+0.33×0.0.1089=0.148137
You miss 1st generation 0 child case
The correct equation is
P0 x 1 + P1 x P0 + P2 x P0^2
= 0.33 + 0.34 x 0.33 + 0.33 x (0.33)^2
= 0.478137

in matlab generate n random numbers between 0 and 1 that sum of them is less equal than one

I want to generate n random numbers between 0 and 1 that sum of them is less equal than one.
Sum(n random number between 0 and 1) <= 1
n?
For example: 3 random number between 0 and 1:
0.2 , 0.3 , 0.4
0.2 + 0.3 + 0.4 = 0.9 <=1
It sounds like you would need to generate the numbers separately while keeping track of the previous numbers. We'll use your example:
Generate the first number between 0 and 1 = 0.2
1.0 - 0.2 = 0.8: Generate the next number between 0 and 0.8 = 0.3
0.8 - 0.3 = 0.5: Generate the next number between 0 and 0.5 = 0.4

How to implement branch selection based on probability?

I want the program to choose something with a set probability. For example, there is 0.312 probability of choosing path A and 0.688 probability of choosing path B. The only way I can think is a naive way to select randomly from the interval 0-1 and checking if <=0.312. Is there some better approach that extends to more than 2 elements?
Following is a way to do it with more efficiently than multiple if else statements: -
Suppose
a = 0.2, b = 0.35, c = 0.15, d = 0.3.
Make an array where p[0] corresponds to a and p[1] corresponds to b and so on
run a loop evaluating sum of probabilties
p[0] = 0.2
p[1] = 0.2 + 0.35 = 0.55
p[2] = 0.55 + 0.15 = 0.70
p[3] = 0.70 + 0.30 = 1
Generate a random number in [0,1]. Do binary search on p for random number. The interval that search returns will be your branch
eg.
random no = 0.6
result = binarySearch(0.6)
result = 2 using above intervals
2 => branch c

Looking for better algorithm to solve this kind of a probability/combinatorics game

Lets say we have the 10 integers from 1 to 10.
We also have some players who are each given a different random number from this set.
Now players start to say information about his or her number by saying: my number is in a subset of initial 1 to 10 set. For example my number is 8,9 or 10.
We want to make assumptions about number of players who didn't say anything yet (of-course its same assumption about each silent player given initial information)
Lets say we have 5 players and the first 3 players said one by one:
mine is 8, 9 or 10
mine is 7 or 6
mine is 7, 6, 9 or 10
Now we need to calculate what are the odds (probability) that next player has a specific number, like what are the odds that next player has a number in 7.
Its just an example of course and information can be given in any form by each player (like 1 or 10, 1 through 10 etc)
Is this some kind of well known problem or maybe someone sees a good approach?
I really want this to be performant, so bruteforcing isn't good. I am thinking it could be directly connected to Bayes theorem but not 100% sure it can be applied here.
EXAMPLE:
Simple case 2 players and 12345 numbers. First player has 4 or 5.
Then for second player he has 25% to have 1, but only 12.5% to have 4 because there are 2 possible outcomes after first players says information about his hand.
1234 or 1235, we can see that 1 is (1/4 * 2) /2 =1/4 and 4 is (1/4 * 1) / 2= 1/8
This is what I call a brute force solution, compute all possible combinations and derive number probability by analyzing each of them.
UPDATE
Solution suggested by Mr.Wizard works.
Here is the code if your curious how it looks:
class Program
{
static void Main()
{
int length = 5;
double[][] matrix = new double[length][];
for (int i = 0; i < length; i++) {
matrix[i] = new double[length];
}
for (int i = 0; i < length; i++) {
for (int j = 0; j < length; j++) {
matrix[i][j] = 1;
}
}
matrix[0] = new double[] { 0, 0, 0, 1, 1 };
matrix[1] = new double[] { 0, 0, 1, 1, 0 };
matrix[2] = new double[] { 0, 0, 0, 0, 1 };
DumpMatrix(matrix);
while(true)
{
NormalizeColumns(matrix);
DumpMatrix(matrix);
NormalizeRows(matrix);
DumpMatrix(matrix);
Console.ReadLine();
}
}
private static void NormalizeRows(double[][] matrix)
{
for (int i = 0; i < matrix.Length; i++)
{
double sum = matrix[i].Sum();
for (int j = 0; j < matrix.Length; j++) {
matrix[i][j] = matrix[i][j] / sum;
}
}
}
private static void NormalizeColumns(double[][] matrix)
{
for (int j = 0; j < matrix.Length; j++)
{
double columnSum = 0;
for (int i = 0; i < matrix.Length; i++)
{
columnSum += matrix[i][j];
}
for (int i = 0; i < matrix.Length; i++) {
matrix[i][j] = matrix[i][j] / columnSum;
}
}
}
private static void DumpMatrix(double[][] matrix)
{
for (int i = 0; i < matrix.Length; i++) {
for (int j = 0; j < matrix.Length; j++) {
Console.Write(matrix[i][j].ToString("0.#####").PadRight(8));
}
Console.WriteLine();
}
Console.WriteLine();
}
}
Although from this example its pretty clear that its approaching final results not very fast.
Here player 3 has exactly 5, players one and two can have 4 or 5 and 3 or 4 respectively, which means that player one has 4 cause player 3 got 5 and player 2 has 3 cause player 2 got 4. But we approach 1 value for players 1 and 2 in matching column after many many iterations.
Try constructing a graph with players on one side and numbers on the other. There is an edge between a player and a number if and only if the player could have that number based on what they've said. You want, for each edge, the probability that a uniform random perfect matching contains that edge.
Unfortunately, if this problem has an exact polynomial-time algorithm, then #P, a class which contains NP (and in fact the entire polynomial hierarchy, by Toda's theorem), is equal to P.
It is possible, in theory at least, to estimate the probability, via a complicated algorithm due to Jerrum, Sinclair, and Vigoda. I'm not sure anyone has ever implemented that algorithm.
You should build a probability tree diagram.
For your example:
__
|___ 0.5 A=4 __
| |___ 0.25 B=1
| |___ 0.25 B=2
| |___ 0.25 B=3
| |___ 0.25 B=5
|___ 0.5 A=5 __
|___ 0.25 B=1
|___ 0.25 B=2
|___ 0.25 B=3
|___ 0.25 B=4
The tree represents statements such as p(B=1|A=4)=0.25
So
p(B=1|A=4 or A=5)= p(B=1|A=4)+p(B=1|A=5)= 0.5*0.25 + 0.5*0.25= 0.25
and
p(B=4|A=4 or A=5)= p(B=4|A=4)+p(B=4|A=5)= 0 + 0.5*0.25= 0.125
You can dynamically expend the tree at any stage of the game and calculate the probability for each assumption accordingly.
I believe that for the general case there are no shortcuts.
I may be well off the mark here, but I think that a process of repetitive normalization of each row and column will converge on the correct values.
If you start with a n*n matrix with zeros representing what cannot be, and ones representing what can, for your example:
0 0 0 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
Meaning, that row #1, representing player #1, can be only either 4 or 5, and nothing else is known. Then, if we normalize each row such that it sums to 1, we get:
0. 0. 0. 0.5 0.5
0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2
0.2 0.2 0.2 0.2 0.2
And then, for each column:
0. 0. 0. 0.384615 0.384615
0.25 0.25 0.25 0.153846 0.153846
0.25 0.25 0.25 0.153846 0.153846
0.25 0.25 0.25 0.153846 0.153846
0.25 0.25 0.25 0.153846 0.153846
Repeat this process 15 times, and we get:
0. 0. 0. 0.5 0.5
0.25 0.25 0.25 0.125 0.125
0.25 0.25 0.25 0.125 0.125
0.25 0.25 0.25 0.125 0.125
0.25 0.25 0.25 0.125 0.125
If the original parameters are not impossible, then each row and column in the final matrix should sum to ~= 1.
I offer no proof that this is correct, but if you already have a working brute force implementation it should be easy to check for correlation of results.
Not sure about the exact calculation right now, but i think you can do it easier than with a complete tree.
In the easy example:
There are 5 Numbers and you want to know the probability for B to have a 3:
There are always 4 possible numbers because one is already taken
In no case the 3 can be taken, so whatever A does, the 3 is always one of the 4 numbers.
From that statements we can directly say that the probability is 1/4 = 25%
For a 1 and 2 it is the same, and for 4 and 5 you have only a 50% chance of the number beeing in the pool, so it reduces to 0.25*0.5 = 0.125
For the bigger Example:
1 to 10, 5 players as you stated above.
Now say you want to know the possibility of a 6.
Both Players that did not say anything have the same probabilities.
One said he has a 6 with 25% and One said he has a 6 with 50%
Im not sure right now how exactly that is done, but you can now calculate the probability of "one of them has a 6". As one has 50% and the other 25% add to it, it should be like 60% or something. (not just add them... two times 50% is a good chance, but no sure hit).
Lets just assume it is 60% for this example. Now we have 10 Numbers, of which 3 are taken which leaves us 7 to choose from = 1/7 ~ 14%.
So for any number which is available we have 14%. But now the 6 is in the pool only 40% of the time, so I think we have 0.14 * 0.4 = 0.056 which means 5.6% that we have a 6.
Whatever information you have, you can calculate the probability of the number you want to know about to be taken, and the probability of hitting exactly the one of X left and multiply them.

Resources