I need code for the ranking selection method on a genetic algorithm.
I have create roulette and tournament selections method but now I need ranking and I am stuck.
My roulette code is here (I am using atom struct for genetic atoms) :
const int roulette (const atom *f)
{
int i;
double sum, sumrnd;
sum = 0;
for (i = 0; i < N; i++)
sum += f[i].fitness + OFFSET;
sumrnd = rnd () * sum;
sum = 0;
for (i = 0; i < N; i++) {
sum += f[i].fitness + OFFSET;
if (sum > sumrnd)
break;
}
return i;
}
Where atom :
typedef struct atom
{
int geno[VARS];
double pheno[VARS];
double fitness;
} atom;
Rank selection is easy to implement when you already know on roulette wheel selection. Instead of using the fitness as probability for getting selected you use the rank. So for a population of N solutions the best solution gets rank N, the second best rank N-1, etc. The worst individual has rank 1. Now use the roulette wheel and start selecting.
The probability for the best individual to be selected is N/( (N * (N+1))/2 ) or roughly 2 / N, for the worst individual it is 2 / (N*(N+1)) or roughly 2 / N^2.
This is called linear rank selection, because the ranks form a linear progression. You can also think of ranks forming a geometric progression, such as e.g 1 / 2^n where n is ranging from 1 for the best individual to N for the worst. This of course gives much higher probability to the best individual.
You can look at the implementation of some selection methods in HeuristicLab.
My code of Rank Selection in MatLab:
NewFitness=sort(Fitness);
NewPop=round(rand(PopLength,IndLength));
for i=1:PopLength
for j=1:PopLength
if(NewFitness(i)==Fitness(j))
NewPop(i,1:IndLength)=CurrentPop(j,1:IndLength);
break;
end
end
end
CurrentPop=NewPop;
ProbSelection=zeros(PopLength,1);
CumProb=zeros(PopLength,1);
for i=1:PopLength
ProbSelection(i)=i/PopLength;
if i==1
CumProb(i)=ProbSelection(i);
else
CumProb(i)=CumProb(i-1)+ProbSelection(i);
end
end
SelectInd=rand(PopLength,1);
for i=1:PopLength
flag=0;
for j=1:PopLength
if(CumProb(j)<SelectInd(i) && CumProb(j+1)>=SelectInd(i))
SelectedPop(i,1:IndLength)=CurrentPop(j+1,1:IndLength);
flag=1;
break;
end
end
if(flag==0)
SelectedPop(i,1:IndLength)=CurrentPop(1,1:IndLength);
end
end
I've made a template genetic-algorithm class in C++.
My library of genetic algorithm is separated from GeneticAlgorithm and GAPopulation. Those are all template classes so that you can see its origin code in API Documents.
Here are source codes and API documents.
http://samchon.github.io/framework/api/cpp/d5/d28/classsamchon_1_1library_1_1GeneticAlgorithm.html
http://samchon.github.io/framework/api/cpp/d8/dcd/classsamchon_1_1library_1_1GAPopulation.html
Related
My problem is as follows:
Given a number of 2n points, I can calculate the distance between all points
and get a symmetrical matrix.
Can you create n pairs of points, so that the sum of the distance of all pairs is
minimal?
EDIT: Every point has to be in one of the pairs. Which means that
every point is only allowed to be in one pair.
I have naively tried to use the Hungarian algorithm and hoped that it may give me an assignment, so that the assignments are symmetrical. But that obviously did not work, as I do not have a bipartite graph.
After a search, I found the Stable roommates problem, which seems to be similar to my problem, but the difference is, that it just tries to find a matching, but not to try to minimize some kind of distance.
Does anyone know a similar problem or even a solution? Did I miss something? The problem does actually not seem that difficult, but I just could not come up with an optimal solution.
There's a primal-dual algorithm due to Edmonds (the Blossom algorithm), which you really don't want to implement yourself if possible. Vladimir Kolmogorov has an implementation that may be suitable for your purposes.
Try network-flow. The max flow is the number of the pairs you want to create. And calculate the min cost of it.
now this isn't a guarantee but just a hunch.
you can find the shortest pair, match them, and remove it from the set.
and recurse until you have no pairs left.
It is clearly sub-optimal. but I have a hunch that the ratio of just how sub-optimal this is to the absolutely optimal solution can be bounded. The hope is to use some sub-modularity argument and bound it to something like (1 - 1 / e) fraction of the global optimal, but I wasn't able to do it. Maybe someone could take a stab at it.
There is a C++ memoization implementation in Competitive Programming 3 as follows (note maximum of N was 8):
#include <algorithm>
#include <cmath>
#include <cstdio>
#include <cstring>
using namespace std;
int N, target;
double dist[20][20], memo[1<<16];
double matching(int bitmask)
{
if (memo[bitmask] > -0.5) // Already computed? Then return the result if yes
return memo[bitmask];
if (bitmask == target) // If all students are already matched then cost is zero
return memo[bitmask] = 0;
double ans = 2000000000.0; // Infinity could also work
int p1, p2;
for (p1 = 0; p1 < 2*N; ++p1) // Find first non-matched point
if (!(bitmask & (1 << p1)))
break;
for (p2 = p1 + 1; p2 < 2*N; ++p2) // and pair it with another non-matched point
if (!(bitmask & (1 << p2)))
ans = min(ans, dist[p1][p2]+matching(bitmask| (1 << p1) | (1 << p2)));
return memo[bitmask] = ans;
}
and then the main method (driving code)
int main()
{
int i,j, caseNo = 1, x[20], y[20];
while(scanf("%d", &N), N){
for (i = 0; i < 2 * N; ++i)
scanf("%d %d", &x[i], &y[i]);
for (i = 0; i < 2*N - 1; ++i)
for (j = i + 1; j < 2*N; ++j)
dist[i][j] = dist[j][i] = hypot(x[i]-x[j], y[i]-y[j]);
// use DP to solve min weighted perfect matching on small general graph
for (i = 0; i < (1 << 16); ++i) memo[i] = -1;
target = (1 << (2 * N)) - 1;
printf("Case %d: %.2lf", caseNo++, matching(0));
}
return 0;
}
I have been thinking on how my binary search can be optimized. The code follows.
What I have done so far:
All I could think of was in terms of handling different inputs.
Optimized worst case(one of the worst cases) when element being searched is out of bounds, i.e. searching a number lower than the lowest or higher than the highest. This saves O(logn) comparisions when it is a guarantee it won't be found in the input.
int input[15] = {1,2,2,3,4,5,5,5,5,6,7,8,8,9,10};
/*
* Returns index p if a[p] = value else -ve int.
* value is value being searched in input.
*/
int
binary_search (int * input, int low, int high, int value)
{
int mid = 0;
if (!input) return -1;
if (low > high) return -1;
/* optimize worst case: value not in input */
if ((value < input[low]) || (value > input[high]))
{ return -2; }
mid = (low + high)/2;
if (input[mid] == value) {
return mid;
}
if (input[mid] > value) {
return binary_search(input, low, mid -1, value);
} else {
return binary_search(input, mid+1, high, value);
}
}
Another worst case I can think of is when value being searched is next to the mid of the input or the first element. I think more generalized is the lower / higher bounds of input to each call of binary_search. This also requires the algorithm to take exact logn comparisons.
Any other suggestions on what other areas I can focus on improving. I don't need the code but a direction would be helpful. Thanks.
Jon Bentley's Programming Pearls has a nice chapter on optimizing binary search. See Chapter 4 in http://www.it.iitb.ac.in/~deepak/deepak/placement/Programming_pearls.pdf
One of the variants is amazingly efficient (see page 87 in the chapter on "Code Tuning"):
# Search a 1000-element array
l = 0
if x[512] < t: l = 1000 + 1 - 512
if x[l+256] < t: l += 256
if x[l+128] < t: l += 128
if x[l+64] < t: l += 64
if x[l+32] < t: l += 32
if x[l+16] < t: l += 16
if x[l+8] < t: l += 8
if x[l+4] < t: l += 4
if x[l+2] < t: l += 2
if x[l+1] < t: l += 1
p = l + 1
if p > 1000 or x[p] != t:
p = 0 # Not Found
An optimization of the sort you're considering -- handling a special case -- will inevitably make you spend more time in the OTHER cases. Your "worst case" optimizations have made them into the best cases, but at the cost of creating other worst cases. And in this instance you've made two cases into "best cases", and n/2 cases into "worst cases" which previously were not. You've slowed everything else down.
(Especially in this instance, because you're checking for too low / too high on every single recursion.)
If you actually expect -- in your particular use case -- that the search will mostly be searching for values that are too low or too high, this might be a good idea. As a general rule of thumb, though, the fastest implementation is the simplest one.
Let's assume that we have a pair of numbers (a, b). We can get a new pair (a + b, b) or (a, a + b) from the given pair in a single step.
Let the initial pair of numbers be (1,1). Our task is to find number k, that is, the least number of steps needed to transform (1,1) into the pair where at least one number equals n.
I solved it by finding all the possible pairs and then return min steps in which the given number is formed, but it taking quite long time to compute.I guess this must be somehow related with finding gcd.can some one please help or provide me some link for the concept.
Here is the program that solved the issue but it is not cleat to me...
#include <iostream>
using namespace std;
#define INF 1000000000
int n,r=INF;
int f(int a,int b){
if(b<=0)return INF;
if(a>1&&b==1)return a-1;
return f(b,a-a/b*b)+a/b;
}
int main(){
cin>>n;
for(int i=1;i<=n/2;i++){
r=min(r,f(n,i));
}
cout<<(n==1?0:r)<<endl;
}
My approach to such problems(one I got from projecteuler.net) is to calculate the first few terms of the sequence and then search in oeis for a sequence with the same terms. This can result in a solutions order of magnitude faster. In your case the sequence is probably: http://oeis.org/A178031 but unfortunately it has no easy to use formula.
:
As the constraint for n is relatively small you can do a dp on the minimum number of steps required to get to the pair (a,b) from (1,1). You take a two dimensional array that stores the answer for a given pair and then you do a recursion with memoization:
int mem[5001][5001];
int solve(int a, int b) {
if (a == 0) {
return mem[a][b] = b + 1;
}
if (mem[a][b] != -1) {
return mem[a][b];
}
if (a == 1 && b == 1) {
return mem[a][b] = 0;
}
int res;
if (a > b) {
swap(a,b);
}
if (mem[a][b%a] == -1) { // not yet calculated
res = solve(a, b%a);
} else { // already calculated
res = mem[a][b%a];
}
res += b/a;
return mem[a][b] = res;
}
int main() {
memset(mem, -1, sizeof(mem));
int n;
cin >> n;
int best = -1;
for (int i = 1; i <= n; ++i) {
int temp = solve(n, i);
if (best == -1 || temp < best) {
best = temp;
}
}
cout << best << endl;
}
In fact in this case there is not much difference between dp and BFS, but this is the general approach to such problems. Hope this helps.
EDIT: return a big enough value in the dp if a is zero
You can use the breadth first search algorithm to do this. At each step you generate all possible NEXT steps that you havent seen before. If the set of next steps contains the result you're done if not repeat. The number of times you repeat this is the minimum number of transformations.
First of all, the maximum number you can get after k-3 steps is kth fibinocci number. Let t be the magic ratio.
Now, for n start with (n, upper(n/t) ).
If x>y:
NumSteps(x,y) = NumSteps(x-y,y)+1
Else:
NumSteps(x,y) = NumSteps(x,y-x)+1
Iteratively calculate NumSteps(n, upper(n/t) )
PS: Using upper(n/t) might not always provide the optimal solution. You can do some local search around this value for the optimal results. To ensure optimality you can try ALL the values from 0 to n-1, in which worst case complexity is O(n^2). But, if the optimal value results from a value close to upper(n/t), the solution is O(nlogn)
I came across this problem during an interview forum.,
Given an int array which might contain duplicates, find the largest subset of it which form a sequence.
Eg. {1,6,10,4,7,9,5}
then ans is 4,5,6,7
Sorting is an obvious solution. Can this be done in O(n) time.
My take on the problem is that this cannot be done O(n) time & the reason is that if we could do this in O(n) time we could do sorting in O(n) time also ( without knowing the upper bound).
As a random array can contain all the elements in sequence but in random order.
Does this sound a plausible explanation ? your thoughts.
I believe it can be solved in O(n) if you assume you have enough memory to allocate an uninitialized array of a size equal to the largest value, and that allocation can be done in constant time. The trick is to use a lazy array, which gives you the ability to create a set of items in linear time with a membership test in constant time.
Phase 1: Go through each item and add it to the lazy array.
Phase 2: Go through each undeleted item, and delete all contiguous items.
In phase 2, you determine the range and remember it if it is the largest so far. Items can be deleted in constant time using a doubly-linked list.
Here is some incredibly kludgy code that demonstrates the idea:
int main(int argc,char **argv)
{
static const int n = 8;
int values[n] = {1,6,10,4,7,9,5,5};
int index[n];
int lists[n];
int prev[n];
int next_existing[n]; //
int prev_existing[n];
int index_size = 0;
int n_lists = 0;
// Find largest value
int max_value = 0;
for (int i=0; i!=n; ++i) {
int v=values[i];
if (v>max_value) max_value=v;
}
// Allocate a lazy array
int *lazy = (int *)malloc((max_value+1)*sizeof(int));
// Set items in the lazy array and build the lists of indices for
// items with a particular value.
for (int i=0; i!=n; ++i) {
next_existing[i] = i+1;
prev_existing[i] = i-1;
int v = values[i];
int l = lazy[v];
if (l>=0 && l<index_size && index[l]==v) {
// already there, add it to the list
prev[n_lists] = lists[l];
lists[l] = n_lists++;
}
else {
// not there -- create a new list
l = index_size;
lazy[v] = l;
index[l] = v;
++index_size;
prev[n_lists] = -1;
lists[l] = n_lists++;
}
}
// Go through each contiguous range of values and delete them, determining
// what the range is.
int max_count = 0;
int max_begin = -1;
int max_end = -1;
int i = 0;
while (i<n) {
// Start by searching backwards for a value that isn't in the lazy array
int dir = -1;
int v_mid = values[i];
int v = v_mid;
int begin = -1;
for (;;) {
int l = lazy[v];
if (l<0 || l>=index_size || index[l]!=v) {
// Value not in the lazy array
if (dir==1) {
// Hit the end
if (v-begin>max_count) {
max_count = v-begin;
max_begin = begin;
max_end = v;
}
break;
}
// Hit the beginning
begin = v+1;
dir = 1;
v = v_mid+1;
}
else {
// Remove all the items with value v
int k = lists[l];
while (k>=0) {
if (k!=i) {
next_existing[prev_existing[l]] = next_existing[l];
prev_existing[next_existing[l]] = prev_existing[l];
}
k = prev[k];
}
v += dir;
}
}
// Go to the next existing item
i = next_existing[i];
}
// Print the largest range
for (int i=max_begin; i!=max_end; ++i) {
if (i!=max_begin) fprintf(stderr,",");
fprintf(stderr,"%d",i);
}
fprintf(stderr,"\n");
free(lazy);
}
I would say there are ways to do it. The algorithm is the one you already describe, but just use a O(n) sorting algorithm. As such exist for certain inputs (Bucket Sort, Radix Sort) this works (this also goes hand in hand with your argumentation why it should not work).
Vaughn Cato suggested implementation is working like this (its working like a bucket sort with the lazy array working as buckets-on-demand).
As shown by M. Ben-Or in Lower bounds for algebraic computation trees, Proc. 15th ACM Sympos. Theory Comput., pp. 80-86. 1983 cited by J. Erickson in pdf Finding Longest Arithmetic Progressions, this problem cannot be solved in less than O(n log n) time (even if the input is already sorted into order) when using an algebraic decision tree model of computation.
Earlier, I posted the following example in a comment to illustrate that sorting the numbers does not provide an easy answer to the question: Suppose the array is given already sorted into ascending order. For example, let it be (20 30 35 40 47 60 70 80 85 95 100). The longest sequence found in any subsequence of the input is 20,40,60,80,100 rather than 30,35,40 or 60,70,80.
Regarding whether an O(n) algebraic decision tree solution to this problem would provide an O(n) algebraic decision tree sorting method: As others have pointed out, a solution to this subsequence problem for a given multiset does not provide a solution to a sorting problem for that multiset. As an example, consider set {2,4,6,x,y,z}. The subsequence solver will give you the result (2,4,6) whenever x,y,z are large numbers not in arithmetic sequence, and it will tell you nothing about the order of x,y,z.
What about this? populate a hash-table so each value stores the start of the range seen so far for that number, except for the head element that stores the end of the range. O(n) time, O(n) space. A tentative Python implementation (you could do it with one traversal keeping some state variables, but this way seems more clear):
def longest_subset(xs):
table = {}
for x in xs:
start = table.get(x-1, x)
end = table.get(x+1, x)
if x+1 in table:
table[end] = start
if x-1 in table:
table[start] = end
table[x] = (start if x-1 in table else end)
start, end = max(table.items(), key=lambda pair: pair[1]-pair[0])
return list(range(start, end+1))
print(longest_subset([1, 6, 10, 4, 7, 9, 5]))
# [4, 5, 6, 7]
here is a un-optimized O(n) implementation, maybe you will find it useful:
hash_tb={}
A=[1,6,10,4,7,9,5]
for i in range(0,len(A)):
if not hash_tb.has_key(A[i]):
hash_tb[A[i]]=A[i]
max_sq=[];cur_seq=[]
for i in range(0,max(A)):
if hash_tb.has_key(i):
cur_seq.append(i)
else:
if len(cur_seq)>len(max_sq):
max_sq=cur_seq
cur_seq=[]
print max_sq
Can anyone provide some pseudo code for a roulette selection function? How would I implement this:
I don't really understand how to read this math notation. I never took any probability or statistics.
It's been a few years since i've done this myself, however the following pseudo code was found easily enough on google.
for all members of population
sum += fitness of this individual
end for
for all members of population
probability = sum of probabilities + (fitness / sum)
sum of probabilities += probability
end for
loop until new population is full
do this twice
number = Random between 0 and 1
for all members of population
if number > probability but less than next probability
then you have been selected
end for
end
create offspring
end loop
The site where this came from can be found here if you need further details.
Lots of correct solutions already, but I think this code is clearer.
def select(fs):
p = random.uniform(0, sum(fs))
for i, f in enumerate(fs):
if p <= 0:
break
p -= f
return i
In addition, if you accumulate the fs, you can produce a more efficient solution.
cfs = [sum(fs[:i+1]) for i in xrange(len(fs))]
def select(cfs):
return bisect.bisect_left(cfs, random.uniform(0, cfs[-1]))
This is both faster and it's extremely concise code. STL in C++ has a similar bisection algorithm available if that's the language you're using.
The pseudocode posted contained some unclear elements, and it adds the complexity of generating offspring in stead of performing pure selection. Here is a simple python implementation of that pseudocode:
def roulette_select(population, fitnesses, num):
""" Roulette selection, implemented according to:
<http://stackoverflow.com/questions/177271/roulette
-selection-in-genetic-algorithms/177278#177278>
"""
total_fitness = float(sum(fitnesses))
rel_fitness = [f/total_fitness for f in fitnesses]
# Generate probability intervals for each individual
probs = [sum(rel_fitness[:i+1]) for i in range(len(rel_fitness))]
# Draw new population
new_population = []
for n in xrange(num):
r = rand()
for (i, individual) in enumerate(population):
if r <= probs[i]:
new_population.append(individual)
break
return new_population
This is called roulette-wheel selection via stochastic acceptance:
/// \param[in] f_max maximum fitness of the population
///
/// \return index of the selected individual
///
/// \note Assuming positive fitness. Greater is better.
unsigned rw_selection(double f_max)
{
for (;;)
{
// Select randomly one of the individuals
unsigned i(random_individual());
// The selection is accepted with probability fitness(i) / f_max
if (uniform_random_01() < fitness(i) / f_max)
return i;
}
}
The average number of attempts needed for a single selection is:
τ = fmax / avg(f)
fmax is the maximum fitness of the population
avg(f) is the average fitness
τ doesn't depend explicitly on the number of individual in the population (N), but the ratio can change with N.
However in many application (where the fitness remains bounded and the average fitness doesn't diminish to 0 for increasing N) τ doesn't increase unboundedly with N and thus a typical complexity of this algorithm is O(1) (roulette wheel selection using search algorithms has O(N) or O(log N) complexity).
The probability distribution of this procedure is indeed the same as in the classical roulette-wheel selection.
For further details see:
Roulette-wheel selection via stochastic acceptance (Adam Liposki, Dorota Lipowska - 2011)
Here is some code in C :
// Find the sum of fitnesses. The function fitness(i) should
//return the fitness value for member i**
float sumFitness = 0.0f;
for (int i=0; i < nmembers; i++)
sumFitness += fitness(i);
// Get a floating point number in the interval 0.0 ... sumFitness**
float randomNumber = (float(rand() % 10000) / 9999.0f) * sumFitness;
// Translate this number to the corresponding member**
int memberID=0;
float partialSum=0.0f;
while (randomNumber > partialSum)
{
partialSum += fitness(memberID);
memberID++;
}
**// We have just found the member of the population using the roulette algorithm**
**// It is stored in the "memberID" variable**
**// Repeat this procedure as many times to find random members of the population**
From the above answer, I got the following, which was clearer to me than the answer itself.
To give an example:
Random(sum) :: Random(12)
Iterating through the population, we check the following: random < sum
Let us chose 7 as the random number.
Index | Fitness | Sum | 7 < Sum
0 | 2 | 2 | false
1 | 3 | 5 | false
2 | 1 | 6 | false
3 | 4 | 10 | true
4 | 2 | 12 | ...
Through this example, the most fit (Index 3) has the highest percentage of being chosen (33%); as the random number only has to land within 6->10, and it will be chosen.
for (unsigned int i=0;i<sets.size();i++) {
sum += sets[i].eval();
}
double rand = (((double)rand() / (double)RAND_MAX) * sum);
sum = 0;
for (unsigned int i=0;i<sets.size();i++) {
sum += sets[i].eval();
if (rand < sum) {
//breed i
break;
}
}
Prof. Thrun of Stanford AI lab also presented a fast(er?) re-sampling code in python during his CS373 of Udacity. Google search result led to the following link:
http://www.udacity-forums.com/cs373/questions/20194/fast-resampling-algorithm
Hope this helps
Here's a compact java implementation I wrote recently for roulette selection, hopefully of use.
public static gene rouletteSelection()
{
float totalScore = 0;
float runningScore = 0;
for (gene g : genes)
{
totalScore += g.score;
}
float rnd = (float) (Math.random() * totalScore);
for (gene g : genes)
{
if ( rnd>=runningScore &&
rnd<=runningScore+g.score)
{
return g;
}
runningScore+=g.score;
}
return null;
}
Roulette Wheel Selection in MatLab:
TotalFitness=sum(Fitness);
ProbSelection=zeros(PopLength,1);
CumProb=zeros(PopLength,1);
for i=1:PopLength
ProbSelection(i)=Fitness(i)/TotalFitness;
if i==1
CumProb(i)=ProbSelection(i);
else
CumProb(i)=CumProb(i-1)+ProbSelection(i);
end
end
SelectInd=rand(PopLength,1);
for i=1:PopLength
flag=0;
for j=1:PopLength
if(CumProb(j)<SelectInd(i) && CumProb(j+1)>=SelectInd(i))
SelectedPop(i,1:IndLength)=CurrentPop(j+1,1:IndLength);
flag=1;
break;
end
end
if(flag==0)
SelectedPop(i,1:IndLength)=CurrentPop(1,1:IndLength);
end
end
Okay, so there are 2 methods for roulette wheel selection implementation: Usual and Stochastic Acceptance one.
Usual algorithm:
# there will be some amount of repeating organisms here.
mating_pool = []
all_organisms_in_population.each do |organism|
organism.fitness.times { mating_pool.push(organism) }
end
# [very_fit_organism, very_fit_organism, very_fit_organism, not_so_fit_organism]
return mating_pool.sample #=> random, likely fit, parent!
Stochastic Acceptance algorithm:
max_fitness_in_population = all_organisms_in_population.sort_by(:fitness)[0]
loop do
random_parent = all_organisms_in_population.sample
probability = random_parent.fitness/max_fitness_in_population * 100
# if random_parent's fitness is 90%,
# it's very likely that rand(100) is smaller than it.
if rand(100) < probability
return random_parent #=> random, likely fit, parent!
else
next #=> or let's keep on searching for one.
end
end
You can choose either, they will be returning identical results.
Useful resources:
http://natureofcode.com/book/chapter-9-the-evolution-of-code - a beginner-friendly and clear chapter on genetic algorithms. explains roulette wheel selection as a bucket of wooden letters (the more As you put in - the great is the chance of picking an A, Usual algorithm).
https://en.wikipedia.org/wiki/Fitness_proportionate_selection - describes Stochastic Acceptance algorithm.
Based on my research ,Here is another implementation in C# if there is a need for it:
//those with higher fitness get selected wit a large probability
//return-->individuals with highest fitness
private int RouletteSelection()
{
double randomFitness = m_random.NextDouble() * m_totalFitness;
int idx = -1;
int mid;
int first = 0;
int last = m_populationSize -1;
mid = (last - first)/2;
// ArrayList's BinarySearch is for exact values only
// so do this by hand.
while (idx == -1 && first <= last)
{
if (randomFitness < (double)m_fitnessTable[mid])
{
last = mid;
}
else if (randomFitness > (double)m_fitnessTable[mid])
{
first = mid;
}
mid = (first + last)/2;
// lies between i and i+1
if ((last - first) == 1)
idx = last;
}
return idx;
}
This Swift 4 array extension implements weighted random selection, a.k.a Roulette selection from its elements:
public extension Array where Element == Double {
/// Consider the elements as weight values and return a weighted random selection by index.
/// a.k.a Roulette wheel selection.
func weightedRandomIndex() -> Int {
var selected: Int = 0
var total: Double = self[0]
for i in 1..<self.count { // start at 1
total += self[i]
if( Double.random(in: 0...1) <= (self[i] / total)) { selected = i }
}
return selected
}
}
For example given the two element array:
[0.9, 0.1]
weightedRandomIndex() will return zero 90% of the time and one 10% of the time.
Here is a more complete test:
let weights = [0.1, 0.7, 0.1, 0.1]
var results = [Int:Int]()
let n = 100000
for _ in 0..<n {
let index = weights.weightedRandomIndex()
results[index] = results[index, default:0] + 1
}
for (key,val) in results.sorted(by: { a,b in weights[a.key] < weights[b.key] }) {
print(weights[key], Double(val)/Double(n))
}
output:
0.1 0.09906
0.1 0.10126
0.1 0.09876
0.7 0.70092
This answer is basically the same as Andrew Mao's answer here:
https://stackoverflow.com/a/15582983/74975
Here is the code in python. This code can also handle the negative value of fitness.
from numpy import min, sum, ptp, array
from numpy.random import uniform
list_fitness1 = array([-12, -45, 0, 72.1, -32.3])
list_fitness2 = array([0.5, 6.32, 988.2, 1.23])
def get_index_roulette_wheel_selection(list_fitness=None):
""" It can handle negative also. Make sure your list fitness is 1D-numpy array"""
scaled_fitness = (list_fitness - min(list_fitness)) / ptp(list_fitness)
minimized_fitness = 1.0 - scaled_fitness
total_sum = sum(minimized_fitness)
r = uniform(low=0, high=total_sum)
for idx, f in enumerate(minimized_fitness):
r = r + f
if r > total_sum:
return idx
get_index_roulette_wheel_selection(list_fitness1)
get_index_roulette_wheel_selection(list_fitness2)
Make sure your fitness list is 1D-numpy array
Scaled the fitness list to the range [0, 1]
Transform maximum problem to minimum problem by 1.0 - scaled_fitness_list
Random a number between 0 and sum(minimizzed_fitness_list)
Keep adding element in minimized fitness list until we get the value greater than the total sum
You can see if the fitness is small --> it has bigger value in minimized_fitness --> It has a bigger chance to add and make the value greater than the total sum.
I wrote a version in C# and am really looking for confirmation that it is indeed correct:
(roulette_selector is a random number which will be in the range 0.0 to 1.0)
private Individual Select_Roulette(double sum_fitness)
{
Individual ret = new Individual();
bool loop = true;
while (loop)
{
//this will give us a double within the range 0.0 to total fitness
double slice = roulette_selector.NextDouble() * sum_fitness;
double curFitness = 0.0;
foreach (Individual ind in _generation)
{
curFitness += ind.Fitness;
if (curFitness >= slice)
{
loop = false;
ret = ind;
break;
}
}
}
return ret;
}