Genetic alghorithm / Linear-rank selector , the problem with meaning of the variables - genetic-algorithm

I have the equation of probability of choicing an induvidual in Linear-rank selector.
P(i) = (1/N) * (n− + (n+ − n− )* ( i−1 / N−1) )
tell me please , what is n- and n+ ?

n-/N is the probability of the worst individual to be selected and n+/N the probability of the best individual to be selected. As the population size is held constant, the conditions n+ = 2 - n- and n- >= 0 must be fulfilled. Note that all individuals get a different rank, i.e., a different selection probability, even if the have the same fitness value.
See A Comparison of Selection Schemes used in Evolutionary Algorithms (1997)

Related

Big-O complexity of calculation: Drawing a non-colliding subset of k elements from n total with dumb algorithm

I'm trying to understand the computational complexity of this pseudocode:
values is a set of n unique elements
subset is an empty set
for 0 ... k
X: randomly select a value from values
if value is in subset
goto X
else
insert value into subset
This is of course a (poor) algorithm for selecting a unique random subset of k elements from n, and I'm aware of the better choices, but I wanted to understand the computational complexity of this.
I can see easily that this is O(n) when duplicates are allowed because the conditional test is eliminated from the pseudocode and k choices are made each time.
When you have to account for duplicates, there is a probability that a re-test will be required which increases with each iteration. Depending on the values of n and k, this is a non-negligible fact, but I'm not certain how it affects the big-O complexity in a generalized way. Could someone explain this to me?
The probability of inserting value into subset for each value of k is (n-k)/n
The number of iterations of each k loop would be inversely proportional to that probability
Therefore Big O notation for each value of k would be O((n/(n-K)) + 1) where 1 would be 'insert value into subset'.
You have to calculate the summation of ((n/(n-K)) + 1) for each value of k ----final answer would be
O(((n/(n-K)) + 1) for k from 1 through k)
Disclaimer - this is assuming if Big(o) is applicable for functions that use random number generating algorithms (since X is random)

Algorithm for making two histograms proportional, minimizing units removed

Imagine you have two histograms with an equal number of bins. N observations are distributed among the bins. Each bin now has between 0 and N observations.
What algorithm would be appropriate for determining the minimum number of observations to remove from both histograms in order to make them proportional? They do not need to be equal in absolute number, only proportional to each other. That is, there must be a common factor by which all the bins in one histogram can be multiplied in order to make it equal to the other histogram.
For example, imagine the following two histograms, where the item i in each histogram refers to the number of observations in bin i for the respective histogram.
Histogram 1: 4, 7, 4, 9
Histogram 2: 2, 0, 2, 1
For these histograms, the solution would be to remove from histogram 1 all 7 observations in bin 2 and another 7 observations from bin 4, such that (histogram 1)*2 = histogram 2.
But what general algorithm could be used to find the subsets of the two histograms that maximized the number of total observations between them while making them proportional? You can drop observations from both histograms or just one.
Thanks!
Seems to me that the problem is equivalent (if you consider each histogram as a N-dimensional vector), to minimizing the Manhattan length |R|, where R=xA-B, A and B are your 'vectors' and x is your proportional scale.
|R| has a single minimum (not necessarily an integer) so you can find it fairly rapidly using a simple bisection algorithm (or something akin to Newton's method).
Then, assuming you want a solution where the proportion is an integer, test the two cases ceil(x), and floor(x), to find which has the smallest Manhattan length (and that is the number of observations you need to remove).
Proof that the problem is not NP-hard:
Consider an inefficient 'solution' whereby you removed all N observations from all the bins. Now both A and B are equal to the 'zero' histogram 0 = (0,0,0,...). The two histograms are equal and thus proportional as 0 = s * 0 for all proportional values s, so a hard maximum for the number of observations to remove is N.
Now assume a more efficient solution exists with assitions/removals < N and a proportional scale s > 2*N (i.e after removal of some observations A = N * B or B=N * A ). If both A = 0 and B = 0, we have the previous solution with N removals (which contradicts the assumption that there are less than N removals). If A = 0 and B ≠ 0 then there is no s <> 0 such that 0 = s * B and no s such that s * 0 = B (with a similar argument for B = 0 and S ≠ 0). So it must be the case that both A ≠ 0 and B ≠ 0. Assume for a moment that A is the histogram to be scaled (so A * s = B), A must have at least one non-zero entry A[i] with minimum value 1 (after removal of extra observations), so when scaled it will have minimum value ≥. Therefore the equivalent entry B[i] must also have at least 2*N observations. But the total number of observations was initially N, so we have needed to add at least N observations to B[i], which contradicts the assumption that the improved solution had less than N additions/removals. So no 'efficient' solution requires a proportional scale greater than N.
So to find an efficient solution requires, at worst, testing the 'best fit' solution for scaling factors in the range 0-N.
The 'best fit' solution for scaling factor s in A = s * B, where A and B have M bins each requires
Sum(i=1 to M) of { Abs(A[i]- s * B[i]) mod s + Abs(A[i]- s * B[i]) div s } additions/removals.
This is an order M operation, so to test for each scaling factor in the range 0-N will be an algorithm of order O(M*N)
I am fairly certain (but haven't got a formal proof), that the scale factor cannot exceed the number of observations in the most filled bin. In practice it is typically very much smaller. For two histograms with two hundred bins and randomly chosen 30-300 observations per bin: if there were Na > Nb total observations in all the bins of A and B respectively the scaling factor was either almost always found in the range Na/Nb-4 < s < Na/Nb + 4, (or s = 0 if Na >> Nb).

Arrangement of sequence of 'n' numbers

In how many ways can you arrange a sequence of 'n' (1 to n) numbers such that no number occurs at the index represent by its value?
For eg
1 can not be at first position
2 can not be at second position
.
.
n can not be at nth position
Please give a general solution. Also solve it for n=6.
Its not a homework.
You want fixed point free permutations, also known as derangements. The formula for their number is slightly more complicated than for the number of permutations that may have fixed points.
Let P(n) be the number of such arrangements for n numbers.
For 123456....n
Cases are of the form
2*****
3*****
4*****
5*****
.
.
n*****
Now 1 can be anywhere at the rest (n-1) positions.
If 1 is put at the position of the number replacing it...
21****
3*1***
4**1**
.
.
n****1
then first and the replaced numbers are fixed.
Then total cases = (n-1) * P(n-2)
Else if
1 is also restricted not to be at a particular position (positions in above cases)
Then total cases = (n-1) * P(n-1)
So
P(n) = (P(n-1) + P(n-2)) * (n-1)
with P(1) = 0
and P(2) = 1
The number of derangements (fixed point free permutations) of n things is round(n!/e) where e is the base of natural logarithms. Here round means nearest integer function. This is described in the Wikipedia article, but in a manner that could stand clarification.
For n = 6 one easily calculates there are round(264.87...) = 265 derangements.
In effect you've asked a frequently covered question from MathSE.

What is the probability that all priorities are unique for Permute-By-Sorting algorithm?

I hope someone can help me answer the following question. Thanks!
Here is a pseudo code of Permute-By-Sorting algorithm:
Permute-By-Sorting (A)
n = A.length
let P[1..n] be a new array
for i = 1 to n
P[i] = Random (1,n^3)
sort A, using P as sort keys
In the above algorithm, the array P represents the priorities of the elements in array A. Line 4 chooses a random number between 1 and n^3.
The question is what is the probability that all priorities in P are unique? and how do I get the probability?
To reconcile the answers already given: for choice i = 0, ..., n - 1, given that no duplicates have been chosen yet, there are n^3 - i non-duplicate choices of n^3 total for the ith value. Thus the probability is the product for i = 0, ..., n - 1 of (1 - i/n^3).
sdcwc is using a union bound to lowerbound this probability by 1 - O(1/n). This estimate turns out to be basically right. The proof sketch is that (1 - i/n^3) is exp(-i/n^3 + O(i^2/n^6)), so the product is exp(-O(n^2)/n^3 + O(n^-3)), which is greater than or equal to 1 - O(n^2)/n^3 + O(n^-3) = 1 - O(1/n). I'm sure the fine folks on math.SE would be happy to do this derivation "properly" for you.
Others have given you the probability calculation, but I think you may be asking the wrong question.
I assume the reason you're asking about the probability of the priorities being unique, and the reason for choosing n^3 in the first place, is because you're hoping they will be unique, and choosing a large range relative to n seems to be a reasonable way of achieving uniqueness.
It is much easier to ensure that the values are unique. Simply populate the array of priorities with the numbers 1 .. n and then shuffle them with the Fisher-Yates algorithm (aka algorithm P from The Art of Computer Programming, volume 2, Seminumerical Algorithms, by Donald Knuth).
The sort would then be carried out with known unique priority values.
(There are also other ways of going about getting a random permutation. It is possible to generate the nth lexicographic permutation of a sequence using factoradic numbers (or, the factorial number system), and so generate the permutation for a randomly chosen value in [1 .. n!].)
You are choosing n numbers from 1...n^3 and asking what is the probability that they are all unique.
There are (n^3) P n = (n^3)!/(n^3-n)! ways to choose the n numbers uniquely, and (n^3)^n ways to choose the n-numbers total.
So the probability of the numbers being unique is just the first equation divided by the second, which gives
n3!
--------------
(n3-n)! n3n
Let Aij be the event: i-th and j-th elements collide. Obviously P(Aij)=1/n3.
There is at most n2 pairs, therefore probability of at least one collision is at most 1/n.
If you are interested in exact thing, see BlueRaja's answer, but in randomized algorithms it is usually enough to give this type of bound.
So the sort part is irrelevant
Assuming the "Random" is real random, the probability is just
n^3!
----------------
(n^3-n)!n^(3n)

algorithm - How to sort a 0/1 array with 2n/3 comparisons?

In Algorithm Design Manual, there is such an excise
4-26 Consider the problem of sorting a sequence of n 0’s and 1’s using
comparisons. For each comparison of two values x and y, the algorithm
learns which of x < y, x = y, or x > y holds.
(a) Give an algorithm to sort in n − 1 comparisons in the worst case.
Show that your algorithm is optimal.
(b) Give an algorithm to sort in 2n/3 comparisons in the average case
(assuming each of the n inputs is 0 or 1 with equal probability). Show
that your algorithm is optimal.
For (a), I think it is fairly easy. I can choose a[n-1] as pivot, then do something like in quicksort partition, scan 0 to n - 2, find the middle point where left side is all 0 and right side is all 1, this take n - 1 comparisons.
But for (b), I can't get a clue. It says "each of the n inputs is 0 or 1 with equal probability", so I guess I can assume the numbers of 0 and 1 equal? But how can I get a result which is related to 1/3? divide the whole array into 3 groups?
Thanks
"0 or 1 with equal probability" is the condition for "average" case. Other cases may have worse timing.
Hint 1: 2/3 = 1/2 + 1/8 + 1/32 + 1/128 + ...
Hint 2: Consider the sequence as a sequence of pairs and compare the items in each pair. Half will return equal; half will not. Of the half that are unequal you know which item in the pair is 0 and which is 1, so those need no more comparisons.
No it means that at any position, you have the same chance (probability) of the input value being 0 or 1. this give you a first clue : your algorithm will be randomized.
The runtime will depend on some random variable, and you need to take the expected value to obtain the average complexity case. Note that in this case, you have to detail during complexity analysis, as they require a precise constant (2/3n rather than simply O(n))
Edit:
Hint. In the sorted array (the one you get at the end), what is only thing which varies, knowing you have only 2 possible elements.

Resources