algorithm to find consecutive wins (adjusted binomial distribution) - algorithm

Imagine I am having a roulette wheel and I want to feed my algorithm three integers
p := probability to win one game
m := number of times the wheel is spun
k := number of consecutive wins I am interested in
and I am interested in the probability P that after spinning the wheel m times I win at least k consecutive games.
Let's go through an example where m = 5 and k = 3 and let's say 1 is a win and 0 a loss.
1 1 1 1 1
0 1 1 1 1
1 1 1 1 0
1 1 1 0 0
0 1 1 1 0
0 0 1 1 1
So in my intention, this would be all solution to win at least 3 consecutive games. For every k, I have (m-k+1) possible winning outcomes.
First question, is this true? Or would also 1 1 1 0 1 and 1 0 1 1 1 be possible solutions?
Next, how would a handy computation for this problem look like? First, I thought about the binomial distribution to solve this problem, where I just iterate over all k:
\textstyle {n \choose k}\,p^{k}(1-p)^{n-k}
But this somehow doesn't guarantee to have consecutive wins. Is the binomial distribution somehow adjustable to produce the output P I am looking for?

the following is an option you might want to consider:
you generate an array of length m, the entries 0 ...m representing the probability that at the ith time you have k-consecutive 1s.
all slots until k have probability 0, no chance for k consecutive wins.
slot k has the probability p^k.
all positions afterwards are computed based on dynamic programming approach: each position i as of position k+1 is calculated: sum consisting of position i-1 plus (p^k * (1-p) * (1 - the probability at position (i-1-k).
This way you iterate through the array and you will have in the last position the probability for at least k-consecutive 1s.

Or would also 1 1 1 0 1 and 1 0 1 1 1 be possible solutions?
yes, they would according to win at least k consecutive games.
First, I thought about the binomial distribution to solve this problem, where I just iterate over all k: \textstyle {n \choose k}\,p^{k}(1-p)^{n-k}
that might work if you combine if with acceptance-rejection method. Generate outcome from binomial, check for at least k winnings, accept if yes, drop otherwise.
Is the binomial distribution somehow adjustable to produce the output P I am looking for?
Frankly, I would take a look at Geometric distribution, which descirbes number of successful wins before loss (or vice versa).

Related

Naive shuffling algorithm probability analysis [duplicate]

I implemented the shuffling algorithm as:
import random
a = range(1, n+1) #a containing element from 1 to n
for i in range(n):
j = random.randint(0, n-1)
a[i], a[j] = a[j], a[i]
As this algorithm is biased. I just wanted to know for any n(n ≤ 17), is it possible to find that which permutation have the highest probablity of occuring and which permutation have least probablity out of all possible n! permutations. If yes then what is that permutation??
For example n=3:
a = [1,2,3]
There are 3^3 = 27 possible shuffle
No. occurence of different permutations:
1 2 3 = 4
3 1 2 = 4
3 2 1 = 4
1 3 2 = 5
2 1 3 = 5
2 3 1 = 5
P.S. I am not so good with maths.
This is not a proof by any means, but you can quickly come up with the distribution of placement probabilities by running the biased algorithm a million times. It will look like this picture from wikipedia:
An unbiased distribution would have 14.3% in every field.
To get the most likely distribution, I think it's safe to just pick the highest percentage for each index. This means it's most likely that the entire array is moved down by one and the first element will become the last.
Edit: I ran some simulations and this result is most likely wrong. I'll leave this answer up until I can come up with something better.

Binary search over 2d array to find a local maximum? What's wrong with this algorithm?

This is the classic finding a local maximum (just one) in a matrix.
My algorithm is:
Choose the number in the center of the matrix.
Check if the number is a peak. If yes, return.
If not, check the numbers to the left and right. If one of them is greater than our current number, choose that half of the matrix. If both are greater, we can choose either half.
Repeat with the numbers to the top and bottom. This will leave us with one quadrant of the matrix to continue checking.
Since this is binary search for a n x n matrix which has n^2 elements, it should take O(log(n^2)) = O(2*log(n)) = O(log(n))
I'm pretty sure this is not correct, but where is my mistake?
This algorithm isn't guaranteed to find the local maximum. Consider for example the case where you need to follow a winding path through the matrix of ascending values to get to the peak. If that path crosses back and forth between quadrants you algorithm will not find it.
13 1 1 1 1
12 1 1 1 1
11 1 1 2 3
10 1 1 1 4
9 8 7 6 5
Or, here's a simpler example:
3 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
You start in the middle, how do you find the '3'? You algorithm doesn't describe what to do when faced with a horizontal plane.
Consider reading Find a peak element in a 2D array, where it describes a brute force approach, as well as an efficient method which has a time complexity of O(rows * log(columns)), in yout case O(nlogn).
The algorithm is based on Binary Search, thus the logn term you had in your complexity too:
Consider mid column and find maximum element in it.
Let index of mid column be ‘mid’, value of maximum element in mid
column be ‘max’ and maximum element be at ‘mat[max_index][mid]’.
If max >= A[index][mid-1] & max >= A[index][pick+1], max is a peak,
return max.
If max < mat[max_index][mid-1], recur for left half of matrix.
If max < mat[max_index][mid+1], recur for right half of matrix.
However, your algorithm won't work for all cases, and might fail to find a local maximum, since you only look neighboring elements of the current center, which does not guarantee you that will find the maximum (the elements are not sorted of course). Example:
1 1 1 1 1
1 1 1 1 1
1 2 1 1 1
1 1 1 1 1
1 1 1 1 10
You start from the center, you pick the wrong submatrix, you are doomed not to find the local maximum.

Maximum OR value of 2D Array of bits

Suppose there is a 2D array (m x n) of bits.
For example:
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
0 0 0 0 1
here, m = 4, n = 5.
I can flip (0 becomes 1, 1 becomes 0) the bits in any row. When you flip the bits in a particular row, you flip all the bits.
My goal is to get the max OR value between a given pair of rows.
That is, if the given pair of rows is (r1, r2), then I can flip any number of rows between r1 and r2, and I should find the maximum possible OR value of all the rows between r1 and r2.
In the above example (consider arrays with 1-based index), if r1 = 1 and r2 = 4, I can flip the 1st row to get 0 1 1 0 1. Now, if I find the OR value of all the rows from 1 to 4, I get the value 31 as the maximum possible OR value (there can be other solutions).
Also, it would be nice to to compute the answer for (r1, r1), (r1, r1+1), (r1, r1+2), ... , (r1, r2-1) while calculating the same for (r1,r2).
Constraints
1 <= m x n <= 10^6
1 <= r1 <= r2 <= m
A simple brute force solution would have a time complexity of O(2^m).
Is there a faster way to compute this?
Since A <= A | B, the value of a number A will only go up as we OR more numbers to A.
Therefore, we can use binary search.
We can use a function to get the maximum between two rows and save the ORed result as a third row. Then compare two of these third rows to get a higher-level row, and then compare two of these higher-level rows, and so on, until only one is left.
Using your example:
array1 = 1 0 0 1 0 [0]
1 0 1 0 0 [1]
1 0 1 1 0 [2]
0 0 0 0 1 [3]
array2 = 1 1 0 1 1 <-- [0] | ~[1]
1 1 1 1 0 <-- [2] | ~[3]
array3 = 1 1 1 1 1 <-- [0] | [1]
And obviously you can truncate branches as necessary when m is not a power of 2.
So this would be O(m) time. And keep in mind that for large numbers of rows, there are likely not unique solutions. More than likely, the result would be 2 ^ n - 1.
An important optimization: If m >= n, then the output must be 2 ^ n - 1. Suppose we have two numbers A and B. If B has k number missing bits, then A or ~A will be guaranteed to fill at least one of those bits. By a similar token, if m >= log n, then the output must also be 2 ^ n - 1 since each A or ~A is guaranteed to fill at least half of the unfilled bits in B.
Using these shortcuts, you can get away with a brute-force search if you wanted. I'm not 100% the binary search algorithm works in every single case.
Considering the problem of flipping the rows in the entire matrix and then or-ing them together to get as many 1s as possible, I claim this is tractable when the number of columns is less than 2^m, where m is the number of rows. Consider the rows one by one. At stage i counting from 0 you have less than 2^(m-i) zeros to fill. Because flipping a row turns 0s into 1s and vice versa, either the current row or the flipped row will fill in at least half of those zeros. When you have worked through all the rows, you will have less than 1 zeros to fill, so this procedure is guaranteed to provide a perfect answer.
I claim this is tractable when the number of columns is at least 2^m, where m is the number of rows. There are 2^m possible patterns of flipped rows, but this is only O(N) where N is the number of columns. So trying all possible patterns of flipped rows gives you an O(N^2) algorithm in this case.

Shuffle sort - Fisher Yates, why cant I find random number between 0 and N-1?

While studying Shuffle sort, I learnt Fisher Yates solution. It loops for 0 to array length and finds a random number between 0(inclusive) and the loop index (inclusive) and NOT 0 and N-1. Finding a random number between 0 and N-1 does not give a random solution. But I couldn't find the reason for the same.
public static void sort(Comparable[] a){
for(int i = 0 ; i < a.length ; i++){
int r = StdRandom.uniform(i+1);
// why cant this be a.length
exch(a, i, r);
}
}
StdRandom.uniform(i+1) returns returns random number between 0 and i (both inclusive)
That's because you can't generate each sequence with equal probability with your approach of choosing r between 0 to n-1.
Example:
Consider n=3 for set {a,b,c}
Total possible outcome = 3!
Now considering perfect random generator, possible swap results that it will generate, with their corresponding shuffled sets are as->
0 1 2 {a,b,c}
0 2 1 {a,b,c}
1 0 2 {a,b,c}
1 2 0 {a,c,b}
2 0 1 {b,a,c}
2 1 0 {a,b,c}
Clearly not covering all outcomes, which will not be the case with actual implementation of Fisher Yates.
If you only select from 0 to N-1, then you can NEVER choose the last number in the array. It's therefore not completely random.
You can predict that the last number WILL DEFINITELY be out of place at least.
I know that that doesn't mean you can predict the entire sequence, but it does mean that there isn't complete randomness in the system.

Rectangular region in an array

Given an N*N matrix having 1's an 0's in them and given an integer k,what is the best method to find a rectangular region such that it has k 1's in it ???
I can do it with O(N^3*log(N)), but sure the best solution is faster. First you create another N*N matrix B (the initial matrix is A). The logic of B is the following:
B[i][j] - is the number of ones on rectangle in A with corners (0,0) and (i,j).
You can evaluate B for O(N^2) by dynamic programming: B[i][j] = B[i-1][j] + B[i][j-1] - B[i-1][j-1] + A[i][j].
Now it is very easy to solve this problem with O(N^4) by iterating over all right-bottom (i=1..N, j=1..N, O(N^2)), left-bottom (z=1..j, O(N)), and right-upper (t=1..i, O(N)) and you get the number of ones in this rectangular with the help of B:
sum_of_ones = B[i][j] - B[i][z-1] - B[t-1][j] + B[t-1][z-1].
If you got exactly k: k==sum_of_ones, then out the result.
To make it N^3*log(N), you should find right-upper by binary search (so not just iterate all possible cells).
Consider this simpler problem:
Given a vector of size N containing only the values 1 and 0, find a subsequence that contains exactly k values of 1 in it.
Let A be the given vector and S[i] = A[1] + A[2] + A[3] + ... + A[i], meaning how many 1s there are in the subsequence A[1..i].
For each i, we are interested in the existence of a j <= i such that S[i] - S[j-1] == k.
We can find this in O(n) with a hash table by using the following relation:
S[i] - S[j-1] == k => S[j-1] = S[i] - k
let H = an empty hash table
for i = 1 to N do
if H.Contains (S[i] - k) then your sequence ends at i
else
H.Add(S[i])
Now we can use this to solve your given problem in O(N^3): for each sequence of rows in your given matrix (there are O(N^2) sequences of rows), consider that sequence to represent a vector and apply the previous algorithm on it. The computation of S is a bit more difficult in the matrix case, but it's not that hard to figure out. Let me know if you need more details.
Update:
Here's how the algorithm would work on the following matrix, assuming k = 12:
0 1 1 1 1 0
0 1 1 1 1 0
0 1 1 1 1 0
Consider the first row alone:
0 1 1 1 1 0
Consider it to be the vector 0 1 1 1 1 0 and apply the algorithm for the simpler problem on it: we find that there's no subsequence adding up to 12, so we move on.
Consider the first two rows:
0 1 1 1 1 0
0 1 1 1 1 0
Consider them to be the vector 0+0 1+1 1+1 1+1 1+1 0+0 = 0 2 2 2 2 0 and apply the algorithm for the simpler problem on it: again, no subsequence that adds up to 12, so move on.
Consider the first three rows:
0 1 1 1 1 0
0 1 1 1 1 0
0 1 1 1 1 0
Consider them to be the vector 0 3 3 3 3 0 and apply the algorithm for the simpler problem on it: we find the sequence starting at position 2 and ending at position 5 to be the solution. From this we can get the entire rectangle with simple bookkeeping.

Resources