Algorithm for "pick the number up game" - algorithm

I'm struggling to get some solution, but I have no idea for that.
RobotA and RobotB who choose a permutation of the N numbers to begin with. RobotA picks first, and they pick alternately. Each turn, robots only can pick any one remaining number from the permutation. When the remaining numbers form an increasing sequence, the game finishes. The robot who picked the last turn (after which the sequence becomes increasing) wins the game.
Assuming both play optimally , who wins?
Example 1:
The original sequence is 1 7 3.
RobotA wins by picking 7, after which the sequence is increasing 1 3.
Example 2:
The original sequence is 8 5 3 1 2.
RobotB wins by selecting the 2, preventing any increasing sequence.
Is there any known algorithm to solve that? Please give me any tips or ideas of where to look at would be really thankful!

Given a sequence w of distinct numbers, let N(w) be the length of w and let L(w) be the length of the longest increasing subsequence in w. For example, if
w = 3 5 8 1 4
then N(w) = 5 and L(w) = 3.
The game ends when L(w) = N(w), or, equivalently, N(w) - L(w) = 0.
Working the game backwards, if on RobotX's turn N(w) - L(w) = 1, then the optimal play is to remove the unique letter not in a longest increasing subsequence, thereby winning the game.
For example, if w = 1 7 3, then N(w) = 3 and L(w) = 2 with a longest increasing subsequence being 1 3. Removing the 7 results in an increasing sequence, ensuring that the player who removed the 7 wins.
Going back to the previous example, w = 3 5 8 1 4, if either 1 or 4 is removed, then for the resulting permutation u we have N(u) - L(u) = 1, so the player who removed the 1 or 4 will certainly lose to a competent opponent. However, any other play results in a victory since it forces the next player to move to a losing position. Here, the optimal play is to remove any of 3, 5, or 8, after which N(u) - L(u) = 2, but after the next move N(v) - L(v) = 1.
Further analysis along these lines should lead to an optimal strategy for either player.
The nearest mathematical game that I do know is the Monotone Sequence Game. In a monotonic sequence game, two players alternately choose elements of a sequence from some fixed ordered set (e.g. 1,2,...,N). The game ends when the resulting sequence contains either an ascending subsequence of length A or a descending one of length D. This game has its origins with a theorem of Erdos and Szekeres, and a nice exposition can be found in MONOTONIC SEQUENCE GAMES, and this slide presentation by Bruce Sagan is also a good reference.
If you want to know more about game theory in general, or these sorts of games in particular, then I strong recommend Winning Ways for Your Mathematical Plays by Berlekamp, Conway and Guy. Volume 3, I believe, addresses these sorts of games.

Looks like a Minimax problem.

I guess there is more fast solution for this task. I will think.
But I can give you an idea of solution with O(N! * N^2) complexity.
At first, note that picking number from N-permutation is equivalent to the following:
Pick number from N-permutation. Let's it was number X.
Reassign numbers using rule:
1 = 1
2 = 2
...
X-1 = X-1
X = Nothing, it's gone.
X+1 = X
...
N = N - 1
And you get permutation of N-1 numbers.
Example:
1 5 6 4 2 3
Pick 2
1 5 6 4 3
Reassign
1 4 5 3 2
Let's use this one as move, instead just picking. It's easy too see that games are equivalent, player A wins in this game for some permutation if and only if it wins in original.
Let's give a code to all permutations of N numbers, N-1 numbers, ... 2 numbers.
Define F(x) -> {0; 1} (where x is a permutation code) is function which is 1 when current
player wins, and 0 if current player fails. Easy to see F(1 2 .. K-1 K) = 0.
F(x) = 1 if there is at least on move which transforms x to y, and F(y) = 0.
F(x) = 0 if for any move which transforms x to y, F(y) = 1.
So you can use recursion with memorization to compute:
Boolean F(X)
{
Let K be length of permutation with code X.
if you already compute F for argument X return previously calculated result;
if X == 1 2 .. K return 0;
Boolean result = 0;
for i = 1 to K do
{
Y code of permutation get from X by picking number on position i.
if (F(y) == 0)
{
result = 1;
break;
}
}
Store result as F(X);
return result;
}
For each argument we will compute this function only once. There is 1! permutations of length 1, 2! permutations of length 2 .. N! permutations of length N. For permutation length K, we need to do O(K) operation to compute. So complexity will be O(1*1! + 2*2! + .. N*N!) =<= O(N! * N^2) = O(N! * N^2)

Here is Python code for Wisdom's Wind's algorithm. It prints out wins for RobotA.
import itertools
def moves(p):
if tuple(sorted(p)) == p:
return
for i in p:
yield tuple(j - (j > i) for j in p if j != i)
winning = set()
for n in range(6):
for p in itertools.permutations(range(n)):
if not winning.issuperset(moves(p)):
winning.add(p)
for p in sorted(winning, key=lambda q: (len(q), q)):
print(p)

Related

Naive shuffling algorithm probability analysis [duplicate]

I implemented the shuffling algorithm as:
import random
a = range(1, n+1) #a containing element from 1 to n
for i in range(n):
j = random.randint(0, n-1)
a[i], a[j] = a[j], a[i]
As this algorithm is biased. I just wanted to know for any n(n ≤ 17), is it possible to find that which permutation have the highest probablity of occuring and which permutation have least probablity out of all possible n! permutations. If yes then what is that permutation??
For example n=3:
a = [1,2,3]
There are 3^3 = 27 possible shuffle
No. occurence of different permutations:
1 2 3 = 4
3 1 2 = 4
3 2 1 = 4
1 3 2 = 5
2 1 3 = 5
2 3 1 = 5
P.S. I am not so good with maths.
This is not a proof by any means, but you can quickly come up with the distribution of placement probabilities by running the biased algorithm a million times. It will look like this picture from wikipedia:
An unbiased distribution would have 14.3% in every field.
To get the most likely distribution, I think it's safe to just pick the highest percentage for each index. This means it's most likely that the entire array is moved down by one and the first element will become the last.
Edit: I ran some simulations and this result is most likely wrong. I'll leave this answer up until I can come up with something better.

How to solve Twisty Movement from Codeforces?

I've read the editorial but it's very short and claims something I don't understand why it's true. Why is it equivalent to finding longest subsequence of 1*2*1*2*?. Can someone explain the solution step by step and justify the claims at every step? http://codeforces.com/contest/934/problem/C
Here is the 'solution' from the editorial, but as I said it's very short and I don't understand it. Hope someone can guide me to the solution step by step justifying the claims along the way, not like in the solution here. Thanks.
Since 1 ≤ ai ≤ 2, it's equivalent to find a longest subsequence like
1 * 2 * 1 * 2 * . By an easy dynamic programming we can find it in
O(n) or O(n2) time. You can see O(n2) solution in the model
solution below. Here we introduce an O(n) approach: Since the
subsequence can be split into 4 parts (11...22...11...22...) , we
can set dp[i][j](i = 1...n, j = 0..3) be the longest subsequence of
a[1...i] with first j parts.
I also think that the cited explanation is not super clear. Here is another take.
You can collapse an original array
1 1 2 2 2 1 1 2 2 1
into a weighted array
2 3 2 2 1
^ ^ ^ ^ ^
1 2 1 2 1
where numbers at the top represent lengths of contiguous strips of repeated values in the original array.
We can convince ourselves that
The optimal flip does not "break up" any contiguous sequences.
The optimal flip starts and ends with different values (i.e. starts with 1 and ends with 2, or starts with 2 and ends with 1).
Hence, the weighted array contains enough information to solve the problem. We want to flip a contiguous slice of the weighted array s.t. the sum of weights associated with some contiguous monotonic sequence is maximized.
Specifically, we want to perform the flip in such a way that some contiguous monotonic sequence 112, 122, 211 or 221 has maximum weight.
One way to do this with dynamic programming is by creating 4 auxiliary arrays.
A[i] : maximal weight of any 1 to the right of i.
B[i] : maximal weight of any 1 to the left of i.
C[i] : maximal weight of any 2 to the right of i.
D[i] : maximal weight of any 2 to the left of i.
Let's assume that if any of A,B,C,D is accessed out of bounds, the returned value is 0.
We initialize x = 0 and do one pass through the array Arr = [1, 2, 1, 2, 1] with weights W = [2, 3, 2, 2, 1]. At each index i, we have 2 cases:
Arr[i:i+2] == 1 2. In this case we set
x = max(x, W[i] + W[i+1] + C[i+1], W[i] + W[i+1] + B[i-1]).
Arr[i:i+2] == 2 1. In this case we set
x = max(x, W[i] + W[i+1] + A[i+1], W[i] + W[i+1] + D[i-1]).
The resulting x is our answer. This is an O(N) solution.

A stone game - 2 players play perfectly

Recently I have learned about the nim game and grundy number
I am stuck on a problem. Please give me some ideas
Problem:
A and B play a game with a pile of stone. A starts the game and they alternate moves. In each move, a player has to remove at least one and no more than sqrt of number stones from the pile. So, for example if a pile contains 10 stones, then a player can take 1,2,3 stones from the pile.
Both A and B play perfectly. The player who cannot make a valid move loses. Now you are given the number of stone, you have to find the player who will win if both play optimally.
Example
n=3 A win,
n=5 B win
n<=10^12
I can solve this problem with small number of stone by using Grundy number https://www.topcoder.com/community/data-science/data-science-tutorials/algorithm-games/?
grundy function is g(x) with x is the remain stones.
call F(s) is set of number of remain stone that we can obtain from x stone.
if s is a terminal position, g(s)=0
if s is not a terminal position, Let X = {g(t)| t in F(s)}. Then, grundy number of s is the smallest integer greater than or equal to 0 which is not in X.
for example x=10 so F(x)={9,8,7} by take 1,2 or 3 stones. (sqrt(10)<=3)
if g(n)>0 => the first player win
g(0)=0
g(1)=1
g(2)=0
g(3)=1
g(4)=2
....
but this algorithm is to slow.
Thanks in advance.
I'm adding a second answer because my first answer provides the background theory without the optimization. But since OP clearly is looking for some optimization and for a very fast solution without a lot of recursion, I took my own advice:
Of course, the really fast way to do this is to do some more math and figure out some simple properties of n you can check that will determine whether or not it is a winner or a loser.
I'm going to use the terminology I defined there, so if this isn't making sense, read that answer! Specifically, n is the pile size, k is the number of stones to remove, n is a winner if there is a winning strategy for player A starting with a pile of size n and it is a loser otherwise. Let's start with the following key insight:
Most numbers are winners.
Why is this true? It isn't obvious for small numbers: 0 is a loser, 1 is a winner, 2 is a loser, 3 is a winner, so is 4, but 5 is a loser again. But for larger numbers, it becomes more obvious.
If some integer p is large and is a loser then p+1, p+2, ... p+k_m are all winners for some k_m that is around the size of sqrt(p). This is because once I find a loser, for any pile that isn't too much larger than that, I can remove a few stones to leave my opponent with that loser. The key is just determining what the largest valid value of k is, since k is defined in terms of the starting pile size n rather than the final pile size p.
So the question becomes, given some integer p, for which values of k is it true that k <= sqrt(n) where n = p+k. In other words, given p, what starting pile sizes n allow me to remove k and leave my opponent with p. Well, since n = p+k and the values are all nonnegative, we must have
k <= sqrt(n) = sqrt(p+k) ==> k^2 <= p + k ==> k^2 - k - p <= 0.
This is a quadratic equation in k for any fixed value of p. The endpoints of the solution set can be found using the quadratic formula:
k = (1 +- sqrt(1 + 4p))/2
So, the inequality is satisfied for values of k between (1-sqrt(1+4p))/2 and (1+sqrt(1+4p))/2. Of course, sqrt(1+4p) is at least sqrt(5) > 2, so the left endpoint is negative. So then k_m = floor((1+sqrt(1+4p))/2).
More importantly, I claim that the next loser after p is the number L = p + k_m + 1. Let's try to prove this:
Theorem: If p is a loser, then L = p + k_m + 1 is also a loser and every integer p < n < L is a winner.
Proof: We have already shown that every integer n in the interval [p+1, p+k_m] is a winner, so we only need to prove that L is a loser.
Suppose, to the contrary, that L is a winner. Then there exists some 1 <= k <= sqrt(L) such that L - k is a loser (by definition). Since we have proven that the integers p+1, ..., p+k_m are winners, we must have that L - k <= p since no number smaller than L and larger than p can be a loser. But this means that L <= p + k and so k satisfies the equation k <= sqrt(L) <= sqrt(p + k). We have already shown that solutions to the equation k <= sqrt(p + k) are no larger than (1+sqrt(1+4p))/2, so any integer solution must satisfy k <= k_m. But then L - k = p + k_m + 1 - k >= p + k_m + 1 - k_m = p + 1. This is a contradiction since p < L - k < L and we have already proved that there are no losers larger than p and smaller than L.
QED
The above theorem gives us a nice approach since we now know that winners fall into intervals of integers separated by a single loser and we know how to calculate the interval between two losers. In particular, if p is a loser, then p + k_m + 1 is a loser where
k_m = floor((1+sqrt(1+4p))/2).
Now we can rewrite the function in a purely iterative manner that should be fast and requires constant space. The approach is simply to compute the sequence of losers until we either find n (in which case it is a loser) or determine that n lies in the interval between two losers.
bool is_winner(int n) {
int p = 0;
// loop through losers until we find one at least as large as n
while (p < n) {
int km = floor((1+sqrt(1+4p))/2);
p = p + km + 1;
}
/* if we skipped n while computing losers,
* it is a winner that lies in the interval
* between two losers. So n is a winner as
* long as it isn't equal to p.*/
return (p != n);
}
You have to think this game recursively from the end: Clearly, to win, you have to take the last stone.
1 stone: First player wins. It's A's turn to take the only stone.
2 stones: Second player wins. A cannot take two stones but has to take one. So A is forced to take one stone and leave the other one for B to take.
3 stones: First player wins. There is still no choice. A has to take one stone, and smiles because they know that B can't win with two stones.
4 stones: First player wins. Now A has the choice to leave two or three stones. From the above, A knows that B will win if given three stones, but B will lose if given two stones, so A takes two stones.
5 stones: Second player wins. Even though A has the choice to leave three or four stones, B will win if given either amount.
As you can see, you can easily calculate who will win a game with n stones by complete knowledge of the outcomes of the games with 1 to n-1 stones.
An algorithmic solution will thus create a boolean array wins, where wins[i] is true if the player given i stones will win the game. wins[0] is initialized to false. The rest of the array is then filled iteratively from the start by scanning the reachable portion of the array for a false entry. If a false entry is found, the current entry is set to true, because A can leave the board in a loosing state for B, otherwise it is set to false.
I will build upon cmaster's answer because it is already pretty close. The question is how to efficiently calculate the values.
The answer is: We don't need the whole array. Only the false values are interesting. Let's analyze:
If we have a false value in the array, then the next few entries will be true because they can remove stones, such that the other player lands on the false value. The question is: How many true entries will be there?
If we are at the false entry z, then the entry x will be a true entry if x - sqrt(x) <= z. We can solve this for x and get:
x <= 1/2 * (1 + 2 * z + sqrt(1 + 4 * z))
This is the last true entry. E.g. for z = 2, this returns 4. The next entry will be false because the player can only remove stones, such that the opponent will come out at a true entry.
Knowing this, our algorithm is almost complete. Start at a known false value (e.g. 0). Then, iteratively move to the next false value until you reach n.
bool isWinner(long long n)
{
double loser = 0;
while(n > loser)
loser = floor(0.5 * (1 + 2 * loser + sqrt(1 + 4 * loser))) + 1;
return n != loser;
}
Games like this (Towers of Hanoi is another classic example) are meant to illustrate the mathematical principles of induction and recursion, with recursion being particularly relevant in programming.
We want to determine whether a pile of n stones is a winning pile or a losing pile. Intuitively, a winning pile is one so that no matter what sequence of choices your opponent makes you can always take some number of stones to guarantee you will win. Similarly, a losing pile is one such that no matter what choice you make, you always leave your opponent a winning strategy.
Obviously, n = 0 is a losing pile; you've already lost. And n = 1 is a winning pile since you take one stone and leave your opponent n=0. What about n=2? Well, you are only allowed to take one stone, at which point you have given your opponent a winning pile (n=1), so n=2 is a losing number. We can make this mathematically more precise.
Definition: An integer n is a loser if n=0 or for every integer k between 1 and sqrt(n), n-k is a winner. An integer n is a winner if there exists some integer k between 1 and sqrt(n) such that n-k is a loser.
In this definition, n is the size of the pile, k is the number of stones you choose to take. A pile is a losing pile if every valid number of stones to remove gives your opponent a winning pile and a winning pile is one where some choice gives your opponent a losing pile.
Of course, that definition should make you a little uneasy because we actually have no idea if it makes sense for anything other than n=0,1,2, which we already checked. Perhaps some number fits the definition of both a winner and a loser, or neither definition. This would certainly be confusing. This is where induction comes in.
Theorem: Every nonnegative integer is either a winner or a loser, but not both.
Proof: We'll use the principle of Strong or Complete Induction. We know that n=0 is a loser (by definition) and we've already shown that n=1 is a winner and n=2 is a loser directly. Those are our base cases.
Now let's consider some integer n_0 > 2 and we'll assume (using strong induction) that every nonnegative integer less than n_0 is either a winner or a loser, but not both. Let s = floor(sqrt(n_0)) and consider the set of integers P = {n_0-s, n_0-s+1, ..., n_0 - 1}. (Since {1, 2, ..., s} is the set of possible choices of stones to remove, P is the set of piles I can leave my opponent with.) By strong induction, since each value in P is a nonnegative integer less than n_0, each of them is either a winner or a loser (but not both). If any value in P is a loser, then by definition, n_0 is a winner (because you remove enough stones to leave your opponent that losing pile). If not, then every value in P is a winner, so then n_0 is a loser (because no matter how many stones you take, your opponent is still left with a winning pile). Therefore, n_0 is either a winner or a loser, but not both.
By strong induction, we conclude that every nonnegative integer is either a winner or a loser but not both.
QED
OK, that was pretty straightforward if you are comfortable with induction. But all we've shown is that our very intuitive definition actually makes sense and that every pile you get is either a winner (if you play it right) or a loser (if your opponent plays it right). How do we determine which is which?
Well, induction leads right to recursion. Let's write recursive functions for our two definitions: is n a winner or a loser? Here's some C-like pseudocode without error checking.
bool is_winner(int n) {
// check all valid numbers of stones to remove (k)
for (int k = 1; k <= sqrt(n); k++) {
if (is_loser(n-k)) {
// I can force the loser n-k on my opponent, so n is a winner
return true;
}
}
// I didn't find a way to force a loser on my opponent, so this must be a loser.
return false;
}
bool is_loser(int n) {
if (n == 0) { // this is our base case
return true;
}
for (int k = 1; k <= sqrt(n); k++) {
if (!is_winner(n-k)) {
// we found a way to give them a pile that ISN'T a winner, so this isn't a loser
return false;
}
}
// Nope: every pile we can give our opponent is a winner, so this pile is a loser
return true;
}
Of course, the code above is somewhat redundant since we've already shown that every number is either a winner or a loser. Therefore, it makes more sense to implement is_loser as just returning !is_winner or vice-versa. Perhaps we'll just do is_winner as a stand-alone implementation.
bool is_winner(int n) {
if (n < 0) {
// raise error
} else if (n == 0) {
return false; // 0 is a loser
} else {
for (int k = 1; k <= sqrt(n); k++) {
if (!is_winner(n-k)) {
// we can give opponent a loser, this is a winner
return true;
}
}
// all choices give our opponent a winner, this is a loser
return false;
}
}
To use this function to answer the question, if the game starts with n stones and player A goes first and both players play optimally, player A wins if is_winner(n) and player B wins if !is_winner(n). To figure out what their plays should be, if you have a winning pile, you should choose some valid k such that n-k is a loser (it doesn't matter which one, but the largest value will make the game end fastest) and if you are given a losing pile, it doesn't matter what you choose -- that is the point of a loser, but again, choosing the largest value of k will make the game end sooner.
None of this really considers performance. Since n could be quite large, there are a number of things you could consider. Like, for example, pre-computing the common small values of n that you are going to consider, or using Memoization, at least within a single recursive call. Furthermore, as I suggested previously, removing the largest value of k ends the game with fewer turns. Similarly, if you reverse the loops and check the largest allowed values of k first, you should be able to reduce the number of recursive calls.
Of course, the really fast way to do this is to do some more math and figure out some simple properties of n you can check that will determine whether or not it is a winner or a loser.
public class Solution {
public boolean canWinNim(int n) {
if(n % 4 == 0)
{
return false;
}
else
{
return true;
}
}
}

Generate a random integer from 0 to N-1 which is not in the list

You are given N and an int K[].
The task at hand is to generate a equal probabilistic random number between 0 to N-1 which doesn't exist in K.
N is strictly a integer >= 0.
And K.length is < N-1. And 0 <= K[i] <= N-1. Also assume K is sorted and each element of K is unique.
You are given a function uniformRand(int M) which generates uniform random number in the range 0 to M-1 And assume this functions's complexity is O(1).
Example:
N = 7
K = {0, 1, 5}
the function should return any random number { 2, 3, 4, 6 } with equal
probability.
I could get a O(N) solution for this : First generate a random number between 0 to N - K.length. And map the thus generated random number to a number not in K. The second step will take the complexity to O(N). Can it be done better in may be O(log N) ?
You can use the fact that all the numbers in K[] are between 0 and N-1 and they are distinct.
For your example case, you generate a random number from 0 to 3. Say you get a random number r. Now you conduct binary search on the array K[].
Initialize i = K.length/2.
Find K[i] - i. This will give you the number of numbers missing from the array in the range 0 to i.
For example K[2] = 5. So 3 elements are missing from K[0] to K[2] (2,3,4)
Hence you can decide whether you have to conduct the remaining search in the first part of array K or the next part. This is because you know r.
This search will give you a complexity of log(K.length)
EDIT: For example,
N = 7
K = {0, 1, 4} // modified the array to clarify the algorithm steps.
the function should return any random number { 2, 3, 5, 6 } with equal probability.
Random number generated between 0 and N-K.length = random{0-3}. Say we get 3. Hence we require the 4th missing number in array K.
Conduct binary search on array K[].
Initial i = K.length/2 = 1.
Now we see K[1] - 1 = 0. Hence no number is missing upto i = 1. Hence we search on the latter part of the array.
Now i = 2. K[2] - 2 = 4 - 2 = 2. Hence there are 2 missing numbers up to index i = 2. But we need the 4th missing element. So we again have to search in the latter part of the array.
Now we reach an empty array. What should we do now? If we reach an empty array between say K[j] & K[j+1] then it simply means that all elements between K[j] and K[j+1] are missing from the array K.
Hence all elements above K[2] are missing from the array, namely 5 and 6. We need the 4th element out of which we have already discarded 2 elements. Hence we will choose the second element which is 6.
Binary search.
The basic algorithm:
(not quite the same as the other answer - the number is only generated at the end)
Start in the middle of K.
By looking at the current value and it's index, we can determine the number of pickable numbers (numbers not in K) to the left.
Similarly, by including N, we can determine the number of pickable numbers to the right.
Now randomly go either left or right, weighted based on the count of pickable numbers on each side.
Repeat in the chosen subarray until the subarray is empty.
Then generate a random number in the range consisting of the numbers before and after the subarray in the array.
The running time would be O(log |K|), and, since |K| < N-1, O(log N).
The exact mathematics for number counts and weights can be derived from the example below.
Extension with K containing a bigger range:
Now let's say (for enrichment purposes) K can also contain values N or larger.
Then, instead of starting with the entire K, we start with a subarray up to position min(N, |K|), and start in the middle of that.
It's easy to see that the N-th position in K (if one exists) will be >= N, so this chosen range includes any possible number we can generate.
From here, we need to do a binary search for N (which would give us a point where all values to the left are < N, even if N could not be found) (the above algorithm doesn't deal with K containing values greater than N).
Then we just run the algorithm as above with the subarray ending at the last value < N.
The running time would be O(log N), or, more specifically, O(log min(N, |K|)).
Example:
N = 10
K = {0, 1, 4, 5, 8}
So we start in the middle - 4.
Given that we're at index 2, we know there are 2 elements to the left, and the value is 4, so there are 4 - 2 = 2 pickable values to the left.
Similarly, there are 10 - (4+1) - 2 = 3 pickable values to the right.
So now we go left with probability 2/(2+3) and right with probability 3/(2+3).
Let's say we went right, and our next middle value is 5.
We are at the first position in this subarray, and the previous value is 4, so we have 5 - (4+1) = 0 pickable values to the left.
And there are 10 - (5+1) - 1 = 3 pickable values to the right.
We can't go left (0 probability). If we go right, our next middle value would be 8.
There would be 2 pickable values to the left, and 1 to the right.
If we go left, we'd have an empty subarray.
So then we'd generate a number between 5 and 8, which would be 6 or 7 with equal probability.
This can be solved by basically solving this:
Find the rth smallest number not in the given array, K, subject to
conditions in the question.
For that consider the implicit array D, defined by
D[i] = K[i] - i for 0 <= i < L, where L is length of K
We also set D[-1] = 0 and D[L] = N
We also define K[-1] = 0.
Note, we don't actually need to construct D. Also note that D is sorted (and all elements non-negative), as the numbers in K[] are unique and increasing.
Now we make the following claim:
CLAIM: To find the rth smallest number not in K[], we need to find right most occurrence of r' in D (which occurs at position defined by j), where r' is the largest number in D, which is < r. Such an r' exists, because D[-1] = 0. Once we find such an r' (and j), the number we are looking for is r-r' + K[j].
Proof: Basically the definition of r' and j tells us that there are exactlyr' numbers missing from 0 to K[j], and more than r numbers missing from 0 to K[j+1]. Thus all the numbers from K[j]+1 to K[j+1]-1 are missing (and these missing are at least r-r' in number), and the number we seek is among them, given by K[j] + r-r'.
Algorithm:
In order to find (r',j) all we need to do is a (modified) binary search for r in D, where we keep moving to the left even if we find r in the array.
This is an O(log K) algorithm.
If you are running this many times, it probably pays to speed up your generation operation: O(log N) time just isn't acceptable.
Make an empty array G. Starting at zero, count upwards while progressing through the values of K. If a value isn't in K add it to G. If it is in K don't add it and progress your K pointer. (This relies on K being sorted.)
Now you have an array G which has only acceptable numbers.
Use your random number generator to choose a value from G.
This requires O(N) preparatory work and each generation happens in O(1) time. After N look-ups the amortized time of all operations is O(1).
A Python mock-up:
import random
class PRNG:
def __init__(self, K,N):
self.G = []
kptr = 0
for i in range(N):
if kptr<len(K) and K[kptr]==i:
kptr+=1
else:
self.G.append(i)
def getRand(self):
rn = random.randint(0,len(self.G)-1)
return self.G[rn]
prng=PRNG( [0,1,5], 7)
for i in range(20):
print prng.getRand()

Levenstein distance from particular group of numbers

My input are three numbers - a number s and the beginning b and end e of a range with 0 <= s,b,e <= 10^1000. The task is to find the minimal Levenstein distance between s and all numbers in range [b, e]. It is not necessary to find the number minimizing the distance, the minimal distance is sufficient.
Obviously I have to read the numbers as string, because standard C++ type will not handle such large numbers. Calculating the Levenstein distance for every number in the possibly huge range is not feasible.
Any ideas?
[EDIT 10/8/2013: Some cases considered in the DP algorithm actually don't need to be considered after all, though considering them does not lead to incorrectness :)]
In the following I describe an algorithm that takes O(N^2) time, where N is the largest number of digits in any of b, e, or s. Since all these numbers are limited to 1000 digits, this means at most a few million basic operations, which will take milliseconds on any modern CPU.
Suppose s has n digits. In the following, "between" means "inclusive"; I will say "strictly between" if I mean "excluding its endpoints". Indices are 1-based. x[i] means the ith digit of x, so e.g. x[1] is its first digit.
Splitting up the problem
The first thing to do is to break up the problem into a series of subproblems in which each b and e have the same number of digits. Suppose e has k >= 0 more digits than s: break up the problem into k+1 subproblems. E.g. if b = 5 and e = 14032, create the following subproblems:
b = 5, e = 9
b = 10, e = 99
b = 100, e = 999
b = 1000, e = 9999
b = 10000, e = 14032
We can solve each of these subproblems, and take the minimum solution.
The easy cases: the middle
The easy cases are the ones in the middle. Whenever e has k >= 1 more digits than b, there will be k-1 subproblems (e.g. 3 above) in which b is a power of 10 and e is the next power of 10, minus 1. Suppose b is 10^m. Notice that choosing any digit between 1 and 9, followed by any m digits between 0 and 9, produces a number x that is in the range b <= x <= e. Furthermore there are no numbers in this range that cannot be produced this way. The minimum Levenshtein distance between s (or in fact any given length-n digit string that doesn't start with a 0) and any number x in the range 10^m <= x <= 10^(m+1)-1 is necessarily abs(m+1-n), since if m+1 >= n it's possible to simply choose the first n digits of x to be the same as those in s, and delete the remainder, and if m+1 < n then choose the first m+1 to be the same as those in s and insert the remainder.
In fact we can deal with all these subproblems in a single constant-time operation: if the smallest "easy" subproblem has b = 10^m and the largest "easy" subproblem has b = 10^u, then the minimum Levenshtein distance between s and any number in any of these ranges is m-n if n < m, n-u if n > u, and 0 otherwise.
The hard cases: the end(s)
The hard cases are when b and e are not restricted to have the form b = 10^m and e = 10^(m+1)-1 respectively. Any master problem can generate at most two subproblems like this: either two "ends" (resulting from a master problem in which b and e have different numbers of digits, such as the example at the top) or a single subproblem (i.e. the master problem itself, which didn't need to be subdivided at all because b and e already have the same number of digits). Note that due to the previous splitting of the problem, we can assume that the subproblem's b and e have the same number of digits, which we will call m.
Super-Levenshtein!
What we will do is design a variation of the Levenshtein DP matrix that calculates the minimum Levenshtein distance between a given digit string (s) and any number x in the range b <= x <= e. Despite this added "power", the algorithm will still run in O(n^2) time :)
First, observe that if b and e have the same number of digits and b != e, then it must be the case that they consist of some number q >= 0 of identical digits at the left, followed by a digit that is larger in e than in b. Now consider the following procedure for generating a random digit string x:
Set x to the first q digits of b.
Append a randomly-chosen digit d between b[i] and e[i] to x.
If d == b[i], we "hug" the lower bound:
For i from q+1 to m:
If b[i] == 9 then append b[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that e[i] will be larger then b[i], and there is no digit larger than 9!]
Otherwise, flip a coin:
Heads: Append b[i].
Tails: Append a randomly-chosen digit d > b[i], then goto 6.
Stop.
Else if d == e[i], we "hug" the upper bound:
For i from q+1 to m:
If e[i] == 0 then append e[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that b[i] will be smaller then e[i], and there is no digit smaller than 0!]
Otherwise, flip a coin:
Heads: Append e[i].
Tails: Append a randomly-chosen digit d < e[i], then goto 6.
Stop.
Otherwise (if d is strictly between b[i] and e[i]), drop through to step 6.
Keep appending randomly-chosen digits to x until it has m digits.
The basic idea is that after including all the digits that you must include, you can either "hug" the lower bound's digits for as long as you want, or "hug" the upper bound's digits for as long as you want, and as soon as you decide to stop "hugging", you can thereafter choose any digits you want. For suitable random choices, this procedure will generate all and only the numbers x such that b <= x <= e.
In the "usual" Levenshtein distance computation between two strings s and x, of lengths n and m respectively, we have a rectangular grid from (0, 0) to (n, m), and at each grid point (i, j) we record the Levenshtein distance between the prefix s[1..i] and the prefix x[1..j]. The score at (i, j) is calculated from the scores at (i-1, j), (i, j-1) and (i-1, j-1) using bottom-up dynamic programming. To adapt this to treat x as one of a set of possible strings (specifically, a digit string corresponding to a number between b and e) instead of a particular given string, what we need to do is record not one but two scores for each grid point: one for the case where we assume that the digit at position j was chosen to hug the lower bound, and one where we assume it was chosen to hug the upper bound. The 3rd possibility (step 5 above) doesn't actually require space in the DP matrix because we can work out the minimal Levenshtein distance for the entire rest of the input string immediately, very similar to the way we work it out for the "easy" subproblems in the first section.
Super-Levenshtein DP recursion
Call the overall minimal score at grid point (i, j) v(i, j). Let diff(a, b) = 1 if characters a and b are different, and 0 otherwise. Let inrange(a, b..c) be 1 if the character a is in the range b..c, and 0 otherwise. The calculations are:
# The best Lev distance overall between s[1..i] and x[1..j]
v(i, j) = min(hb(i, j), he(i, j))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the lower bound
hb(i, j) = min(hb(i-1, j)+1, hb(i, j-1)+1, hb(i-1, j-1)+diff(s[i], b[j]))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the upper bound
he(i, j) = min(he(i-1, j)+1, he(i, j-1)+1, he(i-1, j-1)+diff(s[i], e[j]))
At the point in time when v(i, j) is being calculated, we will also calculate the Levenshtein distance resulting from choosing to "stop hugging", i.e. by choosing a digit that is strictly in between b[j] and e[j] (if j == q) or (if j != q) is either above b[j] or below e[j], and thereafter freely choosing digits to make the suffix of x match the suffix of s as closely as possible:
# The best Lev distance possible between the ENTIRE STRINGS s and x, given that
# we choose to stop hugging at the jth digit of x, and have optimally aligned
# the first i digits of s to these j digits
sh(i, j) = if j >= q then shc(i, j)+abs(n-i-m+j)
else infinity
shc(i, j) = if j == q then
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..(e[j]-1)))
else
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..9),
he(i, j-1)+1, he(i-1, j-1)+inrange(s[i], (0..(e[j]-1)))
The formula for shc(i, j) doesn't need to consider "downward" moves, since such moves don't involve any digit choice for x.
The overall minimal Levenshtein distance is the minimum of v(n, m) and sh(i, j), for all 0 <= i <= n and 0 <= j <= m.
Complexity
Take N to be the largest number of digits in any of s, b or e. The original problem can be split in linear time into at most 1 set of easy problems that collectively takes O(1) time to solve and 2 hard subproblems that each take O(N^2) time to solve using the super-Levenshtein algorithm, so overall the problem can be solved in O(N^2) time, i.e. time proportional to the square of the number of digits.
A first idea to speed up the computation (works if |e-b| is not too large):
Question: how much can the Levestein distance change when we compare s with n and then with n+1?
Answer: not too much!
Let's see the dynamic-programming tables for s = 12007 and two consecutive n
n = 12296
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 3
and
n = 12297
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 2
As you can see, only the last column changes, since n and n+1 have the same digits, except for the last one.
If you have the dynamic-programming table for the edit-distance of s = 12001 and n = 12296, you already have the table for n = 12297, you just need to update the last column!
Obviously if n = 12299 then n+1 = 12300 and you need to update the last 3 columns of the previous table.. but this happens just once every 100 iteration.
In general, you have to
update the last column on every iterations (so, length(s) cells)
update the second-to-last too, once every 10 iterations
update the third-to-last, too, once every 100 iterations
so let L = length(s) and D = e-b. First you compute the edit-distance between s and b. Then you can find the minimum Levenstein distance over [b,e] looping over every integer in the interval. There are D of them, so the execution time is about:
Now since
we have an algorithm wich is

Resources