Uncrossed Sequence Calculation - algorithm

From http://discuss.joelonsoftware.com/default.asp?interview.11.794054.1
The sequence A is defined as follows:
Start with the natural numbers
1,2,3,...
Initialize count = 1;
while(there are uncrossed numbers)
{
pick the first uncrossed number say n.
set A[count] = n.
Cross out the number count+n.
Cross out the number n
Increment count.
}
Give a fast algorithm to determine A[n], given n.
Try getting an algorithm which is polynomial in log n.

Sorry for posting this question.
Apparently this is a famous sequence called Wythoff's sequence and there is a neat formula for A[n] given by A[n] = [n*phi] where [x] = integral part of x and phi is the golden ratio.
To calculate [nphi], we can approximate phi as a ratio of consecutive fibonacci numbers, giving an O(lognloglogn) algorithm. (O(logn) time to do arithmetic on O(log n) bit numbers).

Here is how it starts
1 2 3 4 5 6 7 8 ... A[1] = 1, cross 2
1 X 3 4 5 6 7 8 ... A[2] = 1, cross 3
1 X X 4 5 6 7 8 ... A[3] = 1, cross 4
...
number 1 is never crossed because the least number that can be crossed is 1+1==2.
So there is constant time algorithm: A[n] = 1 for all n.

Related

XOR of numbers = X

I found this problem in a hiring contest(which is over now). Here it is:
You are given two natural numbers N and X. You are required to create an array of N natural numbers such that the bitwise XOR of these numbers is equal to X. The sum of all the natural numbers that are available in the array is as minimum as possible.
If there exist multiple arrays, print the smallest one
Array A< Array B if
A[i] < B[i] for any index i, and A[i]=B[i] for all indices less than i
Sample Input: N=3, X=2
Sample output : 1 1 2
Explanation: We have to print 3 natural numbers having the minimum sum Thus the N-spaced numbers are [1 1 2]
My approach:
If N is odd, I put N-1 ones in the array (so that their xor is zero) and then put X
If N is even, I put N-1 ones again and then put X-1(if X is odd) and X+1(if X is even)
But this algorithm failed for most of the test cases. For example, when N=4 and X=6 my output is
1 1 1 7 but it should be 1 1 2 4
Anyone knows how to make the array sum minimum?
In order to have the minimum sum, you need to make sure that when your target is X, you are not cancelling the bits of X and recreating them again. Because this will increase the sum. For this, you have create the bits of X one by one (ideally) from the end of the array. So, as in your example of N=4 and X=6 we have: (I use ^ to show xor)
X= 7 = 110 (binary) = 2 + 4. Note that 2^4 = 6 as well because these numbers don't share any common bits. So, the output is 1 1 2 4.
So, we start by creating the most significant bits of X from the end of the output array. Then, we also have to handle the corner cases for different values of N. I'm going with a number of different examples to make the idea clear:
``
A) X=14, N=5:
X=1110=8+4+2. So, the array is 1 1 2 4 8.
B) X=14, N=6:
X=8+4+2. The array should be 1 1 1 1 2 12.
C) X=15, N=6:
X=8+4+2+1. The array should be 1 1 1 2 4 8.
D) X=15, N=5:
The array should be 1 1 1 2 12.
E) X=14, N=2:
The array should be 2 12. Because 12 = 4^8
``
So, we go as follows. We compute the number of powers of 2 in X. Let this number be k.
Case 1 - If k <= n (example E): we start by picking the smallest powers from left to right and merge the remaining on the last position in the array.
Case 2 - If k > n (example A, B, C, D): we compute h = n - k. If h is odd we put h = n-k+1. Now, we start by putting h 1's in the beginning of the array. Then, the number of places left is less than k. So, we can follow the idea of Case 1 for the remaining positions. Note that in case 2, instead of having odd number of added 1's we put and even number of 1's and then do some merging at the end. This guarantees that the array is the smallest it can be.
We have to consider that we have to minimize the sum of the array for solution and that is the key point.
First calculate set bits in N suppose if count of setbits are less than or equal to X then divide N in X integers based on set bits like
N = 15, X = 2
setbits in 15 are 4 solution is 1 14
if X = 3 solution is 1 2 12
this minimizes array sum too.
other case if setbits are greater than X
calculate difference = setbits(N) - X
If difference is even then add ones as needed and apply above algorithm all ones will cancel out.
If difference is odd then add ones but now you have take care of that 1 extra one in the answer array.
Check for the corner cases too.

Optimizing the total sum of weighted intervals

I have M intervals on the real line, each with a positive weight. I need to select N among them such that they don't overlap and give the maximum sum. How can I do that efficiently ?
If there is no subset of N non-overlapping intervals, there is no solution.
Without the non-overlap constraint, the question is trivial: pick the N largest weights. Because of the constraint, this doesn't work anymore. In my case, N and M are small (<20), but I hope that there is a more efficient solution than exhaustively trying all subsets.
You can solve it with dynamic programming. Let C(k, i) be the maximum sum of (up to) k weighted intervals, none of which has their left end less than i.
You can restrict i to be in the set of (real) start points for all the intervals, and k ranges from 0 to N.
Start by initializing C(k, max(start for start, end in interval)) to 0, and all other entries to -infinity.
Sort the intervals by start points, and iterate through them backwards.
For each interval (start, end) with weight w, and for each k:
C(start, k) = max(C(start, k), C(next(start), k), w + C(next'(end), k-1))
Here next(x) returns the smallest start point greater than x, and next'(x) returns the smallest start point greater than or equal to x. Both can be implemented by binary search (or linear scan if M is small).
Overall, this is going to take O(M*N*logM) time and O(M*N) space.
(This assumes the intervals aren't closed at both ends, so (0, 100) and (100, 200) don't overlap -- a small adjustment needs to be made if these are to be considered overlapping).
In progress:
Inspired by #PaulHankin's solution, here is a reformulation.
Sort all intervals by increasing right abscissa and iteratively find the largest possible sum up to the K-th right bound. Assume you have solved all optimization problems for the first K intervals and for all relevant N (from 0 to K).
When you take the next interval into consideration, you compute new candidate solutions as follows: for every N (up to the current nomber of intervals), lengthen all previous solutions that you can (including the empty one) without causing overlap, and keep the lenghtened solutions that are better.
Example:
The optimal sums up to the K-th right bound, by increasing N are
1: 2
2: 2 or 8 => 8 | -
3: 8 or 3 => 8 | 2 + 3 | -
4: 8 or 2 => 8 | 2 + 3 or 8 + 2 => 8 + 2 | 2 + 3 + 2 | -
5: 8 or 9 => 9 | 8 + 2 or 2 + 9 => 2 + 9 | 2 + 3 + 2 | - | -

In how many ways can you construct an array of size N such that the product of any pair of consecutive elements in not greater than M?

Every element is an integer and should have a value of at least 1.
Constraints: 2 ≤ N ≤ 1000 and 1 ≤ M ≤ 1000000000.
We need to find the answer modulo 1000000007
May be we can calculate dp[len][type][typeValue], where type have only two states:
type = 0: this is means, that last number in sequence with length len equal or smaller than sqrt(M). And this number we save in typeValue
type = 1: this is means, that last number in sequence bigger than sqrt(M). And we save in typeValue number k = M / lastNumber (rounded down), which not greater than sqrt(M).
So, this dp have O(N sqrt(M)) states, but how can we calculate each 'cell' of this dp?
Firstly, consider some 'cell' dp[len][0][number]. This value can calculate as follows:
dp[len][0][number] = sum[1 <= i <= sqrt(M)] (dp[len - 1][0][i]) + sum[number <= i <= sqrt(M)] (dp[len - 1][1][i])
Little explanation: beacuse type = 0 => number <= sqrt(M), so we can put any number not greater than sqrt(M) next and only some small number greater.
For the dp[len][1][number] we can use next equation:
dp[len][1][k] = sum[1 <= i <= k] (dp[len - 1][0][i] * cntInGroup(k)) where cntInGroup(k) - cnt numbers x such that M / x = k
We can simply calculate cntInGroups(k) for all 1 <= k <= sqrt(M) using binary search or formulas.
But another problem is that out algorithm needs O(sqrt(M)) operations so result asymptotic is O(N M). But we can improve that.
Note that we need to calculate sum of some values on segments, which were processed on previous step. So, we can precalculate prefix sums in advance and after that we can calculate each 'cell' of dp in O(1) time.
So, with this optimization we can solve this problem with asymptotic O(N sqrt(M))
Here is an example for N = 4, M = 10:
1 number divides 10 into 10 equal parts with a remainder less than the part
1 number divides 10 into 5 equal parts with a remainder less than the part
1 number divides 10 into 3 equal parts with a remainder less than the part
2 numbers divide 10 into 2 equal parts with a remainder less than the part
5 numbers divide 10 into 1 part with a remainder less than the part
Make an array and update it for each value of n:
N 1 1 1 2 5
----------------------
2 10 5 3 2 1 // 10 div 1 ; 10 div 2 ; 10 div 3 ; 10 div 5,4 ; 10 div 6,7,8,9,10
3 27 22 18 15 10 // 10+5+3+2*2+5*1 ; 10+5+3+2*2 ; 10+5+3 ; 10+5 ; 10
4 147 97 67 49 27 // 27+22+18+2*15+5*10 ; 27+22+18+2*15 ; 27+22+18 ; 27+22 ; 27
The solution for N = 4, M = 10 is therefore:
147 + 97 + 67 + 2*49 + 5*27 = 544
My thought process:
For each number in the first array position, respectively, there could be the
following in the second:
1 -> 1,2..10
2 -> 1,2..5
3 -> 1,2,3
4 -> 1,2
5 -> 1,2
6 -> 1
7 -> 1
8 -> 1
9 -> 1
10 -> 1
Array position 3:
For each of 10 1's in col 2, there could be 1 of 1,2..10
For each of 5 2's in col 2, there could be 1 of 1,2..5
For each of 3 3's in col 2, there could be 1 of 1,2,3
For each of 2 4's in col 2, there could be 1 of 1,2
For each of 2 5's in col 2, there could be 1 of 1,2
For each of 1 6,7..10 in col 2, there could be one 1
27 1's; 22 2's; 18 3's; 15 4's; 15 5's; 10 x 6's,7's,8's,9's,10's
Array position 4:
1's = 27+22+18+15+15+10*5
2's = 27+22+18+15+15
3's = 27+22+18
4's = 27+22
5's = 27+22
6,7..10's = 27 each
Create a graph and assign the values from 0 to M to the vertices. An edge exists between two vertices if their product is not greater than M. The number of different arrays is then the number of paths with N steps, starting at the vertex with value 0. This number can be computed using a simple depth-first search.
The question is now whether this is efficient enough and whether it can be made more efficient. One way is to restructure the solution using matrix multiplication. The matrix to multiply with represents the edges above, it has a 1 when there is an edge, a 0 otherwise. The initial matrix on the left represents the starting vertex, it has a 1 at position (0, 0), zeros everywhere else.
Based on this, you can multiply the right matrix with itself to represent two steps through the graph. This means that you can combine two steps to make them more efficient, so you only need to multiply log(N) times, not N times. However, make sure you use known efficient matrix multiplication algorithms to implement this, the naive one will only perform for small M.

Algorithm puzzle interview

I found this interview question, and I couldn't come up with an algorithm better than O(N^2 * P):
Given a vector of P natural numbers (1,2,3,...,P) and another vector of length N whose elements are from the first vector, find the longest subsequence in the second vector, such that all elements are uniformly distributed (have the same frequency).
Example : (1,2,3) and (1,2,1,3,2,1,3,1,2,3,1). The longest subsequence is in the interval [2,10], because it contains all the elements from the first sequence with the same frequency (1 appears three times, 2 three times, and 3 three times).
The time complexity should be O(N * P).
"Subsequence" usually means noncontiguous. I'm going to assume that you meant "sublist".
Here's an O(N P) algorithm assuming we can hash (assumption not needed; we can radix sort instead). Scan the array keeping a running total for each number. For your example,
1 2 3
--------
0 0 0
1
1 0 0
2
1 1 0
1
2 1 0
3
2 1 1
2
2 2 1
1
3 2 1
3
3 2 2
1
4 2 2
2
4 3 2
3
4 3 3
1
5 3 3
Now, normalize each row by subtracting the minimum element. The result is
0: 000
1: 100
2: 110
3: 210
4: 100
5: 110
6: 210
7: 100
8: 200
9: 210
10: 100
11: 200.
Prepare two hashes, mapping each row to the first index at which it appears and the last index at which it appears. Iterate through the keys and take the one with maximum last - first.
000: first is at 0, last is at 0
100: first is at 1, last is at 10
110: first is at 2, last is at 5
210: first is at 3, last is at 9
200: first is at 8, last is at 11
The best key is 100, since its sublist has length 9. The sublist is the (1+1)th element to the 10th.
This works because a sublist is balanced if and only if its first and last unnormalized histograms are the same up to adding a constant, which occurs if and only if the first and last normalized histograms are identical.
If the memory usage is not important, it's easy...
You can give the matrix dimensions N*p and save in column (i) the value corresponding to how many elements p is looking between (i) first element in the second vector...
After completing the matrix, you can search for column i that all of the elements in column i are not different. The maximum i is the answer.
With randomization, you can get it down to linear time. The idea is to replace each of the P values with a random integer, such that those integers sum to zero. Now look for two prefix sums that are equal. This allows some small chance of false positives, which we could remedy by checking our output.
In Python 2.7:
# input:
vec1 = [1, 2, 3]
P = len(vec1)
vec2 = [1, 2, 1, 3, 2, 1, 3, 1, 2, 3, 1]
N = len(vec2)
# Choose big enough integer B. For each k in vec1, choose
# a random mod-B remainder r[k], so their mod-B sum is 0.
# Any P-1 of these remainders are independent.
import random
B = N*N*N
r = dict((k, random.randint(0,B-1)) for k in vec1)
s = sum(r.values())%B
r[vec1[0]] = (r[vec1[0]]+B-s)%B
assert sum(r.values())%B == 0
# For 0<=i<=N, let vec3[i] be mod-B sum of r[vec2[j]], for j<i.
vec3 = [0] * (N+1)
for i in range(1,N+1):
vec3[i] = (vec3[i-1] + r[vec2[i-1]]) % B
# Find pair (i,j) so vec3[i]==vec3[j], and j-i is as large as possible.
# This is either a solution (subsequence vec2[i:j] is uniform) or a false
# positive. The expected number of false positives is < N*N/(2*B) < 1/N.
(i, j)=(0, 0)
first = {}
for k in range(N+1):
v = vec3[k]
if v in first:
if k-first[v] > j-i:
(i, j) = (first[v], k)
else:
first[v] = k
# output:
print "Found subsequence from", i, "(inclusive) to", j, "(exclusive):"
print vec2[i:j]
print "This is either uniform, or rarely, it is a false positive."
Here is an observation: you can't get a uniformly distributed sequence that is not a multiplication of P in length. This implies that you only have to check the sub-sequences of N that are P, 2P, 3P... long - (N/P)^2 such sequences.
You can get this down to O(N) time, with no dependence on P by enhancing uty's solution.
For each row, instead of storing the normalized counts of each element, store a hash of the normalized counts while only keeping the normalized counts for the current index. During each iteration, you need to first update the normalized counts, which has an amortized cost of O(1) if each decrement of a count is paid for when it is incremented. Next you recompute the hash. The key here is that the hash needs to be easily updatable following an increment or decrement of one of the elements of the tuple that is being hashed.
At least one way of doing this hashing efficiently, with good theoretical independence guarantees is shown in the answer to this question. Note that the O(lg P) cost for computing the exponential to determine the amount to add to the hash can be eliminated by precomputing the exponentials modulo the prime in with a total running time of O(P) for the precomputation, giving a total running time of O(N + P) = O(N).

how to find the maximum L sum in a matrix?

here is another dynamic programming problem that find the maximum L(chess horse - 4 item) sum in the given matrix (m x n)
For example :
1 2 3
4 5 6
7 8 9
L : (1,2,3,6), (1,4,5,6), (1,2,5,8), (4,5,6,9) ...
and the biggest sum is sum(L) = sum(7,8,9,6) = 30
what is the O(complexity) of the optimal solution ?
it looks like this problem (submatrix with maximum sum)
Say all items are positive
Both positive and negative
Any ideas are welcome!
If your L is constant size (4 elements, as you say), just compute its sum over all < n*m positions and find the maximum one. Repeat for the 8 different orientations you could have. That's O(nm) overall.

Resources