Finding pairs with smallest XOR values from a list - algorithm

I am working on a problem in which I am expected to take the xor of
all the pair of integers in an array and then find the K smallest
integers produced from xor'ing. The size of the array can be N=100000
and so K can be quite large but its limited to 250000.
For example,
if N=5 and K=4,
our array is {1 3 2 4 2}
The numbers resulting from xoring(1 and 3, 1-2, 1-4, 1-2, 3-2, 3-4, 3-2 etc)
3 3 2 5 0 1 6 1 6 7
Since K=4, we have to print 4 smallest integers.
so the answer would be 0 1 1 2.
Since the time limit is 2 sec and very tight, using the brute force approach
of xoring all the numbers would time out. My approach was wrong and so I need
help. May be we can exploit the limit on K=250000 and want to know if it is
possible to get the K smallest numbers without xoring all the integers.

(x ^ y) == (x | y) - (x & y) >= |y - x|
Sorting your numbers in order would be a start, because the difference between the pairs will give you a lower bound for the xor, and therefore a cutoff point for when to stop looking for numbers to xor x with.
There is also a shortcut to looking for pairs of numbers whose xor is less than (say) a power of 2, because you're only interested in x <= y <= x | (2 ^ N - 1). If this doesn't give you enough pairs, increase N and try again.
EDIT: You can of course exclude the pairs of numbers that you already found whose xor is less than the previous power of 2, by using x | (2 ^ (N - 1) - 1) < y <= x | (2 ^ N) - 1.
Example based on (sorted) [1, 2, 2, 3, 4]
Start by looking for pairs of numbers whose xor is less than 1: for each number x, search for subsequent numbers y = x. This gives {2, 2}.
If you need more than one pair, look for pairs of numbers whose xor is less than 2 but not less than 1: for each number x, search for numbers x < y <= x | 1. This gives {2, 3} (twice).
Note that the final xor values aren't quite sorted, but each batch is strictly less than the previous batch.
If you need more than that, look for pairs of numbers whose xor is less than 4 but not less than 2: for each number x, search for numbers x | 1 < y <= x | 3. This gives {1, 2} (twice); {1, 3}.
If you need more than that, look for pairs of numbers whose xor is less than 8 but not less than 4: for each number x, search for numbers x | 3 < y <= x | 7. This gives {1, 4}; {2, 4} (twice); {3, 4}.

Notice that if the all bits to the left of bit n (counting from the right) of numbers x and y are equal, x xor y ≤ 2n-1
x = 0000000000100110
y = 0000000000110010
^Everything to the left of bit 5 is equal
so x xor y ≤ 25-1 = 31
This can be exploited by storing every number in a bitwise-trie - that is, a trie where every edge is either a 0 or a 1. Then x xor y ≤ 2d(x,y)-1, where d(x,y) is the number of steps we need to move up to find the least-common ancestor of x and y.
root
(left-most bit)
0
/
0
/
...
1
/ \
0 1
/ /
0 0
... ...
/ /
0 0
x y
x and y share an ancestor-node that is 5 levels up, so d(x,y) is 5
Once you have the trie, it's easy to find all pairs such that d(x,y) = 1 - just navigate to all nodes 1 level above the leaves, and compare each of that node's children to each other. Those values will give you a max x xor y of 21-1 = 1.
If you still don't have k values, then move up to all nodes 2 levels above the leaves, and compare each of that node's grandchildren to each other†. Those values will give you a max x xor y of 22-1 = 3.
† (Actually, you only need to compare each of the leaves in the left-subtree with each of the leaves in the right-subtree, since each of the leaves in a given subtree have already been compared against each other)
Continue this until, after checking all nodes for a given level, you have at least k values of x xor y. Then sort that list of values, and take the k smallest.
When k is small (<< n2), this algorithm is O(n). For large k, it is O(2bn), where b is the number of bits per integer (assuming there are not many duplicates).

I would approach this by first sorting the input array of integers. Then, the pairs with the smallest xor values will be next to each other (but not all adjacent pairs will have the smallest xor values). You can start with adjacent pairs, then work outwards, checking pairs (N, N+2), (N, N+3), until you have reached your desired list of K smallest results.
For your sample array {1 3 2 4 2}, the sorted array is {1 2 2 3 4} and the pairwise xor values are:
1 xor 2 = 3
2 xor 2 = 0
2 xor 3 = 1
3 xor 4 = 7
For the next step,
1 xor 2 = 3
2 xor 3 = 1
2 xor 4 = 6
and again,
1 xor 3 = 2
2 xor 4 = 6
finally,
1 xor 4 = 5
This idea isn't complete, but you should be able to use it to help construct a full solution.

Related

Find a subset of length K from a set of N numbers (1...N) where k<=N whose XOR value is X

Write an algorithm to find a subset of length K {A1, A2 ... , AK} from a set of N numbers (1...N) where K<=N and A1^A2..AK-1^Ak is X (where a^b represents bitwise XOR of a and b). Print the subset in any order. Print -1 if no subset is possible.
Constraints: 1 <= K <= N <= 10^6
e.g
if N=5, K=4, X=5
Output:
1 2 3 5
if N=5, K=5, X=5
Output: -1
I have tried the following solution:
Find all possible subset of length K
Check if xor sum of any subset = X
The complexity of the above solution O(C(n, k)) which is not optimal for large N value.
Is there any linear time solution to this problem. Please help.

XOR of numbers = X

I found this problem in a hiring contest(which is over now). Here it is:
You are given two natural numbers N and X. You are required to create an array of N natural numbers such that the bitwise XOR of these numbers is equal to X. The sum of all the natural numbers that are available in the array is as minimum as possible.
If there exist multiple arrays, print the smallest one
Array A< Array B if
A[i] < B[i] for any index i, and A[i]=B[i] for all indices less than i
Sample Input: N=3, X=2
Sample output : 1 1 2
Explanation: We have to print 3 natural numbers having the minimum sum Thus the N-spaced numbers are [1 1 2]
My approach:
If N is odd, I put N-1 ones in the array (so that their xor is zero) and then put X
If N is even, I put N-1 ones again and then put X-1(if X is odd) and X+1(if X is even)
But this algorithm failed for most of the test cases. For example, when N=4 and X=6 my output is
1 1 1 7 but it should be 1 1 2 4
Anyone knows how to make the array sum minimum?
In order to have the minimum sum, you need to make sure that when your target is X, you are not cancelling the bits of X and recreating them again. Because this will increase the sum. For this, you have create the bits of X one by one (ideally) from the end of the array. So, as in your example of N=4 and X=6 we have: (I use ^ to show xor)
X= 7 = 110 (binary) = 2 + 4. Note that 2^4 = 6 as well because these numbers don't share any common bits. So, the output is 1 1 2 4.
So, we start by creating the most significant bits of X from the end of the output array. Then, we also have to handle the corner cases for different values of N. I'm going with a number of different examples to make the idea clear:
``
A) X=14, N=5:
X=1110=8+4+2. So, the array is 1 1 2 4 8.
B) X=14, N=6:
X=8+4+2. The array should be 1 1 1 1 2 12.
C) X=15, N=6:
X=8+4+2+1. The array should be 1 1 1 2 4 8.
D) X=15, N=5:
The array should be 1 1 1 2 12.
E) X=14, N=2:
The array should be 2 12. Because 12 = 4^8
``
So, we go as follows. We compute the number of powers of 2 in X. Let this number be k.
Case 1 - If k <= n (example E): we start by picking the smallest powers from left to right and merge the remaining on the last position in the array.
Case 2 - If k > n (example A, B, C, D): we compute h = n - k. If h is odd we put h = n-k+1. Now, we start by putting h 1's in the beginning of the array. Then, the number of places left is less than k. So, we can follow the idea of Case 1 for the remaining positions. Note that in case 2, instead of having odd number of added 1's we put and even number of 1's and then do some merging at the end. This guarantees that the array is the smallest it can be.
We have to consider that we have to minimize the sum of the array for solution and that is the key point.
First calculate set bits in N suppose if count of setbits are less than or equal to X then divide N in X integers based on set bits like
N = 15, X = 2
setbits in 15 are 4 solution is 1 14
if X = 3 solution is 1 2 12
this minimizes array sum too.
other case if setbits are greater than X
calculate difference = setbits(N) - X
If difference is even then add ones as needed and apply above algorithm all ones will cancel out.
If difference is odd then add ones but now you have take care of that 1 extra one in the answer array.
Check for the corner cases too.

Can we find the Bitwise XOR of all sub-arrays of an integer array in O(n) time?

How to find the Bitwise XOR of the value of all sub arrays of array A
A = [1,2]
Output : 0
Explanation :
Sub Arrays :`[1], [2], [1,2]` (XOR of all subarrays = 0)
To answer the question, one has to decide, for each element, whether it appears in an odd or an even number of subarrays. It odd, it will appear in the xor sum, and if even it won't.
The element at position i will be included in (i+1) * (n-i) subarrays. That's because any subarray that includes i starts at index 0, 1, ..., i. And ends at index i, i+1, ..., n-1. Now (i+1) * (n-1) = i(n-1) + i*i + n = (i+1)n (mod 2) since for x^2 = x (mod 2) for any x.
So if n is even, no element appears in an odd number of subarrays. If n is odd, elements at even indices appear in an odd number of subarrays.
So:
def xor_all_subarrays(A):
if len(A) % 2 == 0:
return 0
r = 0
for i in xrange(0, len(A), 2):
r ^= A[i]
return r
If with subarrays you mean powersets, you can use the fact that:
for a list of size n, there are 2n lists and each element occurs in 2n/2; and
the bitwise xor operation is commutative and associative: x ^ y ^ z is equal to z ^ x ^ y.
Now if the list is larger than one element, every element occurs: 2n/2 times which is a power of two. If you xor an element two times, the result is 0: x ^ x = 0 for every x. So since it is a power of two (larger than or equal to two), the xoring of every element a power of two, the result will be 0. In case there is one element, the two subarrays are [] and [x] so in that case the outcome is x. So a fast algorithm is:
def xor_subarrays_powerset(data):
if len(data) == 1:
return data[0]
else:
return 0
In the case these are contiguousness lists based on there index, the story is a bit different:
here element j (zero indexed) will be in:
n
---
\
/ min(j+1,n,i-n+1,n-j)
---
i=1
Indeed if you have a list [1,2,3,4]: there are the following "windows":
1,2,3,4
x
x
x
x
1 1 1 1
x x
x x
x x
1 2 2 1
x x x
x x x
1 2 2 1
x x x x
1 1 1 1
-------
4 6 6 4
and for a list with length 5
1,2,3,4,5
x
x
x
x
x
1 1 1 1 1
x x
x x
x x
x x
1 2 2 2 1
x x x
x x x
x x x
1 2 3 2 1
x x x x
x x x x
1 2 2 2 1
x x x x x
1 1 1 1 1
---------
5 8 9 8 5
So what do we note:
for a list, the first and last element is always counted n times. This is logical since every moving window only passes once through these elements.
the second and one but last element are always counted 2×(n-2)+2 times. Since all windows except the smallest and the largest pass two times;
the third and the two but last elements are always counted 3× (n-4)+2×2+2 times;
the fourth and the three but last elements are always counted 4×(n-6)+3×2+2×2+2; and in general:
the i-th and n-i-th elements (indexed zero, i&leq;n/2) are counted (i+1)×(n-2×i)+.... The ... is not important since these are all multiples of two. Since an element xored with itself is 0, multiples of two do not count.
So now we only need to determine whether the elements contribute even, or odd to the total. We know that if the list has an even length, n is even, and thus all (n-2×i), so that means that no element will contribute and the result is thus 0. In case the list is odd, the first element will contribute odd (because (i+1)×(n-2×i) is odd) and the next element will contribute even, the next element will again contribute odd.
So if the list has an odd length, it means that we only need to xor over the elements positioned at 0, 2, 4,... We can do this with:
from itertools import islice
def xor_subarrays_contiguousness(data):
if len(data)&1:
r = 0
for e in islice(data,0,None,2):
r ^= e
return r
else:
return 0

In how many ways can you construct an array of size N such that the product of any pair of consecutive elements in not greater than M?

Every element is an integer and should have a value of at least 1.
Constraints: 2 ≤ N ≤ 1000 and 1 ≤ M ≤ 1000000000.
We need to find the answer modulo 1000000007
May be we can calculate dp[len][type][typeValue], where type have only two states:
type = 0: this is means, that last number in sequence with length len equal or smaller than sqrt(M). And this number we save in typeValue
type = 1: this is means, that last number in sequence bigger than sqrt(M). And we save in typeValue number k = M / lastNumber (rounded down), which not greater than sqrt(M).
So, this dp have O(N sqrt(M)) states, but how can we calculate each 'cell' of this dp?
Firstly, consider some 'cell' dp[len][0][number]. This value can calculate as follows:
dp[len][0][number] = sum[1 <= i <= sqrt(M)] (dp[len - 1][0][i]) + sum[number <= i <= sqrt(M)] (dp[len - 1][1][i])
Little explanation: beacuse type = 0 => number <= sqrt(M), so we can put any number not greater than sqrt(M) next and only some small number greater.
For the dp[len][1][number] we can use next equation:
dp[len][1][k] = sum[1 <= i <= k] (dp[len - 1][0][i] * cntInGroup(k)) where cntInGroup(k) - cnt numbers x such that M / x = k
We can simply calculate cntInGroups(k) for all 1 <= k <= sqrt(M) using binary search or formulas.
But another problem is that out algorithm needs O(sqrt(M)) operations so result asymptotic is O(N M). But we can improve that.
Note that we need to calculate sum of some values on segments, which were processed on previous step. So, we can precalculate prefix sums in advance and after that we can calculate each 'cell' of dp in O(1) time.
So, with this optimization we can solve this problem with asymptotic O(N sqrt(M))
Here is an example for N = 4, M = 10:
1 number divides 10 into 10 equal parts with a remainder less than the part
1 number divides 10 into 5 equal parts with a remainder less than the part
1 number divides 10 into 3 equal parts with a remainder less than the part
2 numbers divide 10 into 2 equal parts with a remainder less than the part
5 numbers divide 10 into 1 part with a remainder less than the part
Make an array and update it for each value of n:
N 1 1 1 2 5
----------------------
2 10 5 3 2 1 // 10 div 1 ; 10 div 2 ; 10 div 3 ; 10 div 5,4 ; 10 div 6,7,8,9,10
3 27 22 18 15 10 // 10+5+3+2*2+5*1 ; 10+5+3+2*2 ; 10+5+3 ; 10+5 ; 10
4 147 97 67 49 27 // 27+22+18+2*15+5*10 ; 27+22+18+2*15 ; 27+22+18 ; 27+22 ; 27
The solution for N = 4, M = 10 is therefore:
147 + 97 + 67 + 2*49 + 5*27 = 544
My thought process:
For each number in the first array position, respectively, there could be the
following in the second:
1 -> 1,2..10
2 -> 1,2..5
3 -> 1,2,3
4 -> 1,2
5 -> 1,2
6 -> 1
7 -> 1
8 -> 1
9 -> 1
10 -> 1
Array position 3:
For each of 10 1's in col 2, there could be 1 of 1,2..10
For each of 5 2's in col 2, there could be 1 of 1,2..5
For each of 3 3's in col 2, there could be 1 of 1,2,3
For each of 2 4's in col 2, there could be 1 of 1,2
For each of 2 5's in col 2, there could be 1 of 1,2
For each of 1 6,7..10 in col 2, there could be one 1
27 1's; 22 2's; 18 3's; 15 4's; 15 5's; 10 x 6's,7's,8's,9's,10's
Array position 4:
1's = 27+22+18+15+15+10*5
2's = 27+22+18+15+15
3's = 27+22+18
4's = 27+22
5's = 27+22
6,7..10's = 27 each
Create a graph and assign the values from 0 to M to the vertices. An edge exists between two vertices if their product is not greater than M. The number of different arrays is then the number of paths with N steps, starting at the vertex with value 0. This number can be computed using a simple depth-first search.
The question is now whether this is efficient enough and whether it can be made more efficient. One way is to restructure the solution using matrix multiplication. The matrix to multiply with represents the edges above, it has a 1 when there is an edge, a 0 otherwise. The initial matrix on the left represents the starting vertex, it has a 1 at position (0, 0), zeros everywhere else.
Based on this, you can multiply the right matrix with itself to represent two steps through the graph. This means that you can combine two steps to make them more efficient, so you only need to multiply log(N) times, not N times. However, make sure you use known efficient matrix multiplication algorithms to implement this, the naive one will only perform for small M.

Counting subarray have sum in range [L, R]

I am solving a competitive programming problem, it was described like this:
Given n < 10^5 integer a1, a2, a3, ..., an and L, R. How many
subarrays are there such that sum of its element in range [L, R].
Example:
Input:
n = 4, L = 2, R = 4
1 2 3 4
Output: 4
(4 = 4, 3 = 1 + 2 = 3, 2 = 2)
One solution I have is bruteforce, but O(n^2) is too slow. What data structures / algorithms should I use to solve this problem efficiently ?
Compute prefix sums(p[0] = 0, p[1] = a1, p[2] = a1 + a2, ..., p[n] = sum of all numbers).
For a fixed prefix sum p[i], you need to find the number of such prefix sums p[j] that j is less than i and p[i] - R <= p[j] <= p[i] - L. One can do it in O(log n) with treap or another balanced binary search tree.
Pseudo code:
treap.add(0)
sum = 0
ans = 0
for i from 1 to n:
sum += a[i]
left, right = treap.split(sum - R)
middle, right = right.split(sum - L)
ans += middle.size()
merge left, middle and right together
treap.add(sum)
We can do it in linear time if the array contains positive numbers only.
First build an array with prefix sum from left to right.
1. Fix three pointers, X, Y and Z and initialize them with 0
2. At every step increase X by 1
3. While sum of numbers between X and Y are greater than R keep increasing Y
4. While sum of numbers between X and Z are greater than or equal to L, keep increasing Z
5. If valid Y and Z are found, add Z - Y + 1 to result.
6. If X is less than length of the array, Go to step 2.

Resources