How to efficiently apply XOR to two integer arrays? - algorithm

I have two arrays as follows:
A = [1,2,35,4,32,1,2,56,43,2,21]
B = [1,2,35,4,32,1,2,56,43,45,1]
As we can see that A and B has initial subsequence same till element 43. My end goal is to calculate the XOR of last uncommon elements of both of these sequences. Here, my goal is to find XOR of {2,21,45,1}.
Currently, my approach is to store running XOR of both of these arrays in two separate Arrays (say, RESA[], & RESB[]) and then when ever I am asked to find the the XOR of A[0-10] & B[0-9], I just quickly perform a single XOR operation as follows:
RESA[10] ^ RESB[9]
This works because while XORing, common elements cancels out.
My problem here is, what if in every query a threshold T is passed. For example, in this case, if the threshold passed is 32 the I have to filter elements that are less than 32 in both A and B and then apply XORing of all such elements. This definitely increases the complexity, and I cannot apply my earlier logic of keeping running XORs of elements.
Please let me know if you have any ideas on how to leverage XOR properties to come up with a constant time approach as before when there was no thresholds.

You have already worked you that you can find the XOR of the uncommon elements by computing the XOR of every element in the two arrays.
XOR is a commutative and associative operator so we can reorder the arrays in any way we like and still have the same total XOR.
In particular, we can reverse sort each array, and then compute the running XOR of each sorted array.
With this preprocessing we can now compute the XOR of all elements above a threshold by using binary search on each sorted array to find how many elements above T, followed by a lookup into the running XOR array.
This gives an O(logn) complexity for each query.
Extension
The above answer assumes that the query is just the threshold 32: i.e. the start is always 0, and the end is always the length of each sequence. (I assume this because the question says the final goal is to compute the XOR of all uncommon elements.)
If the query also consisted of the start and end of the region to be XORed I would suggest a different approach that requires more storage (because it requires all queries to be buffered and sorted):
Sort all the queries by threshold
Maintain a segment tree of the XOR for each sequence, intialized to 0.
Add the values into the sequences in decreasing order, and perform the queries as soon as all values above their threshold have been inserted.
For example, the segment tree for a sequence C=[1,2,35,4,32,1,2,56] would contain:
1
2
35
4
32
1
2
56
1^2
35^4
32^1
2^56
1^2^35^4
32^1^2^56
1^2^35^4^32^1^2^56
Once we have these values we can compute the XOR of any range using log(n) steps. For example, suppose we wanted to compute the XOR of C[1:3] = [2,35,4]. We can do this by xoring 2 with 35^4.

Related

Number of subsets whose XOR contains less than two set bits

I have an Array A(size <= 10^5) of numbers(<= 10^8), and I need to answer some queries(50000), for L, R, how many subsets for elements in the range [L, R], the XOR of the subset is a number that has 0 or 1 bit set(power of 2). Also, point modifications in the array are being done in between the queries, so can't really do some offline processing or use techniques like square root decomposition etc.
I have an approach where I use DP to calculate for a given range, something on the lines of this:
https://www.geeksforgeeks.org/count-number-of-subsets-having-a-particular-xor-value/
But this is clearly too slow. This feels like a classical segment tree problem, but can't seem to find as to what data points to store at each node, so that I can use the left child and right child to compute the answer for the given range.
Yeah, that DP won't be fast enough.
What will be fast enough is applying some linear algebra over GF(2), the Galois field with two elements. Each number can be interpreted as a bit-vector; adding/subtracting vectors is XOR; scalar multiplication isn't really relevant.
The data you need for each segment is (1) how many numbers are there in the segment (2) a basis for the subspace of numbers generated by numbers in the segment, which will consist of at most 27 numbers because all numbers are less than 2^27. The basis for a one-element segment is just that number if it's nonzero, else the empty set. To find the span of the union of two bases, use Gaussian elimination and discard the zero vectors.
Given the length of an interval and a basis for it, you can count the number of good subsets using the rank-nullity theorem. Basically, for each target number, use your Gaussian elimination routine to test whether the target number belongs to the subspace. If so, there are 2^(length of interval minus size of basis) subsets. If not, the answer is zero.

Or of all pairs formed by taking xor of all the numbers in a list.

Or of all pairs formed by taking xor of all the numbers in a list.
eg : 10,15,17
ans = (10^15)|(15^17)|(10^17) = 31 . i have made an o(n*k) algo but need something better than that(n is number of entries and k is no. of bits in each number) .
It may be easiest to think in negatives here.
XOR is basically "not equal to"--i.e., it produces a result of 1 if and only if the two input bits are not equal to each other.
Since you're ORing all those results together, it means you get a 1 bit in the result anywhere there are at least two inputs that have different values at that bit position.
Inverting that, it means that we get a zero in the result only where every input has the same value at that bit position.
To compute that we can accumulate two intermediate values. For one, we AND together all the inputs. This will give us the positions at which every input had a one. For the other, we invert every input, and AND together all those results. This will tell us every position at which all the inputs had the value 0.
OR those together, and we have a value with a 1 where every input was equal, and a zero otherwise.
Invert that, and we get the desired result: 0 where all inputs were equal, and 1 where any was different.
This lets us compute the result with linear complexity (assuming each input value fits into a single word).

Given n-1*n array, find missing number

Here each row contains a bit representation of a number.These numbers come from 1..N Exactly one number is missing.Find the bit representation of the missing number.
The interviewer asked me this question.
I said: "We can find the sum of the given numbers and subtract it from the sum of first n numbers(which we know as (N*(N+1))/2)"
He said that involves changing from base 10 to base 2.
Can you give me a hint on how I can solve it without changing bases?
You can XOR together all numbers from 0..N range, then XOR the numbers from the array. The result will be the missing number.
Explanation: XORing a number with itself always results in zero. The algorithm above XORs each number exactly twice, except for the missing one. The missing number will be XOR-ed with zero exactly once, so the result is going to equal the missing number.
Note: the interviewer is wrong on needing to convert bases in order to do addition: adding binary numbers is easy and fun - in fact, computers do it all the time :-)
You can just XOR these numbers together, and XOR with 1..n. The fact that the numbers are stored in binary is a good hint, BTW.
In fact, any commutative operator with a inverse should work, since if the operator is commutative, the order does not matter, so it can be applied to the numbers you have and 1..n, with the difference being the first one is not operated on the number that is not in the array. Then you can use its inverse to find that number, with the two results you have. SO + and -, * and /, XOR and XOR and any other operators that meets the requirement all should work here.

Different Combinations algorithm (Candy Splitting)

Yesterday I paticipated in the Google code jam contest. There were that candy splitting problem.
http://code.google.com/codejam/contest/dashboard?c=975485#s=p2
I designed an algorithm that basically tries all different combinations for Patrick's pile and Sean's pile, checks if they have the same Patrick value, and finally choose the combinations that would maximize Sean's share. The algorithm worked well for the small imput file. For the large one, I got memory problems and the output never showed up. I believe there muct be another approach to this problem that wouldnt require considering all combinations. Can anyone help?
For the small input, the number of candies are small (upto 15). A search of all possible cases will consist of 2^15 = 32768 possibilities, which can be checked within a millisecond or so. But with upto 1000 candies (large input), the number of possible combinations go upto 2^1000 = 10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376. Now this number is a little too big, and even if your run the program for a few years, you are not going to get the result.
There are some observations which help in making an efficient program for this:
Like #Protostome pointed out, the sum that Patrick's sum is actually an xor operation.
Again like #Protostome pointed out, if it is solvable, the xor of all the candies will be 0. Reason is this: if it is possible to have the same xor sum in the two partitions, then taking the xor of both the partitions will be a xor a = 0.
If it is possible to partition, then the xor sum of all candies is 0. But, if we remove a single candy from the set of entire candies, it becomes non-zero. Particularly,
.
c1 xor c2 xor ... xor ci-1 xor ci xor ci+1 xor ... xor cn = 0
=> c1 xor c2 xor ... xor ci-1 xor ci+1 xor ... xor cn = ci
That is, we can partition the set into two, by taking out a single candy from the entire set. To maximize the arithmatic sum of the left half, we have to take the candy with the lowest value. So, arithmatic sum of candies in the higher pile is sum of all candies - value of lowest!
Therefore, we have:
If the xor of all candies is zero, it is solvable
If it is solvable, sum is sum of entire list - lowest value.

The best possible searching technique for a large range of input data

There are 1Billion numbers from 1 to 1Billion but there is one number missing. They are all randomly available, how will you find the missing one in best possible way.
randomly available means are they randomly distributed throughout the array (i.e. 5984,1,10937658, 20 ...)
These are only theoretical considerations,but if the numbers are sorted 1,2,3...1B so you just can split your number group into to parts 1 ... 0.5B and 0.5B ... 1B then check how much elements are in first group: if there are less than 0.5B elements thats means missing value is between 1 and 0.5B, if there are 0.5B elements thats means missing value is between 0.5B and 1B. Go ahead with the process until you find the missing value.
I do not know whether this is a very quick way, but it is certainly faster than checking each value :D
Maybe it puts you on the road
If limited memory is concern,
Start XORing each number from initially. Then XOR with 1 to 1B. The number that will remain is the missing number.
Something like this:
Input-1 XOR Input-2 XOR Input-3 XOR Input-last XOR ... XOR 1 XOR 2 XOR...XOR 1B.
If you have ample of memory, sort all numbers and search sequentially.
First one is O(N) while second one is O(NlogN)
Smaller set example:
1 xor 3 xor 1 xor 2 xor 3 => 2

Resources