Prove XOR doesn't work for finding a missing number (interview question)? - algorithm

Interview question: you're given a file of roughly one billion unique numbers, each of which is a 32-bit quantity. Find a number not in the file.
When I was approaching this question, I tried a few examples with 3-bit and 4-bit numbers. For the examples I tried, I found that when I XOR'd the set of numbers, I got a correct answer:
a = [0,1,2] # missing 3
b = [1,2,3] # missing 0
c = [0,1,2,3,4,5,6] # missing 7
d = [0,1,2,3,5,6,7] # missing 4
functools.reduce((lambda x, y: x^y), a) # returns 3
functools.reduce((lambda x, y: x^y), b) # returns 0
functools.reduce((lambda x, y: x^y), c) # returns 7
functools.reduce((lambda x, y: x^y), d) # returns 4
However, when I coded this up and submitted it, it failed the test cases.
My question is: in an interview setting, how can I confirm or rule out with certainty that an approach like this is not a viable solution?

In all your examples, the array is missing exactly one number. That's why XOR worked. Try not to test with the same property.
For the problem itself, you can construct a number by taking the minority of each bit.
EDIT
Why XOR worked on your examples:
When you take the XOR for all the numbers from 0 to 2^n - 1 the result is 0 (there are exactly 2^(n-1) '1' in each bit). So if you take out one number and take XOR of all the rest, the result is the number you took out because taking XOR of that number with the result of all the rest needs to be 0.

Assuming a 64-bit system with more than 4gb free memory, I would read the numbers into an array of 32-bit integers. Then I would loop through the numbers up to 32 times.
Similarly to an inverse ”Mastermind” game, I would construct a missing number bit-by-bit. In every loop, I count all numbers which match the bits, I have chosen so far and a subsequent 0 or 1. Then I add the bit which occurs less frequently. Once the count reaches zero, I have a missing number.
Example:
The numbers in decimal/binary are
1 = 01
2 = 10
3 = 11
There is one number with most-significant-bit 0 and two numbers with 1. Therefore, I take 0 as most significant bit.
In the next round, I have to match 00 and 01. This immediately leads to 00 as missing number.
Another approach would be to use a random number generator. Chances are 50% that you find a non-existing number as first guess.

Proof by counterexample: 3^4^5^6=4.

Related

Quick way to compute n-th sequence of bits of size b with k bits set?

I want to develop a way to be able to represent all combinations of b bits with k bits set (equal to 1). It needs to be a way that given an index, can get quickly the binary sequence related, and the other way around too. For instance, the tradicional approach which I thought would be to generate the numbers in order, like:
For b=4 and k=2:
0- 0011
1- 0101
2- 0110
3- 1001
4-1010
5-1100
If I am given the sequence '1010', I want to be able to quickly generate the number 4 as a response, and if I give the number 4, I want to be able to quickly generate the sequence '1010'. However I can't figure out a way to do these things without having to generate all the sequences that come before (or after).
It is not necessary to generate the sequences in that order, you could do 0-1001, 1-0110, 2-0011 and so on, but there has to be no repetition between 0 and the (combination of b choose k) - 1 and all sequences have to be represented.
How would you approach this? Is there a better algorithm than the one I'm using?
pkpnd's suggestion is on the right track, essentially process one digit at a time and if it's a 1, count the number of options that exist below it via standard combinatorics.
nCr() can be replaced by a table precomputation requiring O(n^2) storage/time. There may be another property you can exploit to reduce the number of nCr's you need to store by leveraging the absorption property along with the standard recursive formula.
Even with 1000's of bits, that table shouldn't be intractably large. Storing the answer also shouldn't be too bad, as 2^1000 is ~300 digits. If you meant hundreds of thousands, then that would be a different question. :)
import math
def nCr(n,r):
return math.factorial(n) // math.factorial(r) // math.factorial(n-r)
def get_index(value):
b = len(value)
k = sum(c == '1' for c in value)
count = 0
for digit in value:
b -= 1
if digit == '1':
if b >= k:
count += nCr(b, k)
k -= 1
return count
print(get_index('0011')) # 0
print(get_index('0101')) # 1
print(get_index('0110')) # 2
print(get_index('1001')) # 3
print(get_index('1010')) # 4
print(get_index('1100')) # 5
Nice question, btw.

Check the Number of Powers of 2

I have a number X , I want to check the number of powers of 2 it have ?
For Ex
N=7 ans is 2 , 2*2
N=20 ans is 4, 2*2*2*2
Similar I want to check the next power of 2
For Ex:
N=14 Ans=16
Is there any Bit Hack for this without using for loops ?
Like we are having a one line solution to check if it's a power of 2 X&(X-1)==0,similarly like that ?
GCC has a built-in instruction called __builtin_clz() that returns the number of leading zeros in an integer. So for example, assuming a 32-bit int, the expression p = 32 - __builtin_clz(n) will tell you how many bits are needed to store the integer n, and 1 << p will give you the next highest power of 2 (provided p<32, of course).
There are also equivalent functions that work with long and long long integers.
Alternatively, math.h defines a function called frexp() that returns the base-2 exponent of a double-precision number. This is likely to be less efficient because your integer will have to be converted to a double-precision value before it is passed to this function.
A number is power of two if it has only single '1' in its binary value. For example, 2 = 00000010, 4 = 00000100, 8 = 00001000 and so on. So you can check it using counting the no. of 1's in its bit value. If count is 1 then the number is power of 2 and vice versa.
You can take help from here and here to avoid for loops for counting set bits.
If count is not 1 (means that Value is not power of 2) then take position of its first set bit from MSB and the next power of 2 value to this number is the value having only set bit at position + 1. For example, number 3 = 00000011. Its first set bit from MSB is 2nd bit. Therefore the next power of 2 number is a value having only set bit at 3rd position. i.e. 00000100 = 4.

How to make space complexity as O(1)

I am trying to answer for the below question : You have an array of integers, such that each integer is present an odd number of time, except 3 of them. Find the three numbers.
so far I came with the brute force method :
public static void main(String[] args) {
// TODO Auto-generated method stub
int number[] = { 1, 6, 4, 1, 4, 5, 8, 8, 4, 6, 8, 8, 9, 7, 9, 5, 9 };
FindEvenOccurance findEven = new FindEvenOccurance();
findEven.getEvenDuplicates(number);
}
// Brute force
private void getEvenDuplicates(int[] number) {
Map<Integer, Integer> map = new HashMap<Integer, Integer>();
for (int i : number) {
if (map.containsKey(i)) {
// a XOR a XOR a ---- - -- - - odd times = a
// a XOR a ---- -- -- --- - even times = 0
int value = map.get(i) ^ i;
map.put(i,value);
} else {
map.put(i, i);
}
}
for (Entry<Integer, Integer> entry : map.entrySet()) {
if (entry.getValue() == 0) {
System.out.println(entry.getKey());
}
}
}
It works fine but not efficient.
The o/p :
1
5
6
8
But the questions specifies we need to do this in O(1) space and O(N) time complexity. For my solution, the time complexity is O(N) but space also O(N). Can some one suggest me a better way of doing this with O(1) space ?
Thanks.
I spent some time solving this problem. Seems that I found solution. In any case I believe, that community will help me to check ideas listed below.
First of all, I claim that we can solve this problem when the number of non-paired integers is equal to 1 or 2. In case of 1 non-paired integer we just need to find XOR of all array elements and it'll be the answer. In case of 2 non-paired integers solution becomes more complicated. But it was already discussed earlier. For example you can find it here.
Now let's try to solve problem when the number of non-paired integers is equal to 3.
At the beginning we also calculate XOR of all elements. Let's denote it as X.
Consider the i-th bit in X. I assume that it's equal to 0. If it's equal to 1 the next procedure is practically the same, we just change 0 to 1 and vice versa.
So, if the i-th in X bit is equal to 0 we have two possible situations. One situation is when all non-paired integers have 0 in the i-th bit. Another situation is when one non-paired integer has 0 in the i-th bit, and two non-paired integers have 1 in i-th bit. This statement is based on simple XOR operation properties. So we have one or three non-paired integers with 0 in the i-th bit.
Now let's divide all elements into the two groups. The first group is for integers with 0 in the i-th bit position, the second is for integers with 1 in the i-th bit position. Also our first group contains one or three non-paired integers with '0' in the i-th bit.
How we can obtain the certain number of non-paired integers in the first group? We just need to calculate XOR of all elements in the second group. If it's equal to zero, than all non-paired integers are in the first group and we need to check another i. In other case only one non-paired integer is in the first group and two others are in the second and we can solve problem separately for this two groups using methods from the beginning of this answer.
The key observation is that there's i such that one non-paired integer has i-th bit that differs from the i-th bits of the two other non-paired integers. In this case non-paired integers are in both groups. It's based on the fact that if there's no such i then bits in all positions in non-paired integers are similar and they are equal to each other. But it's impossible according to the problem statement.
This solution can be implemented without any additional memory. Total complexity is linear with some constant depending on the number of bits in array elements.
Unfortunately it is not possible to achieve such a solution with O(1) space and O(n) complexity if we use a strict sense of space, i.e. O(1) space is bound by the max space used in the input array.
In a weak sense of space, where one arbitrary large Integer number does still fit into O(1), you can just encode your counter into the bits of this one integer. Start with all bits set to 1. Toggle the n-th bit, when you encounter number n in the input array. All bits remaining 1 at the end represent the 3 numbers which were encountered an even number of times.
There's two ways to look at your problem.
The first way, as a mathematical problem with an infinite set of integer, it seems unsolvable.
The second way, as a computing problem with a finite integers set, you've already solved it (congratulations !). Why ? Because storage space is bounded by MAX_INT, independently of N.
NB an obvious space optimization would be to store the values only once, erasing the previous value for even counts, you'll gain half the space.
About the other answers by #Lashane and #SGM1: they also solve the "computing" problem, but are arguably less efficient than yours in most real-world scenarios. Why ? Because they pre-allocate a 512MB array, instead of allocating proportionaly to the number of different values in the array. As the array is likely to use much less than MAX_INT different values, you're likely to use much less than 512MB, even if you store 32bits for each value instead of 1. And that's with 32 bits integers, with more bits the pre-allocated array would grow exponentially, OTOH your solution only depends on the actual values in the array, so is unaffected by the number of bits of the system (i.e. max int value).
See also this and this for better (less space) algorithms.
consider for example the numbers allowed are of size 4 bits, which means the range of numbers allowed from 0 to 24-1 which is a constant number 16, for every possible input we run over all array and xor the occurrence of this number, if the result of xor is zero, we add current value to the overall result. this solution is O(16N) which is O(N) and use only one extra variable to evaluate the xor of current number which is O(1) in terms of space complexity.
we can extend this method to our original problem, but it will have a very big constant number in terms of run time complexity which will be proportional to the number of bits allowed in the original input.
we can enhance this approach by run over all elements and find the Most significant bit over all input data, suppose it is the 10th bit, then our run time complexity will become O(210N) which is also O(N).
another enhancement can be found in the below image, but still with the worst case complexity as discussed before.
finally I believe that, there exist another better solution for this problem but I decided to share my thought.
Edit:
the algorithm in the image may not be clear, here is some explanation to the algorithm.
it start with the idea of trying to divide the elements according to there bits, in other words make the bits as a filter, at each stage xor the divided elements, until the xor result is zero, then it is worth to check this group one by one as it will for sure contain at least one of the desired outputs. or if two consultative filters result in the same size we will stop this filter, it will be more clear with example below.
input: 1,6,4,1,4,5,8,8,4,6,8,8,9,7,9,5,9
we start by dividing the elements according to the Least significant bit.
1st bit zero : 6,4,4,8,8,4,6,8,8
6 xor 4 xor 4 xor 8 xor 8 xor 4 xor 6 xor 8 xor 8 = 4
so we will continue dividing this group according to the 2nd bit.
1st bit zero and 2nd bit zero : 4,4,4,8,8,8,8
4 xor 4 xor 4 xor 8 xor 8 xor 8 xor 8 xor 8 = 4.
so we will continue dividing this group according to the 3rd bit.
1st bit zero and 2nd bit zero and 3rd bit zero : 8,8,8,8
8 xor 8 xor 8 xor 8 = 0
so we will go through every element under this filter as the result of xor is zero and we will add 8 to our result so far.
1st bit zero and 2nd bit zero and 3rd bit one : 4,4,4
4 xor 4 xor 4 = 4
1st bit zero and 2nd bit zero and 3rd bit one and 4th bit zero : 4,4,4
4 xor 4 xor 4 = 4.
so we will stop here as this filter contain the same size as previous filter
now we will go back to the filter of 1st and 2nd bit
1st bit zero and 2nd bit one : 6,6
6 xor 6 = 0.
so we will go through every element under this filter as the result of xor is zero and we will add 6 to our result so far.
now we will go back to the filter of 1st bit
1st bit one : 9,5,9,7,9,1,1
now we will continue under this filter as the same procedure before.
for complete example see the above image.
Your outline of the problem and the example do not match. You say you're looking for 3 integers in your question, but the example shows 4.
I'm not sure this is possible without additional constraints. It seems to me that worst case size complexity will always be at least O(N-6) => O(N) without a sorted list and with the full set of integers.
If we started with sorted array, then yes, easy, but this constraint is not specified. Sorting the array ourselves will be too time or space complex.
My stab at the an answer, using Lashane's proposal in slightly different way:
char negBits[268435456]; // 2 ^ 28 = 2 ^ 30 (number of negative integer numbers) / 8 (size of char)
char posBits[268435456]; // ditto except positive
int number[] = { 1, 6, 4, 1, 4, 5, 8, 8, 4, 6, 8, 8, 9, 7, 9, 5, 9 };
for (int num : number){
if (num &lt 0){
num = -(num + 1);// Integer.MIN_VALUE would be excluded without this + 1
negBits[ &lt&lt 4] ^= ((num & 0xf) >> 1);
}
else {
posBits[num &lt&lt 4] ^= ((num & 0xf) >> 1);
// grab the rite char to mess with
// toggle the bit to represent the integer value.
}
}
// Now the hard part, find what values after all the toggling:
for (int i = 0; i &lt Integer.MAX_VALUE; i++){
if (negBits[i &lt&lt 4] & ((i & 0xf) >> 1)){
System.out.print(" " + (-i - 1));
}
if (posBits[i &lt&lt 4] & ((i & 0xf) >> 1)){
System.out.print(" " + i);
}
}
As per discussion in comments, below points are worth noting to this answer:
Assumes Java in 32 bit.
Java array have an inherent limit of Integer.MAX_INT

Maximum xor of a range of numbers

I am grappling with this problem Codeforces 276D. Initially I used a brute force approach which obviously failed for large inputs(It started when inputs were 10000000000 20000000000). In the tutorials Fcdkbear(turtor for the contest) talks about a dp solution where a state is d[p][fl1][fr1][fl2][fr2].
Further in tutorial
We need to know, which bits we can place into binary representation of number а in p-th position. We can place 0 if the following condition is true: p-th bit of L is equal to 0, or p-th bit of L is equal to 1 and variable fl1 shows that current value of a is strictly greater then L. Similarly, we can place 1 if the following condition is true: p-th bit of R is equal to 1, or p-th bit of R is equal to 0 and variable fr1 shows that current value of a is strictly less then R. Similarly, we can obtain, which bits we can place into binary representation of number b in p-th position.
This is going over my head as when ith bit of L is 0 then how come we can place a zero in a's ith bit. If L and R both are in same bucket(2^i'th boundary like 16 and 24) we will eventually place a 0 at 4th whereas we can place a 1 if a = 20 because i-th bit of R is 0 and a > R. I am wondering what is the use of checking if a > L or not.
In essence I do not get the logic of
What states are
How do we recur
I know that might be an overkill but could someone explain it in descriptive manner as editorial is too short to explain anything.
I have already looked in here but suggested solution is different from one given in editorial. Also I know this can be solved with binary search but I am concerned with DP solution only
If I got the problem right: Start to compare the bits of l and r from left (MSB) to right(LSB). As long as these bits are equal there is no freedom of choice, the same bits must appear in a and b. the first bit differing must be 1 in r and 0 in l. they must appear also in a (0) and b(1). from here you can maximise the XOR result. simply use zeros for b an ones for a. that gives a+1==b and the xor result is a+b which is always 2^n-1.
I'm not following the logic as written above but the basic idea is to look bit by bit.
If L and R have different values in the same bit position then we have already found candidates that would maximize the xor'd value of that position (0 xor 1 = 1 xor 0 = 1). The other case to consider is whether the span of R-L is greater than the position value of that bit. If so then there must be two different values of A and B falling between L and R where that bit position has opposite values (as well as being able to generate any combinations of values in the lower bits.)

Efficient method to get one number, which can't be generated from any XORing combination

If there is any number in the range [0 .. 264] which can not be generated by any XOR composition of one or more numbers from a given set, is there a efficient method which prints at least one of the unreachable numbers, or terminates with the information, that there are no unreachable numbers?
Does this problem have a name? Is it similar to another problem or do you have any idea, how to solve it?
Each number can be treated as a vector in the vector space (Z/2)^64 over Z/2. You basically want to know if the vectors given span the whole space, and if not, to produce one not spanned (except that the span always includes the zero vector – you'll have to special case this if you really want one or more). This can be accomplished via Gaussian elimination.
Over this particular vector space, Gaussian elimination is pretty simple. Start with an empty set for the basis. Do the following until there are no more numbers. (1) Throw away all of the numbers that are zero. (2) Scan the lowest bits set of the remaining numbers (lowest bit for x is x & ~(x - 1)) and choose one with the lowest order bit set. (3) Put it in the basis. (4) Update all of the other numbers with that same bit set by XORing it with the new basis element. No remaining number has this bit or any lower order bit set, so we terminate after 64 iterations.
At the end, if there are 64 elements, then the subspace is everything. Otherwise, we went fewer than 64 iterations and skipped a bit: the number with only this bit on is not spanned.
To special-case zero: zero is an option if and only if we never throw away a number (i.e., the input vectors are independent).
Example over 4-bit numbers
Start with 0110, 0011, 1001, 1010. Choose 0011 because it has the ones bit set. Basis is now {0011}. Other vectors are {0110, 1010, 1010}; note that the first 1010 = 1001 XOR 0011.
Choose 0110 because it has the twos bit set. Basis is now {0011, 0110}. Other vectors are {1100, 1100}.
Choose 1100. Basis is now {0011, 0110, 1100}. Other vectors are {0000}.
Throw away 0000. We're done. We skipped the high order bit, so 1000 is not in the span.
As rap music points out you can think of the problem as finding a base in a vector space. However, it is not necessary to actually solve it completely, just to find if it is possible to do or not, and if not: give an example value (that is a binary vector) that can not be described in terms of the supplied set.
This can be done in O(n^2) in terms of the size of the input set. This should be compared to Gauss elimination which is O(n^3), http://en.wikipedia.org/wiki/Gaussian_elimination.
64 bits are no problem at all. With the example python code below 1000 bits with a set with 1000 random values from 0 to 2^1000-1 takes about a second.
Instead of performing Gauss elimination it's enough to find out if we can rewrite the matrix of all bits on triangular form, such as: (for the 4 bit version:)
original triangular
1110 14 1110 14
1011 11 111 7
111 7 11 3
11 3 1 1
1 1 0 0
The solution works like this: First all original values with the same most significant bit are places together in a list of lists. For our example:
[[14,11],[7],[3],[1],[]]
The last empty entry represents that there were no zeros in the original list. Now, take a value from the first entry and replace that entry with a list containing only that number:
[[14],[7],[3],[1],[]]
and then store the xor of the kept number with all the removed entries at the right place in the vector. For our case we have 14^11 = 5 so:
[[14],[7,5],[3],[1],[]]
The trick is that we do not need to scan and update all other values, just the values with the same most significant bit.
Now process the item 7,5 in the same way. Keep 7, add 7^5 = 2 to the list:
[[14],[7],[3,2],[1],[]]
Now 3,2 leaves [3] and adds 1 :
[[14],[7],[3],[1,1],[]]
And 1,1 leaves [1] and adds 0 to the last entry allowing values with no set bit:
[[14],[7],[3],[1],[0]]
If in the end the vector contains at least one number at each vector entry (as in our example) the base is complete and any number fits.
Here's the complete code:
# return leading bit index ir -1 for 0.
# example 1 -> 0
# example 9 -> 3
def leadbit(v):
# there are other ways, yes...
return len(bin(v))-3 if v else -1
def examinebits(baselist,nbitbuckets):
# index 1 is least significant bit.
# index 0 represent the value 0
bitbuckets=[[] for x in range(nbitbuckets+1)]
for j in baselist:
bitbuckets[leadbit(j)+1].append(j)
for i in reversed(range(len(bitbuckets))):
if bitbuckets[i]:
# leave just the first value of all in bucket i
bitbuckets[i],newb=[bitbuckets[i][0]],bitbuckets[i][1:]
# distribute the subleading values into their buckets
for ni in newb:
q=bitbuckets[i][0]^ni
lb=leadbit(q)+1
if lb:
bitbuckets[lb].append(q)
else:
bitbuckets[0]=[0]
else:
v=2**(i-1) if i else 0
print "bit missing: %d. Impossible value: %s == %d"%(i-1,bin(v),v)
return (bitbuckets,[i])
return (bitbuckets,[])
Example use: (8 bit)
import random
nbits=8
basesize=8
topval=int(2**nbits)
# random set of values to try:
basel=[random.randint(0,topval-1) for dummy in range(basesize)]
bl,ii=examinebits(basel,nbits)
bl is now the triangular list of values, up to the point where it was not possible (in that case). The missing bit (if any) is found in ii[0].
For the following tried set of values: [242, 242, 199, 197, 177, 177, 133, 36] the triangular version is:
base value: 10110001 177
base value: 1110110 118
base value: 100100 36
base value: 10000 16
first missing bit: 3 val: 8
( the below values where not completely processed )
base value: 10 2
base value: 1 1
base value: 0 0
The above list were printed like this:
for i in range(len(bl)):
bb=bl[len(bl)-i-1]
if ii and len(bl)-ii[0] == i:
print "example missing bit:" ,(ii[0]-1), "val:", 2**(ii[0]-1)
print "( the below values where not completely processed )"
if len(bb):
b=bb[0]
print ("base value: %"+str(nbits)+"s") %(bin(b)[2:]), b

Resources