Related
Interview question: you're given a file of roughly one billion unique numbers, each of which is a 32-bit quantity. Find a number not in the file.
When I was approaching this question, I tried a few examples with 3-bit and 4-bit numbers. For the examples I tried, I found that when I XOR'd the set of numbers, I got a correct answer:
a = [0,1,2] # missing 3
b = [1,2,3] # missing 0
c = [0,1,2,3,4,5,6] # missing 7
d = [0,1,2,3,5,6,7] # missing 4
functools.reduce((lambda x, y: x^y), a) # returns 3
functools.reduce((lambda x, y: x^y), b) # returns 0
functools.reduce((lambda x, y: x^y), c) # returns 7
functools.reduce((lambda x, y: x^y), d) # returns 4
However, when I coded this up and submitted it, it failed the test cases.
My question is: in an interview setting, how can I confirm or rule out with certainty that an approach like this is not a viable solution?
In all your examples, the array is missing exactly one number. That's why XOR worked. Try not to test with the same property.
For the problem itself, you can construct a number by taking the minority of each bit.
EDIT
Why XOR worked on your examples:
When you take the XOR for all the numbers from 0 to 2^n - 1 the result is 0 (there are exactly 2^(n-1) '1' in each bit). So if you take out one number and take XOR of all the rest, the result is the number you took out because taking XOR of that number with the result of all the rest needs to be 0.
Assuming a 64-bit system with more than 4gb free memory, I would read the numbers into an array of 32-bit integers. Then I would loop through the numbers up to 32 times.
Similarly to an inverse ”Mastermind” game, I would construct a missing number bit-by-bit. In every loop, I count all numbers which match the bits, I have chosen so far and a subsequent 0 or 1. Then I add the bit which occurs less frequently. Once the count reaches zero, I have a missing number.
Example:
The numbers in decimal/binary are
1 = 01
2 = 10
3 = 11
There is one number with most-significant-bit 0 and two numbers with 1. Therefore, I take 0 as most significant bit.
In the next round, I have to match 00 and 01. This immediately leads to 00 as missing number.
Another approach would be to use a random number generator. Chances are 50% that you find a non-existing number as first guess.
Proof by counterexample: 3^4^5^6=4.
On Page 140 of Programming Pearls, 2nd Edition, Jon proposed an implementation of sets with bit vectors.
We'll turn now to two final structures that exploit the fact that our sets represent integers. Bit vectors are an old friend from Column 1. Here are their private data and functions:
enum { BITSPERWORD = 32, SHIFT = 5, MASK = 0x1F };
int n, hi, *x;
void set(int i) { x[i>>SHIFT] |= (1<<(i & MASK)); }
void clr(int i) { x[i>>SHIFT] &= ~(1<<(i & MASK)); }
int test(int i) { return x[i>>SHIFT] &= (1<<(i & MASK)); }
As I gathered, the central idea of a bit vector to represent an integer set, as described in Column 1, is that the i-th bit is turned on if and only if the integer i is in the set.
But I am really at a loss at the algorithms involved in the above three functions. And the book doesn't give an explanation.
I can only get that i & MASK is to get the lower 5 bits of i, while i>>SHIFT is to move i 5 bits toward the right.
Anybody would elaborate more on these algorithms? Bit operations always seem a myth to me, :(
Bit Fields and You
I'll use a simple example to explain the basics. Say you have an unsigned integer with four bits:
[0][0][0][0] = 0
You can represent any number here from 0 to 15 by converting it to base 2. Say we have the right end be the smallest:
[0][1][0][1] = 5
So the first bit adds 1 to the total, the second adds 2, the third adds 4, and the fourth adds 8. For example, here's 8:
[1][0][0][0] = 8
So What?
Say you want to represent a binary state in an application-- if some option is enabled, if you should draw some element, and so on. You probably don't want to use an entire integer for each one of these- it'd be using a 32 bit integer to store one bit of information. Or, to continue our example in four bits:
[0][0][0][1] = 1 = ON
[0][0][0][0] = 0 = OFF //what a huge waste of space!
(Of course, the problem is more pronounced in real life since 32-bit integers look like this:
[0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0] = 0
The answer to this is to use a bit field. We have a collection of properties (usually related ones) which we will flip on and off using bit operations. So, say, you might have 4 different lights on a piece of hardware that you want to be on or off.
3 2 1 0
[0][0][0][0] = 0
(Why do we start with light 0? I'll explain this in a second.)
Note that this is an integer, and is stored as an integer, but is used to represent multiple states for multiple objects. Crazy! Say we turn lights 2 and 1 on:
3 2 1 0
[0][1][1][0] = 6
The important thing you should note here: There's probably no obvious reason why lights 2 and 1 being on should equal six, and it may not be obvious how we would do anything with this scheme of information storage. It doesn't look more obvious if you add more bits:
3 2 1 0
[1][1][1][0] = 0xE \\what?
Why do we care about this? Do we have exactly one state for each number between 0 and 15?How are we going to manage this without some insane series of switch statements? Ugh...
The Light at the End
So if you've worked with binary arithmetic a bit before, you might realize that the relationship between the numbers on the left and the numbers on the right is, of course, base 2. That is:
1*(23) + 1*(22) + 1*(21) +0 *(20) = 0xE
So each light is present in the exponent of each term of the equation. If the light is on, there is a 1 next to its term- if the light is off, there is a zero. Take the time to convince yourself that there is exactly one integer between 0 and 15 that corresponds to each state in this numbering scheme.
Bit operators
Now that we have this done, let's take a second to see what bitshifting does to integers in this setup.
[0][0][0][1] = 1
When you shift bits to the left or the right in an integer, it literally moves the bits left and right. (Note: I 100% disavow this explanation for negative numbers! There be dragons!)
1<<2 = 4
[0][1][0][0] = 4
4>>1 = 2
[0][0][1][0] = 2
You will encounter similar behavior when shifting numbers represented with more than one bit. Also, it shouldn't be hard to convince yourself that x>>0 or x<<0 is just x. Doesn't shift anywhere.
This probably explains the naming scheme of the Shift operators to anyone who wasn't familiar with them.
Bitwise operations
This representation of numbers in binary can also be used to shed some light on the operations of bitwise operators on integers. Each bit in the first number is xor-ed, and-ed, or or-ed with its fellow number. Take a second to venture to wikipedia and familiarize yourself with the function of these Boolean operators - I'll explain how they function on numbers but I don't want to rehash the general idea in great detail.
...
Welcome back! Let's start by examining the effect of the OR (|) operator on two integers, stored in four bit.
OR OPERATOR ON:
[1][0][0][1] = 0x9
[1][1][0][0] = 0xC
________________
[1][1][0][1] = 0xD
Tough! This is a close analogue to the truth table for the boolean OR operator. Notice that each column ignores the adjacent columns and simply fills in the result column with the result of the first bit and the second bit OR'd together. Note also that the value of anything or'd with 1 is 1 in that particular column. Anything or'd with zero remains the same.
The table for AND (&) is interesting, though somewhat inverted:
AND OPERATOR ON:
[1][0][0][1] = 0x9
[1][1][0][0] = 0xC
________________
[1][0][0][0] = 0x8
In this case we do the same thing- we perform the AND operation with each bit in a column and put the result in that bit. No column cares about any other column.
Important lesson about this, which I invite you to verify by using the diagram above: anything AND-ed with zero is zero. Also, equally important- nothing happens to numbers that are AND-ed with one. They stay the same.
The final table, XOR, has behavior which I hope you all find predictable by now.
XOR OPERATOR ON:
[1][0][0][1] = 0x9
[1][1][0][0] = 0xC
________________
[0][1][0][1] = 0x5
Each bit is being XOR'd with its column, yadda yadda, and so on. But look closely at the first row and the second row. Which bits changed? (Half of them.) Which bits stayed the same? (No points for answering this one.)
The bit in the first row is being changed in the result if (and only if) the bit in the second row is 1!
The one lightbulb example!
So now we have an interesting set of tools we can use to flip individual bits. Let's go back to the lightbulb example and focus only on the first lightbulb.
0
[?] \\We don't know if it's one or zero while coding
We know that we have an operation that can always make this bit equal to one- the OR 1 operator.
0|1 = 1
1|1 = 1
So, ignoring the rest of the bulbs, we could do this
4_bit_lightbulb_integer |= 1;
and know for sure that we did nothing but set the first lightbulb to ON.
3 2 1 0
[0][0][0][?] = 0 or 1? \\4_bit_lightbulb_integer
[0][0][0][1] = 1
________________
[0][0][0][1] = 0x1
Similarly, we can AND the number with zero. Well- not quite zero- we don't want to affect the state of the other bits, so we will fill them in with ones.
I'll use the unary (one-argument) operator for bit negation. The ~ (NOT) bitwise operator flips all of the bits in its argument. ~(0X1):
[0][0][0][1] = 0x1
________________
[1][1][1][0] = 0xE
We will use this in conjunction with the AND bit below.
Let's do 4_bit_lightbulb_integer & 0xE
3 2 1 0
[0][1][0][?] = 4 or 5? \\4_bit_lightbulb_integer
[1][1][1][0] = 0xE
________________
[0][1][0][0] = 0x4
We're seeing a lot of integers on the right-hand-side which don't have any immediate relevance. You should get used to this if you deal with bit fields a lot. Look at the left-hand side. The bit on the right is always zero and the other bits are unchanged. We can turn off light 0 and ignore everything else!
Finally, you can use the XOR bit to flip the first bit selectively!
3 2 1 0
[0][1][0][?] = 4 or 5? \\4_bit_lightbulb_integer
[0][0][0][1] = 0x1
________________
[0][1][0][*] = 4 or 5?
We don't actually know what the value of * is now- just that flipped from whatever ? was.
Combining Bit Shifting and Bitwise operations
The interesting fact about these two operations is when taken together they allow you to manipulate selective bits.
[0][0][0][1] = 1 = 1<<0
[0][0][1][0] = 2 = 1<<1
[0][1][0][0] = 4 = 1<<2
[1][0][0][0] = 8 = 1<<3
Hmm. Interesting. I'll mention the negation operator here (~) as it's used in a similar way to produce the needed bit values for ANDing stuff in bit fields.
[1][1][1][0] = 0xE = ~(1<<0)
[1][1][0][1] = 0xD = ~(1<<1)
[1][0][1][1] = 0xB = ~(1<<2)
[0][1][1][1] = 0X7 = ~(1<<3)
Are you seeing an interesting relationship between the shift value and the corresponding lightbulb position of the shifted bit?
The canonical bitshift operators
As alluded to above, we have an interesting, generic method for turning on and off specific lights with the bit-shifters above.
To turn on a bulb, we generate the 1 in the right position using bit shifting, and then OR it with the current lightbulb positions. Say we want to turn on light 3, and ignore everything else. We need to get a bit shifting operation that ORs
3 2 1 0
[?][?][?][?] \\all we know about these values at compile time is where they are!
and 0x8
[1][0][0][0] = 0x8
Which is easy, thanks to bitshifting! We'll pick the number of the light and switch the value over:
1<<3 = 0x8
and then:
4_bit_lightbulb_integer |= 0x8;
3 2 1 0
[1][?][?][?] \\the ? marks have not changed!
And we can guarantee that the bit for the 3rd lightbulb is set to 1 and that nothing else has changed.
Clearing a bit works similarly- we'll use the negated bits table above to, say, clear light 2.
~(1<<2) = 0xB = [1][0][1][1]
4_bit_lightbulb_integer & 0xB:
3 2 1 0
[?][?][?][?]
[1][0][1][1]
____________
[?][0][?][?]
The XOR method of flipping bits is the same idea as the OR one.
So the canonical methods of bit switching are this:
Turn on the light i:
4_bit_lightbulb_integer|=(1<<i)
Turn off light i:
4_bit_lightbulb_integer&=~(1<<i)
Flip light i:
4_bit_lightbulb_integer^=(1<<i)
Wait, how do I read these?
In order to check a bit we can simply zero out all of the bits except for the one we care about. We'll then check to see if the resulting value is greater than zero- since this is the only value that could possibly be nonzero, it will make the entire integer nonzero if and only if it is nonzero. For example, to check bit 2:
1<<2:
[0][1][0][0]
4_bit_lightbulb_integer:
[?][?][?][?]
1<<2 & 4_bit_lightbulb_integer:
[0][?][0][0]
Remember from the previous examples that the value of ? didn't change. Remember also that anything AND 0 is 0. So, we can say for sure that if this value is greater than zero, the switch at position 2 is true and the lightbulb is zero. Similarly, if the value is off, the value of the entire thing will be zero.
(You can alternately shift the entire value of 4_bit_lightbulb_integer over by i bits and AND it with 1. I don't remember off the top of my head if one is faster than the other but I doubt it.)
So the canonical checking function:
Check if bit i is on:
if (4_bit_lightbulb_integer & 1<<i) {
\\do whatever
}
The specifics
Now that we have a complete set of tools for bitwise operations, we can look at the specific example here. This is basically the same idea- except a much more concise and powerful way of executing it. Let's look at this function:
void set(int i) { x[i>>SHIFT] |= (1<<(i & MASK)); }
From the canonical implementation I'm going to make a guess that this is trying to set some bits to 1! Let's take an integer and look at what's going on here if i feed the value 0x32 (50 in decimal) into i:
x[0x32>>5] |= (1<<(0x32 & 0x1f))
Well, that's a mess.. let's dissect this operation on the right. For convenience, pretend there are 24 more irrelevant zeros, since these are both 32 bit integers.
...[0][0][0][1][1][1][1][1] = 0x1F
...[0][0][1][1][0][0][1][0] = 0x32
________________________
...[0][0][0][1][0][0][1][0] = 0x12
It looks like everything is being cut off at the boundary on top where 1s turn into zeros. This technique is called Bit Masking. Interestingly, the boundary here restricts the resulting values to be between 0 and 31... Which is exactly the number of bit positions we have for a 32 bit integer!
x[0x32>>5] |= (1<<(0x12))
Let's look at the other half.
...[0][0][1][1][0][0][1][0] = 0x32
Shift five bits to the right:
...[0][0][0][0][0][0][0][1] = 0x01
Note that this transformation exactly destroyed all information from the first part of the function- we have 32-5 = 27 remaining bits which could be nonzero. This indicates which of 227 integers in the array of integers are selected. So the simplified equation is now:
x[1] |= (1<<0x12)
This just looks like the canonical bit-setting operation! We've just chosen
So the idea is to use the first 27 bits to pick an integer to shift and the last five bits indicate which bit of the 32 in that integer to shift.
The key to understanding what's going on is to recognize that BITSPERWORD = 2SHIFT. Thus, x[i>>SHIFT] finds which 32-bit element of the array x has the bit corresponding to i. (By shifting i 5 bits to the right, you're simply dividing by 32.) Once you have located the correct element of x, the lower 5 bits of i can then be used to find which particular bit of x[i>>SHIFT] corresponds to i. That's what i & MASK does; by shifting 1 by that number of bits, you move the bit corresponding to 1 to the exact position within x[i>>SHIFT] that corresponds to the ith bit in x.
Here's a bit more of an explanation:
Imagine that we want capacity for N bits in our bit vector. Since each int holds 32 bits, we will need (N + 31) / 32 int values for our storage (that is, N/32 rounded up). Within each int value, we will adopt the convention that bits are ordered from least significant to most significant. We will also adopt the convention that the first 32 bits of our vector are in x[0], the next 32 bits are in x[1], and so forth. Here's the memory layout we are using (showing the bit index in our bit vector corresponding to each bit of memory):
+----+----+-------+----+----+----+
x[0]: | 31 | 30 | . . . | 02 | 01 | 00 |
+----+----+-------+----+----+----+
x[1]: | 63 | 62 | . . . | 34 | 33 | 32 |
+----+----+-------+----+----+----+
etc.
Our first step is to allocate the necessary storage capacity:
x = new int[(N + BITSPERWORD - 1) >> SHIFT]
(We could make provision for dynamically expanding this storage, but that would just add complexity to the explanation.)
Now suppose we want to access bit i (either to set it, clear it, or just to know its current value). We need to first figure out which element of x to use. Since there are 32 bits per int value, this is easy:
subscript for x = i / 32
Making use of the enum constants, the x element we want is:
x[i >> SHIFT]
(Think of this as a 32-bit-wide window into our N-bit vector.) Now we have to find the specific bit corresponding to i. Looking at the memory layout, it's not hard to figure out that the first (rightmost) bit in the window corresponds to bit index 32 * (i >> SHIFT). (The window starts afteri >> SHIFT slots in x, and each slot has 32 bits.) Since that's the first bit in the window (position 0), then the bit we're interested in is is at position
i - (32 * (i >> SHIFT))
in the windows. With a little experimenting, you can convince yourself that this expression is always equal to i % 32 (actually, that's one definition of the mod operator) which, in turn, is always equal to i & MASK. Since this last expression is the fastest way to calculate what we want, that's what we'll use.
From here, the rest is pretty simple. We start with a single bit in the least-significant position of the window (that is, the constant 1), and move it to the left by i & MASK bits to get it to the position in the window corresponding to bit i in the bit vector. This is where the expression
1 << (i & MASK)
comes from. With the bit now moved to where we want it, we can use this as a mask to set, clear, or query the value of the bit at that position in x[i>>SHIFT] and we know that we're actually setting, clearing, or querying the value of bit i in our bit vector.
If you store your bits in an array of n words you can imagine them to be layed out as a matrix with n rows and 32 columns (BITSPERWORD):
3 0
1 0
0 xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx
1 xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx
2 xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx
....
n xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx
To get the k-th bit you divide k by 32. The (integer) result will give you the row (word) the bit is in, the reminder will give you which bit is within the word.
Dividing by 2^p can be done simply by shifting p postions to the right. The reminder can be obtained by getting the p rightmost bits (i.e the bitwise AND with (2^p - 1)).
In C terms:
#define div32(k) ((k) >> 5)
#define mod32(k) ((k) & 31)
#define word_the_bit_is_in(k) div32(k)
#define bit_within_word(k) mod32(k)
Hope it helps.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have a very big number, on the order of a thousand decimal digits, and I have to convert this to its binary representation. The numbers are stored as strings.
Since few languages have a basic data type to handle numbers this big, I see no easy way to turn this into an integral value for which I could convert it.
Could someone please help me out here? What would be a viable approach for doing this?
If this is a genuine problem, there are plenty of BigNum libraries out there to assist, such as the MPIR library.
If it's something where you can't use a third-party library, it's still relatively easy. You don't actually need a complex BigNum library for this, you only need one operation: divide by two.
Here's how you do it. Start with an empty stack of binary digits. Then loop until the number is "0" (yes, that's still a string). If the last digit of the number is odd, push 1 on to the stack, otherwise push 0. Then divide the number by two and restart the loop.
Once the loop is finished (number is "0"), pop the digits off the stack one at a time and print them. There you go.
Oh, yeah, the divide-by-two, that is a rather important piece of the puzzle :-)
Let's start with "12345". Here's the process you follow, in pseudo-code.
Set next_additive to 0.
For every digit in number (starting at the left):
Set additive to next_additive.
If the digit is odd, set next_additive to 5, else set it to 0.
Divide the digit by two (truncating) then add additive.
Remove leading zero if necessary (if it starts with 0 but is not just 0).
This can be done by processing the actual string one character at a time.
Starting with 1 (from 12345), additive is 0, number is odd, so next_additive is 5. Divide 1 by 2 and add additive of 0, you get 0: 02345.
Next digit 2, additive is 5, number is even, so next_additive is 0. Divide 2 by 2 and add additive of 5, you get 6: 06345.
Next digit 3, additive is 0, number is odd, so next_additive is 5. Divide 3 by 2 and add additive of 0, you get 1: 06145.
Next digit 4, additive is 5, number is even, so next_additive is 0. Divide 4 by 2 and add additive of 5, you get 7: 06175.
Next digit 5, additive is 0, number is odd, so next_additive is 5. Divide 5 by 2 and add additive of 0, you get 2: 06172.
Strip off leading zeros: 6172. Ignore the next additive since you're truncating the result.
And there you have it: 12345 / 2 = 6172.
By way of example, here's a Python approach to implementing this algorithm as follows. First the support routine for checking if a string-number is odd (keep in mind this isn't meant to be Pythonic code, it's just to show how it could be done - there's almost certainly better ways to do this in Python but that won't necessarily map well to another language):
def oddsToOne(s):
if s.endswith('1'): return 1
if s.endswith('3'): return 1
if s.endswith('5'): return 1
if s.endswith('7'): return 1
if s.endswith('9'): return 1
return 0
Then another support routine for dividing a string-number by two:
def divByTwo(s):
new_s = ''
add = 0
for ch in s:
new_dgt = (ord(ch) - ord('0')) // 2 + add
new_s = '%s%d' % (new_s, new_dgt)
add = oddsToOne(ch) * 5
if new_s != '0' and new_s.startswith('0'):
new_s = new_s[1:]
return new_s
And, finally, some actual code to make a binary string from the decimal string:
num = '12345'
if num == '0':
stack = '0'
else:
stack = ''
while num != '0':
stack = '%d%s'%(oddsToOne(num), stack)
num = divByTwo (num)
print(stack)
Note that if you wanted to actually use this to populate real bits (rather than make a string of bits), it's a simple matter to change what happens in the if and else clauses.
As stated, it's probably not the most efficient or beautiful Python code you could come up with but it's simply meant to show the process, not be some well-engineered production-ready piece of code. The output is (with some added stuff below to show what's going on):
12345
11000000111001
|| ||| |
|| ||| +- 1
|| ||+---- 8
|| |+----- 16
|| +------ 32
|+------------- 4096
+-------------- 8192
=====
12345
Because this works on the string representation of the numbers, there is no arbitrary numeric limit such as the size of a 64-bit integer. Some example values are (slightly reformatted into 32-digit chunks for readability):
123456781234567812345678
=> 11010001001001001101100000011011
01110110001110110010110110101111
0111101001110
99999999999999999999999999999999
99999999999999999999999999999999
99999999999999999999999999999999
9999
=> 10010010010011010110100100101100
10100110000110111110011101011000
01011001001111000010011000100110
01110000010111111001110001010110
01110010000001000111000100001000
11010011111001010101010110010010
00011000010001010100000101110100
01111000011111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
1111111111111
If there is any number in the range [0 .. 264] which can not be generated by any XOR composition of one or more numbers from a given set, is there a efficient method which prints at least one of the unreachable numbers, or terminates with the information, that there are no unreachable numbers?
Does this problem have a name? Is it similar to another problem or do you have any idea, how to solve it?
Each number can be treated as a vector in the vector space (Z/2)^64 over Z/2. You basically want to know if the vectors given span the whole space, and if not, to produce one not spanned (except that the span always includes the zero vector – you'll have to special case this if you really want one or more). This can be accomplished via Gaussian elimination.
Over this particular vector space, Gaussian elimination is pretty simple. Start with an empty set for the basis. Do the following until there are no more numbers. (1) Throw away all of the numbers that are zero. (2) Scan the lowest bits set of the remaining numbers (lowest bit for x is x & ~(x - 1)) and choose one with the lowest order bit set. (3) Put it in the basis. (4) Update all of the other numbers with that same bit set by XORing it with the new basis element. No remaining number has this bit or any lower order bit set, so we terminate after 64 iterations.
At the end, if there are 64 elements, then the subspace is everything. Otherwise, we went fewer than 64 iterations and skipped a bit: the number with only this bit on is not spanned.
To special-case zero: zero is an option if and only if we never throw away a number (i.e., the input vectors are independent).
Example over 4-bit numbers
Start with 0110, 0011, 1001, 1010. Choose 0011 because it has the ones bit set. Basis is now {0011}. Other vectors are {0110, 1010, 1010}; note that the first 1010 = 1001 XOR 0011.
Choose 0110 because it has the twos bit set. Basis is now {0011, 0110}. Other vectors are {1100, 1100}.
Choose 1100. Basis is now {0011, 0110, 1100}. Other vectors are {0000}.
Throw away 0000. We're done. We skipped the high order bit, so 1000 is not in the span.
As rap music points out you can think of the problem as finding a base in a vector space. However, it is not necessary to actually solve it completely, just to find if it is possible to do or not, and if not: give an example value (that is a binary vector) that can not be described in terms of the supplied set.
This can be done in O(n^2) in terms of the size of the input set. This should be compared to Gauss elimination which is O(n^3), http://en.wikipedia.org/wiki/Gaussian_elimination.
64 bits are no problem at all. With the example python code below 1000 bits with a set with 1000 random values from 0 to 2^1000-1 takes about a second.
Instead of performing Gauss elimination it's enough to find out if we can rewrite the matrix of all bits on triangular form, such as: (for the 4 bit version:)
original triangular
1110 14 1110 14
1011 11 111 7
111 7 11 3
11 3 1 1
1 1 0 0
The solution works like this: First all original values with the same most significant bit are places together in a list of lists. For our example:
[[14,11],[7],[3],[1],[]]
The last empty entry represents that there were no zeros in the original list. Now, take a value from the first entry and replace that entry with a list containing only that number:
[[14],[7],[3],[1],[]]
and then store the xor of the kept number with all the removed entries at the right place in the vector. For our case we have 14^11 = 5 so:
[[14],[7,5],[3],[1],[]]
The trick is that we do not need to scan and update all other values, just the values with the same most significant bit.
Now process the item 7,5 in the same way. Keep 7, add 7^5 = 2 to the list:
[[14],[7],[3,2],[1],[]]
Now 3,2 leaves [3] and adds 1 :
[[14],[7],[3],[1,1],[]]
And 1,1 leaves [1] and adds 0 to the last entry allowing values with no set bit:
[[14],[7],[3],[1],[0]]
If in the end the vector contains at least one number at each vector entry (as in our example) the base is complete and any number fits.
Here's the complete code:
# return leading bit index ir -1 for 0.
# example 1 -> 0
# example 9 -> 3
def leadbit(v):
# there are other ways, yes...
return len(bin(v))-3 if v else -1
def examinebits(baselist,nbitbuckets):
# index 1 is least significant bit.
# index 0 represent the value 0
bitbuckets=[[] for x in range(nbitbuckets+1)]
for j in baselist:
bitbuckets[leadbit(j)+1].append(j)
for i in reversed(range(len(bitbuckets))):
if bitbuckets[i]:
# leave just the first value of all in bucket i
bitbuckets[i],newb=[bitbuckets[i][0]],bitbuckets[i][1:]
# distribute the subleading values into their buckets
for ni in newb:
q=bitbuckets[i][0]^ni
lb=leadbit(q)+1
if lb:
bitbuckets[lb].append(q)
else:
bitbuckets[0]=[0]
else:
v=2**(i-1) if i else 0
print "bit missing: %d. Impossible value: %s == %d"%(i-1,bin(v),v)
return (bitbuckets,[i])
return (bitbuckets,[])
Example use: (8 bit)
import random
nbits=8
basesize=8
topval=int(2**nbits)
# random set of values to try:
basel=[random.randint(0,topval-1) for dummy in range(basesize)]
bl,ii=examinebits(basel,nbits)
bl is now the triangular list of values, up to the point where it was not possible (in that case). The missing bit (if any) is found in ii[0].
For the following tried set of values: [242, 242, 199, 197, 177, 177, 133, 36] the triangular version is:
base value: 10110001 177
base value: 1110110 118
base value: 100100 36
base value: 10000 16
first missing bit: 3 val: 8
( the below values where not completely processed )
base value: 10 2
base value: 1 1
base value: 0 0
The above list were printed like this:
for i in range(len(bl)):
bb=bl[len(bl)-i-1]
if ii and len(bl)-ii[0] == i:
print "example missing bit:" ,(ii[0]-1), "val:", 2**(ii[0]-1)
print "( the below values where not completely processed )"
if len(bb):
b=bb[0]
print ("base value: %"+str(nbits)+"s") %(bin(b)[2:]), b
What I want:
assert_equal 6, ones_complement(9) # 1001 => 0110
assert_equal 0, ones_complement(15) # 1111 => 0000
assert_equal 2, ones_complement(1) # 01 => 10
the size of the input isn't fixed as in 4 bits or 8 bits. rather its a binary stream.
What I see:
v = "1001".to_i(2) => 9
There's a bit flipping operator ~
(~v).to_s(2) => "-1010"
sprintf("%b", ~v) => "..10110"
~v => -10
I think its got something to do with one bit being used to store the sign or something... can someone explain this output ? How do I get a one's complement without resorting to string manipulations like cutting the last n chars from the sprintf output to get "0110" or replacing 0 with 1 and vice versa
Ruby just stores a (signed) number. The internal representation of this number is not relevant: it might be a FixNum, BigNum or something else. Therefore, the number of bits in a number is also undefined: it is just a number after all. This is contrary to for example C, where an int will probably be 32 bits (fixed).
So what does the ~ operator do then? Wel, just something like:
class Numeric
def ~
return -self - 1
end
end
...since that's what '~' represents when looking at 2's complement numbers.
So what is missing from your input statement is the number of bits you want to switch: a 32-bits ~ is different from a generic ~ like it is in Ruby.
Now if you just want to bit-flip n-bits you can do something like:
class Numeric
def ones_complement(bits)
self ^ ((1 << bits) - 1)
end
end
...but you do have to specify the number of bits to flip. And this won't affect the sign flag, since that one is outside your reach with XOR :)
It sounds like you only want to flip four bits (the length of your input) - so you probably want to XOR with 1111.
See this question for why.
One problem with your method is that your expected answer is only true if you only flip the four significant bits: 1001 -> 0110.
But the number is stored with leading zeros, and the ~ operator flips all the leading bits too: 00001001 -> 11110110. Then the leading 1 is interpreted as the negative sign.
You really need to specify what the function is supposed to do with numbers like 0b101 and 0b11011 before you can decide how to implement it. If you only ever want to flip 4 bits you can do v^0b1111, as suggested in another answer. But if you want to flip all significant bits, it gets more complicated.
edit
Here's one way to flip all the significant bits:
def maskbits n
b=1
prev=n;
mask=prev|(prev>>1)
while (mask!=prev)
prev=mask;
mask|=(mask>>(b*=2))
end
mask
end
def ones_complement n
n^maskbits(n)
end
This gives
p ones_complement(9).to_s(2) #>>"110"
p ones_complement(15).to_s(2) #>>"0"
p ones_complement(1).to_s(2) #>>"0"
This does not give your desired output for ones_compliment(1), because it treats 1 as "1" not "01". I don't know how the function could infer how many leading zeros you want without taking the width as an argument.
If you're working with strings you could do:
s = "0110"
s.gsub("\d") {|bit| bit=="1"?"0":"1"}
If you're working with numbers, you'll have to define the number of significant bits because:
0110 = 6; 1001 = 9;
110 = 6; 001 = 1;
Even, ignoring the sign, you'll probably have to handle this.
What you are doing (using the ~) operator, is indeed a one's complement. You are getting those values that you are not expecting because of the way the number is interpreted by Ruby.
What you actually need to do will depend on what you are using this for. That is to say, why do you need a 1's complement?
Remember that you are getting the one's complement right now with ~ if you pass in a Fixnum: the number of bits which represent the number is a fixed quantity in the interpreter and thus there are leading 0's in front of the binary representation of the number 9 (binary 1001). You can find this number of bits by examining the size of any Fixnum. (the answer is returned in bytes)
1.size #=> 4
2147483647.size #=> 4
~ is also defined over Bignum. In this case it behaves as if all of the bits which are specified in the Bignum were inverted, and then if there were an infinite string of 1's in front of that Bignum. You can, conceivably shove your bitstream into a Bignum and invert the whole thing. You will however need to know the size of the bitstream prior to inversion to get a useful result out after it is inverted.
To answer the question as you pose it right off the bat, you can find the largest power of 2 less than your input, double it, subtract 1, then XOR the result of that with your input and always get a ones complement of just the significant bits in your input number.
def sig_ones_complement(num)
significant_bits = num.to_s(2).length
next_smallest_pow_2 = 2**(significant_bits-1)
xor_mask = (2*next_smallest_pow_2)-1
return num ^ xor_mask
end