How to compute one's complement using Ruby's bitwise operators? - ruby

What I want:
assert_equal 6, ones_complement(9) # 1001 => 0110
assert_equal 0, ones_complement(15) # 1111 => 0000
assert_equal 2, ones_complement(1) # 01 => 10
the size of the input isn't fixed as in 4 bits or 8 bits. rather its a binary stream.
What I see:
v = "1001".to_i(2) => 9
There's a bit flipping operator ~
(~v).to_s(2) => "-1010"
sprintf("%b", ~v) => "..10110"
~v => -10
I think its got something to do with one bit being used to store the sign or something... can someone explain this output ? How do I get a one's complement without resorting to string manipulations like cutting the last n chars from the sprintf output to get "0110" or replacing 0 with 1 and vice versa

Ruby just stores a (signed) number. The internal representation of this number is not relevant: it might be a FixNum, BigNum or something else. Therefore, the number of bits in a number is also undefined: it is just a number after all. This is contrary to for example C, where an int will probably be 32 bits (fixed).
So what does the ~ operator do then? Wel, just something like:
class Numeric
def ~
return -self - 1
end
end
...since that's what '~' represents when looking at 2's complement numbers.
So what is missing from your input statement is the number of bits you want to switch: a 32-bits ~ is different from a generic ~ like it is in Ruby.
Now if you just want to bit-flip n-bits you can do something like:
class Numeric
def ones_complement(bits)
self ^ ((1 << bits) - 1)
end
end
...but you do have to specify the number of bits to flip. And this won't affect the sign flag, since that one is outside your reach with XOR :)

It sounds like you only want to flip four bits (the length of your input) - so you probably want to XOR with 1111.

See this question for why.
One problem with your method is that your expected answer is only true if you only flip the four significant bits: 1001 -> 0110.
But the number is stored with leading zeros, and the ~ operator flips all the leading bits too: 00001001 -> 11110110. Then the leading 1 is interpreted as the negative sign.
You really need to specify what the function is supposed to do with numbers like 0b101 and 0b11011 before you can decide how to implement it. If you only ever want to flip 4 bits you can do v^0b1111, as suggested in another answer. But if you want to flip all significant bits, it gets more complicated.
edit
Here's one way to flip all the significant bits:
def maskbits n
b=1
prev=n;
mask=prev|(prev>>1)
while (mask!=prev)
prev=mask;
mask|=(mask>>(b*=2))
end
mask
end
def ones_complement n
n^maskbits(n)
end
This gives
p ones_complement(9).to_s(2) #>>"110"
p ones_complement(15).to_s(2) #>>"0"
p ones_complement(1).to_s(2) #>>"0"
This does not give your desired output for ones_compliment(1), because it treats 1 as "1" not "01". I don't know how the function could infer how many leading zeros you want without taking the width as an argument.

If you're working with strings you could do:
s = "0110"
s.gsub("\d") {|bit| bit=="1"?"0":"1"}
If you're working with numbers, you'll have to define the number of significant bits because:
0110 = 6; 1001 = 9;
110 = 6; 001 = 1;
Even, ignoring the sign, you'll probably have to handle this.

What you are doing (using the ~) operator, is indeed a one's complement. You are getting those values that you are not expecting because of the way the number is interpreted by Ruby.
What you actually need to do will depend on what you are using this for. That is to say, why do you need a 1's complement?

Remember that you are getting the one's complement right now with ~ if you pass in a Fixnum: the number of bits which represent the number is a fixed quantity in the interpreter and thus there are leading 0's in front of the binary representation of the number 9 (binary 1001). You can find this number of bits by examining the size of any Fixnum. (the answer is returned in bytes)
1.size #=> 4
2147483647.size #=> 4
~ is also defined over Bignum. In this case it behaves as if all of the bits which are specified in the Bignum were inverted, and then if there were an infinite string of 1's in front of that Bignum. You can, conceivably shove your bitstream into a Bignum and invert the whole thing. You will however need to know the size of the bitstream prior to inversion to get a useful result out after it is inverted.
To answer the question as you pose it right off the bat, you can find the largest power of 2 less than your input, double it, subtract 1, then XOR the result of that with your input and always get a ones complement of just the significant bits in your input number.
def sig_ones_complement(num)
significant_bits = num.to_s(2).length
next_smallest_pow_2 = 2**(significant_bits-1)
xor_mask = (2*next_smallest_pow_2)-1
return num ^ xor_mask
end

Related

Prove XOR doesn't work for finding a missing number (interview question)?

Interview question: you're given a file of roughly one billion unique numbers, each of which is a 32-bit quantity. Find a number not in the file.
When I was approaching this question, I tried a few examples with 3-bit and 4-bit numbers. For the examples I tried, I found that when I XOR'd the set of numbers, I got a correct answer:
a = [0,1,2] # missing 3
b = [1,2,3] # missing 0
c = [0,1,2,3,4,5,6] # missing 7
d = [0,1,2,3,5,6,7] # missing 4
functools.reduce((lambda x, y: x^y), a) # returns 3
functools.reduce((lambda x, y: x^y), b) # returns 0
functools.reduce((lambda x, y: x^y), c) # returns 7
functools.reduce((lambda x, y: x^y), d) # returns 4
However, when I coded this up and submitted it, it failed the test cases.
My question is: in an interview setting, how can I confirm or rule out with certainty that an approach like this is not a viable solution?
In all your examples, the array is missing exactly one number. That's why XOR worked. Try not to test with the same property.
For the problem itself, you can construct a number by taking the minority of each bit.
EDIT
Why XOR worked on your examples:
When you take the XOR for all the numbers from 0 to 2^n - 1 the result is 0 (there are exactly 2^(n-1) '1' in each bit). So if you take out one number and take XOR of all the rest, the result is the number you took out because taking XOR of that number with the result of all the rest needs to be 0.
Assuming a 64-bit system with more than 4gb free memory, I would read the numbers into an array of 32-bit integers. Then I would loop through the numbers up to 32 times.
Similarly to an inverse ”Mastermind” game, I would construct a missing number bit-by-bit. In every loop, I count all numbers which match the bits, I have chosen so far and a subsequent 0 or 1. Then I add the bit which occurs less frequently. Once the count reaches zero, I have a missing number.
Example:
The numbers in decimal/binary are
1 = 01
2 = 10
3 = 11
There is one number with most-significant-bit 0 and two numbers with 1. Therefore, I take 0 as most significant bit.
In the next round, I have to match 00 and 01. This immediately leads to 00 as missing number.
Another approach would be to use a random number generator. Chances are 50% that you find a non-existing number as first guess.
Proof by counterexample: 3^4^5^6=4.

Detect characters in a string

I'm playing with ruby on codewars. The task is to create a method that accepts a string and returns a string of length 26 of 1s and 0s. The 26 characters of the string correspond to each letter of the alphabet (upper or lower case) and is 1 if the letter is in the string, 0 if not. If an a or an A is in the string, the first character of the returned string is 1 otherwise 0, if b or B is, the second is 1, and so on. For instance:
change('a **& bZ') # => '11000000000000000000000001'
Solutions:
def change input
('a'..'z').to_a.join.gsub(/[#{a.scan(/[a-zA-Z]/).uniq.join}]/i,'1').gsub(/\D/,'0')
end
vs.
def change input
('a'..'z').map { |letter| input.downcase.include?(letter) ? '1' : '0' }.join
end
How can I tell which solution is more optimal? There can be more optimal ones.
Let n be the number of letters in the input and m be the number of letters in the alphabet.
input.scan(/[a-zA-Z]/).uniq.join
is O(n) + O(n) + O(n). Fortunately, you are doing this only once (when what the pattern to gsub is evaluated) Therefore, your complexity adds up to 2*O(m) + 3*O(n) + O(m) = O(max(n, m)).
On the other hand,
input.downcase.include?(letter)
is O(n), but it is executed for each letter in the alphabet, leaving you with O(m*n) + O(m) = O(m*n).
Therefore, the first solution is asymptotically better, as O(max(n, m)) < O(m*n).
That is unless you consider the number of letters in the alphabet a small constant, in which case they are both O(n) and it's just a matter of benchmarking.
You can see that both are linear:
Running 100_000 iterations on a random 1000 letter string gave the following results (using cruby 2.2.2):
user system total real
36.160000 0.000000 36.160000 ( 36.182512)
3.910000 0.000000 3.910000 ( 3.915191)
So in practice, the second solution is far superior.
It is also way more readable.
Not really an answer to your question (which one is the most efficient) but an idea that uses binary arithmetic and the ascii table:
def change input
res = 0
input.each_byte { |c|
res |= c.between?(97,122) ? 1<<(122-c) : c.between?(65,90) ? 1<<(90-c) : 0
}
"%026b" % res
end
s = "Portez ce vieux whisky au juge blond qui fume"
puts change s
This code uses the ascii ranges 97-122 for lower-case letters and 65-90 for upper case letters. each_byte returns the ascii code c for each letters. If a letter is lower case (for example x) 122-c returns 122-120 so 2 that is the position of the corresponding bit. 1<<2 shift to the right the bits of the number 1 and you obtain 100 (binary), then the bitwise operator | (OR) with res gives 0 | 100 = 100 so 0000 0000 0000 0000 0000 0001 00 (without spaces and with leading zeros added).
Advantage: the string is parsed only once, there's no need to create an array and you only need one string manipulation (the formatted string at the end). The algorithm only uses operations that a processor is able to do very quickly.
Notices:
This code is able to deal with utf8 strings without modification since multibyte characters don't use values under 80 (Hex).
For better performances, you can replace the between?(...,...) method with simple number comparaisons:
res |= c>96 ? c<123 ? 1<<(122-c) : 0 : c<91 ? c>64 ? 1<<(90-c) : 0 : 0
With this change, this code is at least 2X faster than your second way.

Complex Numbers Seemingly Arising from Non-Complex Logarithms

I have a simple program written in TI-BASIC that converts from base 10 to base 2
0->B
1->E
Input "DEC:",D
Repeat D=0
int(round(log(D)/log(2),1))->E
round(E)->E
B+10^E->B
D-2^E->D
End
Disp B
This will sometimes return an the error 'ERR: DATA TYPE'. I checked, and this is because the variable D, will sometimes become a complex number. I am not sure how this happens.
This happens with seemingly random numbers, like 5891570. It happens with this number, but not something close to it like 5891590 Which is strange. It also happens with 1e30, But not 1e25. Another example is 1111111111111111, and not 1111111111111120.
I haven't tested this thoroughly, and don't see any pattern in these numbers. Any help would be appreciated.
The error happens because you round the logarithm to one decimal place before taking the integer part; therefore, if log(D)/log(2) is something like 8.99, you will round E up rather than down, and 2^9 will be subtracted from D instead of 2^8, causing, in the next iteration, D to become negative and its logarithm to be complex. Let's walk through your code when D is 511, which has base-2 logarithm 8.9971:
Repeat D=0 ;Executes first iteration without checking whether D=0
log(D)/log(2 ;8.9971
round(Ans,1 ;9.0
int(Ans ;9.0
round(Ans)->E ;E = 9.0
B+10^E->B ;B = 1 000 000 000
D-2^E->D ;D = 511-512 = -1
End ;loops again, since D≠0
---next iteration:----
log(D ;log(-1) = 1.364i; throws ERR:NONREAL ANS in Real mode
Rounding the logarithm any more severely than nine decimal places (nine digits is the default for round( without a "digits" argument) is completely unnecessary, as on my TI-84+ rounding errors do not accumulate: round(int(log(2^X-1)/log(2)) returns X-1 and round(int(log(2^X)/log(2)) returns X for all integer X≤28, which is high enough that precision would be lost anyway in other parts of the calculation.
To fix your code, simply round only once, and only to nine places. I've also removed the unnecessary double-initialization of E, removed your close-parens (it's still legal code!), and changed the Repeat (which always executes one loop before checking the condition D=0) to a While loop to prevent ERR:DOMAIN when the input is 0.
0->B
Input "DEC:",D
While D
int(round(log(D)/log(2->E
B+10^E->B
D-2^E->D
End
B ;on the last line, so it prints implicitly
Don't expect either your code or my fix to work correctly for D > 213 or so, because your calculator can only store 14 digits in its internal representation of any number. You'll lose the digits while you store the result into B!
Now for a trickier, optimized way of computing the binary representation (still only works for D < 213:
Input D
int(2fPart(D/2^cumSum(binomcdf(13,0
.1sum(Ans10^(cumSum(1 or Ans

Scope of variables and the digits function

My question is twofold:
1) As far as I understand, constructs like for loops introduce scope blocks, however I'm having some trouble with a variable that is define outside of said construct. The following code depicts an attempt to extract digits from a number and place them in an array.
n = 654068
l = length(n)
a = Int64[]
for i in 1:(l-1)
temp = n/10^(l-i)
if temp < 1 # ith digit is 0
a = push!(a,0)
else # ith digit is != 0
push!(a,floor(temp))
# update n
n = n - a[i]*10^(l-i)
end
end
# last digit
push!(a,n)
The code executes fine, but when I look at the a array I get this result
julia> a
0-element Array{Int64,1}
I thought that anything that goes on inside the for loop is invisible to the outside, unless I'm operating on variables defined outside the for loop. Moreover, I thought that by using the ! syntax I would operate directly on a, this does not seem to be the case. Would be grateful if anyone can explain to me how this works :)
2) Second question is about syntex used when explaining functions. There is apparently a function called digits that extracts digits from a number and puts them in an array, using the help function I get
julia> help(digits)
Base.digits(n[, base][, pad])
Returns an array of the digits of "n" in the given base,
optionally padded with zeros to a specified size. More significant
digits are at higher indexes, such that "n ==
sum([digits[k]*base^(k-1) for k=1:length(digits)])".
Can anyone explain to me how to interpret the information given about functions in Julia. How am I to interpret digits(n[, base][, pad])? How does one correctly call the digits function? I can't be like this: digits(40125[, 10])?
I'm unable to reproduce you result, running your code gives me
julia> a
1-element Array{Int64,1}:
654068
There's a few mistakes and inefficiencies in the code:
length(n) doesn't give the number of digits in n, but always returns 1 (currently, numbers are iterable, and return a sequence that only contain one number; itself). So the for loop is never run.
/ between integers does floating point division. For extracting digits, you´re better off with div(x,y), which does integer division.
There's no reason to write a = push!(a,x), since push! modifies a in place. So it will be equivalent to writing push!(a,x); a = a.
There's no reason to digits that are zero specially, they are handled just fine by the general case.
Your description of scoping in Julia seems to be correct, I think that it is the above which is giving you trouble.
You could use something like
n = 654068
a = Int64[]
while n != 0
push!(a, n % 10)
n = div(n, 10)
end
reverse!(a)
This loop extracts the digits in opposite order to avoid having to figure out the number of digits in advance, and uses the modulus operator % to extract the least significant digit. It then uses reverse! to get them in the order you wanted, which should be pretty efficient.
About the documentation for digits, [, base] just means that base is an optional parameter. The description should probably be digits(n[, base[, pad]]), since it's not possible to specify pad unless you specify base. Also note that digits will return the least significant digit first, what we get if we remove the reverse! from the code above.
Is this cheating?:
n = 654068
nstr = string(n)
a = map((x) -> x |> string |> int , collect(nstr))
outputs:
6-element Array{Int64,1}:
6
5
4
0
6
8

Bitwise comparison of binary representation of two numbers

I was looking for a way to flip all of the bits in an arbitrary-sized number (ie: arbitrary number of bits), and thought of just negating it. When I printed out
p ~0b1010 == 0b0101
It said false. I'm probably using the wrong comparison operator though. What's the proper way to check that two binary numbers are equal in representation?
One complement is not flipping all bits.
To flip bits, you need to use a xor operation with an argument with the number of bits 1 you want the significance.
Also you can't negate an binary from arbitrary number. You need to define the number of bits you are flipping. This example will show you why:
> 0b000001 ^ 0b1
=> 0
> 0b000001 ^ 0b11
=> 2
> 0b000001 ^ 0b111
=> 6
> 0b000001 ^ 0b1111
=> 14
What you can do is define that an arbitrary number of bits is the minimum number of bits you need to represent your number. This is most likely not what you want, however, the following code can do this for you:
def negate_arbitrary_number(x)
# size is the number of significants digits you have on x.
size = 0
while (a >> size) != 0
size += 1
end
# this is the binary with all number 1's on
mask = ("1"*size).to_i(2)
# xor your number
x ^ mask
end
or this code:
def negate_arbitrary_number(x)
x.to_s(2).unpack("U*").map{|x| x == 49 ? 48 : 49}.pack("U*")
end
you might want to do a simple benchmark to test it.

Resources