How can I access a ruby integer's sign bit? - ruby

I would like to access a ruby integer's sign bit directly.
Ruby allows access to an int's bits through the [] operator:
1[0] -> 1
2[1] -> 1
I tried -1[-1] (to access the last one), but that doesn't work.

You can understand this by looking at the documentation for Fixnum#[] :
Returns the +n+th bit in the binary representation of fix, where fix[0] is the least significant bit
https://ruby-doc.org/core-2.2.0/Fixnum.html#method-i-5B-5D
A Fixnum is not an array, thus its #[] function can (and does) behave differently to what you'd expect from an array.
Solution 1
To access the most significant bit you can use #bit_length - 1
> a = 5
=> 5
> a[a.bit_length - 1]
=> 1
However, this excludes the sign bit for negative numbers, so to get the sign bit, do not remove 1 from the bit_length
> a = -5
=> 5
> a[a.bit_length]
=> 1
> a = 42
=> 42
> a[a.bit_length]
=> 0
Solution 2
You could also just use a < 0 comparison:
> a = -5
=> -5
> sign_bit = a < 0 ? 1 : 0
=> 1
Solution 3
First of all, let's read the documentation for #size:
Returns the number of bytes in the machine representation of fix.
From the implementation of the function we can gather that if the stored number is inside the bounds of a 32-bit signed integer if will be stored in one. In this case #size returns 4. If it is larger it will be stored in a 64-bit integer and #size will return 8.
However not all of these bits are in use. If you store the number 5 (0b101) your 32 bits will be occupied as such
10100000 00000000 00000000 00000000
Negative numbers are most of the time stored as the 2s complement (not sure if applicable for ruby) which would mean that if you store the number -5 your 32 bits will be occupied as such
11011111 11111111 11111111 11111111
Thus you can also access the sign bit with the following code:
x[x.size * 8 - 1]
Bonus
If you're curious about why 5[-1] does not throw an exception and still returns a number, you can look at the source code for Fixnum#[] (on the documentation page previously mentioned).
if (i < 0) return INT2FIX(0);
It is made to just return 0 if your index is a negative number.

Related

hpack encoding integer significance

After reading this, https://httpwg.org/specs/rfc7541.html#integer.representation
I am confused about quite a few things, although I seem to have the overall gist of the idea.
For one, What are the 'prefixes' exactly/what is their purpose?
For two:
C.1.1. Example 1: Encoding 10 Using a 5-Bit Prefix
The value 10 is to be encoded with a 5-bit prefix.
10 is less than 31 (2^5 - 1) and is represented using the 5-bit prefix.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| X | X | X | 0 | 1 | 0 | 1 | 0 | 10 stored on 5 bits
+---+---+---+---+---+---+---+---+
What are the leading Xs? What is the starting 0 for?
>>> bin(10)
'0b1010'
>>>
Typing this in the python IDE, you see almost the same output... Why does it differ?
This is when the number fits within the number of prefix bits though, making it seemingly simple.
C.1.2. Example 2: Encoding 1337 Using a 5-Bit Prefix
The value I=1337 is to be encoded with a 5-bit prefix.
1337 is greater than 31 (25 - 1).
The 5-bit prefix is filled with its max value (31).
I = 1337 - (25 - 1) = 1306.
I (1306) is greater than or equal to 128, so the while loop body executes:
I % 128 == 26
26 + 128 == 154
154 is encoded in 8 bits as: 10011010
I is set to 10 (1306 / 128 == 10)
I is no longer greater than or equal to 128, so the while loop terminates.
I, now 10, is encoded in 8 bits as: 00001010.
The process ends.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| X | X | X | 1 | 1 | 1 | 1 | 1 | Prefix = 31, I = 1306
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1306>=128, encode(154), I=1306/128
| 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 10<128, encode(10), done
+---+---+---+---+---+---+---+---+
The octet-like diagram shows three different numbers being produced... Since the numbers are produced throughout the loop, how do you replicate this octet-like diagram within an integer? What is the actual final result? The diagram or "I" being 10, or 00001010.
def f(a, b):
if a < 2**b - 1:
print(a)
else:
c = 2**b - 1
remain = a - c
print(c)
if remain >= 128:
while 1:
e = remain % 128
g = e + 128
remain = remain / 128
if remain >= 128:
continue
else:
print(remain)
c+=int(remain)
print(c)
break
As im trying to figure this out, I wrote a quick python implementation of it, It seems that i am left with a few useless variables, one being g which in the documentation is the 26 + 128 == 154.
Lastly, where does 128 come from? I can't find any relation between the numbers besides the fact 2 raised to the 7th power is 128, but why is that significant? Is this because the first bit is reserved as a continuation flag? and an octet contains 8 bits so 8 - 1 = 7?
For one, What are the 'prefixes' exactly/what is their purpose?
Integers are used in a few places in HPACK messages and often they have leading bits that cannot be used to for the actual integer. Therefore, there will often be a few leading digits that will be unavailable to use for the integer itself. They are represented by the X. For the purposes of this calculation it doesn't make what those Xs are: could be 000, or 111, or 010 or...etc. Also, there will not always be 3 Xs - that is just an example. There could only be one leading X, or two, or four...etc.
For example, to look up a previous HPACK decoded header, we use 6.1. Indexed Header Field Representation which starts with a leading 1, followed by the table index value. Therefore that 1 is the X in the previous example. We have 7-bits (instead of only 5-bits in the original example in your question). If the table index value is 127 or less we can represent it using those 7-bits. If it's >= 127 then we need to do some extra work (we'll come back to this).
If it's a new value we want to add to the table (to reuse in future requests), but we already have that header name in the table (so it's just a new value for that name we want as a new entry) then we use 6.2.1. Literal Header Field with Incremental Indexing. This has 2 bits at the beginning (01 - which are the Xs), and we only have 6-bits this time to represent the index of the name we want to reuse. So in this case there are two Xs.
So don't worry about there being 3 Xs - that's just an example. In the above examples there was one X (as first bit had to be 1), and two Xs (as first two bits had to be 01) respectively. The Integer Representation section is telling you how to handle any prefixed integer, whether prefixed by 1, 2, 3... etc unusable "X" bits.
What are the leading Xs? What is the starting 0 for?
The leading Xs are discussed above. The starting 0 is just because, in this example we have 5-bits to represent the integers and only need 4-bits. So we pad it with 0. If the value to encode was 20 it would be 10100. If the value was 40, we couldn't fit it in 5-bits so need to do something else.
Typing this in the python IDE, you see almost the same output... Why does it differ?
Python uses 0b to show it's a binary number. It doesn't bother showing any leading zeros. So 0b1010 is the same as 0b01010 and also the same as 0b00001010.
This is when the number fits within the number of prefix bits though, making it seemingly simple.
Exactly. If you need more than the number of bits you have, you don't have space for it. You can't just use more bits as HPACK will not know whether you are intending to use more bits (so should look at next byte) or if it's just a straight number (so only look at this one byte). It needs a signal to know that. That signal is using all 1s.
So to encode 40 in 5 bits, we need to use 11111 to say "it's not big enough", overflow to next byte. 11111 in binary is 31, so we know it's bigger than that, so we'll not waste that, and instead use it, and subtract it from the 40 to give 9 left to encode in the next byte. A new additional byte gives us 8 new bits to play with (well actually only 7 as we'll soon discover, as the first bit is used to signal a further overflow). This is enough so we can use 00001001 to encode our 9. So our complex number is represented in two bytes: XXX11111 and 00001001.
If we want to encode a value bigger than can fix in the first prefixed bit, AND the left over is bigger than 127 that would fit into the available 7 bits of the second byte, then we can't use this overflow mechanism using two bytes. Instead we use another "overflow, overflow" mechanism using three bytes:
For this "overflow, overflow" mechanism, we set the first byte bits to 1s as usual for an overflow (XXX11111) and then set the first bit of the second byte to 1. This leaves 7 bits available to encode the value, plus the next 8 bits in the third byte we're going to have to use (actually only 7 bits of the third byte, because again it uses the first bit to indicate another overflow).
There's various ways they could go have gone about this using the second and third bytes. What they decided to do was encode this as two numbers: the 128 mod, and the 128 multiplier.
1337 = 31 + (128 * 10) + 26
So that means the frist byte is set to 31 as per pervious example, the second byte is set to 26 (which is 11010) plus the leading 1 to show we're using the overflow overflow method (so 100011010), and the third byte is set to 10 (or 00001010).
So 1337 is encoded in three bytes: XXX11111 100011010 00001010 (including setting X to whatever those values were).
Using 128 mod and multiplier is quite efficient and means this large number (and in fact any number up to 16,383) can be represented in three bytes which is, not uncoincidentally, also the max integer that can be represented in 7 + 7 = 14 bits). But it does take a bit of getting your head around!
If it's bigger than 16,383 then we need to do another round of overflow in a similar manner.
All this seems horrendously complex but is actually relatively simply, and efficiently, coded up. Computers can do this pretty easily and quickly.
It seems that i am left with a few useless variables, one being g
You are not print this value in the if statement. Only the left over value in the else. You need to print both.
which in the documentation is the 26 + 128 == 154.
Lastly, where does 128 come from? I can't find any relation between the numbers besides the fact 2 raised to the 7th power is 128, but why is that significant? Is this because the first bit is reserved as a continuation flag? and an octet contains 8 bits so 8 - 1 = 7?
Exactly, it's because the first bit (value 128) needs to be set as per explanation above, to show we are continuing/overflowing into needing a third byte.

Why n bitwise and -n always return the right most bit (last bit)

Here is the python code snippet:
1 & -1 # 1
2 & -2 # 2
3 & -3 # 1
...
It seems any n & -n always return right most (last) bit, I don't really know why. Can someone help me to understand this?
It's due to the way that negative numbers are represented in binary, which is called two's complement representation.
To create the two's complement of some number n (in other words, to create the representation of -n):
Invert all the bits
Add 1
So in other words, when you write 1 & -1 it really means 1 & ((~1)+1). The initial ~1 gives the value 1111110 and adding one gives 11111111. (Let's stick with 8 bits for these examples.) ANDing that values with 1 gives just 1.
In the next case, 2 & -2 means 2 & ((~2)+1). Inverting 2 gives 11111101 and adding one gives 11111110. Then AND with 2 (10 in binary) gives 2.
Finally 3 & -3 means 3 & ((~3)+1). Invert 3 gives 11111100, add 1 gives 11111101, and AND with 3 (11 binary) gives 1.
~x = -1 -x
so
-x = ~x + 1
When you take the compliment of x (~x), all the 0-bits turn to 1 and all the 1 bits turn to zero. e.g. 101100 -> 010011.
When you add 1, the consecutive 1s on the right change to 0 and the first 0 bit gets set to 1: 010011 -> 010100
If you & that with the original, the 0 bits at the top that changed to 1 come out 0. The 1 bits at the bottom you flipped to 0 by adding come out 0. Only the rightmost 1 bit, which turned into the rightmost 0 bit in the complement and got reset to 1 by the addition, is 1 on both sides: 101100 & 010100 -> 000100
The integers are stored in the memory in the binary form. The non-negative integers are stored as it is in as they are in their binary but the negative numbers are stored in the two's complement form. For example take any arbitrary number 158.
158 = 0000000010011110
while its negative ie.
-158 = 1111111101100010
Take for any number and bitwise AND with its negative then you will get rightmost set bit. This is because in the process of conversion of two's complement we start from right and put the bits as it is till we encounter our first set bit. That is the right most set bit is being written as it is. Then We flip the digits in the left starting from here.
However Above procedure is just a shortcut method for calculating 2's complement. For actual process, you have to first take 1's complement of the number(flipping all bits set to unset and unset to set) and then add 1 to the whole result. Now here is insight of why does this shortcut works everytime
Why does this two's complement shortcut work?
This will give you more insight https://www.geeksforgeeks.org/efficient-method-2s-complement-binary-string/
And take some numbers and work on examples and see yourself.
The catch is in calculating the 2's complement not in performing bitwise AND operation, AND always needs same things to be same to give true.
What 2's complement is doing is, it is making the rightmost digit of both numbers taking part in bitwise AND operation, Look below how 2's complement operation is being calculated and you will find that we represent the number(decimal) in binary and then to calculate the 2's complement we start from right and copy all binary digits the same until we see the first 1, and when we see the first 1 then we make all left of it as opposite the the previous.
The catch is:
Say you want 2's complement of 4 i.e (-4) , so what it says is represent the decimal in binary and copy all bits(0) from right till you see the first 1 and after that reverse all 0s and 1s.
Example: we want 2's complement of 6 -> 0 1 1 0 = 0 0 1 0 , From right of 0110 we start and till we see the first 1 we copy exactly what is in first then we reverse all 0s and 1s.
Another operation with 2's complement
0100
1100 ( Bold are just same as above)
Now it is obvious that when you do bitwise AND then the right most
digit will only make through the AND operation as you need both to
be equal to get through AND.

Why is the sign flag set even on addition of 5FH and 33H?

I am studying about 8085 microprocessor and I came across an instruction - ADC.
In the example, they gave the accumulator [A] = 57H and a register [D] = 33H and initially carry was set, so [CY] = 01H
Instruction: ADC D
They added 57H, 33H and 01H
0 1 0 1 1 1 1 1
0 0 1 1 0 0 1 1
0 0 0 0 0 0 0 1
Answer: 1 0 0 1 0 0 1 1.
They said that the sign flag is now set as the MSB contains the higher bit. I do not understand why is the answer considered to be negative, even though an addition operation is conducted.
It's negative by definition. Under two's complement, any number where the top bit is set is a negative number. 57 + 33 + 1 = 8B, which has the top bit set. Therefore it is a negative number.
In your case it's unfortunate that the register isn't large enough to hold the true result. If it were a 16-bit register you'd have computed 0057 + 0033 + 0001 = 008B, which doesn't have the top bit set. If you intended to keep signed numbers about, you've lost information — you can no longer tell whether this is really meant to be decimal 139 or -117. But if you're working purely in unsigned numbers then you can just ignore the sign flag. You know it doesn't apply.
You can also use the overflow flag to check whether the result has the wrong sign. Overflow is set for addition if two positives seemingly produce a negative or if two negatives seemingly produce a positive.
In your case both sign and overflow should be set.

Consolidate 10 bit Value into a Unique Byte

As part of an algorithm I'm writing, I need to find a way to convert a 10-bit word into a unique 8-bit word. The 10-bit word is made up of 5 pairs, where each pair can only ever equal 0, 1 or 2 (never 3). For example:
|00|10|00|01|10|
This value needs to somehow be consolidated into a single, unique byte.
As each pair can never equal 3, there are a wide range of values that this 10-bit word will never represent, which makes me think that it is possible to create an algorithm to perform this conversion. The simplest way to do this would be to use a lookup table, but it seems like a waste of resources to store ~680 values which will only be used once in my program. I've already tried to incorporate one of the pairs into the others somehow, but every attempt I've made has resulted in a non-unique value, and I'm now very quickly running out of ideas!
Any help?
The number you have is essentially base 3. You just need to convert this to base 2.
There are 5 pairs, so 3^5 = 243 numbers. And 8 bits is 2^8 = 256 numbers, so it's possible.
The simplest way to convert between bases is to go to base 10 first.
So, for your example:
00|10|00|01|10
Base 3: 02012
Base 10: 2*3^3 + 1*3^1 + 2*3^0
= 54 + 3 + 2
= 59
Base 2:
59 % 2 = 1
/2 29 % 2 = 1
/2 14 % 2 = 0
/2 7 % 2 = 1
/2 3 % 2 = 1
/2 1 % 2 = 1
So 111011 is your number in binary
This explains the above process in a bit more detail.
Note that once you have 59 above stored in a 1-byte integer, you'll probably already have what you want, thus explicitly converting to base 2 might not be necessary.
What you basically have is a base 3 number and you want to convert this to a single number 0 - 255, luckily 5 digits in ternary (base 3) gives 243 combinations.
What you'll need to do is:
Digit Action
( 1st x 3^4)
+ (2nd x 3^3)
+ (3rd x 3^2)
+ (4th x 3)
+ (5th)
This will give you a number 0 to 242.
You are considering to store some information in a byte. A byte can contain at most 2 ^ 8 = 256 status.
Your status is totally 3 ^ 5 = 243 < 256. That make the transfer possible.
Consider your pairs are ABCDE (each character can be 0, 1 or 2)
You can just calculate A*3^4 + B*3^3 + C*3^2 + D*3 + E as your result. I guarantee the result will be in range 0 -- 255.

extract bit in Ruby Integers

I need to get the n-th bit of an Integer, either signed or unsigned, in Ruby.
x = 123 # that is 1111011
n = 2 # bit 2 is ...0
The following piece of code doesn't work in the general case:
x.to_s(2)[-(n+1)] #0 good! but...
because of negative numbers not represented as 2-complement:
-123.to_s(2) # "-1111011"
So how to proceed?
x = 123 # that is 1111011
n = 2 # bit 2 is ...0
x[n] # => 0
-123[2] # => 1
def f x, bit
(x & 1 << bit) > 0 ? 1 : 0
end
You could try the Bindata lib.
There is a function to represent an integer's binary representation as a string, and after that, you can do what you like with it.

Resources