Ruby binary string to hexadecimal without loosing 0000 - ruby

I have a binary string that I need to convert to a hexadecimal string. I have this code that does it pretty well
binary = '0000010000000000000000000000000000000000000000000000000000000000'
binary.to_i(2).to_s(16)
This will normally work but in this situation, the first four zeros, representing the first hexadecimal place is left out. So instead of
0400000000000000 it is showing 400000000000000.
Now, I know i can loop through the binary string manually and convert 4 bits at a time, but is there a simpler way of getting to my wanted result of '0400000000000000'?
Would rjust(16,'0') be my ideal solution?

You should use string format for such complicated results.
"%016x" % binary.to_i(2)
# => "0400000000000000"

You can use this:
binary = "0000010000000000000000000000000000000000000000000000000000000000"
[binary].pack("B*").unpack("H*").first
# => "0400000000000000"
binary.to_i(2) will convert the value to a number. A number does not know about leading zeros. pack("B*") will convert 8 bit each to a byte, giving you a binary encoded String. 8 x 0 is "\x00", a zero byte. So unlike the number, the string preserves the leading zeros. unpack("H*") then converts 4 bits each into their hex representation. See Array#pack and String#unpack for more information.

Related

Pack/Unpack and base64 in Ruby

I have a string a = "hello". I can convert it to base 2 or base 16 using unpack:
a.unpack('B*')
# => ["0110100001100101011011000110110001101111"]
a.unpack('H*')
# => ["68656c6c6f"]
To convert to base 64, I tried pack:
[a].pack('m0')
# => "aGVsbG8="
but the result is not what I expected. I thought that if I have some binary representation or a string, to represent it in divided parts, I should use unpack. But it turned out not. Please help me understand it.
Per OP's clarified question, "Why do we use #pack to get base64 and #unpack to get other representations of raw data?"
The surface level reason is because Array#pack is a method that returns a String, while String#unpack is a method that returns an Array.
There are stronger conceptual reasons underlying this. The key principle is that base64 is not an array of raw bytes. Rather, it's a 7-bit-ASCII-safe string that can represent arbitrary bytes if properly (de)coded.
Each base64 character maps to a sequence of six bits. At the byte level, that's a 4:3 ratio of characters to raw bytes. Since integer powers of 2 don't divide by 3, we end up with padding more often than not, and you can't slice base64 in arbitrary places to get ranges of bytes out of it (you'd have to figure out which bytes you want in groups of three and go get the associated base64 characters in groups of four).
Arbitrary sequences of data are, fundamentally, arrays of bytes. Base64-encoded sequences are, fundamentally, strings: data sequences constrained to the range of bytes safely transmissible and displayable as text.
Base64 is the encapsulation (or "packing") of a data array into a string.
The encoded text is correct, to validate use below online tool:
https://www.base64encode.org/
text:
hello
Encoded Base64:
aGVsbG8=
Useful resource:
https://idiosyncratic-ruby.com/4-what-the-pack.html

Rubyist way to decode this encoded string assuming invariant ASCII encoding

My program is a decoder for a binary protocol. One of the fields in that binary protocol is an encoded String. Each character in the String is printable, and represents an integral value. According to the spec of the protocol I'm decoding, the integral value it represents is taken from the following table, where all possible characters are listed:
Character Value
========= =====
0 0
1 1
2 2
3 3
[...]
: 10
; 11
< 12
= 13
[...]
B 18
So for example, the character = represents an integral 13.
My code was originally using ord to get the ASCII code for the character, and then subtracting 48 from that, like this:
def Decode(val)
val[0].ord - 48
end
...which works perfectly, assuming that val consists only of characters listed in that table (this is verified elsewhere).
However, in another question, I was told that:
You are asking for a Ruby way to use ord, where using it is against
the Ruby way.
It seems to me that ord is exactly what I need here, so I don't understand why using ord here is not a Rubyist way to do what I'm trying to do.
So my questions are:
First and foremost, what is the Rubyist way to write my function above?
Secondary, why is using ord here a non-Rubyist practice?
A note on encoding: This protocol which I'm decoding specifies precisely that these strings are ASCII encoded. No other encoding is possible here. Protocols like this are extremely common in my industry (stock & commodity markets).
I guess the Rubyistic way, and faster, to decode the string into an array of integers is the unpack method:
"=01:".unpack("C*").map {|v| v - 48}
>> [13, 0, 1, 10]
The unpack method, with "C*" param, converts each character to an 8-bit unsigned integer.
Probably ord is entirely safe and appropriate in your case, as the source data should always be encoded the same way. Especially if when reading the data you set the encoding to 'US-ASCII' (although the format used looks safe for 'ASCII-8BIT', 'UTF-8' and 'ISO-8859', which may be the point of it - it seems resilient to many conversions, and does not use all possible byte values). However, ord is intended to be used with character semantics, and technically you want byte semantics. With basic ASCII and variants there is no practical difference, all byte values below 128 are the same character code.
I would suggest using String#unpack as a general method for converting binary input to Ruby data types, but there is not an unpack code for "use this byte with an offset", so that becomes a two-part process.

How to encode a number as a string such that the lexicographic order of the generated string is in the same order as the numeric order

For eg. if we have two strings 2 and 10, 10 will come first if we order lexicographically.
The very trivial sol will be to repeat a character n number of time.
eg. 2 can be encoded as aa
10 as aaaaaaaaaa
This way the lex order is same as the numeric one.
But, is there a more elegant way to do this?
When converting the numbers to strings make sure that all the strings have the same length, by appending 0s in the front if necessary. So 2 and 10 would be encoded as "02" and "10".
While kjampani's solution is probably the best and easiest in normal applications, another way which is more space-efficient is to prepend every string with its own length. Of course, you need to encode the length in a way which is also consistently sorted.
If you know all the strings are fairly short, you can just encode their length as a fixed-length base-X sequence, where X is the number of character codes you're willing to use (popular values are 64, 96, 255 and 256.) Note that you have to use the character codes in lexicographical order, so normal base64 won't work.
One variable-length order-preserving encoding is the one used by UTF-8. (Not UTF-8 directly, which has a couple of corner cases which will get in the way, but the same encoding technique. The order-preserving property of UTF-8 is occasionally really useful.) The full range of such compressed codes can encode values up to 42 bits long, with an average of five payload bits per byte. That's sufficient for pretty long strings; four terabyte long strings are pretty rare in the wild; but if you need longer, it's possible, too, by extending the size prefix over more than one byte.
Break the string into successive sub strings of letters and numbers and then sort by comparing each substring as an integer if it's an numeric string
"aaa2" ---> aaa + 2
"aaa1000" ---> aaa + 1000
aaa == aaa
Since they're equal, we continue:
1000 > 2
Hence, aaa1000 > aaa2.

Best way to add Hex values in Ruby

I have a hex value (0x0020004E0000 ... which is the base address to a hardware address). I need to add 0x04 to the base for each register. I have been doing this by first converting the base address to a base 10 number, then adding 4 to that value. The sum I then take and convert back to hex all via the string class .to_s and .to_i.
Is there a better way to do this so I'm not converting back-and-forth between base 10 and base 16 all the time? (FYI, in my previous AppleScript script, I punted hex math to the OS and let bc take care of the addition for me).
0x0020004E0000 + 0x04
or simply
0x0020004E0000 + 4
You have four ways of representing integer values in Ruby
64 # integer
0x40 # hexadecimal
0100 # octal
0b1000000 # binary
# These are all 64.
A number is a number is a number, no matter how it is represented internally or displayed to the user. Just add them as you would any other number. If you want to view them as Hex later then fine; format them for output.
You are confusing representation with value. Ruby can parse a number represented in Hex just as well as it can parse a decimal, binary, or octal value.
0x04 (Hex) == 4 (decimal) == 100 (binary)
All the same thing.

How does string.unpack work in Ruby?

Can someone explain why how the result for the following unpack is computed?
"aaa".unpack('h2H2') #=> ["16", "61"]
In binary, 'a' = 0110 0001. I'm not sure how the 'h2' can become 16 (0001 0000) or 'H2' can become 61 (0011 1101).
Not 16 - it is showing 1 and then 6. h is giving the hex value of each nibble, so you get 0110 (6), then 0001 (1), depending on whether its the high or low bit you're looking at. Use the high nibble first and you get 61, which is hex for 97 - the value of 'a'
Check out the Programming Ruby reference on unpack. Here's a snippet:
Decodes str (which may contain binary
data) according to the format string,
returning an array of each value
extracted. The format string consists
of a sequence of single-character
directives, summarized in Table 22.8
on page 379. Each directive may be
followed by a number, indicating the
number of times to repeat with this
directive. An asterisk ("*") will
use up all remaining elements. The
directives sSiIlL may each be followed
by an underscore ("_") to use the
underlying platform's native size for
the specified type; otherwise, it uses
a platform-independent consistent
size. Spaces are ignored in the format
string. See also Array#pack on page
286.
And the relevant characters from your example:
H Extract hex nibbles from each character (most significant first).
h Extract hex nibbles from each character (least significant first).
The hex code of char a is 61.
Template h2 is a hex string (low nybble first), H2 is the same with high nibble first.
Also see the perl documentation.

Resources