convert ruby String to Bignum - ruby

I have a binary string, say
x = "c1\x98\xCCf3\x1C\x00.\x01\xC7\x00\xC0"
(actually much longer). I need to have it represented as Bignum, for the purposes of further conversion to base-something sequences (something > 36).
x.unpack('H*')[0].to_i
yields an Integer from first bytes of the value, and not a Bignum.

There's no need to use unpack and go through an intermediate hex string representation.
To convert a binary string directly to a number (which will automatically be a Bignum as needed), you can do:
"\xc1\x98\xCC\xf3\x1C\x00".bytes.inject {|a, b| (a << 8) + b }
=> 212862017674240

The default base for String#to_i is, of course, 10 but you're trying to convert hex so you want .to_i(16). If you don't specify the base, to_i will stop when it sees the first non-decimal value and that's where your truncation comes from.
You want to say this:
x.unpack('H*')[0].to_i(16)
For example:
>> "633198cc66331c0001c700c0633198cc66331c0001c700c063312e98cc66331c0001c700c0".to_i
=> 633198
>> "633198cc66331c0001c700c0633198cc66331c0001c700c063312e98cc66331c0001c700c0".to_i(16)
=> 49331350698902676183344474146684368690988113012187221237314170009285390086987127695278272

Related

Ruby: Why does unpack('Q') give a different result than manual conversion?

I'm trying to write a function that will .unpack('Q') (unpack to uint64_t) without access to the unpack method.
When I manually convert from string to binary to uint64, I get a different result than .unpack('Q'):
Integer('abcdefgh'.unpack('B*').first, 2) # => 7017280452245743464
'abcdefgh'.unpack('Q').first # => 7523094288207667809
I don't understand what's happening here.
I also don't understand why the output of .unpack('Q') is fixed regardless of the size of the input. If I add a thousand characters after 'abcdefgh' and then unpack('Q') it, I still just get [7523094288207667809]?
Byte order matters:
Integer('abcdefgh'.
each_char.
flat_map { |c| c.unpack('B*') }.
reverse.
join, 2)
#⇒ 7523094288207667809
'abcdefgh'.unpack('Q*').first
#⇒ 7523094288207667809
Your code produces the wrong result because after converting to binary, bytes should be reversed.
For the last part of your question, the reason the output of .unpack('Q') doesn't change with a longer input string is because the format is specifying a single 64-bit value so any characters after the first 8 are ignored. If you specified a format of Q2 and a 16 character string you'd decode 2 values:
> 'abcdefghihjklmno'.unpack('Q2')
=> [7523094288207667809, 8029475498074204265]
and again you'd find adding additional characters wouldn't change the result:
> 'abcdefghihjklmnofoofoo'.unpack('Q2')
=> [7523094288207667809, 8029475498074204265]
A format of Q* would return as many values as multiples of 64-bits were in the input:
> 'abcdefghihjklmnopqrstuvw'.unpack('Q*')
=> [7523094288207667809, 8029475498074204265, 8608196880778817904]
> 'abcdefghihjklmnopqrstuvwxyz'.unpack('Q*')
=> [7523094288207667809, 8029475498074204265, 8608196880778817904]

How to convert bytes in number into a string of characters? (character representation of a number)

How do I easily convert a number, e.g. 0x616263, equivalently 6382179 in base 10, into a string by dividing the number up into sequential bytes? So the example above should convert into 'abc'.
I've experimented with Array.pack but cant figure out how to get it to convert more than one byte in the number, e.g. [0x616263].pack("C*") returns 'c'.
I've also tried 0x616263.to_s(256), but that throws an ArgumentError: invalid radix. I guess it needs some sort of encoding information?
(Note: Other datatypes in pack like N work with the example I've given above, but only because it fits within 4 bytes, so e.g. [0x616263646566].pack("N") gives cdef, not abcdef)
This question is vaguely similar to this one, but not really. Also, I sort of figured out how to get the hex representation string from a character string using "abcde".unpack("c*").map{|c| c.to_s(16)}.join(""), which gives '6162636465'. I basically want to go backwards.
I don't think this is an X-Y problem, but in case it is - I'm trying to convert a number I've decoded with RSA into a character string.
Thanks for any help. I'm not too experienced with Ruby. I'd also be interested in a Python solution (for fun), but I don't know if its right to add tags for two separate programming languages to this question.
To convert a single number 0x00616263 into 3 characters, what you really need to do first is separate them into three numbers: 0x00000061, 0x00000062, and 0x00000063.
For the last number, the hex digits you want are already in the correct place. But for the other two, you have to do a bitshift using >> 16 and >> 8 respectively.
Afterwards, use a bitwise and to get rid of the other digits:
num1 = (0x616263 >> 16) & 0xFF
num2 = (0x616263 >> 8) & 0xFF
num3 = 0x616263 & 0xFF
For the characters, you could then do:
char1 = ((0x616263 >> 16) & 0xFF).chr
char2 = ((0x616263 >> 8) & 0xFF).chr
char3 = (0x616263 & 0xFF).chr
Of course, bitwise operations aren't very Ruby-esque. There are probably more Ruby-like answers that someone else might provide.
64 bit integers
If your number is smaller than 2**64 (8 bytes), you can :
convert the "big-endian unsigned long long" to 8 bytes
remove the leading zero bytes
Ruby
[0x616263].pack('Q>').sub(/\x00+/,'')
# "abc"
[0x616263646566].pack('Q>').sub(/\x00+/,'')
# "abcdef"
Python 2 & 3
In Python, pack returns bytes, not a string. You can use decode() to convert bytes to a String :
import struct
import re
print(re.sub('\x00', '', struct.pack(">Q", 0x616263646566).decode()))
# abcdef
print(re.sub('\x00', '', struct.pack(">Q", 0x616263).decode()))
# abc
Large numbers
With gsub
If your number doesn't fit in 8 bytes, you could use a modified version of your code. This is shorter and outputs the string correctly if the first byte is smaller than 10 (e.g. for "\t") :
def decode(int)
if int < 2**64
[int].pack('Q>').sub(/\x00+/, '')
else
nhex = int.to_s(16)
nhex = '0' + nhex if nhex.size.odd?
nhex.gsub(/../) { |hh| hh.to_i(16).chr }
end
end
puts decode(0x616263) == 'abc'
# true
puts decode(0x616263646566) == 'abcdef'
# true
puts decode(0x0961) == "\ta"
# true
puts decode(0x546869732073656e74656e63652069732077617920746f6f206c6f6e6720666f7220616e20496e743634)
# This sentence is way too long for an Int64
By the way, here's the reverse method :
def encode(str)
str.reverse.each_byte.with_index.map { |b, i| b * 256**i }.inject(:+)
end
You should still check if your RSA code really outputs arbitrary large numbers or just an array of integers.
With shifts
Here's another way to get the result. It's similar to #Nathan's answer, but it works for any integer size :
def decode(int)
a = []
while int>0
a << (int & 0xFF)
int >>= 8
end
a.reverse.pack('C*')
end
According to fruity, it's twice as fast as the gsub solution.
I'm currently rolling with this:
n = 0x616263
nhex = n.to_s(16)
nhexarr = nhex.scan(/.{1,2}/)
nhexarr = nhexarr.map {|e| e.to_i(16)}
out = nhexarr.pack("C*")
But was hoping for a concise/built-in way to do this, so I'll leave this answer unaccepted for now.

Keep leading zeroes when converting string to integer

For no particular reason, I am trying to add a #reverse method to the Integer class:
class Integer
def reverse
self.to_s.reverse.to_i
end
end
puts 1337.reverse # => 7331
puts 1000.reverse # => 1
This works fine except for numbers ending in a 0, as shown when 1000.reverse returns 1 rather than 0001. Is there any way to keep leading zeroes when converting a string into an integer?
Short answer: no, you cant.
2.1.5 :001 > 0001
=> 1
0001 doesn't make sense at all as Integer. In the Integer world, 0001 is exactly as 1.
Moreover, the number of leading integer is generally irrelevant, unless you need to pad some integer for displaying, but in this case you are probably converting it into another kind of object (e.g a String).
If you want to keep the integer as Fixnum you will not be able to add leading zeros.
The real question is: why do you want/need leading zeros? You didn't provide such information in the question. There are probably better ways to achieve your result (such as wrapping the value into a decorator object if the goal is to properly format a result for display).
Does rjust work for you?
1000.to_s.reverse.to_i.to_s.rjust(1000.to_s.size,'0') #=> "0001"
self.to_s.to_i does convert the integer to a string and this string "0001" to an integer value. Since leading zeros are not required for regular numbers they are dropped. In other words: Keeping leading zeros does not make sense for calculations, so they are dropped. Just ask yourself how the integer 1 would look like if leading zeros would be preserved, since it represents a 32 bit number. If you need the leading zeros, there is no way around a string.
BUT 10 + "0001".to_i returns 11, so you probably need to override the + method of the String class.

How do I convert hex to binary (and vice versa) in Ruby, WHILE maintaining leading zeroes?

I have a data structure that I'd like to convert back and forth from hex to binary in Ruby. The simplest approach for a binary to hex is '0010'.to_i(2).to_s(16) - unfortunately this does not preserve leading zeroes (due to the to_i call), as one may need with data structures like cryptographic keys (which also vary with the number of leading zeroes).
Is there an easy built in way to do this?
I think you should have a firm idea of how many bits are in your cryptographic key. That should be stored in some constant or variable in your program, not inside individual strings representing the key:
KEY_BITS = 16
The most natural way to represent a key is as an integer, so if you receive a key in a hex format you can convert it like this (leading zeros in the string do not matter):
key = 'a0a0'.to_i(16)
If you receive a key in a (ASCII) binary format, you can convert it like this (leading zeros in the string do not matter):
key = '101011'.to_i(2)
If you need to output a key in hex with the right number of leading zeros:
key.to_s(16).rjust((KEY_BITS+3)/4, '0')
If you need to output a key in binary with the right number of leading zeros:
key.to_s(2).rjust(KEY_BITS, '0')
If you really do want to figure out how many bits might be in a key based on a (ASCII) binary or hex string, you can do:
key_bits = binary_str.length
key_bits = hex_str.length * 4
The truth is, leading zeros are not part of the integer value. I mean, it's a little detail related to representation of this value, not the value itself. So if you want to preserve properties of representation, it may be best not to get to underlying values at all.
Luckily, hex<->binary conversion has one neat property: each hexadecimal digit exactly corresponds to 4 binary digits. So assuming you only get binary numbers that have number of digits divisible by 4 you can just construct two dictionaries for constructing back and forth:
# Hexadecimal part is easy
hex = [*'0'..'9', *'A'..'F']
# Binary... not much longer, but a bit trickier
bin = (0..15).map { |i| '%04b' % i }
Note the use of String#% operator, that formats the given value interpreting the string as printf-style format string.
Okay, so these are lists of "digits", 16 each. Now for the dictionaries:
hex2bin = hex.zip(bin).to_h
bin2hex = bin.zip(hex).to_h
Converting hex to bin with these is straightforward:
"DEADBEEF".each_char.map { |d| hex2bin[d] }.join
Converting back is not that trivial. I assume we have a "good number" that can be split into groups of 4 binary digits each. I haven't found a cleaner way than using String#scan with a "match every 4 characters" regex:
"10111110".scan(/.{4}/).map { |d| bin2hex[d] }.join
The procedure is mostly similar.
Bonus task: implement the same conversion disregarding my assumption of having only "good binary numbers", i. e. "110101".
"I-should-have-read-the-docs" remark: there is Hash#invert that returns a hash with all key-value pairs inverted.
This is the most straightforward solution I found that preserves leading zeros. To convert from hexadecimal to binary:
['DEADBEEF'].pack('H*').unpack('B*').first # => "11011110101011011011111011101111"
And from binary to hexadecimal:
['11011110101011011011111011101111'].pack('B*').unpack1('H*') # => "deadbeef"
Here you can find more information:
Array#pack: https://ruby-doc.org/core-2.7.1/Array.html#method-i-pack
String#unpack1 (similar to unpack): https://ruby-doc.org/core-2.7.1/String.html#method-i-unpack1

Converting a hexadecimal number to binary in ruby

I am trying to convert a hex value to a binary value (each bit in the hex string should have an equivalent four bit binary value). I was advised to use this:
num = "0ff" # (say for eg.)
bin = "%0#{num.size*4}b" % num.hex.to_i
This gives me the correct output 000011111111. I am confused with how this works, especially %0#{num.size*4}b. Could someone help me with this?
You can also do:
num = "0ff"
num.hex.to_s(2).rjust(num.size*4, '0')
You may have already figured out, but, num.size*4 is the number of digits that you want to pad the output up to with 0 because one hexadecimal digit is represented by four (log_2 16 = 4) binary digits.
You'll find the answer in the documentation of Kernel#sprintf (as pointed out by the docs for String#%):
http://www.ruby-doc.org/core/classes/Kernel.html#M001433
This is the most straightforward solution I found to convert from hexadecimal to binary:
['DEADBEEF'].pack('H*').unpack('B*').first # => "11011110101011011011111011101111"
And from binary to hexadecimal:
['11011110101011011011111011101111'].pack('B*').unpack1('H*') # => "deadbeef"
Here you can find more information:
Array#pack: https://ruby-doc.org/core-2.7.1/Array.html#method-i-pack
String#unpack1 (similar to unpack): https://ruby-doc.org/core-2.7.1/String.html#method-i-unpack1
This doesn't answer your original question, but I would assume that a lot of people coming here are, instead of looking to turn hexadecimal to actual "0s and 1s" binary output, to decode hexadecimal to a byte string representation (in the spirit of such utilities as hex2bin). As such, here is a good method for doing exactly that:
def hex_to_bin(hex)
# Prepend a '0' for padding if you don't have an even number of chars
hex = '0' << hex unless (hex.length % 2) == 0
hex.scan(/[A-Fa-f0-9]{2}/).inject('') { |encoded, byte| encoded << [byte].pack('H2') }
end
Getting back to hex again is much easier:
def bin_to_hex(bin)
bin.unpack('H*').first
end
Converting the string of hex digits back to binary is just as easy. Take the hex digits two at a time (since each byte can range from 00 to FF), convert the digits to a character, and join them back together.
def hex_to_bin(s) s.scan(/../).map { |x| x.hex.chr }.join end

Resources