String contains NUL bytes - ruby

I'm trying to decode this file that is in IBM437 into readable UTF I'm at the point where I think I've almost got it but I'm getting an ArgumentError where the string contains nul bytes, I'm aware of how to gsub out nul bytes using:
.gsub("\u0000", '') however I can't figure out where to gsub the bytes out.
Here's the source:
def gather_info
file = './lib/SETI_message.txt'
File.read(file).each_line do |gather|
packed = [gather].pack('b*')
ec = Encoding::Converter.new(packed, 'utf-8')
encoding_forced = packed.encode(ec)
File.open('packed.txt', 'a+'){ |s| s.puts(encoding_forced.gsub("\u0000", '')) }
end
end
gather_info
And here's the file
Can anyone tell me what I'm doing wrong here?

The following works for me :
file = File.read('SETI.txt')
packed = file.scan(/......../).map{|s| s.to_i(2)}.pack('U*')
File.write('packed.txt', packed)
Let's break file.scan(/......../).map{|s| s.to_i(2)}.pack('U*') down :
file.scan(/......../)
Here we break the huge string of 0s and 1s (the file) into an array of strings containing 8 characters each. It looks like that : ['00001111', '11110000', ...].
arr.map{|s| s.to_i(2)}
From step 1 we got an array of strings representing the different characters in binary notation. We can convert one of those strings (called s) by applying s.to_i(2) because the parameter '2' says to the method to_i to use base 2. So '00000011'.to_i(2) returns 3.
We apply this to all the characters by using map.
So we now have an array that looks like [98, 82, 49, 39, ...].
arr.pack('U*')
From step 2 we have an array of integers representing each a character. We can now use the pack method to transform our array of integers into a string. The parameter we use for pack is U to tell him that the integers are in fact UTF-8 characters.

Related

Ruby: Why does unpack('Q') give a different result than manual conversion?

I'm trying to write a function that will .unpack('Q') (unpack to uint64_t) without access to the unpack method.
When I manually convert from string to binary to uint64, I get a different result than .unpack('Q'):
Integer('abcdefgh'.unpack('B*').first, 2) # => 7017280452245743464
'abcdefgh'.unpack('Q').first # => 7523094288207667809
I don't understand what's happening here.
I also don't understand why the output of .unpack('Q') is fixed regardless of the size of the input. If I add a thousand characters after 'abcdefgh' and then unpack('Q') it, I still just get [7523094288207667809]?
Byte order matters:
Integer('abcdefgh'.
each_char.
flat_map { |c| c.unpack('B*') }.
reverse.
join, 2)
#⇒ 7523094288207667809
'abcdefgh'.unpack('Q*').first
#⇒ 7523094288207667809
Your code produces the wrong result because after converting to binary, bytes should be reversed.
For the last part of your question, the reason the output of .unpack('Q') doesn't change with a longer input string is because the format is specifying a single 64-bit value so any characters after the first 8 are ignored. If you specified a format of Q2 and a 16 character string you'd decode 2 values:
> 'abcdefghihjklmno'.unpack('Q2')
=> [7523094288207667809, 8029475498074204265]
and again you'd find adding additional characters wouldn't change the result:
> 'abcdefghihjklmnofoofoo'.unpack('Q2')
=> [7523094288207667809, 8029475498074204265]
A format of Q* would return as many values as multiples of 64-bits were in the input:
> 'abcdefghihjklmnopqrstuvw'.unpack('Q*')
=> [7523094288207667809, 8029475498074204265, 8608196880778817904]
> 'abcdefghihjklmnopqrstuvwxyz'.unpack('Q*')
=> [7523094288207667809, 8029475498074204265, 8608196880778817904]

How can I convert a UUID to a string using a custom character set in Ruby?

I want to create a valid IFC GUID (IfcGloballyUniqueId) according to the specification here:
http://www.buildingsmart-tech.org/ifc/IFC2x3/TC1/html/ifcutilityresource/lexical/ifcgloballyuniqueid.htm
It's basically a UUID or GUID (128 bit) mapped to a set of 22 characters to limit storage space in a text file.
I currently have this workaround, but it's merely an approximation:
guid = '';22.times{|i|guid<<'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$'[rand(64)]}
It seems best to use ruby SecureRandom to generate a 128 bit UUID, like in this example (https://ruby-doc.org/stdlib-2.3.0/libdoc/securerandom/rdoc/SecureRandom.html):
SecureRandom.uuid #=> "2d931510-d99f-494a-8c67-87feb05e1594"
This UUID needs to be mapped to a string with a length of 22 characters according to this format:
1 2 3 4 5 6
0123456789012345678901234567890123456789012345678901234567890123
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$";
I don't understand this exactly.
Should the 32-character long hex-number be converted to a 128-character long binary number, then devided in 22 sets of 6 bits(except for one that gets the remaining 2 bits?) for which each can be converted to a decimal number from 0 to 64? Which then in turn can be replaced by the corresponding character from the conversion table?
I hope someone can verify if I'm on the right track here.
And if I am, is there a computational faster way in Ruby to convert the 128 bit number to the 22 sets of 0-64 than using all these separate conversions?
Edit: For anyone having the same problem, this is my solution for now:
require 'securerandom'
# possible characters in GUID
guid64 = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$'
guid = ""
# SecureRandom.uuid: creates a 128 bit UUID hex string
# tr('-', ''): removes the dashes from the hex string
# pack('H*'): converts the hex string to a binary number (high nibble first) (?) is this correct?
# This reverses the number so we end up with the leftover bit on the end, which helps with chopping the sting into pieces.
# It needs to be reversed again to end up with a string in the original order.
# unpack('b*'): converts the binary number to a bit string (128 0's and 1's) and places it into an array
# [0]: gets the first (and only) value from the array
# to_s.scan(/.{1,6}/m): chops the string into pieces 6 characters(bits) with the leftover on the end.
[SecureRandom.uuid.tr('-', '')].pack('H*').unpack('b*')[0].to_s.scan(/.{1,6}/m).each do |num|
# take the number (0 - 63) and find the matching character in guid64, add the found character to the guid string
guid << guid64[num.to_i(2)]
end
guid.reverse
Base64 encoding is pretty close to what you want here, but the mappings are different. No big deal, you can fix that:
require 'securerandom'
require 'base64'
# Define the two mappings here, side-by-side
BASE64 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
IFCB64 = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_$'
def ifcb64(hex)
# Convert from hex to binary, then from binary to Base64
# Trim off the == padding, then convert mappings with `tr`
Base64.encode64([ hex.tr('-', '') ].pack('H*')).gsub(/\=*\n/, '').tr(BASE64, IFCB64)
end
ifcb64(SecureRandom.uuid)
# => "fa9P7E3qJEc1tPxgUuPZHm"

How encode sequence of bytes into ruby string with characters

How encode sequence of bytes from ruby string into ruby string human-readable characters?
This is input string:
"\x127\x00\x06\x00\x00\x00\x01\x00\xA2\x8F"
So how parse this string into array with bytes,
and encode every element from array to ASCII character?
P.S. However, I can't find a way to roundtrip from bytes back to an array. I tried to use Array.pack with the U* option, but that doesn't work for multibyte characters.
You can try something like:
"string\xaa".each_byte.map {|b| "%c(%x)" % [ b, b ] }.join( ' ' )
# => "s(73) t(74) r(72) i(69) n(6e) g(67) ª(aa)"

Converting a hexadecimal number to binary in ruby

I am trying to convert a hex value to a binary value (each bit in the hex string should have an equivalent four bit binary value). I was advised to use this:
num = "0ff" # (say for eg.)
bin = "%0#{num.size*4}b" % num.hex.to_i
This gives me the correct output 000011111111. I am confused with how this works, especially %0#{num.size*4}b. Could someone help me with this?
You can also do:
num = "0ff"
num.hex.to_s(2).rjust(num.size*4, '0')
You may have already figured out, but, num.size*4 is the number of digits that you want to pad the output up to with 0 because one hexadecimal digit is represented by four (log_2 16 = 4) binary digits.
You'll find the answer in the documentation of Kernel#sprintf (as pointed out by the docs for String#%):
http://www.ruby-doc.org/core/classes/Kernel.html#M001433
This is the most straightforward solution I found to convert from hexadecimal to binary:
['DEADBEEF'].pack('H*').unpack('B*').first # => "11011110101011011011111011101111"
And from binary to hexadecimal:
['11011110101011011011111011101111'].pack('B*').unpack1('H*') # => "deadbeef"
Here you can find more information:
Array#pack: https://ruby-doc.org/core-2.7.1/Array.html#method-i-pack
String#unpack1 (similar to unpack): https://ruby-doc.org/core-2.7.1/String.html#method-i-unpack1
This doesn't answer your original question, but I would assume that a lot of people coming here are, instead of looking to turn hexadecimal to actual "0s and 1s" binary output, to decode hexadecimal to a byte string representation (in the spirit of such utilities as hex2bin). As such, here is a good method for doing exactly that:
def hex_to_bin(hex)
# Prepend a '0' for padding if you don't have an even number of chars
hex = '0' << hex unless (hex.length % 2) == 0
hex.scan(/[A-Fa-f0-9]{2}/).inject('') { |encoded, byte| encoded << [byte].pack('H2') }
end
Getting back to hex again is much easier:
def bin_to_hex(bin)
bin.unpack('H*').first
end
Converting the string of hex digits back to binary is just as easy. Take the hex digits two at a time (since each byte can range from 00 to FF), convert the digits to a character, and join them back together.
def hex_to_bin(s) s.scan(/../).map { |x| x.hex.chr }.join end

Ruby: Fuzzing through all unicode characters ‎(UTF8/Encoding/String Manipulation)

I can't iterate over the entire range of unicode characters.
I searched everywhere...
I am building a fuzzer and want to embed into a url, all unicode characters (one at a time).
For example:
http://www.example.com?a=\uff1c
I know that there are some built tools but I need more flexibility.
If i could do someting like the following: "\u" + "ff1c" it would be great.
This is the closest I got:
char = "\u0000"
...
#within iteration
char.succ!
...
but after the character "\u0039", which is the number 9, I will get "10" instead of ":"
You could use pack to convert numbers to UTF8 characters but I'm not sure if this solves your problem.
You can either create an array with numeric values of all the characters and use pack to get an UTF8 string or you can just loop from 0 to whatever you need and use pack within the loop.
I've written a small example to explain myself. The code below prints out the hex value of each character followed by the character itself.
0.upto(100) do |i|
puts "%04x" % i + ": " + [i].pack("U*")
end
Here's some simpler code, albeit slightly obfuscated, that takes advantage of the fact that Ruby will convert an integer on the right hand side of the << operator to a codepoint. This only works with Ruby 1.8 up for integer values <= 255. It will work for values greater than 255 in 1.9.
0.upto(100) do |i|
puts "" << i
end

Resources