How encode sequence of bytes into ruby string with characters - ruby

How encode sequence of bytes from ruby string into ruby string human-readable characters?
This is input string:
"\x127\x00\x06\x00\x00\x00\x01\x00\xA2\x8F"
So how parse this string into array with bytes,
and encode every element from array to ASCII character?
P.S. However, I can't find a way to roundtrip from bytes back to an array. I tried to use Array.pack with the U* option, but that doesn't work for multibyte characters.

You can try something like:
"string\xaa".each_byte.map {|b| "%c(%x)" % [ b, b ] }.join( ' ' )
# => "s(73) t(74) r(72) i(69) n(6e) g(67) ª(aa)"

Related

How do I print a hex number representing a IEEE 754 float as a float in ruby

I am using ruby to parse a datastream, some parts of which are IEEE-754 floats, but am not sure how to print these as floats. For example:
f = 0xbe80fd31 # -0.2519317
puts "%f" % f
3196124465.000000
how do I get -0.2519317 ?
Any time your converting a binary byte stream to something else, you usually end up using String#unpack (and Array#pack if you're going the other way).
If you have these bytes:
bytes = [0xbe, 0x80, 0xfd, 0x31]
then you could say:
bytes.map(&:chr).join.unpack('g')
# [-0.25193169713020325]
and then unwrap the array. This:
bytes.map(&:chr).join
packs the bytes into the string:
"\xbe\x80\xfd\x31"
which is suitable for #unpack. You could also (thanks Stefan) say:
# Variations on getting the bytes into a string for `#unpack`
bytes.pack('C4').unpack('g').first
[0xbe80fd31].pack('L>').unpack('g').first
# Variations using `#unpack1`
bytes.map(&:chr).join.unpack1('g')
bytes.pack('C4').unpack1('g')
[0xbe80fd31].pack('L>').unpack1('g')
If you already have the string then you go can straight to #unpack or #unpack1.
You'll want to use 'e' instead of 'g' your bytes are in a different order and 'E' or 'G' if you actually have an eight byte double rather than a four byte float.

Decode in Ruby on rails

Is there any way to decode the below string,
"location.replace(i+\"&utm_content=\"+s)}(document,window,navigator,screen,\"\\x68\\x74\\x74\\x70\\x3a\\x2f\\x2f\\x6d\\x6f\\x62\\x76\\x69\\x64\\x69\\x2e\\x6d\\x6f\\x62\\x73\\x74\\x61\\x72\\x72\\x2e\\x63\\x6f\\x6d\\x2f\\x3f\\x75\\x74\\x6d\\x5f\\x74\\x65\\x72\\x6d\\x3d\\x36\\x35\\x34\\x33\\x34\\x39\\x39\\x37\\x36\\x39\\x31\\x38\\x32\\x39\\x34\\x36\\x33\\x30\\x32\\x26\\x63\\x6c\\x69\\x63\\x6b\\x76\\x65\\x72\\x69\\x66\\x79\\x3d\\x31\",fi
I have tried as,
URI.unescape string
But its not working
There may be another way to do this, but here's one way:
>> hex = "\\x68\\x74\\x74\\x70\\x3a\\x2f\\x2f\\x6d\\x6f\\x62\\x76\\x69\\x64\\x69\\x2e\\x6d\\x6f\\x62\\x73\\x74\\x61\\x72\\x72\\x2e\\x63\\x6f\\x6d\\x2f\\x3f\\x75\\x74\\x6d\\x5f\\x74\\x65\\x72\\x6d\\x3d\\x36\\x35\\x34\\x33\\x34\\x39\\x39\\x37\\x36\\x39\\x31\\x38\\x32\\x39\\x34\\x36\\x33\\x30\\x32\\x26\\x63\\x6c\\x69\\x63\\x6b\\x76\\x65\\x72\\x69\\x66\\x79\\x3d\\x31"
=> "\\x68\\x74\\x74\\x70\\x3a\\x2f\\x2f\\x6d\\x6f\\x62\\x76\\x69\\x64\\x69\\x2e\\x6d\\x6f\\x62\\x73\\x74\\x61\\x72\\x72\\x2e\\x63\\x6f\\x6d\\x2f\\x3f\\x75\\x74\\x6d\\x5f\\x74\\x65\\x72\\x6d\\x3d\\x36\\x35\\x34\\x33\\x34\\x39\\x39\\x37\\x36\\x39\\x31\\x38\\x32\\x39\\x34\\x36\\x33\\x30\\x32\\x26\\x63\\x6c\\x69\\x63\\x6b\\x76\\x65\\x72\\x69\\x66\\x79\\x3d\\x31"
>> Array(hex.gsub("\\x","")).pack('H*')
=> "http://mobvidi.mobstarr.com/?utm_term=6543499769182946302&clickverify=1"
I created a string variable for the hex string and then stripped out the backslashes and 'x' characters. Then, this is converted into an array so we can call the pack method (specifying the capital H string directive for a high nibble first hex string) which you can read about here.

Ruby Cyphering Leads to non Alphanumeric Characters [duplicate]

This question already has answers here:
Rotating letters in a string so that each letter is shifted to another letter by n places
(4 answers)
Closed 5 years ago.
I'm trying to make a basic cipher.
def caesar_crypto_encode(text, shift)
(text.nil? or text.strip.empty? ) ? "" : text.gsub(/[a-zA-Z]/){ |cstr|
((cstr.ord)+shift).chr }
end
but when the shift is too high I get these kinds of characters:
Test.assert_equals(caesar_crypto_encode("Hello world!", 127), "eBIIL TLOIA!")
Expected: "eBIIL TLOIA!", instead got: "\xC7\xE4\xEB\xEB\xEE \xF6\xEE\xF1\xEB\xE3!"
What is this format?
The reason you get the verbose output is because Ruby is running with UTF-8 encoding, and your conversion has just produced gibberish characters (an invalid character sequence under UTF-8 encoding).
ASCII characters A-Z are represented by decimal numbers (ordinals) 65-90, and a-z is 97-122. When you add 127 you push all the characters into 8-bit space, which makes them unrecognizable for proper UTF-8 encoding.
That's why Ruby inspect outputs the encoded strings in quoted form, which shows each character as its hexadecimal number "\xC7...".
If you want to get some semblance of characters out of this, you could re-encode the gibberish into ISO8859-1, which supports 8-bit characters.
Here's what you get if you do that:
s = "\xC7\xE4\xEB\xEB\xEE \xF6\xEE\xF1\xEB\xE3!"
>> s.encoding
=> #<Encoding:UTF-8>
# Re-encode as ISO8859-1.
# Your terminal (and Ruby) is using UTF-8, so Ruby will refuse to print these yet.
>> s.force_encoding('iso8859-1')
=> "\xC7\xE4\xEB\xEB\xEE \xF6\xEE\xF1\xEB\xE3!"
# In order to be able to print ISO8859-1 on an UTF-8 terminal, you have to
# convert them back to UTF-8 by re-encoding. This way your terminal (and Ruby)
# can display the ISO8859-1 8-bit characters using UTF-8 encoding:
>> s.encode('UTF-8')
=> "Çäëëî öîñëã!"
# Another way is just to repack the bytes into UTF-8:
>> s.bytes.pack('U*')
=> "Çäëëî öîñëã!"
Of course the proper way to do this, is not to let the numbers overflow into 8-bit space under any circumstance. Your encryption algorithm has a bug, and you need to ensure that the output is in the 7-bit ASCII range.
A better solution
Like #tadman suggested, you could use tr instead:
AZ_SEQUENCE = *'A'..'Z' + *'a'..'z'
"Hello world!".tr(AZ_SEQUENCE.join, AZ_SEQUENCE.rotate(127).join)
=> "eBIIL tLOIA!
I'm still curious about that format though...
Those characters represent the corresponding ASCII encoding after getting the ordinal (ord) of each letter and adding 127 to it (i.e. (cstr.ord)+shift).chr)
Why? Check Integer#chr, from the docs:
Returns a string containing the character represented by the int's
value according to encoding.
So, for example, take your first letter "H":
char_ord = "H".ord
#=> 72
new_char_ord = char_ord + 127
#=> 199
new_char_ord.chr
#=> "\xC7"
So, 199 corresponds to "\xC7". Keep changing all characters in "Hello world" and you will get "\xC7\xE4\xEB\xEB\xEE \xF6\xEE\xF1\xEB\xE3".
To avoid this you need to loop only with ord values that represent a letter (answer in the Possible duplicate link).

String contains NUL bytes

I'm trying to decode this file that is in IBM437 into readable UTF I'm at the point where I think I've almost got it but I'm getting an ArgumentError where the string contains nul bytes, I'm aware of how to gsub out nul bytes using:
.gsub("\u0000", '') however I can't figure out where to gsub the bytes out.
Here's the source:
def gather_info
file = './lib/SETI_message.txt'
File.read(file).each_line do |gather|
packed = [gather].pack('b*')
ec = Encoding::Converter.new(packed, 'utf-8')
encoding_forced = packed.encode(ec)
File.open('packed.txt', 'a+'){ |s| s.puts(encoding_forced.gsub("\u0000", '')) }
end
end
gather_info
And here's the file
Can anyone tell me what I'm doing wrong here?
The following works for me :
file = File.read('SETI.txt')
packed = file.scan(/......../).map{|s| s.to_i(2)}.pack('U*')
File.write('packed.txt', packed)
Let's break file.scan(/......../).map{|s| s.to_i(2)}.pack('U*') down :
file.scan(/......../)
Here we break the huge string of 0s and 1s (the file) into an array of strings containing 8 characters each. It looks like that : ['00001111', '11110000', ...].
arr.map{|s| s.to_i(2)}
From step 1 we got an array of strings representing the different characters in binary notation. We can convert one of those strings (called s) by applying s.to_i(2) because the parameter '2' says to the method to_i to use base 2. So '00000011'.to_i(2) returns 3.
We apply this to all the characters by using map.
So we now have an array that looks like [98, 82, 49, 39, ...].
arr.pack('U*')
From step 2 we have an array of integers representing each a character. We can now use the pack method to transform our array of integers into a string. The parameter we use for pack is U to tell him that the integers are in fact UTF-8 characters.

How to deal with Unicode strings in Ruby?

I've seen follownig construction in a tutorial of Ruby:
irb(main):001:0> "abc".each_byte { |c| printf "<%c>", c }
<a><b><c>=> "abc"
However, if I put string Здравствуйте! instead of abc, I get
irb(main):003:0> "Здравствуйте!".each_byte { |c| printf "<%c>", c }
<Ð><><Ð><´><Ñ><><Ð><°><Ð><²><Ñ><><Ñ><><Ð><²><Ñ><><Ð><¹><Ñ><><Ð><µ><!>=> "Здравствуйте!"
How to deal with Unicode strings?
irb(main):005:0> RUBY_VERSION
=> "1.9.3"
▶ "Здравствуйте!".each_char { |c| printf "<%c>", c }
# ⇒ <З><д><р><а><в><с><т><в><у><й><т><е><!>=> "Здравствуйте!"
Byte is byte, while char is char, consisting of bytes.
A byte is 8 bits. But unicode characters can take up multiple bytes when stored on your computer. So for example, lets say the integer code for some unicode character is 8,000, which is what is actually stored on your computer. When ruby reads in 8,000, ruby knows that represents some unicode character. However, 8,000 cannot be stored in one byte on your computer(the largest number that can be stored in one byte is 1111 1111, which is 255). If you tell ruby that each byte of the several bytes stored on your computer for 8,000 represents one character, i.e. by calling each_byte(), then ruby will never see the 8,000. Instead, ruby will read in a piece of 8,000 and think that represents one character, then read in another piece of 8,000 and think that represents another character.
each_byte() tells ruby to ignore the clusters of bytes, and just read in one byte at a time and then determine what character is represented by the integer stored in that byte.

Resources