Ruby's pack and unpack explained - ruby

Even after reading the standard documentation, I still can't understand how Ruby's Array#pack and String#unpack exactly work. Here is the example that's causing me the most trouble:
irb(main):001:0> chars = ["61","62","63"]
=> ["61", "62", "63"]
irb(main):002:0> chars.pack("H*")
=> "a"
irb(main):003:0> chars.pack("HHH")
=> "```"
I expected both these operations to return the same output: "abc". Each of them "fails" in a different manner (not really a fail since I probably expect the wrong thing). So two questions:
What is the logic behind those outputs?
How can I achieve the effect I want, i.e. transforming a sequence of hexadecimal numbers to the corresponding string. Even better - given an integer n, how to transform it to a string identical to the text file that when is considered as a number (say, in a hex editor) equals n?

We were working on a similar problem this morning. If the array size is unknown, you can use:
ary = ["61", "62", "63"]
ary.pack('H2' * ary.size)
=> "abc"
You can reverse it using:
str = "abc"
str.unpack('H2' * str.size)
=> ["61", "62", "63"]

The 'H' String directive for Array#pack says that array contents should be interpreted as nibbles of hex strings.
In the first example you've provided:
irb(main):002:0> chars.pack("H*")
=> "a"
you're telling to pack the first element of the array as if it were a sequence of nibbles (half bytes) of a hex string: 0x61 in this case that corresponds to the 'a' ASCII character.
In the second example:
irb(main):003:0> chars.pack("HHH")
=> "```"
you're telling to pack 3 elements of the array as if they were nibbles (the high part in this case): 0x60 corresponds to the '`' ASCII character. The low part or second nibble (0x01) "gets lost" due to missing '2' or '*' modifiers for "aTemplateString".
What you need is:
chars.pack('H*' * chars.size)
in order to pack all the nibbles of all the elements of the array as if they were hex strings.
The case of 'H2' * char.size only works fine if the array elements are representing 1 byte only hex strings.
It means that something like chars = ["6161", "6262", "6363"] is going to be incomplete:
2.1.5 :047 > chars = ["6161", "6262", "6363"]
=> ["6161", "6262", "6363"]
2.1.5 :048 > chars.pack('H2' * chars.size)
=> "abc"
while:
2.1.5 :049 > chars.pack('H*' * chars.size)
=> "aabbcc"

The Array#pack method is pretty arcane. Addressing question (2), I was able to get your example to work by doing this:
> ["61", "62", "63"].pack("H2H2H2")
=> "abc"
See the Ruby documentation for a similar example. Here is a more general way to do it:
["61", "62", "63"].map {|s| [s].pack("H2") }.join
This is probably not the most efficient way to tackle your problem; I suspect there is a better way, but it would help to know what kind of input you are starting out with.
The #pack method is common to other languages, such as Perl. If Ruby's documentation does not help, you might consult analogous documentation elsewhere.

I expected both these operations to return the same output: "abc".
The easiest way to understand why your approach didn't work, is to simply start with what you are expecting:
"abc".unpack("H*")
# => ["616263"]
["616263"].pack("H*")
# => "abc"
So, it seems that Ruby expects your hex bytes in one long string instead of separate elements of an array. So the simplest answer to your original question would be this:
chars = ["61", "62", "63"]
[chars.join].pack("H*")
# => "abc"
This approach also seems to perform comparably well for large input:
require 'benchmark'
chars = ["61", "62", "63"] * 100000
Benchmark.bmbm do |bm|
bm.report("join pack") do [chars.join].pack("H*") end
bm.report("big pack") do chars.pack("H2" * chars.size) end
bm.report("map pack") do chars.map{ |s| [s].pack("H2") }.join end
end
# user system total real
# join pack 0.030000 0.000000 0.030000 ( 0.025558)
# big pack 0.030000 0.000000 0.030000 ( 0.027773)
# map pack 0.230000 0.010000 0.240000 ( 0.241117)

Related

How do I increment/decrement a character in Ruby for all possible values?

I have a string that is one character long and can be any possible character value:
irb(main):001:0> "\x0"
=> "\u0000"
I thought this might work:
irb(main):002:0> "\x0" += 1
SyntaxError: (irb):2: syntax error, unexpected tOP_ASGN, expecting $end
"\x0" += 1
^ from /opt/rh/ruby193/root/usr/bin/irb:12:in `<main>'
But, as you can see, it didn't. How can I increment/decrement my character?
Edit:
Ruby doesn't seem to be set up to do this. Maybe I'm approaching this the wrong way. I want to manipulate raw data in terms of 8-bit chunks. How can I best accomplish that sort of operation?
Depending on what the possible values are, you can use String#next:
"\x0".next
# => "\u0001"
Or, to update an existing value:
c = "\x0"
c.next!
This may well not be what you want:
"z".next
# => "aa"
The simplest way I can think of to increment a character's underlying codepoint is this:
c = 'z'
c = c.ord.next.chr
# => "{"
Decrementing is slightly more complicated:
c = (c.ord - 1).chr
# => "z"
In both cases there's the assumption that you won't step outside of 0..255; you may need to add checks for that.
You cannot do:
"\x0" += 1
Because, in Ruby, that is short for:
"\x0" = "\x0" + 1
and it is a syntax error to assign a value to a string literal.
However, given an integer n, you can convert it to a character by using pack. For example,
[97].pack 'U' # => "a"
Similarly, you can convert a character into an integer by using ord. For example:
[300].pack('U').ord # => 300
With these methods, you can easily write your own increment function, as follows:
def step(c, delta=1)
[c.ord + delta].pack 'U'
end
def increment(c)
step c, 1
end
def decrement(c)
step c, -1
end
If you just want to manipulate bytes, you can use String#bytes, which will give you an array of integers to play with. You can use Array#pack to convert those bytes back to a String. (Refer to documentation for encoding options.)
You could use the String#next method.
I think the most elegant method (for alphanumeric chars) would be:
"a".tr('0-9a-z','1-9a-z0')
which would loop the a through to z and through the numbers and back to a.
I reread the question and see, that my answer has nothing to do with the question. I have no answer for manipulationg 8-bit values directly.

Case-insensitive Array#include?

I want to know what's the best way to make the String.include? methods ignore case. Currently I'm doing the following. Any suggestions? Thanks!
a = "abcDE"
b = "CD"
result = a.downcase.include? b.downcase
Edit:
How about Array.include?. All elements of the array are strings.
Summary
If you are only going to test a single word against an array, or if the contents of your array changes frequently, the fastest answer is Aaron's:
array.any?{ |s| s.casecmp(mystr)==0 }
If you are going to test many words against a static array, it's far better to use a variation of farnoy's answer: create a copy of your array that has all-lowercase versions of your words, and use include?. (This assumes that you can spare the memory to create a mutated copy of your array.)
# Do this once, or each time the array changes
downcased = array.map(&:downcase)
# Test lowercase words against that array
downcased.include?( mystr.downcase )
Even better, create a Set from your array.
# Do this once, or each time the array changes
downcased = Set.new array.map(&:downcase)
# Test lowercase words against that array
downcased.include?( mystr.downcase )
My original answer below is a very poor performer and generally not appropriate.
Benchmarks
Following are benchmarks for looking for 1,000 words with random casing in an array of slightly over 100,000 words, where 500 of the words will be found and 500 will not.
The 'regex' text is my answer here, using any?.
The 'casecmp' test is Arron's answer, using any? from my comment.
The 'downarray' test is farnoy's answer, re-creating a new downcased array for each of the 1,000 tests.
The 'downonce' test is farnoy's answer, but pre-creating the lookup array once only.
The 'set_once' test is creating a Set from the array of downcased strings, once before testing.
user system total real
regex 18.710000 0.020000 18.730000 ( 18.725266)
casecmp 5.160000 0.000000 5.160000 ( 5.155496)
downarray 16.760000 0.030000 16.790000 ( 16.809063)
downonce 0.650000 0.000000 0.650000 ( 0.643165)
set_once 0.040000 0.000000 0.040000 ( 0.038955)
If you can create a single downcased copy of your array once to perform many lookups against, farnoy's answer is the best (assuming you must use an array). If you can create a Set, though, do that.
If you like, examine the benchmarking code.
Original Answer
I (originally said that I) would personally create a case-insensitive regex (for a string literal) and use that:
re = /\A#{Regexp.escape(str)}\z/i # Match exactly this string, no substrings
all = array.grep(re) # Find all matching strings…
any = array.any?{ |s| s =~ re } # …or see if any matching string is present
Using any? can be slightly faster than grep as it can exit the loop as soon as it finds a single match.
For an array, use:
array.map(&:downcase).include?(string)
Regexps are very slow and should be avoided.
You can use casecmp to do your comparison, ignoring case.
"abcdef".casecmp("abcde") #=> 1
"aBcDeF".casecmp("abcdef") #=> 0
"abcdef".casecmp("abcdefg") #=> -1
"abcdef".casecmp("ABCDEF") #=> 0
class String
def caseinclude?(x)
a.downcase.include?(x.downcase)
end
end
my_array.map!{|c| c.downcase.strip}
where map! changes my_array, map instead returns a new array.
To farnoy in my case your example doesn't work for me. I'm actually looking to do this with a "substring" of any.
Here's my test case.
x = "<TD>", "<tr>", "<BODY>"
y = "td"
x.collect { |r| r.downcase }.include? y
=> false
x[0].include? y
=> false
x[0].downcase.include? y
=> true
Your case works with an exact case-insensitive match.
a = "TD", "tr", "BODY"
b = "td"
a.collect { |r| r.downcase }.include? b
=> true
I'm still experimenting with the other suggestions here.
---EDIT INSERT AFTER HERE---
I found the answer. Thanks to Drew Olsen
var1 = "<TD>", "<tr>","<BODY>"
=> ["<TD>", "<tr>", "<BODY>"]
var2 = "td"
=> "td"
var1.find_all{|item| item.downcase.include?(var2)}
=> ["<TD>"]
var1[0] = "<html>"
=> "<html>"
var1.find_all{|item| item.downcase.include?(var2)}
=> []

What is the meaning of i.to_s in Ruby?

I want to understand a piece of code I found in Google:
i.to_s
In the above code i is an integer. As per my understanding i is being converted into a string. Is that true?
Better to say that this is an expression returning the string representation of the integer i. The integer itself doesn't change. #pedantic.
In irb
>> 54.to_s
=> "54"
>> 4598734598734597345937423647234.to_s
=> "4598734598734597345937423647234"
>> i = 7
=> 7
>> i.to_s
=> "7"
>> i
=> 7
As noted in the other answers, calling .to_s on an integer will return the string representation of that integer.
9.class #=> Fixnum
9.to_s #=> "9"
9.to_s.class #=> String
But you can also pass an argument to .to_s to change it from the default Base = 10 to anything from Base 2 to Base 36. Here is the documentation: Fixnum to_s. So, for example, if you wanted to convert the number 1024 to it's equivalent in binary (aka Base 2, which uses only "1" and "0" to represent any number), you could do:
1024.to_s(2) #=> "10000000000"
Converting to Base 36 can be useful when you want to generate random combinations of letters and numbers, since it counts using every number from 0 to 9 and then every letter from a to z. Base 36 explanation on Wikipedia. For example, the following code will give you a random string of letters and numbers of length 1 to 3 characters long (change the 3 to whatever maximum string length you want, which increases the possible combinations):
rand(36**3).to_s(36)
To better understand how the numbers are written in the different base systems, put this code into irb, changing out the 36 in the parenthesis for the base system you want to learn about. The resulting printout will count from 0 to 35 in which ever base system you chose
36.times {|i| puts i.to_s(36)}
That is correct. to_s converts any object to a string, in this case (probably) an integer, since the variable is called i.

Ruby, remove last N characters from a string?

What is the preferred way of removing the last n characters from a string?
irb> 'now is the time'[0...-4]
=> "now is the "
If the characters you want to remove are always the same characters, then consider chomp:
'abc123'.chomp('123') # => "abc"
The advantages of chomp are: no counting, and the code more clearly communicates what it is doing.
With no arguments, chomp removes the DOS or Unix line ending, if either is present:
"abc\n".chomp # => "abc"
"abc\r\n".chomp # => "abc"
From the comments, there was a question of the speed of using #chomp versus using a range. Here is a benchmark comparing the two:
require 'benchmark'
S = 'asdfghjkl'
SL = S.length
T = 10_000
A = 1_000.times.map { |n| "#{n}#{S}" }
GC.disable
Benchmark.bmbm do |x|
x.report('chomp') { T.times { A.each { |s| s.chomp(S) } } }
x.report('range') { T.times { A.each { |s| s[0...-SL] } } }
end
Benchmark Results (using CRuby 2.13p242):
Rehearsal -----------------------------------------
chomp 1.540000 0.040000 1.580000 ( 1.587908)
range 1.810000 0.200000 2.010000 ( 2.011846)
-------------------------------- total: 3.590000sec
user system total real
chomp 1.550000 0.070000 1.620000 ( 1.610362)
range 1.970000 0.170000 2.140000 ( 2.146682)
So chomp is faster than using a range, by ~22%.
Ruby 2.5+
As of Ruby 2.5 you can use delete_suffix or delete_suffix! to achieve this in a fast and readable manner.
The docs on the methods are here.
If you know what the suffix is, this is idiomatic (and I'd argue, even more readable than other answers here):
'abc123'.delete_suffix('123') # => "abc"
'abc123'.delete_suffix!('123') # => "abc"
It's even significantly faster (almost 40% with the bang method) than the top answer. Here's the result of the same benchmark:
user system total real
chomp 0.949823 0.001025 0.950848 ( 0.951941)
range 1.874237 0.001472 1.875709 ( 1.876820)
delete_suffix 0.721699 0.000945 0.722644 ( 0.723410)
delete_suffix! 0.650042 0.000714 0.650756 ( 0.651332)
I hope this is useful - note the method doesn't currently accept a regex so if you don't know the suffix it's not viable for the time being. However, as the accepted answer (update: at the time of writing) dictates the same, I thought this might be useful to some people.
str = str[0..-1-n]
Unlike the [0...-n], this handles the case of n=0.
I would suggest chop. I think it has been mentioned in one of the comments but without links or explanations so here's why I think it's better:
It simply removes the last character from a string and you don't have to specify any values for that to happen.
If you need to remove more than one character then chomp is your best bet. This is what the ruby docs have to say about chop:
Returns a new String with the last character removed. If the string
ends with \r\n, both characters are removed. Applying chop to an empty
string returns an empty string. String#chomp is often a safer
alternative, as it leaves the string unchanged if it doesn’t end in a
record separator.
Although this is used mostly to remove separators such as \r\n I've used it to remove the last character from a simple string, for example the s to make the word singular.
name = "my text"
x.times do name.chop! end
Here in the console:
>name = "Nabucodonosor"
=> "Nabucodonosor"
> 7.times do name.chop! end
=> 7
> name
=> "Nabuco"
Dropping the last n characters is the same as keeping the first length - n characters.
Active Support includes String#first and String#last methods which provide a convenient way to keep or drop the first/last n characters:
require 'active_support/core_ext/string/access'
"foobarbaz".first(3) # => "foo"
"foobarbaz".first(-3) # => "foobar"
"foobarbaz".last(3) # => "baz"
"foobarbaz".last(-3) # => "barbaz"
if you are using rails, try:
"my_string".last(2) # => "ng"
[EDITED]
To get the string WITHOUT the last 2 chars:
n = "my_string".size
"my_string"[0..n-3] # => "my_stri"
Note: the last string char is at n-1. So, to remove the last 2, we use n-3.
Check out the slice() method:
http://ruby-doc.org/core-2.5.0/String.html#method-i-slice
You can always use something like
"string".sub!(/.{X}$/,'')
Where X is the number of characters to remove.
Or with assigning/using the result:
myvar = "string"[0..-X]
where X is the number of characters plus one to remove.
If you're ok with creating class methods and want the characters you chop off, try this:
class String
def chop_multiple(amount)
amount.times.inject([self, '']){ |(s, r)| [s.chop, r.prepend(s[-1])] }
end
end
hello, world = "hello world".chop_multiple 5
hello #=> 'hello '
world #=> 'world'
Using regex:
str = 'string'
n = 2 #to remove last n characters
str[/\A.{#{str.size-n}}/] #=> "stri"
x = "my_test"
last_char = x.split('').last

Escape problem with hex

I need to print escaped characters to a binary file using Ruby. The main problem is that slashes need the whole byte to escape correctly, and I don't know/can't create the byte in such a way.
I am creating the hex value with, basically:
'\x' + char
Where char is some 'hex' value, such as 65. In hex, \x65 is the ASCII character 'e'.
Unfortunately, when I puts this sequence to the file, I end up with this:
\\x65
How do I create a hex string with the properly escaped value? I have tried a lot of things, involving single or double quotes, pack, unpack, multiple slashes, etc. I have tried so many different combinations that I feel as though I understand the problem less now then I did when I started.
How?
You may need to set binary mode on your file, and/or use putc.
File.open("foo.tmp", "w") do |f|
f.set_encoding(Encoding::BINARY) # set_encoding is Ruby 1.9
f.binmode # only useful on Windows
f.putc "e".hex
end
Hopefully this can give you some ideas even if you have Ruby <1.9.
Okay, if you want to create a string whose first byte
has the integer value 0x65, use Array#pack
irb> [0x65].pack('U')
#=> "e"
irb> "e"[0]
#=> 101
10110 = 6516, so this works.
If you want to create a literal string whose first byte is '\',
second is 'x', third is '6', and fourth is '5', then just use interpolation:
irb> "\\x#{65}"
#=> "\\x65"
irb> "\\x65".split('')
#=> ["\\", "x", "6", "5"]
If you have the hex value and you want to create a string containing the character corresponding to that hex value, you can do:
irb(main):002:0> '65'.hex.chr
=> "e"
Another option is to use Array#pack; this can be used if you need to convert a list of numbers to a single string:
irb(main):003:0> ['65'.hex].pack("C")
=> "e"
irb(main):004:0> ['66', '6f', '6f'].map {|x| x.hex}.pack("C*")
=> "foo"

Resources