Escape problem with hex - ruby

I need to print escaped characters to a binary file using Ruby. The main problem is that slashes need the whole byte to escape correctly, and I don't know/can't create the byte in such a way.
I am creating the hex value with, basically:
'\x' + char
Where char is some 'hex' value, such as 65. In hex, \x65 is the ASCII character 'e'.
Unfortunately, when I puts this sequence to the file, I end up with this:
\\x65
How do I create a hex string with the properly escaped value? I have tried a lot of things, involving single or double quotes, pack, unpack, multiple slashes, etc. I have tried so many different combinations that I feel as though I understand the problem less now then I did when I started.
How?

You may need to set binary mode on your file, and/or use putc.
File.open("foo.tmp", "w") do |f|
f.set_encoding(Encoding::BINARY) # set_encoding is Ruby 1.9
f.binmode # only useful on Windows
f.putc "e".hex
end
Hopefully this can give you some ideas even if you have Ruby <1.9.

Okay, if you want to create a string whose first byte
has the integer value 0x65, use Array#pack
irb> [0x65].pack('U')
#=> "e"
irb> "e"[0]
#=> 101
10110 = 6516, so this works.
If you want to create a literal string whose first byte is '\',
second is 'x', third is '6', and fourth is '5', then just use interpolation:
irb> "\\x#{65}"
#=> "\\x65"
irb> "\\x65".split('')
#=> ["\\", "x", "6", "5"]

If you have the hex value and you want to create a string containing the character corresponding to that hex value, you can do:
irb(main):002:0> '65'.hex.chr
=> "e"
Another option is to use Array#pack; this can be used if you need to convert a list of numbers to a single string:
irb(main):003:0> ['65'.hex].pack("C")
=> "e"
irb(main):004:0> ['66', '6f', '6f'].map {|x| x.hex}.pack("C*")
=> "foo"

Related

Ruby .to_i does not return the complete integer as expected

My ruby command is,
"980,323,344.00".to_i
Why does it return 980 instead of 980323344?
You can achieve it by doing this :
"980,323,344.00".delete(',').to_i
The reason your method call to to_i does not return as expected is explained here, and to quote, the method :
Returns the result of interpreting leading characters in str as an integer base base (between 2 and 36). Extraneous characters past the end of a valid number are ignored.
Extraneous characters in your case would be the comma character that ends at 980, the reason why you see 980 being returned
In ruby calling to_i on a string will truncate from the beginning of a string where possible.
number_string = '980,323,344.00'
number_string.delete(',').to_i
#=> 980323344
"123abc".to_i
#=> 123
If you want to add underscores to make longer number more readable, those can be used where the conventional commas would be in written numbers.
"980_323_344.00".to_i
#=> 980323344
The documentation for to_i might be a bit misleading:
Returns the result of interpreting leading characters in str as an integer base base (between 2 and 36)
"interpreting" doesn't mean that it tries to parse various number formats (like Date.parse does for date formats). It means that it looks for what's a valid integer literal in Ruby (in the given base). For example:
1234. #=> 1234
'1234'.to_i #=> 1234
1_234. #=> 1234
'1_234'.to_i. #=> 1234
0d1234 #=> 1234
'0d1234'.to_i #=> 1234
0x04D2 #=> 1234
'0x04D2'.to_i(16) #=> 1234
Your input as a whole however is not a valid integer literal: (Ruby doesn't like the ,)
980,323,344.00
# SyntaxError (syntax error, unexpected ',', expecting end-of-input)
# 980,323,344.00
# ^
But it starts with a valid integer literal. And that's where the the seconds sentence comes into play:
Extraneous characters past the end of a valid number are ignored.
So the result is 980 – the leading characters which form a valid integer converted to an integer.
If your strings always have that format, you can just delete the offending commas and run the result through to_i which will ignore the trailing .00:
'980,323,344.00'.delete(',') #=> "980323344.00"
'980,323,344.00'.delete(',').to_i #=> 980323344
Otherwise you could use a regular expression to check its format before converting it:
input = '980,323,344.00'
number = case input
when /\A\d{1,3}(,\d{3})*\.00\z/
input.delete(',').to_i
when /other format/
# other conversion
end
And if you are dealing with monetary values, you should consider using the money gem and its monetize addition for parsing formatted values:
amount = Monetize.parse('980,323,344.00')
#=> #<Money fractional:98032334400 currency:USD>
amount.format
#=> "$980.323.344,00"
Note that format requires i18n so the above example might require some setup.

Regular Expression for Ruby Integer

Ruby integers are written as using an optional leading sign, an optional base indicator (0 for octal, 0x for hex, or 0b for binary), followed by a string of digits in the appropriate base. Underscore characters are ignored in the digit string. The letters mentioned in the above description may be either upper or lower case and the underscore characters can only occur strictly within the digit string.
I need to create regular expression to check for Ruby integers in java string with the specification mentioned above.
I assume the substrings that may represent integers are separated by spaces or begin or end the string. If so, I suggest you split the string on whitespace and then the use the method Kernel#Integer to determine if each element of the resulting array represents an integer.
def str_to_int(str)
str.split.each_with_object([]) do |s,a|
val = Integer(s) rescue nil
a << [s, val] unless val.nil?
end
end
str_to_int "22 -22 077 0xAB 0xA_B 0b101 -0b101 cat _3 4_"
#=> [["22", 22], ["-22", -22], ["077", 63], ["0xAB", 171],
# ["0xA_B", 171], ["0b101", 5], ["-0b101", -5]]
Integer raises a TypeError exception is the number cannot be converted to an integer. I've dealt with that with an in-line rescue that returns nil, but you may wish to write it so that only that exception is rescued. It may be prudent remove punctuation from the string before executing the above method.
This regex captures positive or negative numbers in denary, binary, octal and hexidecimal form including any underscores:
# hexidecimal binary octal denary
-?0x[0-9a-fA-F][0-9a-fA-F_]*[0-9a-fA-F]|-?0x[0-9a-fA-F]|-?0b[01][01_]*[01]|-?0b[01]|-?0[0-7][0-7_]?[0-7]?|-?0[0-7]|-?[1-9][0-9_]*[0-9]|-?[0-9]
You should test the regex thoroughly to make sure it works as required but it does seem to work on a few relevant samples I tried (see this on Rubular where I've used () captures so you can see the matches more easily but it is essentially the same regex).
Here is an example of the regex in action using String#scan:
str = "-0x88339_43 wor0ds 8_8_ 0b1001 01words0x334 _9 0b1 0x4 0_ 0x_ 0b_1 0b00_1"
reg = /-?0x[0-9a-fA-F][0-9a-fA-F_]*[0-9a-fA-F]|-?0x[0-9a-fA-F]|-?0b[01][01_]*[01]|-?0b[01]|-?0[0-7][0-7_]?[0-7]?|-?0[0-7]|-?[1-9][0-9_]*[0-9]|-?[0-9]/
#regex matches
str.scan reg
#=>["-0x88339_43", "0", "8_8", "0b1001", "01", "0x334", "9", "0b1", "0x4", "0", "0", "0", "1", "0b00_1"]
Like #CarySwoveland, I'm assuming your string has spaces. Without spaces you will still get a result but it may not be what you desire, but at least it's a start.

How to validate that a string is a proper hexadecimal value in Ruby?

I am writing a 6502 assembler in Ruby. I am looking for a way to validate hexadecimal operands in string form. I understand that the String object provides a "hex" method to return a number, but here's a problem I run into:
"0A".hex #=> 10 - a valid hexadecimal value
"0Z".hex #=> 0 - invalid, produces a zero
"asfd".hex #=> 10 - Why 10? I guess it reads 'a' first and stops at 's'?
You will get some odd results by typing in a bunch of gibberish. What I need is a way to first verify that the value is a legit hex string.
I was playing around with regular expressions, and realized I can do this:
true if "0A" =~ /[A-Fa-f0-9]/
#=> true
true if "0Z" =~ /[A-Fa-f0-9]/
#=> true <-- PROBLEM
I'm not sure how to address this issue. I need to be able to verify that letters are only A-F and that if it is just numbers that is ok too.
I'm hoping to avoid spaghetti code, riddled with "if" statements. I am hoping that someone could provide a "one-liner" or some form of elegent code.
Thanks!
!str[/\H/] will look for invalid hex values.
String#hex does not interpret the whole string as hex, it extracts from the beginning of the string up to as far as it can be interpreted as hex. With "0Z", the "0" is valid hex, so it interpreted that part. With "asfd", the "a" is valid hex, so it interpreted that part.
One method:
str.to_i(16).to_s(16) == str.downcase
Another:
str =~ /\A[a-f0-9]+\Z/i # or simply /\A\h+\Z/ (see hirolau's answer)
About your regex, you have to use anchors (\A for begin of string and \Z for end of string) to say that you want the full string to match. Also, the + repeats the match for one or more characters.
Note that you could use ^ (begin of line) and $ (end of line), but this would allow strings like "something\n0A" to pass.
This is an old question, but I just had the issue myself. I opted for this in my code:
str =~ /^\h+$/
It has the added benefit of returning nil if str is nil.
Since Ruby has literal hex built-in, you can eval the string and rescue the SyntaxError
eval "0xA" => 10
eval "0xZ" => SyntaxError
You can use this on a method like
def is_hex?(str)
begin
eval("0x#{str}")
true
rescue SyntaxError
false
end
end
is_hex?('0A') => true
is_hex?('0Z') => false
Of course since you are using eval, make sure you are sending only safe values to the methods

What does %w(array) mean?

I'm looking at the documentation for FileUtils.
I'm confused by the following line:
FileUtils.cp %w(cgi.rb complex.rb date.rb), '/usr/lib/ruby/1.6'
What does the %w mean? Can you point me to the documentation?
%w(foo bar) is a shortcut for ["foo", "bar"]. Meaning it's a notation to write an array of strings separated by spaces instead of commas and without quotes around them. You can find a list of ways of writing literals in zenspider's quickref.
I think of %w() as a "word array" - the elements are delimited by spaces and it returns an array of strings.
Here are all % literals:
%w() array of strings
%r() regular expression.
%q() string
%x() a shell command (returning the output string)
%i() array of symbols (Ruby >= 2.0.0)
%s() symbol
%() (without letter) shortcut for %Q()
The delimiters ( and ) can be replaced with a lot of variations, like [ and ], |, !, etc.
When using a capital letter %W() you can use string interpolation #{variable}, similar to the " and ' string delimiters. This rule works for all the other % literals as well.
abc = 'a b c'
%w[1 2#{abc} d] #=> ["1", "2\#{abc}", "d"]
%W[1 2#{abc} d] #=> ["1", "2a b c", "d"]
There is also %s that allows you to create any symbols, for example:
%s|some words| #Same as :'some words'
%s[other words] #Same as :'other words'
%s_last example_ #Same as :'last example'
Since Ruby 2.0.0 you also have:
%i( a b c ) # => [ :a, :b, :c ]
%i[ a b c ] # => [ :a, :b, :c ]
%i_ a b c _ # => [ :a, :b, :c ]
# etc...
%W and %w allow you to create an Array of strings without using quotes and commas.
Though it's an old post, the question keep coming up and the answers don't always seem clear to me, so, here's my thoughts:
%w and %W are examples of General Delimited Input types, that relate to Arrays. There are other types that include %q, %Q, %r, %x and %i.
The difference between the upper and lower case version is that it gives us access to the features of single and double quotes. With single quotes and (lowercase) %w, we have no code interpolation (#{someCode}) and a limited range of escape characters that work (\\, \n). With double quotes and (uppercase) %W we do have access to these features.
The delimiter used can be any character, not just the open parenthesis. Play with the examples above to see that in effect.
For a full write up with examples of %w and the full list, escape characters and delimiters, have a look at "Ruby - %w vs %W – secrets revealed!"
Instead of %w() we should use %w[]
According to Ruby style guide:
Prefer %w to the literal array syntax when you need an array of words (non-empty strings without spaces and special characters in them). Apply this rule only to arrays with two or more elements.
# bad
STATES = ['draft', 'open', 'closed']
# good
STATES = %w[draft open closed]
Use the braces that are the most appropriate for the various kinds of percent literals.
[] for array literals(%w, %i, %W, %I) as it is aligned with the standard array literals.
# bad
%w(one two three)
%i(one two three)
# good
%w[one two three]
%i[one two three]
For more read here.
Excerpted from the documentation for Percent Strings at http://ruby-doc.org/core/doc/syntax/literals_rdoc.html#label-Percent+Strings:
Besides %(...) which creates a String, the % may create other types of object. As with strings, an uppercase letter allows interpolation and escaped characters while a lowercase letter disables them.
These are the types of percent strings in ruby:
...
%w: Array of Strings
I was given a bunch of columns from a CSV spreadsheet of full names of users and I needed to keep the formatting, with spaces. The easiest way I found to get them in while using ruby was to do:
names = %(Porter Smith
Jimmy Jones
Ronald Jackson).split("\n")
This highlights that %() creates a string like "Porter Smith\nJimmyJones\nRonald Jackson" and to get the array you split the string on the "\n" ["Porter Smith", "Jimmy Jones", "Ronald Jackson"]
So to answer the OP's original question too, they could have wrote %(cgi\ spaeinfilename.rb;complex.rb;date.rb).split(';') if there happened to be space when you want the space to exist in the final array output.

Ruby - Convert Integer to String

In Ruby, trying to print out the individual elements of a String is giving me trouble. Instead of seeing each character, I'm seeing their ASCII values instead:
>> a = "0123"
=> "0123"
>> a[0]
=> 48
I've looked online but can't find any way to get the original "0" back out of it. I'm a little new to Ruby to I know it has to be something simple but I just can't seem to find it.
Or you can convert the integer to its character value:
a[0].chr
You want a[0,1] instead of a[0].
I believe this is changing in Ruby 1.9 such that "asdf"[2] yields "d" rather than the character code
To summarize:
This behavior will be going away in version 1.9, in which the character itself is returned, but in previous versions, trying to reference a single character of a string by its character position will return its character value (so "ABC"[2] returns 67)
There are a number of methods that return a range of characters from a string (see the Ruby docs on the String slice method) All of the following return "C":
"ABC"[2,1]
"ABC"[2..2]
"ABC".slice(2,1)
I find the range selector to be the easiest to read. Can anyone speak to whether it is less efficient?
#Chris,
That's just how [] and [,] are defined for the String class.
Check out the String API.
The [,] operator returns a string back to you, it is a substring operator, where as the [] operator returns the character which ruby treats as a number when printing it out.
I think each_char or chars describes better what you want.
irb(main):001:0> a = "0123"
=> "0123"
irb(main):002:0> Array(a.each_char)
=> ["0", "1", "2", "3"]
irb(main):003:0> puts Array(a.each_char)
0
1
2
3
=> nil

Resources