Unescape string containing octal bit pattern - ruby

I'm having trouble unescaping a string containing an octal bit pattern ("\50\51") in Ruby.
I've tried String#undump, JSON#load and YAML#load. They all don't seem to unescape octal bit patterns. Kernel#eval does this, but I'd like to avoid using it.
str = '"\\50\\51"'
# expected result (but insecure)
eval(str)
# => "()"
# doesn't handle octal bit patterns
str.undump
# => "\\50\\51"

You can just evaluate it like this:
s.gsub(/\\(\d+)/) { |v| $1.to_i(8).chr }
Where that interpolates the substitutions as octal (base 8 argument to to_i).

Related

Regular Expression for Ruby Integer

Ruby integers are written as using an optional leading sign, an optional base indicator (0 for octal, 0x for hex, or 0b for binary), followed by a string of digits in the appropriate base. Underscore characters are ignored in the digit string. The letters mentioned in the above description may be either upper or lower case and the underscore characters can only occur strictly within the digit string.
I need to create regular expression to check for Ruby integers in java string with the specification mentioned above.
I assume the substrings that may represent integers are separated by spaces or begin or end the string. If so, I suggest you split the string on whitespace and then the use the method Kernel#Integer to determine if each element of the resulting array represents an integer.
def str_to_int(str)
str.split.each_with_object([]) do |s,a|
val = Integer(s) rescue nil
a << [s, val] unless val.nil?
end
end
str_to_int "22 -22 077 0xAB 0xA_B 0b101 -0b101 cat _3 4_"
#=> [["22", 22], ["-22", -22], ["077", 63], ["0xAB", 171],
# ["0xA_B", 171], ["0b101", 5], ["-0b101", -5]]
Integer raises a TypeError exception is the number cannot be converted to an integer. I've dealt with that with an in-line rescue that returns nil, but you may wish to write it so that only that exception is rescued. It may be prudent remove punctuation from the string before executing the above method.
This regex captures positive or negative numbers in denary, binary, octal and hexidecimal form including any underscores:
# hexidecimal binary octal denary
-?0x[0-9a-fA-F][0-9a-fA-F_]*[0-9a-fA-F]|-?0x[0-9a-fA-F]|-?0b[01][01_]*[01]|-?0b[01]|-?0[0-7][0-7_]?[0-7]?|-?0[0-7]|-?[1-9][0-9_]*[0-9]|-?[0-9]
You should test the regex thoroughly to make sure it works as required but it does seem to work on a few relevant samples I tried (see this on Rubular where I've used () captures so you can see the matches more easily but it is essentially the same regex).
Here is an example of the regex in action using String#scan:
str = "-0x88339_43 wor0ds 8_8_ 0b1001 01words0x334 _9 0b1 0x4 0_ 0x_ 0b_1 0b00_1"
reg = /-?0x[0-9a-fA-F][0-9a-fA-F_]*[0-9a-fA-F]|-?0x[0-9a-fA-F]|-?0b[01][01_]*[01]|-?0b[01]|-?0[0-7][0-7_]?[0-7]?|-?0[0-7]|-?[1-9][0-9_]*[0-9]|-?[0-9]/
#regex matches
str.scan reg
#=>["-0x88339_43", "0", "8_8", "0b1001", "01", "0x334", "9", "0b1", "0x4", "0", "0", "0", "1", "0b00_1"]
Like #CarySwoveland, I'm assuming your string has spaces. Without spaces you will still get a result but it may not be what you desire, but at least it's a start.

How to validate that a string is a proper hexadecimal value in Ruby?

I am writing a 6502 assembler in Ruby. I am looking for a way to validate hexadecimal operands in string form. I understand that the String object provides a "hex" method to return a number, but here's a problem I run into:
"0A".hex #=> 10 - a valid hexadecimal value
"0Z".hex #=> 0 - invalid, produces a zero
"asfd".hex #=> 10 - Why 10? I guess it reads 'a' first and stops at 's'?
You will get some odd results by typing in a bunch of gibberish. What I need is a way to first verify that the value is a legit hex string.
I was playing around with regular expressions, and realized I can do this:
true if "0A" =~ /[A-Fa-f0-9]/
#=> true
true if "0Z" =~ /[A-Fa-f0-9]/
#=> true <-- PROBLEM
I'm not sure how to address this issue. I need to be able to verify that letters are only A-F and that if it is just numbers that is ok too.
I'm hoping to avoid spaghetti code, riddled with "if" statements. I am hoping that someone could provide a "one-liner" or some form of elegent code.
Thanks!
!str[/\H/] will look for invalid hex values.
String#hex does not interpret the whole string as hex, it extracts from the beginning of the string up to as far as it can be interpreted as hex. With "0Z", the "0" is valid hex, so it interpreted that part. With "asfd", the "a" is valid hex, so it interpreted that part.
One method:
str.to_i(16).to_s(16) == str.downcase
Another:
str =~ /\A[a-f0-9]+\Z/i # or simply /\A\h+\Z/ (see hirolau's answer)
About your regex, you have to use anchors (\A for begin of string and \Z for end of string) to say that you want the full string to match. Also, the + repeats the match for one or more characters.
Note that you could use ^ (begin of line) and $ (end of line), but this would allow strings like "something\n0A" to pass.
This is an old question, but I just had the issue myself. I opted for this in my code:
str =~ /^\h+$/
It has the added benefit of returning nil if str is nil.
Since Ruby has literal hex built-in, you can eval the string and rescue the SyntaxError
eval "0xA" => 10
eval "0xZ" => SyntaxError
You can use this on a method like
def is_hex?(str)
begin
eval("0x#{str}")
true
rescue SyntaxError
false
end
end
is_hex?('0A') => true
is_hex?('0Z') => false
Of course since you are using eval, make sure you are sending only safe values to the methods

Ruby convert non-printable characters into numbers

I have a string with non-printable characters.
What I am currently doing is replacing them with a tilde using:
string.gsub!(/^[:print:]]/, "~")
However, I would actually like to convert them to their integer value.
I tried this, but it always outputs 0
string.gsub!(/[^[:print:]]/, "#{$1.to_i}")
Thoughts?
String#gsub, String#gsub! accept optional block. The return value of the block is used for substitution.
"\x01Hello\x02".gsub(/[^[:print:]]/) { |x| x.ord }
# => "1Hello2"
Object#inspect is also an option if you just need to output string with non-printable characters to log or for debug purposes.
puts "\x01Hello\x02".inspect
# => "\u0001Hello\u0002"

Stumped by a simple regex

I am trying to see if the string s contains any of the symbols in a regex. The regex below works fine on rubular.
s = "asd#d"
s =~ /[~!##$%^&*()]+/
But in Ruby 1.9.2, it gives this error message:
syntax error, unexpected ']', expecting tCOLON2 or '[' or '.'
s = "asd#d"; s =~ /[~!##$%^&*()]/
What is wrong?
This is actually a special case of string interpolation with global and instance variables that most seem not to know about. Since string interpolation also occurs within regex in Ruby, I'll illustrate below with strings (since they provide for an easier example):
#foo = "instancefoo"
$foo = "globalfoo"
"##foo" # => "instancefoo"
"#$foo" # => "globalfoo"
Thus you need to escape the # to prevent it from being interpolated:
/[~!#\#$%^&*()]+/
The only way that I know of to create a non-interpolated regex in Ruby is from a string (note single quotes):
Regexp.new('[~!##$%^&*()]+')
I was able to replicate this behavior in 1.9.3p0. Apparently there is a problem with the '#$' combination. If you escape either it works. If you reverse them it works:
s =~ /[~!#$#%^&*()]+/
Edit: in Ruby 1.9 #$ invokes variable interpolation, even when followed by a % which is not a valid variable name.
I disagree, you need to escape the $, its the end of string character.
s =~ /[~!##\$%^&*()]/ => 3
That is correct.

Regexp using literal vs. escaped character

Is there any practical difference between a regexp using an escape character versus one using the literal character? I.e. are there any situations where matching with them will return different results?
Example in Ruby:
literal = Regexp.new("\t")
=> / /
escaped = Regexp.new("\\t")
=> /\t/
# They're different...
literal == escaped
=> false
# ...but they seem to match the same:
"Hello\tWorld".match(literal)
=> #<MatchData "\t">
"Hello\tWorld".match(escaped)
=> #<MatchData "\t">
No, not in the case of \t (or \n).
But it won't work in most other cases (e.g., escape sequences that either don't have a 1:1 equivalent in string escapes like \s or where the meaning differs like \b), so it's generally a good idea to use the escaped versions (or construct the regex using /.../ in the first place).

Resources