Ruby convert non-printable characters into numbers - ruby

I have a string with non-printable characters.
What I am currently doing is replacing them with a tilde using:
string.gsub!(/^[:print:]]/, "~")
However, I would actually like to convert them to their integer value.
I tried this, but it always outputs 0
string.gsub!(/[^[:print:]]/, "#{$1.to_i}")
Thoughts?

String#gsub, String#gsub! accept optional block. The return value of the block is used for substitution.
"\x01Hello\x02".gsub(/[^[:print:]]/) { |x| x.ord }
# => "1Hello2"

Object#inspect is also an option if you just need to output string with non-printable characters to log or for debug purposes.
puts "\x01Hello\x02".inspect
# => "\u0001Hello\u0002"

Related

Unescape string containing octal bit pattern

I'm having trouble unescaping a string containing an octal bit pattern ("\50\51") in Ruby.
I've tried String#undump, JSON#load and YAML#load. They all don't seem to unescape octal bit patterns. Kernel#eval does this, but I'd like to avoid using it.
str = '"\\50\\51"'
# expected result (but insecure)
eval(str)
# => "()"
# doesn't handle octal bit patterns
str.undump
# => "\\50\\51"
You can just evaluate it like this:
s.gsub(/\\(\d+)/) { |v| $1.to_i(8).chr }
Where that interpolates the substitutions as octal (base 8 argument to to_i).

Use ARGV[] argument vector to pass a regular expression in Ruby

I am trying to use gsub or sub on a regex passed through terminal to ARGV[].
Query in terminal: $ruby script.rb input.json "\[\{\"src\"\:\"
Input file first 2 lines:
[{
"src":"http://something.com",
"label":"FOO.jpg","name":"FOO",
"srcName":"FOO.jpg"
}]
[{
"src":"http://something123.com",
"label":"FOO123.jpg",
"name":"FOO123",
"srcName":"FOO123.jpg"
}]
script.rb:
dir = File.dirname(ARGV[0])
output = File.new(dir + "/output_" + Time.now.strftime("%H_%M_%S") + ".json", "w")
open(ARGV[0]).each do |x|
x = x.sub(ARGV[1]),'')
output.puts(x) if !x.nil?
end
output.close
This is very basic stuff really, but I am not quite sure on how to do this. I tried:
Regexp.escape with this pattern: [{"src":".
Escaping the characters and not escaping.
Wrapping the pattern between quotes and not wrapping.
Meditate on this:
I wrote a little script containing:
puts ARGV[0].class
puts ARGV[1].class
and saved it to disk, then ran it using:
ruby ~/Desktop/tests/test.rb foo /abc/
which returned:
String
String
The documentation says:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ā€˜dā€™, instead of a digit.
That means that the regular expression, though it appears to be a regex, it isn't, it's a string because ARGV only can return strings because the command-line can only contain strings.
When we pass a string into sub, Ruby recognizes it's not a regular expression, so it treats it as a literal string. Here's the difference in action:
'foo'.sub('/o/', '') # => "foo"
'foo'.sub(/o/, '') # => "fo"
The first can't find "/o/" in "foo" so nothing changes. It can find /o/ though and returns the result after replacing the two "o".
Another way of looking at it is:
'foo'.match('/o/') # => nil
'foo'.match(/o/) # => #<MatchData "o">
where match finds nothing for the string but can find a hit for /o/.
And all that leads to what's happening in your code. Because sub is being passed a string, it's trying to do a literal match for the regex, and won't be able to find it. You need to change the code to:
sub(Regexp.new(ARGV[1]), '')
but that's not all that has to change. Regexp.new(...) will convert what's passed in into a regular expression, but if you're passing in '/o/' the resulting regular expression will be:
Regexp.new('/o/') # => /\/o\//
which is probably not what you want:
'foo'.match(/\/o\//) # => nil
Instead you want:
Regexp.new('o') # => /o/
'foo'.match(/o/) # => #<MatchData "o">
So, besides changing your code, you'll need to make sure that what you pass in is a valid expression, minus any leading and trailing /.
Based on this answer in the thread Convert a string to regular expression ruby, you should use
x = x.sub(/#{ARGV[1]}/,'')
I tested it with this file (test.rb):
puts "You should not see any number [0123456789].".gsub(/#{ARGV[0]}/,'')
I called the file like so:
ruby test.rb "\d+"
# => You should not see any number [].

How to validate that a string is a proper hexadecimal value in Ruby?

I am writing a 6502 assembler in Ruby. I am looking for a way to validate hexadecimal operands in string form. I understand that the String object provides a "hex" method to return a number, but here's a problem I run into:
"0A".hex #=> 10 - a valid hexadecimal value
"0Z".hex #=> 0 - invalid, produces a zero
"asfd".hex #=> 10 - Why 10? I guess it reads 'a' first and stops at 's'?
You will get some odd results by typing in a bunch of gibberish. What I need is a way to first verify that the value is a legit hex string.
I was playing around with regular expressions, and realized I can do this:
true if "0A" =~ /[A-Fa-f0-9]/
#=> true
true if "0Z" =~ /[A-Fa-f0-9]/
#=> true <-- PROBLEM
I'm not sure how to address this issue. I need to be able to verify that letters are only A-F and that if it is just numbers that is ok too.
I'm hoping to avoid spaghetti code, riddled with "if" statements. I am hoping that someone could provide a "one-liner" or some form of elegent code.
Thanks!
!str[/\H/] will look for invalid hex values.
String#hex does not interpret the whole string as hex, it extracts from the beginning of the string up to as far as it can be interpreted as hex. With "0Z", the "0" is valid hex, so it interpreted that part. With "asfd", the "a" is valid hex, so it interpreted that part.
One method:
str.to_i(16).to_s(16) == str.downcase
Another:
str =~ /\A[a-f0-9]+\Z/i # or simply /\A\h+\Z/ (see hirolau's answer)
About your regex, you have to use anchors (\A for begin of string and \Z for end of string) to say that you want the full string to match. Also, the + repeats the match for one or more characters.
Note that you could use ^ (begin of line) and $ (end of line), but this would allow strings like "something\n0A" to pass.
This is an old question, but I just had the issue myself. I opted for this in my code:
str =~ /^\h+$/
It has the added benefit of returning nil if str is nil.
Since Ruby has literal hex built-in, you can eval the string and rescue the SyntaxError
eval "0xA" => 10
eval "0xZ" => SyntaxError
You can use this on a method like
def is_hex?(str)
begin
eval("0x#{str}")
true
rescue SyntaxError
false
end
end
is_hex?('0A') => true
is_hex?('0Z') => false
Of course since you are using eval, make sure you are sending only safe values to the methods

String#delete ignore special characters

String#delete interprets a-z as character range. However, I would like it to delete fa-zo.
"fojwfa-zowj".delete("fa-zo") #=> "-"
Desired result:
"fojwwj"
You could also use this little trick:
string = "fojwfa-zowj"
string[/fa-zo/] = ''
string
# => "fojwwj"
Notice however, that this modifies the string in place like #gsub!, which should be faster and should use less memory, but which could introduce side-effects if not considered well.
"fojwfa-zowj".gsub("fa-zo","") # => "fojwwj"
"fojwfa-zowj".tap{ |s| s.slice! "fa-zo" } # just for the Heaven of it

How can I convert a string of codepoints to the string it represents?

I have a string (in Ruby) like this:
626c6168
(that is 'blah' without the quotes)
How do I convert it to 'blah'? Note that these are variable lengths, and also they aren't always letters and numbers. (They're being stored in a database, not being printed.)
Array#pack
['626c6168'].pack('H*')
# => "blah"
Using hex to convert each character:
"626c6168".scan(/../).map{ |c| c.hex.chr }.join
This gives blah.

Resources