Best way to escape and unescape strings in Ruby? - ruby

Does Ruby have any built-in method for escaping and unescaping strings? In the past, I've used regular expressions; however, it occurs to me that Ruby probably does such conversions internally all the time. Perhaps this functionality is exposed somewhere.
So far I've come up with these functions. They work, but they seem a bit hacky:
def escape(s)
s.inspect[1..-2]
end
def unescape(s)
eval %Q{"#{s}"}
end
Is there a better way?

Ruby 2.5 added String#undump as a complement to String#dump:
$ irb
irb(main):001:0> dumped_newline = "\n".dump
=> "\"\\n\""
irb(main):002:0> undumped_newline = dumped_newline.undump
=> "\n"
With it:
def escape(s)
s.dump[1..-2]
end
def unescape(s)
"\"#{s}\"".undump
end
$irb
irb(main):001:0> escape("\n \" \\")
=> "\\n \\\" \\\\"
irb(main):002:0> unescape("\\n \\\" \\\\")
=> "\n \" \\"

There are a bunch of escaping methods, some of them:
# Regexp escapings
>> Regexp.escape('\*?{}.')
=> \\\*\?\{\}\.
>> URI.escape("test=100%")
=> "test=100%25"
>> CGI.escape("test=100%")
=> "test%3D100%25"
So, its really depends on the issue you need to solve. But I would avoid using inspect for escaping.
Update - there is a dump, inspect uses that, and it looks like it is what you need:
>> "\n\t".dump
=> "\"\\n\\t\""

Caleb function was the nearest thing to the reverse of String #inspect I was able to find, however it contained two bugs:
\\ was not handled correctly.
\x.. retained the backslash.
I fixed the above bugs and this is the updated version:
UNESCAPES = {
'a' => "\x07", 'b' => "\x08", 't' => "\x09",
'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
"\"" => "\x22", "'" => "\x27"
}
def unescape(str)
# Escape all the things
str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
if $1
if $1 == '\\' then '\\' else UNESCAPES[$1] end
elsif $2 # escape \u0000 unicode
["#$2".hex].pack('U*')
elsif $3 # escape \0xff or \xff
[$3].pack('H2')
end
}
end
# To test it
while true
line = STDIN.gets
puts unescape(line)
end

Update: I no longer agree with my own answer, but I'd prefer not to delete it since I suspect that others may go down this wrong path, and there's already been a lot of discussion of this answer and it's alternatives, so I think it still contributes to the conversation, but please don't use this answer in real code.
If you don't want to use eval, but are willing to use the YAML module, you can use it instead:
require 'yaml'
def unescape(s)
YAML.load(%Q(---\n"#{s}"\n))
end
The advantage to YAML over eval is that it is presumably safer. cane disallows all usage of eval. I've seen recommendations to use $SAFE along with eval, but that is not available via JRuby currently.
For what it is worth, Python does have native support for unescaping backslashes.

Ruby's inspect can help:
"a\nb".inspect
=> "\"a\\nb\""
Normally if we print a string with an embedded line-feed, we'd get:
puts "a\nb"
a
b
If we print the inspected version:
puts "a\nb".inspect
"a\nb"
Assign the inspected version to a variable and you'll have the escaped version of the string.
To undo the escaping, eval the string:
puts eval("a\nb".inspect)
a
b
I don't really like doing it this way. It's more of a curiosity than something I'd do in practice.

YAML's ::unescape doesn't seem to escape quote characters, e.g. ' and ". I'm guessing this is by design, but it makes me sad.
You definitely do not want to use eval on arbitrary or client-supplied data.
This is what I use. Handles everything I've seen and doesn't introduce any dependencies.
UNESCAPES = {
'a' => "\x07", 'b' => "\x08", 't' => "\x09",
'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
"\"" => "\x22", "'" => "\x27"
}
def unescape(str)
# Escape all the things
str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
if $1
if $1 == '\\' then '\\' else UNESCAPES[$1] end
elsif $2 # escape \u0000 unicode
["#$2".hex].pack('U*')
elsif $3 # escape \0xff or \xff
[$3].pack('H2')
end
}
end

I suspect that Shellwords.escape will do what you're looking for
https://ruby-doc.org/stdlib-1.9.3/libdoc/shellwords/rdoc/Shellwords.html#method-c-shellescape

Related

How to convert a backslash hexadecimal string to a binary string in Ruby? [duplicate]

Does Ruby have any built-in method for escaping and unescaping strings? In the past, I've used regular expressions; however, it occurs to me that Ruby probably does such conversions internally all the time. Perhaps this functionality is exposed somewhere.
So far I've come up with these functions. They work, but they seem a bit hacky:
def escape(s)
s.inspect[1..-2]
end
def unescape(s)
eval %Q{"#{s}"}
end
Is there a better way?
Ruby 2.5 added String#undump as a complement to String#dump:
$ irb
irb(main):001:0> dumped_newline = "\n".dump
=> "\"\\n\""
irb(main):002:0> undumped_newline = dumped_newline.undump
=> "\n"
With it:
def escape(s)
s.dump[1..-2]
end
def unescape(s)
"\"#{s}\"".undump
end
$irb
irb(main):001:0> escape("\n \" \\")
=> "\\n \\\" \\\\"
irb(main):002:0> unescape("\\n \\\" \\\\")
=> "\n \" \\"
There are a bunch of escaping methods, some of them:
# Regexp escapings
>> Regexp.escape('\*?{}.')
=> \\\*\?\{\}\.
>> URI.escape("test=100%")
=> "test=100%25"
>> CGI.escape("test=100%")
=> "test%3D100%25"
So, its really depends on the issue you need to solve. But I would avoid using inspect for escaping.
Update - there is a dump, inspect uses that, and it looks like it is what you need:
>> "\n\t".dump
=> "\"\\n\\t\""
Caleb function was the nearest thing to the reverse of String #inspect I was able to find, however it contained two bugs:
\\ was not handled correctly.
\x.. retained the backslash.
I fixed the above bugs and this is the updated version:
UNESCAPES = {
'a' => "\x07", 'b' => "\x08", 't' => "\x09",
'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
"\"" => "\x22", "'" => "\x27"
}
def unescape(str)
# Escape all the things
str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
if $1
if $1 == '\\' then '\\' else UNESCAPES[$1] end
elsif $2 # escape \u0000 unicode
["#$2".hex].pack('U*')
elsif $3 # escape \0xff or \xff
[$3].pack('H2')
end
}
end
# To test it
while true
line = STDIN.gets
puts unescape(line)
end
Update: I no longer agree with my own answer, but I'd prefer not to delete it since I suspect that others may go down this wrong path, and there's already been a lot of discussion of this answer and it's alternatives, so I think it still contributes to the conversation, but please don't use this answer in real code.
If you don't want to use eval, but are willing to use the YAML module, you can use it instead:
require 'yaml'
def unescape(s)
YAML.load(%Q(---\n"#{s}"\n))
end
The advantage to YAML over eval is that it is presumably safer. cane disallows all usage of eval. I've seen recommendations to use $SAFE along with eval, but that is not available via JRuby currently.
For what it is worth, Python does have native support for unescaping backslashes.
Ruby's inspect can help:
"a\nb".inspect
=> "\"a\\nb\""
Normally if we print a string with an embedded line-feed, we'd get:
puts "a\nb"
a
b
If we print the inspected version:
puts "a\nb".inspect
"a\nb"
Assign the inspected version to a variable and you'll have the escaped version of the string.
To undo the escaping, eval the string:
puts eval("a\nb".inspect)
a
b
I don't really like doing it this way. It's more of a curiosity than something I'd do in practice.
YAML's ::unescape doesn't seem to escape quote characters, e.g. ' and ". I'm guessing this is by design, but it makes me sad.
You definitely do not want to use eval on arbitrary or client-supplied data.
This is what I use. Handles everything I've seen and doesn't introduce any dependencies.
UNESCAPES = {
'a' => "\x07", 'b' => "\x08", 't' => "\x09",
'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
"\"" => "\x22", "'" => "\x27"
}
def unescape(str)
# Escape all the things
str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
if $1
if $1 == '\\' then '\\' else UNESCAPES[$1] end
elsif $2 # escape \u0000 unicode
["#$2".hex].pack('U*')
elsif $3 # escape \0xff or \xff
[$3].pack('H2')
end
}
end
I suspect that Shellwords.escape will do what you're looking for
https://ruby-doc.org/stdlib-1.9.3/libdoc/shellwords/rdoc/Shellwords.html#method-c-shellescape

Regex put in via formtastic gets altered (maybe by the controller) before it's put into Mongoid

I have a form where I put in hashes with regular expression values. My problem is that they gets messed up when travelling from my view, through my controller and into MongoDB with Mongoid. How do I preserve the regex'es?
Input examples:
{:regex1 => "^Something \(#\d*\)$"}
{:regex2 => "\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z"}
My formtastic view form looks like this:
= semantic_form_for resource, :html => {:class => "form-vertical"} do |r|
= r.inputs do
= r.input :value, :as => :text
= r.actions do
= r.action :submit
My controller create action takes in the params and handles it like this:
class EmailTypesController < InheritedResources::Base
def create
puts params[:email_type][:value] # => {:regex1 => "^Something \(#\d*\)$"} and
# {:regex2 => "\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z"}
puts params[:email_type][:value].inspect # => "{:regex1 => \"^Something \\(#\\d*\\)$\"}" and
# "{:regex2 => \"\\A[\\w+\\-.]+#[a-z\\d\\-.]+\\.[a-z]+\\z\"}"
params[:email_type][:value] = convert_to_hash(params[:email_type][:value])
puts params[:email_type][:value] # => {"regex1"=>"^Something (#d*)$"} and
# {"regex2"=>"A[w+-.]+#[a-zd-.]+.[a-z]+z"}
create! do |success, failure|
success.html {
redirect_to resource
}
failure.html {
render :action => :new
}
end
end
def convert_to_hash(string)
if string.match(/(.*?)=>(.*)\n*/)
string = eval(string)
else
string = string_to_hash(string)
end
end
def string_to_hash(string)
values = string.split("\r\n")
output = {}
values.each do |v|
val = v.split("=")
output[val[0].to_sym] = val[1]
end
output
end
end
Firing up the console and inspecting the values put in through Mongoid:
Loading development environment (Rails 3.2.12)
1.9.3p385 :001 > EmailType.all.each do |email_type|
1.9.3p385 :002 > puts email_type.value
1.9.3p385 :003?> end
{"regex1"=>"^Something (#d*)$"}
{"regex2"=>"A[w+-.]+#[a-zd-.]+.[a-z]+z"}
=> true
1.9.3p385 :004 >
The problem lies in ruby's evaluation of strings, which ignores useless escapes:
puts "^Something \(#\d*\)$".inspect
=>"^Something (#d*)$"
That is to say the eval simply ignores the backslash. Note that typically in ruby regexes aren't created using strings but through their own regex literal, so that
/^Something \(#\d*\)$/.inspect
=>"/^Something \\(#\\d*\\)$/"
Notice the double backslash instead of single. This means that eval has to receive two backslashes instead of one in the string, as it has to be eval'd into a single backslash character.
A quick and easy way to do this is to simply run a sub ob the string before the convert_to_hash call:
# A little confusing due to escapes, but single backslashes are replaced with double.
# The second parameter is confusing, but it appears that String#sub requires a few
# extra escapes due to backslashes also being used to backreference.
# i.e. \n is replaced with the nth regex group, so to replace something with the string
# "\n" yet another escape for backslash is required, so "\\n" is replaced with "\n".
# Therefore the input of 8 blackslashes is eval'd to a string of 4 backslashes, which
# sub interprets as 2 backslashes.
params[:email_type][:value].gsub!('\\', '\\\\\\\\')
this shouldn't be a problem unless you are using backslashes in the hash keys at some point, in which case more advanced matching would be needed to extract only the regex's and perform the substitution on them.

Is there any string syntax in Ruby that allows interpolation but no escape character?

Is there any single-line string literal syntax in Ruby that allows string interpolation but does not interpret a backslash as an escape character?
I.e.,
Where ruby_var = "foo"
I want to be able to type the equivalent of C:\some\windows\path\#{ruby_var}\path resulting in the string C:\some\windows\path\foo\path without having to escape the backslashes or resort to a multi-line heredoc.
puts "C:\some\windows\path\#{ruby_var}\path"
puts "C:\some\windows\path\path_#{ruby_var}\path"
=> C: omewindowspath#{ruby_var}path
=> C: omewindowspathpath_foopath
puts 'C:\some\windows\path\#{ruby_var}\path'
puts 'C:\some\windows\path\path_#{ruby_var}\path'
=> C:\some\windows\path\#{ruby_var}\path
=> C:\some\windows\path\path_#{ruby_var}\path
puts %{C:\some\windows\path\#{ruby_var}\path}
puts %{C:\some\windows\path\path_#{ruby_var}\path}
=> C: omewindowspath#{ruby_var}path
=> C: omewindowspathpath_foopath
puts %q{C:\some\windows\path\#{ruby_var}\path}
puts %q{C:\some\windows\path\path_#{ruby_var}\path}
=> C:\some\windows\path\#{ruby_var}\path
=> C:\some\windows\path\path_#{ruby_var}\path
ruby_var = "hello"
puts 'C:\some\windows\path\%s\path' % ruby_var
#=>C:\some\windows\path\hello\path
'C:\some\windows\path\%s\path' % ruby_var
#=> 'C:\some\windows\path\foo\path'
I don't think it is possible.
You should consider using forward slashes instead to make it look prettier; I believe the standard ruby libraries in Windows won't care what kind of slashes you use.
There is also:
File.join('C:', 'path', ruby_var)

Ruby: How to get the first character of a string

How can I get the first character in a string using Ruby?
Ultimately what I'm doing is taking someone's last name and just creating an initial out of it.
So if the string was "Smith" I just want "S".
You can use Ruby's open classes to make your code much more readable. For instance, this:
class String
def initial
self[0,1]
end
end
will allow you to use the initial method on any string. So if you have the following variables:
last_name = "Smith"
first_name = "John"
Then you can get the initials very cleanly and readably:
puts first_name.initial # prints J
puts last_name.initial # prints S
The other method mentioned here doesn't work on Ruby 1.8 (not that you should be using 1.8 anymore anyway!--but when this answer was posted it was still quite common):
puts 'Smith'[0] # prints 83
Of course, if you're not doing it on a regular basis, then defining the method might be overkill, and you could just do it directly:
puts last_name[0,1]
If you use a recent version of Ruby (1.9.0 or later), the following should work:
'Smith'[0] # => 'S'
If you use either 1.9.0+ or 1.8.7, the following should work:
'Smith'.chars.first # => 'S'
If you use a version older than 1.8.7, this should work:
'Smith'.split(//).first # => 'S'
Note that 'Smith'[0,1] does not work on 1.8, it will not give you the first character, it will only give you the first byte.
"Smith"[0..0]
works in both ruby 1.8 and ruby 1.9.
For completeness sake, since Ruby 1.9 String#chr returns the first character of a string. Its still available in 2.0 and 2.1.
"Smith".chr #=> "S"
http://ruby-doc.org/core-1.9.3/String.html#method-i-chr
In MRI 1.8.7 or greater:
'foobarbaz'.each_char.first
Try this:
>> a = "Smith"
>> a[0]
=> "S"
OR
>> "Smith".chr
#=> "S"
In Rails
name = 'Smith'
name.first
>> s = 'Smith'
=> "Smith"
>> s[0]
=> "S"
Another option that hasn't been mentioned yet:
> "Smith".slice(0)
#=> "S"
Because of an annoying design choice in Ruby before 1.9 — some_string[0] returns the character code of the first character — the most portable way to write this is some_string[0,1], which tells it to get a substring at index 0 that's 1 character long.
Try this:
def word(string, num)
string = 'Smith'
string[0..(num-1)]
end
If you're using Rails You can also use truncate
> 'Smith'.truncate(1, omission: '')
#=> "S"
or for additional formatting:
> 'Smith'.truncate(4)
#=> "S..."
> 'Smith'.truncate(2, omission: '.')
#=> "S."
While this is definitely overkill for the original question, for a pure ruby solution, here is how truncate is implemented in rails
# File activesupport/lib/active_support/core_ext/string/filters.rb, line 66
def truncate(truncate_at, options = {})
return dup unless length > truncate_at
omission = options[:omission] || "..."
length_with_room_for_omission = truncate_at - omission.length
stop = if options[:separator]
rindex(options[:separator], length_with_room_for_omission) || length_with_room_for_omission
else
length_with_room_for_omission
end
"#{self[0, stop]}#{omission}"
end
Other way around would be using the chars for a string:
def abbrev_name
first_name.chars.first.capitalize + '.' + ' ' + last_name
end
Any of these methods will work:
name = 'Smith'
puts name.[0..0] # => S
puts name.[0] # => S
puts name.[0,1] # => S
puts name.[0].chr # => S

How to escape slashes in single-quoted strings?

In Ruby 1.8.6 (2007-09-24 patchlevel 111):
str = '\&123'
puts "abc".gsub("b", str) => ab123c
puts "abc".gsub("b", "#{str}") => ab123c
puts "abc".gsub("b", str.to_s) => ab123c
puts "abc".gsub("b", '\&123') => ab123c
puts "abc".gsub("b", "\&123") => a&123c <--- This I want to achieve using temporary variable
If I change str = '\&123' to str = "\&123" it works fine, but I get str from match function, so I cannot specify it manually within parentheses. Is there any way to change the 'string' to "string" behavior?
maybe there is a simpler way, however the code below works
> str = '\&123'
> puts "abc".gsub("b", str.gsub(/\\&/o, '\\\\\&\2\1'))
> => a\&123c
Simple:
str = '\&123' <-- the result of your match function
str = str.gsub(/\\/, '\\\\')
You may also want to take a look here.
#Valentin
-> I meant that str from match was not taken verbatim. Thus another (simpler) solution appeared, that I was not aware of....
"abc".gsub("b") { str } -> a\&123c
Just remove the backslash:
puts "abc".gsub("b", '&123')
There is no need to protect the ampersand with a backslash inside
single-quoted string literals (unlike double-quoted ones).

Resources