Replace single quote with backslash single quote - ruby

I have a very large string that needs to escape all the single quotes in it, so I can feed it to JavaScript without upsetting it.
I have no control over the external string, so I can't change the source data.
Example:
Cote d'Ivoir -> Cote d\'Ivoir
(the actual string is very long and contains many single quotes)
I'm trying to this by using gsub on the string, but can't get this to work:
a = "Cote d'Ivoir"
a.gsub("'", "\\\'")
but this gives me:
=> "Cote dIvoirIvoir"
I also tried:
a.gsub("'", 92.chr + 39.chr)
but got the same result; I know it's something to do with regular expressions, but I never get those.

The %q delimiters come in handy here:
# %q(a string) is equivalent to a single-quoted string
puts "Cote d'Ivoir".gsub("'", %q(\\\')) #=> Cote d\'Ivoir

The problem is that \' in a gsub replacement means "part of the string after the match".
You're probably best to use either the block syntax:
a = "Cote d'Ivoir"
a.gsub(/'/) {|s| "\\'"}
# => "Cote d\\'Ivoir"
or the Hash syntax:
a.gsub(/'/, {"'" => "\\'"})
There's also the hacky workaround:
a.gsub(/'/, '\#').gsub(/#/, "'")

# prepare a text file containing [ abcd\'efg ]
require "pathname"
backslashed_text = Pathname("/path/to/the/text/file.txt").readlines.first.strip
# puts backslashed_text => abcd\'efg
unslashed_text = "abcd'efg"
unslashed_text.gsub("'", Regexp.escape(%q|\'|)) == backslashed_text # true
# puts unslashed_text.gsub("'", Regexp.escape(%q|\'|)) => abcd\'efg

Related

Use ARGV[] argument vector to pass a regular expression in Ruby

I am trying to use gsub or sub on a regex passed through terminal to ARGV[].
Query in terminal: $ruby script.rb input.json "\[\{\"src\"\:\"
Input file first 2 lines:
[{
"src":"http://something.com",
"label":"FOO.jpg","name":"FOO",
"srcName":"FOO.jpg"
}]
[{
"src":"http://something123.com",
"label":"FOO123.jpg",
"name":"FOO123",
"srcName":"FOO123.jpg"
}]
script.rb:
dir = File.dirname(ARGV[0])
output = File.new(dir + "/output_" + Time.now.strftime("%H_%M_%S") + ".json", "w")
open(ARGV[0]).each do |x|
x = x.sub(ARGV[1]),'')
output.puts(x) if !x.nil?
end
output.close
This is very basic stuff really, but I am not quite sure on how to do this. I tried:
Regexp.escape with this pattern: [{"src":".
Escaping the characters and not escaping.
Wrapping the pattern between quotes and not wrapping.
Meditate on this:
I wrote a little script containing:
puts ARGV[0].class
puts ARGV[1].class
and saved it to disk, then ran it using:
ruby ~/Desktop/tests/test.rb foo /abc/
which returned:
String
String
The documentation says:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ‘d’, instead of a digit.
That means that the regular expression, though it appears to be a regex, it isn't, it's a string because ARGV only can return strings because the command-line can only contain strings.
When we pass a string into sub, Ruby recognizes it's not a regular expression, so it treats it as a literal string. Here's the difference in action:
'foo'.sub('/o/', '') # => "foo"
'foo'.sub(/o/, '') # => "fo"
The first can't find "/o/" in "foo" so nothing changes. It can find /o/ though and returns the result after replacing the two "o".
Another way of looking at it is:
'foo'.match('/o/') # => nil
'foo'.match(/o/) # => #<MatchData "o">
where match finds nothing for the string but can find a hit for /o/.
And all that leads to what's happening in your code. Because sub is being passed a string, it's trying to do a literal match for the regex, and won't be able to find it. You need to change the code to:
sub(Regexp.new(ARGV[1]), '')
but that's not all that has to change. Regexp.new(...) will convert what's passed in into a regular expression, but if you're passing in '/o/' the resulting regular expression will be:
Regexp.new('/o/') # => /\/o\//
which is probably not what you want:
'foo'.match(/\/o\//) # => nil
Instead you want:
Regexp.new('o') # => /o/
'foo'.match(/o/) # => #<MatchData "o">
So, besides changing your code, you'll need to make sure that what you pass in is a valid expression, minus any leading and trailing /.
Based on this answer in the thread Convert a string to regular expression ruby, you should use
x = x.sub(/#{ARGV[1]}/,'')
I tested it with this file (test.rb):
puts "You should not see any number [0123456789].".gsub(/#{ARGV[0]}/,'')
I called the file like so:
ruby test.rb "\d+"
# => You should not see any number [].

Split string with special characters in Ruby

I need to split a string using special characters like " < > = and white space.
Example of the string:
<fileset dir="/tmp/test/my_test" includes="all"/>
By now I tried using different combinations but no results
Example:
line.split(/<=>"\s+/).each do |line_parsed|
puts line_parsed
end
Regex is not the right tool for parsing XML. You can use any XML parser you like, here I am using Nokogiri:
require 'nokogiri'
doc = Nokogiri::XML(line)
fileset = doc.css('fileset').first
fileset.attr 'dir'
#=> "/tmp/test/my_test"
fileset.attr 'includes'
#=> "all"
If you have another loop à la each_line around the code you showed us, chances are you can drop this altogether and parse the whole document in one run.
Try enclosing the special characters in a "character class" ([...]) and moving the repetition character (+) outside:
parts = line.split(/[<=>"\s]+/)
# => ["", "fileset", "dir", "/tmp/test/my_test", "includes", "all", "/"]
parts[1] # => "fileset"
parts[2] # => "dir"
parts[3] # => "/tmp/test/my_test"

Replacing regex capture with the same capture and an extra string

I am trying to escape certain characters in a string. In particular, I want to turn
abc/def.ghi into abc\/def\.ghi
I tried to use the following syntax:
1.9.3p125 :076 > "abc/def.ghi".gsub(/([\/.])/, '\\\1')
=> "abc\\1def\\1ghi"
Hmm. This behaves as if capture replacements didn't work. Yet, when I tried this:
1.9.3p125 :075 > "abc/def.ghi".gsub(/([\/.])/, '\1')
=> "abc/def.ghi"
... I got the replacement to work, but, of course, my prefixes weren't part of it.
What is the correct syntax to do something like this?
This should be easier
gsub(/(?=[.\/])/, "\\")
If you are trying to prepare a string to be used as a regex pattern, use the right tool:
Regexp.escape('abc/def.ghi')
=> "abc/def\\.ghi"
You can then use the resulting string to create a regex:
/#{ Regexp.escape('abc/def.ghi') }/
=> /abc\/def\.ghi/
or:
Regexp.new(Regexp.escape('abc/def.ghi'))
=> /abc\/def\.ghi/
From the docs:
Escapes any characters that would have special meaning in a regular expression. Returns a new escaped string, or self if no characters are escaped. For any string, Regexp.new(Regexp.escape(str))=~str will be true.
Regexp.escape('\*?{}.') #=> \\\*\?\{\}\.
You can pass a block to gsub:
>> "abc/def.ghi".gsub(/([\/.])/) {|m| "\\#{m}"}
=> "abc\\/def\\.ghi"
Not nearly as elegant as #sawa's answer, but it was the only way I could find to get it to work if you need the replacing string to contain the captured group/backreference (rather than inserting the replacement before the look-ahead).

How to strip leading and trailing quote from string, in Ruby

I want to strip leading and trailing quotes, in Ruby, from a string. The quote character will occur 0 or 1 time. For example, all of the following should be converted to foo,bar:
"foo,bar"
"foo,bar
foo,bar"
foo,bar
You could also use the chomp function, but it unfortunately only works in the end of the string, assuming there was a reverse chomp, you could:
'"foo,bar"'.rchomp('"').chomp('"')
Implementing rchomp is straightforward:
class String
def rchomp(sep = $/)
self.start_with?(sep) ? self[sep.size..-1] : self
end
end
Note that you could also do it inline, with the slightly less efficient version:
'"foo,bar"'.chomp('"').reverse.chomp('"').reverse
EDIT: Since Ruby 2.5, rchomp(x) is available under the name delete_prefix, and chomp(x) is available as delete_suffix, meaning that you can use
'"foo,bar"'.delete_prefix('"').delete_suffix('"')
I can use gsub to search for the leading or trailing quote and replace it with an empty string:
s = "\"foo,bar\""
s.gsub!(/^\"|\"?$/, '')
As suggested by comments below, a better solution is:
s.gsub!(/\A"|"\Z/, '')
As usual everyone grabs regex from the toolbox first. :-)
As an alternate I'll recommend looking into .tr('"', '') (AKA "translate") which, in this use, is really stripping the quotes.
Another approach would be
remove_quotations('"foo,bar"')
def remove_quotations(str)
if str.start_with?('"')
str = str.slice(1..-1)
end
if str.end_with?('"')
str = str.slice(0..-2)
end
end
It is without RegExps and start_with?/end_with? are nicely readable.
It frustrates me that strip only works on whitespace. I need to strip all kinds of characters! Here's a String extension that will fix that:
class String
def trim sep=/\s/
sep_source = sep.is_a?(Regexp) ? sep.source : Regexp.escape(sep)
pattern = Regexp.new("\\A(#{sep_source})*(.*?)(#{sep_source})*\\z")
self[pattern, 2]
end
end
Output
'"foo,bar"'.trim '"' # => "foo,bar"
'"foo,bar'.trim '"' # => "foo,bar"
'foo,bar"'.trim '"' # => "foo,bar"
'foo,bar'.trim '"' # => "foo,bar"
' foo,bar'.trim # => "foo,bar"
'afoo,bare'.trim /[aeiou]/ # => "foo,bar"
Assuming that quotes can only appear at the beginning or end, you could just remove all quotes, without any custom method:
'"foo,bar"'.delete('"')
I wanted the same but for slashes in url path, which can be /test/test/test/ (so that it has the stripping characters in the middle) and eventually came up with something like this to avoid regexps:
'/test/test/test/'.split('/').reject(|i| i.empty?).join('/')
Which in this case translates obviously to:
'"foo,bar"'.split('"').select{|i| i != ""}.join('"')
or
'"foo,bar"'.split('"').reject{|i| i.empty?}.join('"')
Regexs can be pretty heavy and lead to some funky errors. If you are not dealing with massive strings and the data is pretty uniform you can use a simpler approach.
If you know the strings have starting and leading quotes you can splice the entire string:
string = "'This has quotes!'"
trimmed = string[1..-2]
puts trimmed # "This has quotes!"
This can also be turned into a simple function:
# In this case, 34 is \" and 39 is ', you can add other codes etc.
def trim_chars(string, char_codes=[34, 39])
if char_codes.include?(string[0]) && char_codes.include?(string[-1])
string[1..-2]
else
string
end
end
You can strip non-optional quotes with scan:
'"foo"bar"'.scan(/"(.*)"/)[0][0]
# => "foo\"bar"

How to escape slashes in single-quoted strings?

In Ruby 1.8.6 (2007-09-24 patchlevel 111):
str = '\&123'
puts "abc".gsub("b", str) => ab123c
puts "abc".gsub("b", "#{str}") => ab123c
puts "abc".gsub("b", str.to_s) => ab123c
puts "abc".gsub("b", '\&123') => ab123c
puts "abc".gsub("b", "\&123") => a&123c <--- This I want to achieve using temporary variable
If I change str = '\&123' to str = "\&123" it works fine, but I get str from match function, so I cannot specify it manually within parentheses. Is there any way to change the 'string' to "string" behavior?
maybe there is a simpler way, however the code below works
> str = '\&123'
> puts "abc".gsub("b", str.gsub(/\\&/o, '\\\\\&\2\1'))
> => a\&123c
Simple:
str = '\&123' <-- the result of your match function
str = str.gsub(/\\/, '\\\\')
You may also want to take a look here.
#Valentin
-> I meant that str from match was not taken verbatim. Thus another (simpler) solution appeared, that I was not aware of....
"abc".gsub("b") { str } -> a\&123c
Just remove the backslash:
puts "abc".gsub("b", '&123')
There is no need to protect the ampersand with a backslash inside
single-quoted string literals (unlike double-quoted ones).

Resources