Why I can't substitute with '\\+' inside a string with gsub? - ruby

Try the following code:
s = '#value#'
puts s.gsub('#value#', Regexp.escape('*')) # => '\*'
puts s.gsub('#value#', Regexp.escape('+')) # => ''
Wtf? It looks like the char '\+' (returned by Regexp.escape) is completely ignored by gsub. How to fix this?

It's because of the interpolation of special variables. \+ will be replaced with "the text matched by the highest-numbered capturing group that actually participated in the match" (See the Special Variables section on http://www.regular-expressions.info/ruby.html)
The block syntax is in fact a fix for this, well done.

xsdg of #ruby worked this out
Looks like that gsub's replacement is parsed, so the + is lost somewhere in the process. A workaround is using gsub's block syntax. This way:
s = '#value#'
puts s.gsub('#value#') { |v| Regexp.escape('+') } # => '+'
Works as expected :)
Thanks, xsdg!

Related

Ruby regular expression to get users names after # symbol

How can I get the username without the # symbol?
That's everything between # and any non-word character.
message = <<-MESSAGE
From #victor with love,
To #andrea,
and CC goes to #ghost
MESSAGE
Using a Ruby regular expression, I tried
username_pattern = /#\w+/
I will like to get the following output
message.scan(username_pattern)
#=> ["victor", "andrea", "ghost"]
Use look behind
(?<=#)\w+
this will leave # symbol regex
I would go with:
message.scan(/(?<=#)\w+/)
#=> ["victor","andrea","ghost"]
You might want to read about look-behind regexp.
You could match the # and then capture one or more times a word character in a capturing group
#(\w+)
username_pattern = /#(\w+)/
Regex demo
Try this
irb(main):010:0> message.scan(/#(\w+)/m).flatten
=> ["victor", "andrea", "ghost"]

Multiple Ruby chomp! statements

I'm writing a simple method to detect and strip tags from text strings. Given this input string:
{{foobar}}
The function has to return
foobar
I thought I could just chain multiple chomp! methods, like so:
"{{foobar}}".chomp!("{{").chomp!("}}")
but this won't work, because the first chomp! returns a NilClass. I can do it with regular chomp statements, but I'm really looking for a one-line solution.
The String class documentation says that chomp! returns a Str if modifications have been made - therefore, the second chomp! should work. It doesn't, however. I'm at a loss at what's happening here.
For the purposes of this question, you can assume that the input string is always a tag which begins and ends with double curly braces.
You can definitely chain multiple chomp statements (the non-bang version), still having a one-line solution as you wanted:
"{{foobar}}".chomp("{{").chomp("}}")
However, it will not work as expected because both chomp! and chomp removes the separator only from the end of the string, not from the beginning.
You can use sub
"{{foobar}}".sub(/{{(.+)}}/, '\1')
# => "foobar"
"alfa {{foobar}} beta".sub(/{{(.+)}}/, '\1')
# => "alfa foobar beta"
# more restrictive
"{{foobar}}".sub(/^{{(.+)}}$/, '\1')
# => "foobar"
Testing this out, it's clear that chomp! will return nil if the separator it's provided as an argument is not present at the end of the string.
So "{{text}}".chomp!("}}") returns a string, but "{{text}}".chomp!("{{") reurns nil.
See here for an answer of how to chomp at the beginning of a string. But recognize that chomp only looks at the end of the string. So you can call str.reverse.chomp!("{{").reverse to remove the opening brackets.
You could also use a regex:
string = "{{text}}"
puts [/^\{\{(.+)\}\}$/, 1]
# => "text"
Try tr:
'{{foobar}}'.tr('{{', '').tr('}}', '')
You can also use gsub or sub but if the replacement is not needed as pattern, then tr should be faster.
If there are always curly braces, then you can just slice the string:
'{{foobar}}'[2...-2]
If you plan to make a method which returns the string without curly braces then DO NOT use bang versions. Modifying the input parameter of a method will be suprising!
def strip(string)
string.tr!('{{', '').tr!('}}', '')
end
a = '{{foobar}}'
b = strip(a)
puts b #=> foobar
puts a #=> foobar

Use ARGV[] argument vector to pass a regular expression in Ruby

I am trying to use gsub or sub on a regex passed through terminal to ARGV[].
Query in terminal: $ruby script.rb input.json "\[\{\"src\"\:\"
Input file first 2 lines:
[{
"src":"http://something.com",
"label":"FOO.jpg","name":"FOO",
"srcName":"FOO.jpg"
}]
[{
"src":"http://something123.com",
"label":"FOO123.jpg",
"name":"FOO123",
"srcName":"FOO123.jpg"
}]
script.rb:
dir = File.dirname(ARGV[0])
output = File.new(dir + "/output_" + Time.now.strftime("%H_%M_%S") + ".json", "w")
open(ARGV[0]).each do |x|
x = x.sub(ARGV[1]),'')
output.puts(x) if !x.nil?
end
output.close
This is very basic stuff really, but I am not quite sure on how to do this. I tried:
Regexp.escape with this pattern: [{"src":".
Escaping the characters and not escaping.
Wrapping the pattern between quotes and not wrapping.
Meditate on this:
I wrote a little script containing:
puts ARGV[0].class
puts ARGV[1].class
and saved it to disk, then ran it using:
ruby ~/Desktop/tests/test.rb foo /abc/
which returned:
String
String
The documentation says:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ‘d’, instead of a digit.
That means that the regular expression, though it appears to be a regex, it isn't, it's a string because ARGV only can return strings because the command-line can only contain strings.
When we pass a string into sub, Ruby recognizes it's not a regular expression, so it treats it as a literal string. Here's the difference in action:
'foo'.sub('/o/', '') # => "foo"
'foo'.sub(/o/, '') # => "fo"
The first can't find "/o/" in "foo" so nothing changes. It can find /o/ though and returns the result after replacing the two "o".
Another way of looking at it is:
'foo'.match('/o/') # => nil
'foo'.match(/o/) # => #<MatchData "o">
where match finds nothing for the string but can find a hit for /o/.
And all that leads to what's happening in your code. Because sub is being passed a string, it's trying to do a literal match for the regex, and won't be able to find it. You need to change the code to:
sub(Regexp.new(ARGV[1]), '')
but that's not all that has to change. Regexp.new(...) will convert what's passed in into a regular expression, but if you're passing in '/o/' the resulting regular expression will be:
Regexp.new('/o/') # => /\/o\//
which is probably not what you want:
'foo'.match(/\/o\//) # => nil
Instead you want:
Regexp.new('o') # => /o/
'foo'.match(/o/) # => #<MatchData "o">
So, besides changing your code, you'll need to make sure that what you pass in is a valid expression, minus any leading and trailing /.
Based on this answer in the thread Convert a string to regular expression ruby, you should use
x = x.sub(/#{ARGV[1]}/,'')
I tested it with this file (test.rb):
puts "You should not see any number [0123456789].".gsub(/#{ARGV[0]}/,'')
I called the file like so:
ruby test.rb "\d+"
# => You should not see any number [].

How do I remove backslashes from a JSON string?

I have a JSON string that looks as below
'{\"test\":{\"test1\":{\"test1\":[{\"test2\":\"1\",\"test3\": \"foo\",\"test4\":\"bar\",\"test5\":\"test7\"}]}}}'
I need to change it to the one below using Ruby or Rails:
'{"test":{"test1":{"test1":[{"test2":"1","test3": "foo","test4":"bar","test5":"bar2"}]}}}'
I need to know how to remove those slashes.
To avoid generating JSON with backslashes in console, use puts:
> hash = {test: 'test'}
=> {:test=>"test"}
> hash.to_json
=> "{\"test\":\"test\"}"
> puts hash.to_json
{"test":"test"}
You could also use JSON.pretty_generate, with puts of course.
> puts JSON.pretty_generate(hash)
{
"test": "test"
}
Use Ruby's String#delete! method. For example:
str = '{\"test\":{\"test1\":{\"test1\":[{\"test2\":\"1\",\"test3\": \"foo\",\"test4\":\"bar\",\"test5\":\"test7\"}]}}}'
str.delete! '\\'
puts str
#=> {"test":{"test1":{"test1":[{"test2":"1","test3": "foo","test4":"bar","test5":"test7"}]}}}
Replace all backslashes with empty string using gsub:
json.gsub('\\', '')
Note that the default output in a REPL uses inspect, which will double-quote the string and still include backslashes to escape the double-quotes. Use puts to see the string’s exact contents:
{"test":{"test1":{"test1":[{"test2":"1","test3": "foo","test4":"bar","test5":"test7"}]}}}
Further, note that this will remove all backslashes, and makes no regard for their context. You may wish to only remove backslashes preceding a double-quote, in which case you can do:
json.gsub('\"', '')
I needed to print a pretty JSON array to a file. I created an array of JSON objects then needed to print to a file for the DBA to manipulate.
This is how i did it.
puts(((dirtyData.gsub(/\\"/, "\"")).gsub(/\["/, "\[")).gsub(/"\]/, "\]"))
It's a triple nested gsub that removes the \" first then removes the [" lastly removes the "]
I hope this helps
I tried the earlier approaches and they did not seem to fix my issue. My json still had backslashes. However, I found a fix to solve this issue.
myuglycode.gsub!(/\"/, '\'')
Use JSON.parse()
JSON.parse("{\"test\":{\"test1\":{\"test1\":[{\"test2\":\"1\",\"test3\": \"foo\",\"test4\":\"bar\",\"test5\":\"test7\"}]}}}")
> {"test"=>{"test1"=>{"test1"=>[{"test2"=>"1", "test3"=>"foo", "test4"=>"bar", "test5"=>"test7"}]}}}
Note that the "json-string" must me double-quoted for this to work, otherwise \" would be read as is instead of being "translated" to " which return a JSON::ParserError Exception.

How to strip leading and trailing quote from string, in Ruby

I want to strip leading and trailing quotes, in Ruby, from a string. The quote character will occur 0 or 1 time. For example, all of the following should be converted to foo,bar:
"foo,bar"
"foo,bar
foo,bar"
foo,bar
You could also use the chomp function, but it unfortunately only works in the end of the string, assuming there was a reverse chomp, you could:
'"foo,bar"'.rchomp('"').chomp('"')
Implementing rchomp is straightforward:
class String
def rchomp(sep = $/)
self.start_with?(sep) ? self[sep.size..-1] : self
end
end
Note that you could also do it inline, with the slightly less efficient version:
'"foo,bar"'.chomp('"').reverse.chomp('"').reverse
EDIT: Since Ruby 2.5, rchomp(x) is available under the name delete_prefix, and chomp(x) is available as delete_suffix, meaning that you can use
'"foo,bar"'.delete_prefix('"').delete_suffix('"')
I can use gsub to search for the leading or trailing quote and replace it with an empty string:
s = "\"foo,bar\""
s.gsub!(/^\"|\"?$/, '')
As suggested by comments below, a better solution is:
s.gsub!(/\A"|"\Z/, '')
As usual everyone grabs regex from the toolbox first. :-)
As an alternate I'll recommend looking into .tr('"', '') (AKA "translate") which, in this use, is really stripping the quotes.
Another approach would be
remove_quotations('"foo,bar"')
def remove_quotations(str)
if str.start_with?('"')
str = str.slice(1..-1)
end
if str.end_with?('"')
str = str.slice(0..-2)
end
end
It is without RegExps and start_with?/end_with? are nicely readable.
It frustrates me that strip only works on whitespace. I need to strip all kinds of characters! Here's a String extension that will fix that:
class String
def trim sep=/\s/
sep_source = sep.is_a?(Regexp) ? sep.source : Regexp.escape(sep)
pattern = Regexp.new("\\A(#{sep_source})*(.*?)(#{sep_source})*\\z")
self[pattern, 2]
end
end
Output
'"foo,bar"'.trim '"' # => "foo,bar"
'"foo,bar'.trim '"' # => "foo,bar"
'foo,bar"'.trim '"' # => "foo,bar"
'foo,bar'.trim '"' # => "foo,bar"
' foo,bar'.trim # => "foo,bar"
'afoo,bare'.trim /[aeiou]/ # => "foo,bar"
Assuming that quotes can only appear at the beginning or end, you could just remove all quotes, without any custom method:
'"foo,bar"'.delete('"')
I wanted the same but for slashes in url path, which can be /test/test/test/ (so that it has the stripping characters in the middle) and eventually came up with something like this to avoid regexps:
'/test/test/test/'.split('/').reject(|i| i.empty?).join('/')
Which in this case translates obviously to:
'"foo,bar"'.split('"').select{|i| i != ""}.join('"')
or
'"foo,bar"'.split('"').reject{|i| i.empty?}.join('"')
Regexs can be pretty heavy and lead to some funky errors. If you are not dealing with massive strings and the data is pretty uniform you can use a simpler approach.
If you know the strings have starting and leading quotes you can splice the entire string:
string = "'This has quotes!'"
trimmed = string[1..-2]
puts trimmed # "This has quotes!"
This can also be turned into a simple function:
# In this case, 34 is \" and 39 is ', you can add other codes etc.
def trim_chars(string, char_codes=[34, 39])
if char_codes.include?(string[0]) && char_codes.include?(string[-1])
string[1..-2]
else
string
end
end
You can strip non-optional quotes with scan:
'"foo"bar"'.scan(/"(.*)"/)[0][0]
# => "foo\"bar"

Resources