Replacing regex capture with the same capture and an extra string - ruby

I am trying to escape certain characters in a string. In particular, I want to turn
abc/def.ghi into abc\/def\.ghi
I tried to use the following syntax:
1.9.3p125 :076 > "abc/def.ghi".gsub(/([\/.])/, '\\\1')
=> "abc\\1def\\1ghi"
Hmm. This behaves as if capture replacements didn't work. Yet, when I tried this:
1.9.3p125 :075 > "abc/def.ghi".gsub(/([\/.])/, '\1')
=> "abc/def.ghi"
... I got the replacement to work, but, of course, my prefixes weren't part of it.
What is the correct syntax to do something like this?

This should be easier
gsub(/(?=[.\/])/, "\\")

If you are trying to prepare a string to be used as a regex pattern, use the right tool:
Regexp.escape('abc/def.ghi')
=> "abc/def\\.ghi"
You can then use the resulting string to create a regex:
/#{ Regexp.escape('abc/def.ghi') }/
=> /abc\/def\.ghi/
or:
Regexp.new(Regexp.escape('abc/def.ghi'))
=> /abc\/def\.ghi/
From the docs:
Escapes any characters that would have special meaning in a regular expression. Returns a new escaped string, or self if no characters are escaped. For any string, Regexp.new(Regexp.escape(str))=~str will be true.
Regexp.escape('\*?{}.') #=> \\\*\?\{\}\.

You can pass a block to gsub:
>> "abc/def.ghi".gsub(/([\/.])/) {|m| "\\#{m}"}
=> "abc\\/def\\.ghi"
Not nearly as elegant as #sawa's answer, but it was the only way I could find to get it to work if you need the replacing string to contain the captured group/backreference (rather than inserting the replacement before the look-ahead).

Related

Use ARGV[] argument vector to pass a regular expression in Ruby

I am trying to use gsub or sub on a regex passed through terminal to ARGV[].
Query in terminal: $ruby script.rb input.json "\[\{\"src\"\:\"
Input file first 2 lines:
[{
"src":"http://something.com",
"label":"FOO.jpg","name":"FOO",
"srcName":"FOO.jpg"
}]
[{
"src":"http://something123.com",
"label":"FOO123.jpg",
"name":"FOO123",
"srcName":"FOO123.jpg"
}]
script.rb:
dir = File.dirname(ARGV[0])
output = File.new(dir + "/output_" + Time.now.strftime("%H_%M_%S") + ".json", "w")
open(ARGV[0]).each do |x|
x = x.sub(ARGV[1]),'')
output.puts(x) if !x.nil?
end
output.close
This is very basic stuff really, but I am not quite sure on how to do this. I tried:
Regexp.escape with this pattern: [{"src":".
Escaping the characters and not escaping.
Wrapping the pattern between quotes and not wrapping.
Meditate on this:
I wrote a little script containing:
puts ARGV[0].class
puts ARGV[1].class
and saved it to disk, then ran it using:
ruby ~/Desktop/tests/test.rb foo /abc/
which returned:
String
String
The documentation says:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ‘d’, instead of a digit.
That means that the regular expression, though it appears to be a regex, it isn't, it's a string because ARGV only can return strings because the command-line can only contain strings.
When we pass a string into sub, Ruby recognizes it's not a regular expression, so it treats it as a literal string. Here's the difference in action:
'foo'.sub('/o/', '') # => "foo"
'foo'.sub(/o/, '') # => "fo"
The first can't find "/o/" in "foo" so nothing changes. It can find /o/ though and returns the result after replacing the two "o".
Another way of looking at it is:
'foo'.match('/o/') # => nil
'foo'.match(/o/) # => #<MatchData "o">
where match finds nothing for the string but can find a hit for /o/.
And all that leads to what's happening in your code. Because sub is being passed a string, it's trying to do a literal match for the regex, and won't be able to find it. You need to change the code to:
sub(Regexp.new(ARGV[1]), '')
but that's not all that has to change. Regexp.new(...) will convert what's passed in into a regular expression, but if you're passing in '/o/' the resulting regular expression will be:
Regexp.new('/o/') # => /\/o\//
which is probably not what you want:
'foo'.match(/\/o\//) # => nil
Instead you want:
Regexp.new('o') # => /o/
'foo'.match(/o/) # => #<MatchData "o">
So, besides changing your code, you'll need to make sure that what you pass in is a valid expression, minus any leading and trailing /.
Based on this answer in the thread Convert a string to regular expression ruby, you should use
x = x.sub(/#{ARGV[1]}/,'')
I tested it with this file (test.rb):
puts "You should not see any number [0123456789].".gsub(/#{ARGV[0]}/,'')
I called the file like so:
ruby test.rb "\d+"
# => You should not see any number [].

How in ruby delete all non-digits symbols (except commas and dashes)

I meet some hard task for me. I has a string which need to parse into array and some other elements. I have a troubles with REGEXP so wanna ask help.
I need delete from string all non-digits, except commas (,) and dashes (-)
For example:
"!1,2e,3,6..-10" => "1,2,3,6-10"
"ffff5-10...." => "5-10"
"1.2,15" => "12,15"
and so.
[^0-9,-]+
This should do it for you.Replace by empty string.See demo.
https://regex101.com/r/vV1wW6/44
We must have at least one non-regex solution:
def keep_some(str, keepers)
str.delete(str.delete(keepers))
end
keep_some("!1,2e,3,6..-10", "0123456789,-")
#=> "1,2,3,6-10"
keep_some("ffff5-10....", "0123456789,-")
#=> "5-10"
keep_some("1.2,15", "0123456789,-")
#=> "12,15"
"!1,2e,3,6..-10".gsub(/[^\d,-]+/, '') # => "1,2,3,6-10"
Use String#gsub with a pattern that matches everything except what you want to keep, and replace it with the empty string. In a reguar expression, the negated character class [^whatever] matches everything except the characters in the "whatever", so this works:
a_string.gsub /[^0-9,-]/, ''
Note that the hyphen has to come last, as otherwise it will be interpreted as a range indicator.
To demonstrate, I put all your "before" strings into an Array and used Enumerable#map to run the above gsub call on all of them, producing an Array of the "after" strings:
["!1,2e,3,6..-10", "ffff5-10....", "1.2,15"].map { |s| s.gsub /[^0-9,-]/, '' }
# => ["1,2,3,6-10", "5-10", "12,15"]

How do I remove backslashes from a JSON string?

I have a JSON string that looks as below
'{\"test\":{\"test1\":{\"test1\":[{\"test2\":\"1\",\"test3\": \"foo\",\"test4\":\"bar\",\"test5\":\"test7\"}]}}}'
I need to change it to the one below using Ruby or Rails:
'{"test":{"test1":{"test1":[{"test2":"1","test3": "foo","test4":"bar","test5":"bar2"}]}}}'
I need to know how to remove those slashes.
To avoid generating JSON with backslashes in console, use puts:
> hash = {test: 'test'}
=> {:test=>"test"}
> hash.to_json
=> "{\"test\":\"test\"}"
> puts hash.to_json
{"test":"test"}
You could also use JSON.pretty_generate, with puts of course.
> puts JSON.pretty_generate(hash)
{
"test": "test"
}
Use Ruby's String#delete! method. For example:
str = '{\"test\":{\"test1\":{\"test1\":[{\"test2\":\"1\",\"test3\": \"foo\",\"test4\":\"bar\",\"test5\":\"test7\"}]}}}'
str.delete! '\\'
puts str
#=> {"test":{"test1":{"test1":[{"test2":"1","test3": "foo","test4":"bar","test5":"test7"}]}}}
Replace all backslashes with empty string using gsub:
json.gsub('\\', '')
Note that the default output in a REPL uses inspect, which will double-quote the string and still include backslashes to escape the double-quotes. Use puts to see the string’s exact contents:
{"test":{"test1":{"test1":[{"test2":"1","test3": "foo","test4":"bar","test5":"test7"}]}}}
Further, note that this will remove all backslashes, and makes no regard for their context. You may wish to only remove backslashes preceding a double-quote, in which case you can do:
json.gsub('\"', '')
I needed to print a pretty JSON array to a file. I created an array of JSON objects then needed to print to a file for the DBA to manipulate.
This is how i did it.
puts(((dirtyData.gsub(/\\"/, "\"")).gsub(/\["/, "\[")).gsub(/"\]/, "\]"))
It's a triple nested gsub that removes the \" first then removes the [" lastly removes the "]
I hope this helps
I tried the earlier approaches and they did not seem to fix my issue. My json still had backslashes. However, I found a fix to solve this issue.
myuglycode.gsub!(/\"/, '\'')
Use JSON.parse()
JSON.parse("{\"test\":{\"test1\":{\"test1\":[{\"test2\":\"1\",\"test3\": \"foo\",\"test4\":\"bar\",\"test5\":\"test7\"}]}}}")
> {"test"=>{"test1"=>{"test1"=>[{"test2"=>"1", "test3"=>"foo", "test4"=>"bar", "test5"=>"test7"}]}}}
Note that the "json-string" must me double-quoted for this to work, otherwise \" would be read as is instead of being "translated" to " which return a JSON::ParserError Exception.

How to change case of letters in string using RegEx in Ruby

Say I have a string : "hEY "
I want to convert it to "Hey "
string.gsub!(/([a-z])([A-Z]+ )/, '\1'.upcase)
That is the idea I have, but it seems like the upcase method does nothing when I use it within the gsub method. Why is that?
EDIT: I came up with this method:
string.gsub!(/([a-z])([A-Z]+ )/) { |str| str.downcase!.capitalize! }
Is there a way to do this within the regex though? I don't really understand the '\1' '\2' thing. Is that backreferencing? How does that work
#sawa Has the simple answer, and you've edited your question with another mechanism. However, to answer two of your questions:
Is there a way to do this within the regex though?
No, Ruby's regex does not support a case-changing feature as some other regex flavors do. You can "prove" this to yourself by reviewing the official Ruby regex docs for 1.9 and 2.0 and searching for the word "case":
https://github.com/ruby/ruby/blob/ruby_1_9_3/doc/re.rdoc
https://github.com/ruby/ruby/blob/ruby_2_0_0/doc/re.rdoc
I don't really understand the '\1' '\2' thing. Is that backreferencing? How does that work?
Your use of \1 is a kind of backreference. A backreference can be when you use \1 and such in the search pattern. For example, the regular expression /f(.)\1/ will find the letter f, followed by any character, followed by that same character (e.g. "foo" or "f!!").
In this case, within a replacement string passed to a method like String#gsub, the backreference does refer to the previous capture. From the docs:
"If replacement is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form \d, where d is a group number, or \k<n>, where n is a group name. If it is a double-quoted string, both back-references must be preceded by an additional backslash."
In practice, this means:
"hello world".gsub( /([aeiou])/, '_\1_' ) #=> "h_e_ll_o_ w_o_rld"
"hello world".gsub( /([aeiou])/, "_\1_" ) #=> "h_\u0001_ll_\u0001_ w_\u0001_rld"
"hello world".gsub( /([aeiou])/, "_\\1_" ) #=> "h_e_ll_o_ w_o_rld"
Now, you have to understand when code runs. In your original code…
string.gsub!(/([a-z])([A-Z]+ )/, '\1'.upcase)
…what you are doing is calling upcase on the string '\1' (which has no effect) and then calling the gsub! method, passing in a regex and a string as parameters.
Finally, another way to achieve this same goal is with the block form like so:
# Take your pick of which you prefer:
string.gsub!(/([a-z])([A-Z]+ )/){ $1.upcase << $2.downcase }
string.gsub!(/([a-z])([A-Z]+ )/){ [$1.upcase,$2.downcase].join }
string.gsub!(/([a-z])([A-Z]+ )/){ "#{$1.upcase}#{$2.downcase}" }
In the block form of gsub the captured patterns are set to the global variables $1, $2, etc. and you can use those to construct the replacement string.
I don't know why you are trying to do it in a complicated way, but the usual way is:
"hEY".capitalize # => "Hey"
If you insist in using a regex and upcase, then you would also need downcase:
"hEY".downcase.sub(/\w/){$&.upcase} # => "Hey"
If you really want to just swap the case of every letter in the string, you can avoid the complexity of regex entirely because There's A Method For That™.
"hEY".swapcase # => "Hey"
"HellO thERe".swapcase # => "hELLo THerE"
There's also swapcase! to do it destructively.

Ruby remove everything except some characters?

How can I remove from a string all characters except white spaces, numbers, and some others?
Something like this:
oneLine.gsub(/[^ULDR0-9\<\>\s]/i,'')
I need only: 0-9 l d u r < > <space>
Also, is there a good document about the use of regex in Ruby, like a list of special characters with examples?
The regex you have is already working correctly. However, you do need to assign the result back to the string you're operating on. Otherwise, you're not changing the string (.gsub() does not modify the string in-place).
You can improve the regex a bit by adding a '+' quantifier (so consecutive characters can be replaced in one go). Also, you don't need to escape angle brackets:
oneLine = oneLine.gsub(/[^ULDR0-9<>\s]+/i, '')
A good resource with special consideration of Ruby regexes is the Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan. A good online tutorial by the same author is here.
Good old String#delete does this without a regular expression. The ^ means 'NOT'.
str = "12eldabc8urp pp"
p str.delete('^0-9ldur<> ') #=> "12ld8ur "
Just for completeness: you don't need a regular expression for this particular task, this can be done using simple string manipulation:
irb(main):005:0> "asdasd123".tr('^ULDRuldr0-9<>\t\r\n ', '')
=> "dd123"
There's also the tr! method if you want to replace the old value:
irb(main):009:0> oneLine = 'UasdL asd 123'
irb(main):010:0> oneLine.tr!('^ULDRuldr0-9<>\t\r\n ', '')
irb(main):011:0> oneLine
=> "UdL d 123"
This should be a bit faster as well (but performance shouldn't be a big concern in Ruby :)

Resources