How to get the particular part of string matching regexp in Ruby? - ruby

I've got a string Unnecessary:12357927251data and I need to select all data after colon and numbers. I will do it using Regexp.
string.scan(/:\d+.+$/)
This will give me :12357927251data, but can I select only needed information .+ (data)?

Anything in parentheses in a regexp will be captured as a group, which you can access in $1, $2, etc. or by using [] on a match object:
string.match(/:\d+(.+)$/)[1]
If you use scan with capturing groups, you will get an array of arrays of the groups:
"Unnecessary:123data\nUnnecessary:5791next".scan(/:\d+(.+)$/)
=> [["data"], ["next"]]

Use parenthesis in your regular expression and the result will be broken out into an array. For example:
x='Unnecessary:12357927251data'
x.scan(/(:\d+)(.+)$/)
=> [[":12357927251", "data"]]
x.scan(/:\d+(.+$)/).flatten
=> ["data"]

Assuming that you are trying to get the string 'data' from your string, then you can use:
string.match(/.*:\d*(.*)/)[1]
String#match returns a MatchData object. You can then index into that MatchData object to find the part of the string that you want.
(The first element of MatchData is the original string, the second element is the part of the string captured by the parentheses)

Try this: /(?<=\:)\d+.+$/
It changes the colon to a positive look-behind so that it does not appear in the output. Note that the colon alone is a metacharacter and so must be escaped with a backslash.

Using IRB
irb(main):004:0> "Unnecessary:12357927251data".scan(/:\d+(.+)$/)
=> [["data"]]

Related

Why regex works in javascript, but don't work in ruby?

text = 'http://www.site.info www.escola.ninja.br google.com.ag'
expression: (http:\/\/)?((www\.)?\w+\.\w{2,}(\.\w{2,})?)
In Javascript, this expression works, returning:
["http://www.site.info", "www.escola.ninja.br", "google.com.ag"]
Why it's not working in ruby?
For example:
using the Match method:
p text.match(/(http:\/\/)?(www\.)?\w+\.\w{2,}(\.\w{2})?/)
#<MatchData "http://www.site.info" 1:"http://" 2:"www." 3:nil>
using the Scan method:
p text.scan(/(http:\/\/)?(www\.)?\w+\.\w{2,}(\.\w{2})?/)
[["http://", "www.", nil], [nil, "www.", ".br"], [nil, nil, ".ag"]]
How can I return the following array instead?
["http://www.site.info", "www.escola.ninja.br", "google.com.ag"]
Because according to the Ruby String#scan method:
If the pattern contains groups, each individual result is itself an array containing one entry per group.
So you can simply modify the expression so that the groups are non-capturing by converting (...) to (?:...), resulting in the following expression
text.scan(/(?:http:\/\/)?(?:(?:www\.)?\w+\.\w{2,}(?:\.\w{2,})?)/)
# => ["http://www.site.info", "www.escola.ninja.br", "google.com.ag"]
The reason is that str.match(/regex/g) in JS does not keep captured substrings, see MDN String#match() reference:
If the regular expression includes the g flag, the method returns an Array containing all matched substrings rather than match objects. Captured groups are not returned.
In Ruby, you have to modify the pattern to remove redundant capturing groups and turn capturing ones into non-capturing (that is, replace unescaped ( with (?:) because otherwise, only the captured substrings will get output by the String#scan method:
If the pattern contains no groups, each individual result consists of the matched string, $&. If the pattern contains groups, each individual result is itself an array containing one entry per group.
Use
text = 'http://www.site.info www.escola.ninja.br google.com.ag'
puts text.scan(/(?:http:\/\/)?(?:www\.)?\w+\.\w{2,}(?:\.\w{2,})?/)
Output of the demo:
http://www.site.info
www.escola.ninja.br
google.com.ag

Method gsub does not work as expected

I want to change "#" to "\40" in a string. But am not able to do so.
a = "srikanth#in.com"
a.gsub("#", "\40")
# => "srikanth in.com"
It's changing \40 with space. Any idea how to implement this?
An other solution
puts a.gsub("#") {"\\40"}
# => srikanth\40in.com
\\40 doesn't work because it refers to a capture group. From the docs:
If replacement is a String it will be substituted for the matched
text. It may contain back-references to the pattern’s capture groups
of the form \\d, where d is a group number ...
You can use gsub's hash syntax instead:
If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string.
Example:
a.gsub('#', '#' => '\\40')
#=> "srikanth\\40in.com"
backslashes have a special meaning in the second parameter of gsub. They refer to a possibly matched regex groups. I tried escaping, but couldn't get it to work. It works this way, though:
s = "srikanth#in.com"
s['#'] = '\\40'
s # => "srikanth\\40in.com"

String gsub - Replace characters between two elements, but leave surrounding elements

Suppose I have the following string:
mystring = "start/abc123/end"
How can you splice out the abc123 with something else, while leaving the "/start/" and "/end" elements intact?
I had the following to match for the pattern, but it replaces the entire string. I was hoping to just have it replace the abc123 with 123abc.
mystring.gsub(/start\/(.*)\/end/,"123abc") #=> "123abc"
Edit: The characters between the start & end elements can be any combination of alphanumeric characters, I changed my example to reflect this.
You can do it using this character class : [^\/] (all that is not a slash) and lookarounds
mystring.gsub(/(?<=start\/)[^\/]+(?=\/end)/,"7")
For your example, you could perhaps use:
mystring.gsub(/\/(.*?)\//,"/7/")
This will match the two slashes between the string you're replacing and putting them back in the substitution.
Alternatively, you could capture the pieces of the string you want to keep and interpolate them around your replacement, this turns out to be much more readable than lookaheads/lookbehinds:
irb(main):010:0> mystring.gsub(/(start)\/.*\/(end)/, "\\1/7/\\2")
=> "start/7/end"
\\1 and \\2 here refer to the numbered captures inside of your regular expression.
The problem is that you're replacing the entire matched string, "start/8/end", with "7". You need to include the matched characters you want to persist:
mystring.gsub(/start\/(.*)\/end/, "start/7/end")
Alternatively, just match the digits:
mystring.gsub(/\d+/, "7")
You can do this by grouping the start and end elements in the regular expression and then referring to these groups in in the substitution string:
mystring.gsub(/(?<start>start\/).*(?<end>\/end)/, "\\<start>7\\<end>")

Ruby scan regex will not match optional

Take this string.
a = "real-ab(+)real-bc(+)real-cd-xy"
a.scan(/[a-z_0-9]+\-[a-z_0-9]+[\-\[a-z_0-9]+\]?/)
=> ["real-ab", "real-bc", "real-cd-xy"]
But how come this next string gets nothing?
a = "real-a(+)real-b(+)real-c"
a.scan(/[a-z_0-9]+\-[a-z_0-9]+[\-\[a-z_0-9]+\]?/)
=> []
How can I have it so both strings output into a 3 count array?
You've confused parentheses (used for grouping) and square brackets (used for character classes). You want
a.scan(/[a-z_0-9]+-[a-z_0-9]+(?:-[a-z_0-9]+)?/)
(?:...) creates a non-capturing group which is what you need here.
Furthermore, unless you want to disallow uppercase letters explicitly, you can write \w as a shorthand for "a letter, digit or underscore":
a.scan(/\w+-\w+(?:-\w+)?/)
a.scan(/[a-z_0-9]+\-[a-z_0-9]+/)
Why not simply?
a.scan(/[a-z_0-9\-]+/)

Ruby doesn't recognize the g flag for regex

Is it implied by default in str.scan? Is it off by default in str[regex] ?
Yes, how often the regex is applied depends on the method used, not on the regex's flags.
scan will return an array containing (or iterate over) all matches of the regex. match and String#[] will return the first match. =~ will return the index of the first match. gsub will replace all occurrences of the regex and sub will replace the first occurence.
smotchkkiss:~$ irb
>> 'Foobar does not like food because he is a fool'.gsub(/foo/i, 'zim')
=> "zimbar does not like zimd because he is a ziml"

Resources