Ruby matching the particular word - ruby

I am trying to match the particular word in a string but it is matching the whole string
doc = "<span>Hi welcome to world</span>"
puts doc.match(/<span>(.*?)<\/span>/)
This code prints the whole string
Output:
<span>Hi welcome to world</span>
But i want only
Hi welcome to world
The another problem is that the output for this program is just an integer
doc = "<span>Hi welcome to world</span>"
puts doc =~ (/<span>(.*?)<\/span>/)
Output:
0

You should put first match group:
puts doc.match(/<span>(.*?)<\/span>/)[1]
# => Hi welcome to world
To answer your another question, from documentation:
Match—If obj is a Regexp, use it as a pattern to match against str,and returns the position the match starts, or nil if there is no match.

After matching with a RegEx you can use $1, $2, ... to output matched groups. So you could simply do:
doc.match(/<span>(.*?)<\/span>/)
puts $1
You could take a look at What are Ruby's numbered global variables for a detailed explanation about other variables such as $'.

Related

How to extract part of the string which comes after given substring?

For example I have url string like:
https://abc.s3-something.amazonaws.com/subfolder/1234/5.html?X-Amz-Credential=abcd12bhhh34-1%2Fs3%2Faws4_request&X-Amz-Date=2016&X-Amz-Expires=3&X-Amz-SignedHeaders=host&X-Amz-Signature=abcd34hhhhbfbbf888ksdskj
From this string I need to extract number 1234 which comes after subfolder/. I tried with gsub but no luck. Any help would be appreciated.
Suppose your url is saved in a variable called url.
Then the following should return 1234
url.match(/subfolder\/(\d*)/)[1]
Explanation:
url.match(/ # call the match function which takes a regex
subfolder\/ # search for the first appearance of the string 'subfolder/'
# note: we must escape the `/` so we don't end the regex early
(\d*) # match any number of digits in a capture group,
/)[1] # close the regex and return the first capture group
lwassink has the right idea, but it can be done more simply. If subfolder is always the same:
url = "https://abc.s3-something.amazonaws.com/subfolder/1234/5.html?X-Amz-Credential=abcd12bhhh34-1%2Fs3%2Faws4_request&X-Amz-Date=2016&X-Amz-Expires=3&X-Amz-SignedHeaders=host&X-Amz-Signature=abcd34hhhhbfbbf888ksdskj"
url[/subfolder\/\K\d+/]
# => "1234"
The \K discards the matched text up to that point, so only "1234" is returned.
If you want to get the number after any subfolder, and the domain name is always the same, you might do this instead:
url[%r{amazonaws\.com/[^/]+/\K\d+}]
# => "1234"
s.split('/')[4]
Add a .to_i at the end if you like.
Or, to key it on a substring like you asked for...
a = s.split '/'
a[a.find_index('subfolder') + 1]
Or, to do it as a one-liner I suppose you could:
s.split('/').tap { |a| #i = 1 + a.find_index('subfolder')}[#i]
Or, since I am a damaged individual, I would actually write that:
s.split('/').tap { |a| #i = 1 + (a.find_index 'subfolder')}[#i]
url = 'http://abc/xyz'
index= url.index('/abc/')
url[index+5..length_of_string_you_want_to_extract]
Hope, that helps!

Ruby Regex Replace Last Occurence of Grouping

In Ruby regular expressions I would like to use gsub to replace a last occurrence of a grouping, if it occurs, otherwise, perform a replacement anyways at a default location. I am trying to replace the last occurrence of a number in the 40s (40...49). I have the following regular expression, which is correctly capturing the grouping I would like in '\3':
/(([1-3,5-9][0-9]|([4][0-9]))[a-z])*Foo/
Some sample strings I am using this regex on are:
12a23b34c45d56eFoo
12a45b34c46d89eFoo
45aFoo
Foo
12a23bFoo
12a23b445cFoo
Using https://regex101.com/, I see the last number in 40s is captured in '\3'. I would then like to somehow perform string.gsub(regex, '\3' => 'NEW') to replace this last occurrence or append before Foo if not present. My desired results would be:
12a23b34cNEWd56eFoo
12a45b34cNEWd89eFoo
NEWaFoo
NEWFoo
12a23bNEWFoo
12a23b4NEWcFoo
If I correctly understood, you are interested in gsub with codeblock:
str.gsub(PATTERN) { |mtch|
puts mtch # the whole match
puts $~[3] # the third group
mtch.gsub($~[3], 'NEW') # the result
}
'abc'.gsub(/(b)(c)/) { |m| m.gsub($~[2], 'd') }
#⇒ "abd"
Probably you should handle the case when there are no 40-s occureneces at all, like:
gsub($~[1], "NEW$~[1]") if $~[3].nil?
To handle all the possible cases, one might declare the group for Foo:
# NOTE THE GROUP ⇓⇓⇓⇓⇓
▶ re = /(([1-3,5-9][0-9]|([4][0-9]))[a-z])*(Foo)/
#⇒ /(([1-3,5-9][0-9]|([4][0-9]))[a-z])*(Foo)/
▶ inp.gsub(re) do |mtch|
▷ $~[3].nil? ? mtch.gsub($~[4], "NEW#{$~[4]}") : mtch.gsub(/#{$~[3]}/, 'NEW')
▷ end
#⇒ "12a23b34cNEWd56eFoo\n12a45b34cNEWd89eFoo\nNEWaFoo\nNEWFoo\n12a23bNEWFoo"
Hope it helps.
I suggest the following:
'12a23b34c45d56eFoo'.gsub(/(([1-3,5-9][0-9]|([4][0-9]))[a-z])*Foo/) {
if Regexp.last_match[3].nil? then
puts "Append before Foo"
else
puts "Replace group 3"
end
}
You'd need to find a way to append or replace accordingly or maybe someone can edit with a concise code...

how to find out one string contains substring in ruby

I am having one string variable need to check substring is present in it, like:
str = "sdfgg"
need to check if str contains df
Please help me to write a code in ruby to check the scenario
Use String#include?.
str.include?("df")
You can also use a regex for that:
if str =~ /df/
# Successful match
else
# Match attempt failed
end

Two strings evaluated by regex, but one of the scan results are being put into an extra array?

I can't figure out what I'm doing different in the below example. I have two string which in my perspective are similar - plain strings. For each string I have a regex, but the first regex, /\*Hi (.*) \*,/, gives me a result where the regex match is presented in 2 arrays: [["result"]]. I need my result to be presented in just 1 array: ["result"]. What am I doing differently in the 2 below examples?
✗ irb
2.0.0p247 :001 > name_line_1 = "*Hi Peter Parker *,"
=> "*Hi Peter Parker *,"
2.0.0p247 :002 > name_line_1.scan(/\*Hi (.*) \*,/)
=> [["Peter Parker"]]
2.0.0p247 :003 > name_line_2 = "Peter Parker<br />Memory Lane 60<br />0000 Gotham<br />USA<br />TEL:: 00000000000<br />peter#parker.com<br />\r"
=> "Peter Parker<br />Memory Lane 60<br />0000 Gotham<br />USA<br />TEL:: 00000000000<br />peter#parker.com<br />\r"
2.0.0p247 :004 > name_line_2.scan(/^[^<]*/)
=> ["Peter Parker"]
scan returns an array of matches. As the other answers point out, if your regex has capturing groups (parentheses), that means each match will return an array, with one string for each capturing group within the match.
If it didn't do this, scan wouldn't be very useful, as it is very common to use capturing groups in a regex to pick out different parts of the match.
I suspect that scan is not really the best method for your situation. scan is useful when you want to get all the matches from a string. But in the string you show, there is only one match anyways. If you want to get a specific capturing group from the first match in a string, the easiest way is:
string[/regex/, 1] # extract the first capturing group, or nil if there is no match
Another way is to do something like this:
if string =~ /regex/
# $1 will contain the first capturing group from the first match
Or:
if match = string.match(/regex/)
# match[1] will contain the first capturing group
If you really want to get all matches in the string, and need to use a capturing group (or feel it's more readable than using lookahead and lookbehind, which it is):
string.scan(/regex/) do |match|
# do something with match[0]
end
Or:
string.scan(/regex/).map(&:first)
Its because you are capturing the name in name_line_1 using parentheses. This causes the scan method to return an array of arrays. If you absolutely must return a 1 dimensional array, you can use forward and backward checking like so:
/(?<=\*Hi ).*(?= \*,)/
Or, if you find that too confusing, you could always just call .flatten on the resulting array ;-)
The difference is that, in the first regex, you have captured substring (). When a regex matches, the whole match is captured as $&, and in addition to that, you can capture parts of it as many as you want by using (). They will be captured as $1, $2, ...
And scan behaves differently depending whether you have $1, $2, ... When you don't, then it returns an array of all $&s. When you do have $1, $2, ..., then it returns an array of [$1, $2, ...].
In order to avoid $1 in the first regex, you have to avoid using captured substring:

Ruby global match regexp?

In other languages, in RegExp you can use /.../g for a global match.
However, in Ruby:
"hello hello".match /(hello)/
Only captures one hello.
How do I capture all hellos?
You can use the scan method. The scan method will either give you an array of all the matches or, if you pass it a block, pass each match to the block.
"hello1 hello2".scan(/(hello\d+)/) # => [["hello1"], ["hello2"]]
"hello1 hello2".scan(/(hello\d+)/).each do|m|
puts m
end
I've written about this method, you can read about it here near the end of the article.
Here's a tip for anyone looking for a way to replace all regex matches with something else.
Rather than the //g flag and one substitution method like many other languages, Ruby uses two different methods instead.
# .sub — Replace the first
"ABABA".sub(/B/, '') # AABA
# .gsub — Replace all
"ABABA".gsub(/B/, '') # AAA
use String#scan. It will return an array of each match, or you can pass a block and it will be called with each match.
All the details at http://ruby-doc.org/core/classes/String.html#M000812

Resources