extract number in string with regex in ruby

extract number in string with regex in ruby - ruby

I have this string
url = "#AppDashboardPlace:p=io.transporterapp.deep.test1&appid=4975603106871514996"
I would like to get 4975603106871514996
I have tried that
url.to_s[/\appid\=(.*?)\&/, 1]
=> nil

Your regex doesn't match because there's no & after the appid value. Try this:
url.to_s[/appid=(\d+)/,1]
If you left the matching part as .*? with nothing after it, it would match the minimum amount of the string possible, which is the empty string. If you know that the appid is the very end of the string, then you could use .* without the ?, but it's best to be precise and specify that what you're looking for is a series of one or more (+) decimal digits (\d).

You could use String#match with the \d regex matcher, for matching on \d+, which means one or more digit.
url = "#AppDashboardPlace:p=io.transporterapp.deep.test1&appid=4975603106871514996"
match = url.match(/appid\=(\d+)/)
# => #<MatchData "appid=4975603106871514996" 1:"4975603106871514996">
puts match[0]
# => "appid=4975603106871514996"
puts match[1]
# => "4975603106871514996"

Related

Why does /[<>]/ not return both angle brackets with String#match?

I expect this example to match the two characters <and >:
a = "<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>"
a.match /[<>]/
# => #<MatchData "<">
It matches only the first character. Why?

#match only returns the first match as you have seen as MatchData, #scan will return all matches.
>> a="<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>"
=> "<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>"
>> a.scan /[<>]/
=> ["<", ">"]

Problem
You are misunderstanding your expression. /[<>]/ means:
Match a single character from the character class, which may be either < or >.
Ruby is correctly giving you exactly what you've asked for in your pattern.
Solution
If you're expecting the entire string between the two characters, you need a different pattern. For example:
"<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>".match /<.*?>/
#=> #<MatchData "<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>">
Alternatively, if you just want to match all the instances of < or > in your string, then you should use String#scan with a character class or alternation. In this particular case, the results will be identical either way. For example:
"<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>".scan /<|>/
#=> ["<", ">"]
"<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>".scan /[<>]/
#=> ["<", ">"]

How do you capture part of a regex to a variable in Ruby?

I know about "string"[/regex/], which returns the part of the string that matches. But what if I want to return only the captured part(s) of a string?
I have the string "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3". I want to store in the variable title the text The_Case_of_the_Gold_Ring.
I can capture this part with the regex /\d_(?!.*\d_)(.*).mp3$/i. But writing the Ruby "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"[/\d_(?!.*\d_)(.*).mp3$/i] returns 0_The_Case_of_the_Gold_Ring.mp3 which isn't what I want.
I can get what I want by writing
"1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" =~ /\d_(?!.*\d_)(.*).mp3$/i
title = $~.captures[0]
But this seems sloppy. Surely there's a proper way to do this?
(I'm aware that someone can probably write a simpler regex to target the text I want that lets the "string"[/regex/] method work, but this is just an example to illustrate the problem, the specific regex isn't the issue.)

You can pass number of part to [/regexp/, index] method:
=> string = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
=> string[/\d_(?!.*\d_)(.*).mp3$/i, 1]
=> "The_Case_of_the_Gold_Ring"
=> string[/\d_(?!.*\d_)(.*).mp3$/i, 0]
=> "0_The_Case_of_the_Gold_Ring.mp3"

Have a look at the match method:
string = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
regexp = /\d_(?!.*\d_)(.*).mp3$/i
matches = regexp.match(string)
matches[1]
#=> "The_Case_of_the_Gold_Ring"
Where matches[0] would return the whole match and matches[1] (and following) returns all subcaptures:
matches.to_a
#=> ["0_The_Case_of_the_Gold_Ring.mp3", "The_Case_of_the_Gold_Ring"]
Read more examples: http://ruby-doc.org/core-2.1.4/MatchData.html#method-i-5B-5D

You can use named captures
"1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" =~ /\d_(?!.*\d_)(?<title>.*).mp3$/i
and $~[:title] will give you want you want

Meditate on this:
Here's the source string to be parsed:
str = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
Patterns can be defined as strings:
DATE_REGEX = '\d{4}-[A-Z]{3}-\d{2}'
SERIAL_REGEX = '\d{2}'
TITLE_REGEX = '.+'
Then interpolated into a regexp:
regex = /^(#{ DATE_REGEX })_(#{ SERIAL_REGEX })_(#{ TITLE_REGEX })/
# => /^(\d{4}-[A-Z]{3}-\d{2})_(\d{2})_(.+)/
The advantage to that is it's easier to maintain because the pattern is really several smaller ones.
str.match(regex) # => #<MatchData "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" 1:"1952-FEB-21" 2:"70" 3:"The_Case_of_the_Gold_Ring.mp3">
regex.match(str) # => #<MatchData "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" 1:"1952-FEB-21" 2:"70" 3:"The_Case_of_the_Gold_Ring.mp3">
are equivalent because both Regexp and String implement match.
We can retrieve what was captured as an array:
regex.match(str).captures # => ["1952-FEB-21", "70", "The_Case_of_the_Gold_Ring.mp3"]
regex.match(str).captures.last # => "The_Case_of_the_Gold_Ring.mp3"
We can also name the captures and access them like we would a hash:
regex = /^(?<date>#{ DATE_REGEX })_(?<serial>#{ SERIAL_REGEX })_(?<title>#{ TITLE_REGEX })/
matches = regex.match(str)
matches[:date] # => "1952-FEB-21"
matches[:serial] # => "70"
matches[:title] # => "The_Case_of_the_Gold_Ring.mp3"
Of course, it's not necessary to mess with that rigamarole at all. We can split the string on underscores ('_'):
str = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
str.split('_') # => ["1952-FEB-21", "70", "The", "Case", "of", "the", "Gold", "Ring.mp3"]
split can take a limit parameter saying how many times it should split the string. Passing in 3 gives us:
str.split('_', 3) # => ["1952-FEB-21", "70", "The_Case_of_the_Gold_Ring.mp3"]
Grabbing the last element returns:
str.split('_', 3).last # => "The_Case_of_the_Gold_Ring.mp3"

I believe it would be easiest to use a capture group here, but I'd like to present some possibilities that do not, for illustrative purposes. All employ the same positive lookahead ((?=\.mp3$)). all but one use a positive lookbehind and one uses \K to "forget" the match up to the last character before beginning of the desired match. Some permit the matched string to contain digits (.+); others do not ([^\d]).
str = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
1 # match follows last digit followed by underscore, cannot contain digits
str[/(?<=\d_)[^\d]+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"
2 # same as 1, as `\K` disregards match to that point
str[/\d_\K[^\d]+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"
3 # match follows underscore, two digits, underscore, may contain digits
str[/(?<=_\d\d_).+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"
4 # match follows string having specfic pattern, may contain digits
str[/(?<=\d{4}-[A-Z]{3}-\d{2}_\d{2}_).+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"
5 # match follows digit, any 12 characters, another digit and underscore,
# may contain digits
str[/(?<=\d.{12}\d_).+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"

regex for a pattern at end of string

I have a string which looks like:
hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0
Through regex I want to get the string after last '/' and until end of line i.e. in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0
I tried this - ^(.+)\/(.+)$ which returns me an array of which first object is "hello/world" and 2nd object is "1.9.2-some-text"
Is there a way to just get "1.9.2-some-text" as the output?

Try using a negative character class ([^…]) like this:
[^\/]+$
This will match one or more of any character other than / followed by the end of the string.

You can use a negated match here.
'hello/world/1.9.2-some-text'.match(Regexp.new('[^/]+$'))
# => "1.9.2-some-text"
Meaning any character except: / (1 or more times) followed by the end of the string.
Although, the simplest way would be to split the string.
'hello/world/1.9.2-some-text'.split('/').last
# => "1.9.2-some-text"
OR
'hello/world/1.9.2-some-text'.split('/')[-1]
# => "1.9.2-some-text"

If you do not need to use a regex, the ordinary way of doing such thing is:
File.basename("hello/world/1.9.2-some-text")
#=> "1.9.2-some-text"

This is one way:
s = 'hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0'
s.lines.map { |l| l[/.*\/(.*)/,1] }
#=> ["1.9.2-some-text", "2.0.2-some-text", "2.11.0"]
You said, "in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0". That's neither a string nor an array, so I assumed you wanted an array. If you want a string, tack .join(', ') onto the end.
Regex's are naturally "greedy", so .*\/ will match all characters up to and including the last / in each line. 1 returns the contents of the capture group (.*) (capture group 1).

Ruby Regex to eliminate non word characters

Hello I would like to eliminate non words characters by a Regex in Ruby.
Let's say that I have:
pal1 = "a#b?a"
pal1 = /[a-z0-9]/.match(pal1)
When I put this in http://www.rubular.com/, it says that the Match result is:
aba
But whe I run the code in my ruby it is not true, it gives only "a"
How can I change my Regex to achieve aba in pal1.
Thanks in advance for your time.

You can use gsub to remove these characters.
pal1 = 'a#b?a'
pal1.gsub(/[^a-z0-9]/i, '')
# => "aba"
You can also use scan to match these characters and join them together.
pal1 = 'a#b?a'
pal1.scan(/[a-z0-9]/i).join
# => "aba"

You can do either of:
pal1.gsub!( /[^a-z\d]/i, '' ) # Kill all characters that don't match
pal1 = pal1.scan(/[a-z\d]/i).join # Find all the matching characters as array
# and then join them all into one string.

Ruby Regexp#match to match start of string with given position (Python re-like)

I'm looking for a way to match strings from first symbol, but considering the offset I give to match method.
test_string = 'abc def qwe'
def_pos = 4
qwe_pos = 8
/qwe/.match(test_string, def_pos) # => #<MatchData "qwe">
# ^^^ this is bad, as it just skipped over the 'def'
/^qwe/.match(test_string, def_pos) # => nil
# ^^^ looks ok...
/^qwe/.match(test_string, qwe_pos) # => nil
# ^^^ it's bad, as it never matches 'qwe' now
what I'm looking for is:
/...qwe/.match(test_string, def_pos) # => nil
/...qwe/.match(test_string, qwe_pos) # => #<MatchData "qwe">
Any ideas?

How about using a string slice?
/^qwe/.match(test_string[def_pos..-1])
The pos parameter tells the regex engine where to start the match, but it doesn't change the behaviour of the start-of-line (and other) anchors. ^ still only matches at the start of a line (and qwe_pos is still in the middle of test_string).
Also, in Ruby, \A is the "start-of-string" anchor, \z is the "end-of-string" anchor. ^ and $ match starts/ends of lines, too, and there is no option to change that behavior (which is special to Ruby, just like the charmingly confusing use of (?m) which does what (?s) does in other regex flavors)...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

extract number in string with regex in ruby - ruby

I have this string url = "#AppDashboardPlace:p=io.transporterapp.deep.test1&appid=4975603106871514996" I would like to get 4975603106871514996 I have tried that url.to_s[/\appid\=(.*?)\&/, 1] => nil

Related

Why does /[<>]/ not return both angle brackets with String#match?

How do you capture part of a regex to a variable in Ruby?

regex for a pattern at end of string

Ruby Regex to eliminate non word characters

Ruby Regexp#match to match start of string with given position (Python re-like)

Categories

Resources