How to get text between #{...} with RegEx? - ruby

I have the next text:
My name is %{name}
how can I get name inside of %{ ... }?
I'm trying with:
/%{(.*)}/
but it takes whole %{name}, but I need just name.
When I try this expression in regex101.com, it gives me 2 cases: Full match({name}) and Group 1(name). In my ruby code it gives me Full case, but I need Group case.
What is the problem?

You can use lookaround:
(?<=%{)[^%]*(?=})
see demo.
(?<=%{) will ensure that the next part is preceded with %{
[^%]* will match avoid issue with encapsulated field
(?=}) will ensure that it's followed by a }

Don't know how you're applying that regex to the string, but .match method returns a MatchData object, from which you can extract matched groups
s = 'My name is %{name}'
regex = /%{(.*)}/
m = s.match(regex) # => #<MatchData "%{name}" 1:"name">
m[0] # => "%{name}"
m[1] # => "name"
It looks nicer with named groups
s = 'My name is %{name}'
regex = /%{(?<var>.*)}/
m = s.match(regex) # => #<MatchData "%{name}" var:"name">
m[:var] # => "name"

In Ruby, you can easily access any capture group you need with
s[/regex/, n]
where n is the ID of the capturing group. So, in your case, use
s[/%{([^}]*)}/, 1]
or
s[/%{(.*?)}/m, 1]
See the online demo
You need to make the Group 1 subpattern lazy or set to match any chars but } to get as few symbols as possible in order not to overflow to the next match.

Related

Combination of strings, other variables and regexp in variable declaration

I'm trying to move files to other directories with FileUtils.mv. I'm trying to define a variable called name_convention, which is a mix of strings, other variables and I also want to include a regexp, where I'm failing. My code so far:
#these are my other variables already declared from an array
season = array[11..13]
episode = array[15..17]
#and this is my 'name_convention' variable
name_convention = "friends" + season + episode + "bluray.mkv"
Up to here, everything is working fine. Except that between friends and season, there can be either a . or a _. For example:
friends_s01e01_bluray.mkv
friends.s01e01.bluray.mkv
I tried to use a regexp, like /(\.|-)/, but I got the error: no implicit conversion of regex into string ruby
How can I provide the two options to my name_convention variable, so that it can be applied to both filenames?
You're trying to interpolate a regex into a string, but you need to do the opposite - interpolate the strings into the regex:
season = "s01"
episode = "e01"
regex = /friends[\._]#{Regexp.escape(season)}#{Regexp.escape(episode)}.bluray.mkv/
regex.match "friends_s01e01_bluray.mkv"
# => MatchData
regex.match "friends.s01e01_bluray.mkv"
# => MatchData
regex.match "friends-s01e01_bluray.mkv"
# => nil
For this particular example (s01 and e01) you don't need the Regexp.escape but it's a good idea to include it just in case.
If you're looking for a quick and dirty sNNeNN parser, try this:
def parse_episode(str)
m = str.match(/\A(.*?)[\-\_\.]?(s\d+)(e\d+)[\-\_\.]?(.*)\z/i)
# If matched, strip out the first entry which is the complete match
m&.to_a&.drop(1)
end
Where this produces results like:
parse_episode('snowpiercer-s01e01-stream')
# => ["snowpiercer", "s01", "e01", "stream"]
parse_episode('s01')
# => nil
parse_episode('wilford')
# => nil
parse_episode('simpsons_S04E12_monorail')
# => ["simpsons", "S04", "E12", "monorail"]
parse_episode('simpsons.S04E12')
# => ["simpsons", "S04", "E12", ""]

How to extract part of the string which comes after given substring?

For example I have url string like:
https://abc.s3-something.amazonaws.com/subfolder/1234/5.html?X-Amz-Credential=abcd12bhhh34-1%2Fs3%2Faws4_request&X-Amz-Date=2016&X-Amz-Expires=3&X-Amz-SignedHeaders=host&X-Amz-Signature=abcd34hhhhbfbbf888ksdskj
From this string I need to extract number 1234 which comes after subfolder/. I tried with gsub but no luck. Any help would be appreciated.
Suppose your url is saved in a variable called url.
Then the following should return 1234
url.match(/subfolder\/(\d*)/)[1]
Explanation:
url.match(/ # call the match function which takes a regex
subfolder\/ # search for the first appearance of the string 'subfolder/'
# note: we must escape the `/` so we don't end the regex early
(\d*) # match any number of digits in a capture group,
/)[1] # close the regex and return the first capture group
lwassink has the right idea, but it can be done more simply. If subfolder is always the same:
url = "https://abc.s3-something.amazonaws.com/subfolder/1234/5.html?X-Amz-Credential=abcd12bhhh34-1%2Fs3%2Faws4_request&X-Amz-Date=2016&X-Amz-Expires=3&X-Amz-SignedHeaders=host&X-Amz-Signature=abcd34hhhhbfbbf888ksdskj"
url[/subfolder\/\K\d+/]
# => "1234"
The \K discards the matched text up to that point, so only "1234" is returned.
If you want to get the number after any subfolder, and the domain name is always the same, you might do this instead:
url[%r{amazonaws\.com/[^/]+/\K\d+}]
# => "1234"
s.split('/')[4]
Add a .to_i at the end if you like.
Or, to key it on a substring like you asked for...
a = s.split '/'
a[a.find_index('subfolder') + 1]
Or, to do it as a one-liner I suppose you could:
s.split('/').tap { |a| #i = 1 + a.find_index('subfolder')}[#i]
Or, since I am a damaged individual, I would actually write that:
s.split('/').tap { |a| #i = 1 + (a.find_index 'subfolder')}[#i]
url = 'http://abc/xyz'
index= url.index('/abc/')
url[index+5..length_of_string_you_want_to_extract]
Hope, that helps!

Regular expression for DSL

I'm trying to write a regular expression that captures two groups: the first is group of n words (where n>= 0 and it's variable) and the second is a group of pairs with this format field:value. In both groups, the individuals are separated by blank spaces. Eventually, an optional space separates the two groups (unless one of them is blank/nil).
Please, take into consideration the following examples:
'the big apple'.match(pattern).captures # => ['the big apple', nil]
'the big apple is red status:drafted1 category:3'.match(pattern).captures # => ['the big apple is red', 'status:drafted1 category:3']
'status:1'.match(pattern).captures # => [nil, 'status:1']
I have tried a lot of combinations and patterns but I can't get it working. My closest pattern is /([[\w]*\s?]*)([\w+:[\w]+\s?]*)/, but it doesn't work properly in the second and third case previously exposed.
Thanks!
The one regex solution:
(.*?)(?:(?: ?((?: ?\w+:\w+)+))|$)
(.*?) match anythings but is not greedy and is used to find words
then there is a group or the end of line $
the group ignore the space ? then match all field:value with \w+:\w+
See an example here https://regex101.com/r/nZ9wU6/1 (I had flags to show the behavior but it works best for single result)
Not a regexp, but give it a try
string = 'the big apple:something'
first_result = ''
second_result = ''
string.split(' ').each do |value|
value.include?(':') ? first_string += value : second_string += value
end

How do you capture part of a regex to a variable in Ruby?

I know about "string"[/regex/], which returns the part of the string that matches. But what if I want to return only the captured part(s) of a string?
I have the string "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3". I want to store in the variable title the text The_Case_of_the_Gold_Ring.
I can capture this part with the regex /\d_(?!.*\d_)(.*).mp3$/i. But writing the Ruby "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"[/\d_(?!.*\d_)(.*).mp3$/i] returns 0_The_Case_of_the_Gold_Ring.mp3 which isn't what I want.
I can get what I want by writing
"1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" =~ /\d_(?!.*\d_)(.*).mp3$/i
title = $~.captures[0]
But this seems sloppy. Surely there's a proper way to do this?
(I'm aware that someone can probably write a simpler regex to target the text I want that lets the "string"[/regex/] method work, but this is just an example to illustrate the problem, the specific regex isn't the issue.)
You can pass number of part to [/regexp/, index] method:
=> string = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
=> string[/\d_(?!.*\d_)(.*).mp3$/i, 1]
=> "The_Case_of_the_Gold_Ring"
=> string[/\d_(?!.*\d_)(.*).mp3$/i, 0]
=> "0_The_Case_of_the_Gold_Ring.mp3"
Have a look at the match method:
string = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
regexp = /\d_(?!.*\d_)(.*).mp3$/i
matches = regexp.match(string)
matches[1]
#=> "The_Case_of_the_Gold_Ring"
Where matches[0] would return the whole match and matches[1] (and following) returns all subcaptures:
matches.to_a
#=> ["0_The_Case_of_the_Gold_Ring.mp3", "The_Case_of_the_Gold_Ring"]
Read more examples: http://ruby-doc.org/core-2.1.4/MatchData.html#method-i-5B-5D
You can use named captures
"1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" =~ /\d_(?!.*\d_)(?<title>.*).mp3$/i
and $~[:title] will give you want you want
Meditate on this:
Here's the source string to be parsed:
str = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
Patterns can be defined as strings:
DATE_REGEX = '\d{4}-[A-Z]{3}-\d{2}'
SERIAL_REGEX = '\d{2}'
TITLE_REGEX = '.+'
Then interpolated into a regexp:
regex = /^(#{ DATE_REGEX })_(#{ SERIAL_REGEX })_(#{ TITLE_REGEX })/
# => /^(\d{4}-[A-Z]{3}-\d{2})_(\d{2})_(.+)/
The advantage to that is it's easier to maintain because the pattern is really several smaller ones.
str.match(regex) # => #<MatchData "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" 1:"1952-FEB-21" 2:"70" 3:"The_Case_of_the_Gold_Ring.mp3">
regex.match(str) # => #<MatchData "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" 1:"1952-FEB-21" 2:"70" 3:"The_Case_of_the_Gold_Ring.mp3">
are equivalent because both Regexp and String implement match.
We can retrieve what was captured as an array:
regex.match(str).captures # => ["1952-FEB-21", "70", "The_Case_of_the_Gold_Ring.mp3"]
regex.match(str).captures.last # => "The_Case_of_the_Gold_Ring.mp3"
We can also name the captures and access them like we would a hash:
regex = /^(?<date>#{ DATE_REGEX })_(?<serial>#{ SERIAL_REGEX })_(?<title>#{ TITLE_REGEX })/
matches = regex.match(str)
matches[:date] # => "1952-FEB-21"
matches[:serial] # => "70"
matches[:title] # => "The_Case_of_the_Gold_Ring.mp3"
Of course, it's not necessary to mess with that rigamarole at all. We can split the string on underscores ('_'):
str = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
str.split('_') # => ["1952-FEB-21", "70", "The", "Case", "of", "the", "Gold", "Ring.mp3"]
split can take a limit parameter saying how many times it should split the string. Passing in 3 gives us:
str.split('_', 3) # => ["1952-FEB-21", "70", "The_Case_of_the_Gold_Ring.mp3"]
Grabbing the last element returns:
str.split('_', 3).last # => "The_Case_of_the_Gold_Ring.mp3"
I believe it would be easiest to use a capture group here, but I'd like to present some possibilities that do not, for illustrative purposes. All employ the same positive lookahead ((?=\.mp3$)). all but one use a positive lookbehind and one uses \K to "forget" the match up to the last character before beginning of the desired match. Some permit the matched string to contain digits (.+); others do not ([^\d]).
str = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
1 # match follows last digit followed by underscore, cannot contain digits
str[/(?<=\d_)[^\d]+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"
2 # same as 1, as `\K` disregards match to that point
str[/\d_\K[^\d]+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"
3 # match follows underscore, two digits, underscore, may contain digits
str[/(?<=_\d\d_).+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"
4 # match follows string having specfic pattern, may contain digits
str[/(?<=\d{4}-[A-Z]{3}-\d{2}_\d{2}_).+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"
5 # match follows digit, any 12 characters, another digit and underscore,
# may contain digits
str[/(?<=\d.{12}\d_).+(?=\.mp3$)/]
#=> "The_Case_of_the_Gold_Ring"

Get id from string with Ruby

I have strings like this:
"/detail/205193-foo-var-bar-foo.html"
"/detail/183863-parse-foo.html"
"/detail/1003-bar-foo-bar.html"
How to get ids (205193, 183863, 1003) from it with Ruby?
Just say s[/\d+/]
[
"/detail/205193-foo-var-bar-foo.html",
"/detail/183863-parse-foo.html",
"/detail/1003-bar-foo-bar.html"
].each { |s| puts s[/\d+/] }
could also do something like this
"/detail/205193-foo-var-bar-foo.html".gsub(/\/detail\//,'').to_i
=> 205193
regex = /\/detail\/(\d+)-/
s = "/detail/205193-foo-var-bar-foo.html"
id = regex.match s # => <MatchData "/detail/205193-" 1:"205193">
id[1] # => "205193"
$1 # => "205193"
The MatchData object will store the entire matched portion of the string in the first element, and any matched subgroups starting from the second element (depending on how many matched subgroups there are)
Also, Ruby provides a shortcut to the most recent matched subgroup with $1 .
One easy way to do it would be to strip out the /detail/ part of your string, and then just call to_i on what's left over:
"/detail/1003-bar-foo-bar.html".gsub('/detail/','').to_i # => 1003
s = "/detail/205193-foo-var-bar-foo.html"
num = (s =~ /detail\/(\d+)-/) ? Integer($1) : nil

Resources