Ruby extract string via regular expression - ruby

I have these strings:
'da_report/GY4LFDN6/2017_11/view_mission_join_player_count2017_11/index.html'
'da_report/GY4LFDN6/2017_11/activily_time2017_11/index.html'
From these two strings, I want to extract these two file names:
'2017_11/view_mission_join_player_count2017_11'
'2017_11/activily_time2017_11'
I wrote some regular expressions, but they seem wrong.
str = 'da_report/GY4LFDN6/2017_11/view_mission_join_player_count2017_11/index.html'
str[/([^\/index.html]+)/, 1] # => "a_r"

Regular expression is an overkill here, and i prone to errors.
input = [
"da_report/GY4LFDN6/" \
"2017_11/view_mission_join_player_count2017_11" \
"/index.html",
"da_report/GY4LFDN6/" \
"2017_11/activily_time2017_11" \
"/index.html"
]
input.map { |str| str.split('/')[2..3].join('/') }
#⇒ [
# [0] "2017_11/view_mission_join_player_count2017_11",
# [1] "2017_11/activily_time2017_11"
# ]
or, more elegant:
input.map { |str| str.split('/').grep(/2017_/).join('/') }

Use /(?<=GY4LFDN6\/)(.*)(?=\/index.html)/
str = 'da_report/GY4LFDN6/2017_11/view_mission_join_player_count2017_11/index.html'
str[/(?<=GY4LFDN6\/)(.*)(?=\/index.html)/]
=> "2017_11/view_mission_join_player_count2017_11"
live demo: http://rubular.com/r/Ued6UOXWDf

This answer assumes that you want to capture beginning with the third component of the path, up to and including the last component of the path before the filename. If so, then we can use the following regex pattern:
(?:[^/]*/){2}(.*)/.*
The quantity in parentheses is the capture group, i.e. what you want to extract from the entire path.
str = 'da_report/GY4LFDN6/2017_11/view_mission_join_player_count2017_11/index.html'
puts str[/(?:[^\/]*\/){2}(.*)\/.*/, 1]
Demo

If you are looking for the values at the end of the string like in the format string/string followed by /filename.extension, you could use a positive lookahead for a file name.
\w+\/\w+(?=\/\w+\.\w+$)
Demo

Based on your examples, you may be able to use a very simple regex.
def extract(str)
str[/\d{4}_\d{2}.+\d{4}_\d{2}/]
end
extract 'da_report/GY4LFDN6/2017_11/view_mission_join_player_count2017_11/index.html'
#=> "2017_11/view_mission_join_player_count2017_11"
extract 'da_report/GY4LFDN6/2017_11/activily_time2017_11/index.html'
#=> "2017_11/activily_time2017_11"

Related

Ruby - Given string with ENV variables how do I get their values

Say I have
str = "DISABLE_THINGY=true -p my_profile"
To get the value of DISABLE_THINGY (true) I can use
str.partition("DISABLE_THINGY=").last.split(' ').first
I do not want to do that.
There must be a library that parses all this for me.
Anybody know some better ways?
The selected answer is way too convoluted to solve such a simple problem. Regular expressions are great, but the more complex they are, the more likely they'll be wrong:
str = "DISABLE_THINGY=true -p my_profile"
str[/\w+=(\w+)/, 1] # => "true"
/\w+=(\w+)/ simply looks for "words" joined by =.
See String's [] method for more information.
If you had a number of assignments and wanted to capture them all, or, wanted to capture the name and value of this one:
str = "DISABLE_THINGY=true -p my_profile"
str.scan(/\w+=\w+/).map { |s| s.split('=') } # => [["DISABLE_THINGY", "true"]]
That returns an array-of-arrays, which can be useful, or, you could convert that to a Hash:
str.scan(/\w+=\w+/).map { |s| s.split('=') }.to_h # => {"DISABLE_THINGY"=>"true"}
and similarly:
str = "DISABLE_THINGY=true FOO=bar -p my_profile"
str.scan(/\w+=\w+/).map { |s| s.split('=') } # => [["DISABLE_THINGY", "true"], ["FOO", "bar"]]
str.scan(/\w+=\w+/).map { |s| s.split('=') }.to_h # => {"DISABLE_THINGY"=>"true", "FOO"=>"bar"}
Take a look at "Parse command line arguments in a Ruby script".
If all of your arguments don't use a hyphen, you might have to make slight tweaks to the regex used, but this should get you where you need to go. Just replace ARGV.join(' ') in the accepted answer with your str var.
Adjusted the regex in the link provided above to make your use-case work where you combine ENV variables with command line parameters:
args = Hash[ str.scan(/-{0,2}([^=\s]+)(?:[=\s](\S+))?/) ] => {"DISABLE_THINGY"=>"true", "p"=>"my_profile"}

Ruby regex method

I need to get the expected output in ruby by using any method like scan or match.
Input string:
"http://test.com?t&r12=1&r122=1&r1=1&r124=1"
"http://test.com?t&r12=1&r124=1"
Expected:
r12=1,r122=1, r1=1, r124=1
r12=1,r124=1
How can I get the expected output using regex?
Use regex /r\d+=\d+/:
"http://test.com?t&r12=1&r122=1&r1=1&r124=1".scan(/r\d+=\d+/)
# => ["r12=1", "r122=1", "r1=1", "r124=1"]
"http://test.com?t&r12=1&r124=1".scan(/r\d+=\d+/)
# => ["r12=1", "r124=1"]
You can use join to get a string output. Here:
"http://test.com?t&r12=1&r122=1&r1=1&r124=1".scan(/r\d+=\d+/).join(',')
# => "r12=1,r122=1,r1=1,r124=1"
Update
If the URL contains other parameters that may include r in end, the regex can be made stricter:
a = []
"http://test.com?r1=2&r12=1&r122=1&r1=1&r124=1&ar1=2&tr2=3&xy4=5".scan(/(&|\?)(r+\d+=\d+)/) {|x,y| a << y}
a.join(',')
# => "r12=1,r122=1,r1=1,r124=1"
While input strings are urls with queries, I would safeguard myself from the false positives:
input = "http://test.com?t&r12=1&r122=1&r1=1&r124=1"
query_params = input.split('?').last.split('&')
#⇒ ["t", "r12=1", "r122=1", "r1=1", "r124=1"]
r_params = query_params.select { |e| e =~ /\Ar\d+=\d+/ }
#⇒ ["r12=1", "r122=1", "r1=1", "r124=1"]
r_params.join(',')
#⇒ "r12=1,r122=1,r1=1,r124=1"
It’s safer than just scan the original input for any regexp.
If you really need to do it with regex correctly, you'll need to use a regex like this:
puts "http://test.com?t&r12=1&r122=1&r1=1&r124=1".scan(/(?:http.*?\?t|(?<!^)\G)\&*(\br\d*=\d*)(?=.*$)/i).join(',')
puts "http://test.com?t&r12=1&r124=1".scan(/(?:http.*?\?t|(?<!^)\G)\&*(\br\d*=\d*)(?=.*$)/i).join(',')
Sample program output:
r12=1,r122=1,r1=1,r124=1
r12=1,r124=1

Splitting string based on word

I have a string composed by words divided by'#'. For instance 'this#is#an#example' and I need to extract the last word or the last two words according to the second to last word.
If the second to last is 'myword' I need the last two words otherwise just the last one.
'this#is#an#example' => 'example'
'this#is#an#example#using#myword#also' => 'myword#also'
Is there a better way than splitting and checking the second to last? perhaps using regular expression?
Thanks.
You can use the end-of-line anchor $ and make the myword# prefix optional:
str = 'this#is#an#example'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "example"
str = 'this#is#an#example#using#myword#also'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "myword#also"
However, I don't think using a regular expression is "better" in this case. I would use something like Santosh's (deleted) answer: split the line by # and use an if clause.
def foo(str)
*, a, b = str.split('#')
if a == 'myword'
"#{a}##{b}"
else
b
end
end
str = 'this#is#an#example#using#myword#also'
array = str.split('#')
array[-2] == 'myword' ? array[-2..-1].join('#') : array[-1]
With regex:
'this#is#an#example'[/(myword\#)*\w+$/]
# => "example"
'this#is#an#example#using#myword#also'[/(myword\#)*\w+$/]
# => "myword#also"

How do I scan text for multiple strings?

I'm scanning through a product name to check if a specific string exists in it. Right now it works for a single string, but how can I can scan for multiple strings? e.g. i'd like to scan for both apple and microsoft
product.name.downcase.scan(/apple/)
If the string is detected i get ["apple"]
if not then it returns nil [ ]
You can use regex alternation:
product.name.downcase.scan(/apple|microsoft/)
If all you need to know is whether the string contains any of the specified strings, you should better use single match =~ instead of scan.
str = 'microsoft, apple and microsoft once again'
res = str.scan /apple|microsoft/ # => res = ["microsoft", "apple", "microsoft"]
# do smth with res
# or
if str =~ /apple|microsoft/
# do smth
end
You could also skip regular expressions altogether:
['apple', 'pear', 'orange'].any?{|s| product.name.downcase.match(s)}
or
['apple', 'pear', 'orange'].any?{|s| product.name.downcase[s]}

Return prefix of string using regular expression where stripped string sometimes contains '/'

I'm trying to return a prefix of a string, my related question is here,but I've run into a new problem.
How to return the string prefix from regexp
Basically I have a strings like
23430-BL
23430BZ
23430BK/BL
The Extensions I'm trying to remove are
strip_ext = BK/BL|BZ|BL
The regular expression I'm using to get the string without the extension is
prefix = sample_data[/(.*[^-])-?(?:#{strip_ext})/,1]
This is returning
23430
23430
23430-BK
In theory, I understand that the regexp finds the BL match, and for some reason selects that as the match over the BK/BL. But is there a way to get the regexp to find BK/BL rather than BL?
Unfortunately, there isn't always a dash before the part that I am looking to strip.
I added the original strip_ext list as an example, and thought it would make it easy to understand. An actual strip_ext list looks like this and changes based on the sample data provided, so unfortunately it isn't as easy as Mu's answer below.
AM/DB|AM/BN|RD/BK|PR/WT|YP/BN|YP/CH|YP/DB|PK/BN|PK/CH|PK/DB|SF/BN|SF/CH|SF/DB|AM/CH|BN/CH|BN/DB|CH/BN|CH/DB|DB/BN|DB/CH|BN/BN|CH/CH|MR/BN|MR/CH|MR/DB|DB/DB|AM/AB|DIC/BN|DIC/CH|DIC/DB|BN|DB|WT|BN/WT|BK|WT/BN|BK/BN|BK/DB|BL/BN|BL/DB|BK/CH|BL/CH|AM|CH|FR|SB|AM/BK|AM/WT|PT/CH|BG/CH|BG/DB|MF/CH|MF/DB|YR/CH|YR/DB|WT/DB|pt/bn
Make the first quantifier ungreedy.
(.*?[^-])-?(?:BK/BL|BZ|BL)
See it here on Regexr
The ? causes the .*? to match as less as possible.
You could mix a negative look-behind into your BL matcher:
/(.*[^-])-?(?:BK\/BL|BZ|(?<!BK\/)BL)/
Adding (?<!BK\/) indicates that you want to match BL except when it is preceded by BK/.
A quick test:
>> %w{23430-BL 23430GR 23430BK/BL}.map { |s| s[/(.*[^-])-?(?:BK\/BL|BZ|(?<!BK\/)BL)/,1] }
=> ["23430", nil, "23430"]
Your sample output doesn't match your input though, is "GR" a typo in your inputs or is "BZ" a typo in your regex?
Given that your patterns are not fixed, you could bypass regular expressions completely and fall back on simple string wrangling. Here's a better example of what I mentioned in my comment:
require 'set'
# The suffix list that you get from somewhere.
suffixes = [ 'BK/BL', 'BZ', 'BL' ]
# We want to do a couple things at once here. For each suffix, we
# want both the suffix and the suffix with a leading '-' attached,
# the `map` and `flatten` stuff does that. Then we group them by
# length to get a hash like:
#
# { 2 => ['BZ','BL'], 3 => ['-BZ', '-BL'], 5 => ['BK/BL'], ... }
#
by_length = suffixes.map { |suffix| [suffix, '-' + suffix ] }.flatten.group_by(&:length)
# Now we reorganize our suffixes into sets with the set of longest
# suffixes first and the set of shortest suffixes last. The result
# will be:
#
# [#<Set: {"-BK/BL"}>, #<Set: {"BK/BL"}>, #<Set: {"-BZ", "-BL"}>, #<Set: {"BZ", "BL"}>]
#
sets = by_length.keys.sort { |a,b| b <=> a }.map { |k| Set.new(by_length[k]) }
# Then we can just spin through sets, pull off the suffix of the
# appropriate length from the string, and see if it is in our set.
# If it is then chop the suffix off the string, do whatever is to be
# done with chopped string, and break out for the next string.
#
%w{ 23430-BL 23430BZ 23430BK/BL }.each do |string|
sets.each do |suffixes|
len = suffixes.first.length
sfx = string[string.length - len, len]
if(suffixes.include?(sfx))
puts string[0 .. -(len + 1)]
break
end
end
end
That's just an "off the top of my head" illustration of the algorithm.

Resources