I have:
event = {"first_type_a" => 100, "first_type_b" => false, "second_type_a" => "abc", "second_type_b" => false}
I am trying to change the keys of event based on the original. If the name contains a substring, the name will be changed, and the new key, pair value should be added, and the old pair should be removed. I expect to get:
event = {"important.a_1" => 100, "important.b_1" => false, "second_type_a" => "abc", "second_type_b" => false}
What would be the most efficient way to update event?
I expected this to work:
event.each_pair { |k, v|
if !!(/first_type_*/ =~ k) do
key = "#{["important", k.split("_", 3)[2]].join(".")}";
event.merge!(key: v);
event.delete(k)
end
}
but it raises an error:
simpleLoop.rb:5: syntax error, unexpected keyword_do, expecting keyword_then or ';' or '\n'
... if !!(/first_type_*/ =~ k) do
... ^~
simpleLoop.rb:9: syntax error, unexpected keyword_end, expecting '}'
end
^~~
simpleLoop.rb:21: embedded document meets end of file
I thought to approach it differently:
if result = event.keys.find { |k| k.include? "first_type_" } [
key = "#{["important", k.split("_", 3)[2]].join(".")}"
event.merge!(key: v)
event.delete(k)
]
but still no luck. I am counting all brackets, and as the error indicates, it is something there, but I can't find it. Does the order matter?
I will just show how this can be done quite economically in Ruby. Until recently, when wanting to modify keys of a hash one usually would do one of two things:
create a new empty hash and then add key/values pairs to the hash; or
convert the hash to an array a of two-element arrays (key-value pairs), modify the first element of each element of a (the key) and then convert a to the desired hash.
Recently (with MRI v2.4) the Ruby monks bestowed on us the handy methods Hash#transform_keys and Hash#transform_keys!. We can use the first of these profitably here. First we need a regular expression to match keys.
r = /
\A # match beginning of string
first_type_ # match string
(\p{Lower}+) # match 1+ lowercase letters in capture group 1
\z # match the end of the string
/x # free-spacing regex definition mode
Conventionally, this is written
r = /\Afirst_type_(\p{Lower}+)\z/
The use of free-spacing mode makes the regex self-documenting. We now apply the transform_keys method, together with the method String#sub and the regex just defined.
event = {"first_type_a"=>100, "first_type_b"=>false,
"second_type_a"=>"abc", "second_type_b"=>false}
event.transform_keys { |k| k.sub(r, "important.#{'\1'}_1") }
#=> {"important.a_1"=>100, "important.b_1"=>false,
# "second_type_a"=>"abc", "second_type_b"=>false}
In the regex the p{} construct expression \p{Lower} could be replaced with \p{L}, the POSIX bracket expression [[:lower:]] (both match Unicode letters) or [a-z], but the last has the disadvantage that it will not match letters with diacritical marks. That includes letters of words borrowed from other languages that are used in English text (such as rosé, the wine). Search Regexp for documentation of POSIX and \p{} expressions.
If "first_type_" could be followed by lowercase or uppercase letters use \p{Alpha}; if it could be followed by alphanumeric characters, use \p{Alnum}, and so on.
The if starts a block, such as other structures if ... else ... end do ... end begin ... rescue ... end
Therefore your first example remove the do after the if, the block is already open. I also made it clearer by changing the block after each_pair to use do ... end rather than braces to help avoid confusing a hash with a block.
event = { 'first_type_a' => 100, 'first_type_b' => false, 'second_type_a' => 'abc', 'second_type_b' => false }
new_event = {}
event.each_pair do |k, v|
if !!(/first_type_*/ =~ k)
important_key = ['important', k.split('_', 3)[2]].join('.')
new_event[important_key] = v
else
new_event[k] = v
end
end
You could define a method to be used inside the transform key call:
def transform(str)
return ['important.', str.split('_').last, '_1'].join() if str[0..4] == 'first' # I checked just for "first"
str
end
event.transform_keys! { |k| transform(k) } # Ruby >= 2.5
event.map { |k, v| [transform(k), v] }.to_h # Ruby < 2.5
Using Hash#each_pair but with Enumerable#each_with_object:
event.each_pair.with_object({}) { |(k, v), h| h[transform(k)] = v }
Or use as one liner:
event.transform_keys { |str| str[0..4] == 'first' ? ['important.', str.split('_').last, '_1'].join() : str }
Related
Is there is a better solution for such a trivial task?
Given an array of strings as follows:
roles = [
"id=Accountant,id=TOTO,id=client",
"id=Admin,id=YOYO,id=custom",
"id=CDI,id=SC"
]
To extract a role value based on its id value I'm using the following regex expression to match it:
r =~ /id=Admin/
The dumb simple solution would be just to iterate on the roles array, assign the matched value and return it as follows:
role = nil
roles.each do |r|
role = 'admin' if r =~ /id=Admin/
role = 'national' if r =~ /id=National/
role = 'local' if r =~ /id=Local/
end
role
Is there a better solution?
You could define a regular expression to match several roles at once. Here's a simple one:
/id=(Admin|National|Local)/
The parentheses act as a capturing group for the role name. You might want to add anchors, e.g. to only match the first id=value pair in each line. Or to ensure that you match the whole value instead of just the beginning if these can be ambiguous.
The pattern can then be passed to grep which returns the matching lines:
roles.grep(/id=(Admin|National|Local)/)
#=> ["id=Admin,id=YOYO,id=custom"]
Passing a block to grep allows us to transform the match: ($1 refers to the first capture group)
roles.grep(/id=(Admin|National|Local)/) { $1.downcase }
#=> ["admin"]
To get the first role:
roles.grep(/id=(Admin|National|Local)/) { $1.downcase }.first
#=> "admin"
If your array is large you can use a lazy enumerator which will stop traversing after the first match:
roles.lazy.grep(/id=(Admin|National|Local)/) { $1.downcase }.first
#=> "admin"
Well the obvious way I think would be to simply parse the whole roles array:
roles = [
"id=Accountant,id=TOTO,id=client",
"id=Admin,id=YOYO,id=custom",
"id=CDI,id=SC"
]
user_roles = roles.join(',').split(',').map { |r| r.split('=', 2).last.downcase }
Where user_roles becomes:
["accountant", "toto", "client", "admin", "yoyo", "custom", "cdi", "sc"]
Now you can simply do something like:
user_roles.include?('admin')
Or to find any of the "admin", "national", "local" occurences:
# array1 AND array2, finds the elements that occur in both:
> %w(admin national local) & user_roles
=> ["admin"]
Or perhaps to just find out if the user has any of those roles:
# When there are no matching elements, it will return an empty array
> (%w(admin national local) & user_roles).empty?
=> false
> (["megaboss", "superadmin"] & user_roles).empty?
=> true
And here it is in a more complete example with constants and methods and all!
SUPERVISOR_ROLES = %w(admin national local)
def is_supervisor?(roles)
!(SUPERVISOR_ROLES & roles).empty?
end
def parse_roles(raw_array)
raw_array.flat_map { |r| r.split(',').map { |r| r.split('=', 2).last.downcase } }
end
roles = [
"id=Accountant,id=TOTO,id=client",
"id=Admin,id=YOYO,id=custom",
"id=CDI,id=SC"
]
raise "you no boss :(" unless is_supervisor?(parse_roles(roles))
This of course may be inefficient if the data set is large, but a bit cleaner and maybe even safer than performing such a regex, for example someone could create a role called AdminHack which would still be matched by the /id=Admin/ regex and by writing such a general role parser may become useful along the way if you want to check for other roles for other purposes.
(And yes, obviously this solution creates a hefty amount of intermediary arrays and other insta-discarded objects and has plenty of room for optimization)
If we wish to find a particular spy from a group of cells we could merely round up all the spies from all the cells and examine them sequentially until the culpit is found.
Here the equivalent is to join the strings from the given array to form a single string and then search that string for the given substring:
str = roles.join(' ').downcase
#=> "id=accountant,id=toto,id=client id=admin,id=yoyo,id=custom id=cdi,id=sc"
join's argument could be a space, newline, comma or any of several other strings (I've used a space).
We then simply look for a match, using the method String#[] and the regular expression:
r = /
(?<=id=) # match 'id=' in a positive lookbehind
(?:admin|national|local) # match 'admin', 'national' or 'local'
(?!\w) # do not match a word character (negative lookahead)
/x # free-spacing regex definition mode
In normal (not free-spacing mode) this is:
/(?<=id=)(?:admin|national|local)(?!\w)/
'id=', being in a positive lookbehind, is not included in the match. The negative lookahead, (?!\w), ensures that the match is not immediately followed by a word character. That prevents a match, for example, on the word 'administration'.
We now simply extract the match, if there is one:
str[r] #=> "admin"
Had there not been a match, nil would have been returned.
We could have instead downcased at the end:
str = roles.join(' ')
str[/(?<=id=)([aA]dmin|[nN]ational|[lL]ocal)(?!\w)/i].downcase
I like Stefen's answer, but didn't like that it could run a long time grabbing id values before exiting the grep if the list of roles was really big. I also didn't like the pattern because it wasn't anchored to the beginning of the search string, forcing the engine to do more work.
I'd rather see the code stop at the first hit so this was a first attempt:
roles = [
"id=Accountant,id=TOTO,id=client",
"id=Admin,id=YOYO,id=custom",
"id=CDI,id=SC"
]
found_role = nil
roles.each do |i|
r = i[/^id=(Admin|National|Local)/]
if r
found_role = r.downcase
break
end
end
found_role # => "id=admin"
Thinking about that kept nagging at me as being too verbose, so this popped out:
roles = [
"id=Accountant,id=TOTO,id=client",
"id=Admin,id=YOYO,id=custom",
"id=CDI,id=SC"
]
roles.find { |i| i[/^id=(Admin|National|Local)/] }.downcase[/^(id=\w+),/, 1]
# => "id=admin"
Breaking it down, here are the high spots:
i[/^id=(Admin|National|Local)/] returns the matching string "id=Admin..." and exits the loop.
downcase[/^(id=\w+),/, 1] grabs the first pair and returns it.
Then, being as anal as I am, I figured downcase would be doing too much work too so this happened:
roles.find { |i| i[/^id=(Admin|National|Local)/] }[/^(id=\w+),/, 1].downcase
It's pretty cryptic Ruby, and we're not really supposed to write code this way, but I used to write C and Perl so it seems reasonable to me.
And the interesting part:
require 'fruity'
roles = [
"id=Accountant,id=TOTO,id=client",
"id=Admin,id=YOYO,id=custom",
"id=CDI,id=SC"
]
compare do
numero_uno {
found_role = nil
roles.each do |i|
r = i[/^id=(Admin|National|Local)/]
if r
found_role = r.downcase
break
end
end
found_role
}
numero_dos { roles.find { |i| i[/^id=(Admin|National|Local)/] }.downcase[/^(id=\w+),/, 1] }
numero_tres { roles.find { |i| i[/^id=(Admin|National|Local)/] }[/^(id=\w+),/, 1].downcase }
end
# >> Running each test 2048 times. Test will take about 1 second.
# >> numero_uno is similar to numero_tres
# >> numero_tres is faster than numero_dos by 10.000000000000009% ± 10.0%
I want to delete dot only in between numbers
I have text "dolar. 2.000.000"
I have tried using \\.\d*?, but . in dollar also deleted
I want it displayed as "dolar. 2000000".
Use positive lookarounds.
"dolar. 2.000.000".gsub(/(?<=\d)\.(?=\d)/, '')
#⇒ "dolar. 2000000"
(?<=\d) here means “preceded by not included in the match digit” and (?=\d) means “followed by not included in the match digit.”
Here are some more ways to do that.
str = "dolar. 2.000.000"
All of the following return "dolar. 2000000".
str.gsub(/\d\.\d/) { |s| s.delete('.') }
str.gsub(/(\d)\.(\d)/, '\1\2')
h = Hash.new { |_,k| k.delete('.') } #=> {}
str.gsub(/\d\.\d/, h)
In #3 h[k] returns k.delete('.') if the hash has no key k. Since the hash has no keys that will be returned for all matches.
I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.
The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end
s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }
Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).
You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}
I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c
As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"
I was wondering how you construct the regular expression to check if the string has a variation of a pattern with the same length. Say the string is "door boor robo omanyte" how do I return the words that have the variation of [door]?
You can easily get all the possible words using Array#permutation. Then you can scan for them in provided string. Here:
possible_words = %w[d o o r].permutation.map &:join
# => ["door", "doro", "door", "doro", "droo", "droo", "odor", "odro", "oodr", "oord", "ordo", "orod", "odor", "odro", "oodr", "oord", "ordo", "orod", "rdoo", "rdoo", "rodo", "rood", "rodo", "rood"]
string = "door boor robo omanyte"
string.scan(possible_words.join("|"))
# => ["door"]
string = "door close rood example ordo"
string.scan(possible_words.join("|"))
# => ["door", "rood", "ordo"]
UPDATE
You can improve scan further by looking for word boundary. Here:
string = "doorrood example ordo"
string.scan(/"\b#{possible_words.join('\b|\b')}\b"/)
# => ["ordo"]
NOTE
As Cary correctly pointed out in comments below, this process is quite inefficient if you intend to find permutation for a fairly large string. However it should work fine for OP's example.
If the comment I left on your question correctly interprets the question, you could do this:
str = "door sit its odor to"
str.split
.group_by { |w| w.chars.sort.join }
.values
.select { |a| a.size > 1 }
#=> [["door", "odor"], ["sit", "its"]]
This assumes all the letters are the same case.
If case is not important, just make a small change:
str = "dooR sIt itS Odor to"
str.split
.group_by { |w| w.downcase.chars.sort.join }
.values
.select { |a| a.size > 1 }
#=> [["dooR", "Odor"], ["sIt", "itS"]]
In my opinion the fastest way to find this will be
word_a.chars.sort == word_b.chars.sort
since we are using the same characters inside the words
IMO, some kind of iteration is definitely necessary to build a regular expression to match this one. Not using a regular expression is better too.
def variations_of_substr(str, sub)
# Creates regexp to match words with same length, and
# with same characters of str.
patt = "\\b" + ( [ "[#{sub}]{1}" ] * sub.size ).join + "\\b"
# Above alone won't be enough, characters in both words should
# match exactly.
str.scan( Regexp.new(patt) ).select do |m|
m.chars.sort == sub.chars.sort
end
end
variations_of_substr("door boor robo omanyte", "door")
# => ["door"]
I have a string:
s="123--abc,123--abc,123--abc"
I tried using Ruby 1.9's new feature "named groups" to fetch all named group info:
/(?<number>\d*)--(?<chars>\s*)/
Is there an API like Python's findall which returns a matchdata collection? In this case I need to return two matches, because 123 and abc repeat twice. Each match data contains of detail of each named capture info so I can use m['number'] to get the match value.
Named captures are suitable only for one matching result.
Ruby's analogue of findall is String#scan. You can either use scan result as an array, or pass a block to it:
irb> s = "123--abc,123--abc,123--abc"
=> "123--abc,123--abc,123--abc"
irb> s.scan(/(\d*)--([a-z]*)/)
=> [["123", "abc"], ["123", "abc"], ["123", "abc"]]
irb> s.scan(/(\d*)--([a-z]*)/) do |number, chars|
irb* p [number,chars]
irb> end
["123", "abc"]
["123", "abc"]
["123", "abc"]
=> "123--abc,123--abc,123--abc"
Chiming in super-late, but here's a simple way of replicating String#scan but getting the matchdata instead:
matches = []
foo.scan(regex){ matches << $~ }
matches now contains the MatchData objects that correspond to scanning the string.
You can extract the used variables from the regexp using names method. So what I did is, I used regular scan method to get the matches, then zipped names and every match to create a Hash.
class String
def scan2(regexp)
names = regexp.names
scan(regexp).collect do |match|
Hash[names.zip(match)]
end
end
end
Usage:
>> "aaa http://www.google.com.tr aaa https://www.yahoo.com.tr ddd".scan2 /(?<url>(?<protocol>https?):\/\/[\S]+)/
=> [{"url"=>"http://www.google.com.tr", "protocol"=>"http"}, {"url"=>"https://www.yahoo.com.tr", "protocol"=>"https"}]
#Nakilon is correct showing scan with a regex, however you don't even need to venture into regex land if you don't want to:
s = "123--abc,123--abc,123--abc"
s.split(',')
#=> ["123--abc", "123--abc", "123--abc"]
s.split(',').inject([]) { |a,s| a << s.split('--'); a }
#=> [["123", "abc"], ["123", "abc"], ["123", "abc"]]
This returns an array of arrays, which is convenient if you have multiple occurrences and need to see/process them all.
s.split(',').inject({}) { |h,s| n,v = s.split('--'); h[n] = v; h }
#=> {"123"=>"abc"}
This returns a hash, which, because the elements have the same key, has only the unique key value. This is good when you have a bunch of duplicate keys but want the unique ones. Its downside occurs if you need the unique values associated with the keys, but that appears to be a different question.
If using ruby >=1.9 and the named captures, you could:
class String
def scan2(regexp2_str, placeholders = {})
return regexp2_str.to_re(placeholders).match(self)
end
def to_re(placeholders = {})
re2 = self.dup
separator = placeholders.delete(:SEPARATOR) || '' #Returns and removes separator if :SEPARATOR is set.
#Search for the pattern placeholders and replace them with the regex
placeholders.each do |placeholder, regex|
re2.sub!(separator + placeholder.to_s + separator, "(?<#{placeholder}>#{regex})")
end
return Regexp.new(re2, Regexp::MULTILINE) #Returns regex using named captures.
end
end
Usage (ruby >=1.9):
> "1234:Kalle".scan2("num4:name", num4:'\d{4}', name:'\w+')
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
or
> re="num4:name".to_re(num4:'\d{4}', name:'\w+')
=> /(?<num4>\d{4}):(?<name>\w+)/m
> m=re.match("1234:Kalle")
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
> m[:num4]
=> "1234"
> m[:name]
=> "Kalle"
Using the separator option:
> "1234:Kalle".scan2("#num4#:#name#", SEPARATOR:'#', num4:'\d{4}', name:'\w+')
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
I needed something similar recently. This should work like String#scan, but return an array of MatchData objects instead.
class String
# This method will return an array of MatchData's rather than the
# array of strings returned by the vanilla `scan`.
def match_all(regex)
match_str = self
match_datas = []
while match_str.length > 0 do
md = match_str.match(regex)
break unless md
match_datas << md
match_str = md.post_match
end
return match_datas
end
end
Running your sample data in the REPL results in the following:
> "123--abc,123--abc,123--abc".match_all(/(?<number>\d*)--(?<chars>[a-z]*)/)
=> [#<MatchData "123--abc" number:"123" chars:"abc">,
#<MatchData "123--abc" number:"123" chars:"abc">,
#<MatchData "123--abc" number:"123" chars:"abc">]
You may also find my test code useful:
describe String do
describe :match_all do
it "it works like scan, but uses MatchData objects instead of arrays and strings" do
mds = "ABC-123, DEF-456, GHI-098".match_all(/(?<word>[A-Z]+)-(?<number>[0-9]+)/)
mds[0][:word].should == "ABC"
mds[0][:number].should == "123"
mds[1][:word].should == "DEF"
mds[1][:number].should == "456"
mds[2][:word].should == "GHI"
mds[2][:number].should == "098"
end
end
end
I really liked #Umut-Utkan's solution, but it didn't quite do what I wanted so I rewrote it a bit (note, the below might not be beautiful code, but it seems to work)
class String
def scan2(regexp)
names = regexp.names
captures = Hash.new
scan(regexp).collect do |match|
nzip = names.zip(match)
nzip.each do |m|
captgrp = m[0].to_sym
captures.add(captgrp, m[1])
end
end
return captures
end
end
Now, if you do
p '12f3g4g5h5h6j7j7j'.scan2(/(?<alpha>[a-zA-Z])(?<digit>[0-9])/)
You get
{:alpha=>["f", "g", "g", "h", "h", "j", "j"], :digit=>["3", "4", "5", "5", "6", "7", "7"]}
(ie. all the alpha characters found in one array, and all the digits found in another array). Depending on your purpose for scanning, this might be useful. Anyway, I love seeing examples of how easy it is to rewrite or extend core Ruby functionality with just a few lines!
A year ago I wanted regular expressions that were more easy to read and named the captures, so I made the following addition to String (should maybe not be there, but it was convenient at the time):
scan2.rb:
class String
#Works as scan but stores the result in a hash indexed by variable/constant names (regexp PLACEHOLDERS) within parantheses.
#Example: Given the (constant) strings BTF, RCVR and SNDR and the regexp /#BTF# (#RCVR#) (#SNDR#)/
#the matches will be returned in a hash like: match[:RCVR] = <the match> and match[:SNDR] = <the match>
#Note: The #STRING_VARIABLE_OR_CONST# syntax has to be used. All occurences of #STRING# will work as #{STRING}
#but is needed for the method to see the names to be used as indices.
def scan2(regexp2_str, mark='#')
regexp = regexp2_str.to_re(mark) #Evaluates the strings. Note: Must be reachable from here!
hash_indices_array = regexp2_str.scan(/\(#{mark}(.*?)#{mark}\)/).flatten #Look for string variable names within (#VAR#) or # replaced by <mark>
match_array = self.scan(regexp)
#Save matches in hash indexed by string variable names:
match_hash = Hash.new
match_array.flatten.each_with_index do |m, i|
match_hash[hash_indices_array[i].to_sym] = m
end
return match_hash
end
def to_re(mark='#')
re = /#{mark}(.*?)#{mark}/
return Regexp.new(self.gsub(re){eval $1}, Regexp::MULTILINE) #Evaluates the strings, creates RE. Note: Variables must be reachable from here!
end
end
Example usage (irb1.9):
> load 'scan2.rb'
> AREA = '\d+'
> PHONE = '\d+'
> NAME = '\w+'
> "1234-567890 Glenn".scan2('(#AREA#)-(#PHONE#) (#NAME#)')
=> {:AREA=>"1234", :PHONE=>"567890", :NAME=>"Glenn"}
Notes:
Of course it would have been more elegant to put the patterns (e.g. AREA, PHONE...) in a hash and add this hash with patterns to the arguments of scan2.
Piggybacking off of Mark Hubbart's answer, I added the following monkey-patch:
class ::Regexp
def match_all(str)
matches = []
str.scan(self) { matches << $~ }
matches
end
end
which can be used as /(?<letter>\w)/.match_all('word'), and returns:
[#<MatchData "w" letter:"w">, #<MatchData "o" letter:"o">, #<MatchData "r" letter:"r">, #<MatchData "d" letter:"d">]
This relies on, as others have said, the use of $~ in the scan block for the match data.
I like the match_all given by John, but I think it has an error.
The line:
match_datas << md
works if there are no captures () in the regex.
This code gives the whole line up to and including the pattern matched/captured by the regex. (The [0] part of MatchData) If the regex has capture (), then this result is probably not what the user (me) wants in the eventual output.
I think in the case where there are captures () in regex, the correct code should be:
match_datas << md[1]
The eventual output of match_datas will be an array of pattern capture matches starting from match_datas[0]. This is not quite what may be expected if a normal MatchData is wanted which includes a match_datas[0] value which is the whole matched substring followed by match_datas[1], match_datas[[2],.. which are the captures (if any) in the regex pattern.
Things are complex - which may be why match_all was not included in native MatchData.