Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I would like to ask you for help. I have keywords in this form "AB10" and I need to split i to "AB" and "10". What is the best way?
Thank you for your help!
One could use String#scan:
def divide_str(s)
s.scan(/\d+|\D+/)
end
divide_str 'AB10' #=> ["AB", "10"]
divide_str 'AB10CD20' #=> ["AB", "10", "CD", "20"]
divide_str '10AB20CD' #=> ["10", "AB", "20", "CD"]
The regular expression /\d+|\D+/ reads, "match one or more (+) digits (\d) or one or more non-digits (\D).
Here is another way, one that does not employ a regular expression.
def divide_str(s)
digits = '0'..'9'
s.each_char.slice_when do |x,y|
digits.cover?(x) ^ digits.cover?(y)
end.map(&:join)
end
divide_str 'AB10' #=> ["AB", "10"]
divide_str 'AB10CD20' #=> ["AB", "10", "CD", "20"]
divide_str '10AB20CD' #=> ["10", "AB", "20", "CD"]
See Enumerable#slice_when, Range#cover?, TrueClass#^ and FalseClass#^.
Use split like so:
my_str.split(/(\d+)/)
To split any string on the boundary between digits and letters, use either of these 2 methods:
Use split with regex in capturing parentheses to include the delimiter, here a stretch of digits, into the resulting array. Remove empty strings (if any) using a combination of reject and empty?:
strings = ['AB10', 'AB10CD20', '10AB20CD']
strings.each do |str|
arr = str.split(/(\d+)/).reject(&:empty?)
puts "'#{str}' => #{arr}"
end
Output:
'AB10' => ["AB", "10"]
'AB10CD20' => ["AB", "10", "CD", "20"]
'10AB20CD' => ["10", "AB", "20", "CD"]
Use split with non-capturing parentheses: (?:PATTERN), positive lookahead (?=PATTERN) and positive lookbehind (?<=PATTERN) regexes to match the letter-digit and digit-letter boundaries:
strings.each do |str|
arr = str.split(/ (?: (?<=[A-Za-z]) (?=\d) ) | (?: (?<=\d) (?=[A-Za-z]) ) /x)
puts "'#{str}' => #{arr}"
end
The two methods give the same output for the cases shown.
Related
I have a string "wwwggfffw" and want to break it up into an array as follows:
["www", "gg", "fff", "w"]
Is there a way to do this with regex?
"wwwggfffw".scan(/((.)\2*)/).map(&:first)
scan is a little funny, as it will return either the match or the subgroups depending on whether there are subgroups; we need to use subgroups to ensure repetition of the same character ((.)\1), but we'd prefer it if it returned the whole match and not just the repeated letter. So we need to make the whole match into a subgroup so it will be captured, and in the end we need to extract just the match (without the other subgroup), which we do with .map(&:first).
EDIT to explain the regexp ((.)\2*) itself:
( start group #1, consisting of
( start group #2, consisting of
. any one character
) and nothing else
\2 followed by the content of the group #2
* repeated any number of times (including zero)
) and nothing else.
So in wwwggfffw, (.) captures w into group #2; then \2* captures any additional number of w. This makes group #1 capture www.
You can use back references, something like
'wwwggfffw'.scan(/((.)\2*)/).map{ |s| s[0] }
will work
Here's one that's not using regex but works well:
def chunk(str)
chars = str.chars
chars.inject([chars.shift]) do |arr, char|
if arr[-1].include?(char)
arr[-1] << char
else
arr << char
end
arr
end
end
In my benchmarks it's faster than the regex answers here (with the example string you gave, at least).
Another non-regex solution, this one using Enumerable#slice_when, which made its debut in Ruby v.2.2:
str.each_char.slice_when { |a,b| a!=b }.map(&:join)
#=> ["www", "gg", "fff", "w"]
Another option is:
str.scan(Regexp.new(str.squeeze.each_char.map { |c| "(#{c}+)" }.join)).first
#=> ["www", "gg", "fff", "w"]
Here the steps are as follows
s = str.squeeze
#=> "wgfw"
a = s.each_char
#=> #<Enumerator: "wgfw":each_char>
This enumerator generates the following elements:
a.to_a
#=> ["w", "g", "f", "w"]
Continuing
b = a.map { |c| "(#{c}+)" }
#=> ["(w+)", "(g+)", "(f+)", "(w+)"]
c = b.join
#=> "(w+)(g+)(f+)(w+)"
r = Regexp.new(c)
#=> /(w+)(g+)(f+)(w+)/
d = str.scan(r)
#=> [["www", "gg", "fff", "w"]]
d.first
#=> ["www", "gg", "fff", "w"]
Here's one more way of doing it without a regex:
'wwwggfffw'.chars.chunk(&:itself).map{ |s| s[1].join }
# => ["www", "gg", "fff", "w"]
/((\w)\2)/ finds repeating letters. I was hoping to avoid the two dimensional array that is produced by ignoring the letter matching second capture group like this: /((?:\w)\2)/. It seems that's not possible. Any ideas why?
Rubular example
You don't need any capture groups:
str = [*'a+'..'z+', *'A+'..'Z+', *'0+'..'9+', '_+'].join('|')
#=> "a+|b+| ... |z+|A+|B+| ... |Z+|0+|1+| ... |9+|_+"
"aaabbcddd".scan(/#{str}/)
#=> ["aaa", "bb", "c", "ddd"]
but if you insist on having one:
"aaabbcddd".scan(/(#{str})/).flatten(1)
#=> ["aaa", "bb", "c", "ddd"]
Is this cheating? You did ask if it was possible.
If you mean you're using String#scan, you can post-process the result to return only the first items Enumerable#map:
'helloo'.scan(/((\w)\2)/)
# => [["ll", "l"], ["oo", "o"]]
'helloo'.scan(/((\w)\2)/).map { |m| m[0] }
# => ["ll", "oo"]
This question already has answers here:
Ruby: filter array by regex?
(6 answers)
Closed 8 years ago.
I can intersect two arrays by doing:
keyphrase_matches = words & city.keywords
How can I achieve the same thing using regular expression? I want to test one array against a regular expression and get a new array with the matches.
You can use the Enumerable#grep method:
%w{a b c 1 2 3}.grep /\d/ # => ["1", "2", "3"]
Use array.grep(regex) to return all elements that match the given regex.
See Enumerable#grep.
As I understand, if arr1 and arr2 are two arrays of strings (though you did not say they contain strings), you want to know if a regular expression could be used to produce arr1 & arr2.
First some test data:
arr1 = "Now is the time for all good Rubyists".split
#=> ["Now", "is", "the", "time", "for", "all", "good", "Rubyists"]
arr2 = "to find time to have the good life".split
#=> ["to", "find", "time", "to", "have", "the", "good", "life"]
The result we want:
arr1 & arr2
#=> ["the", "time", "good"]
I can think of two ways you might use use Enumerable#grep, as suggested by #meagar and #August:
#1
arr1.select { |e| arr2.grep(/#{e}/).any? }
#=> ["the", "time", "good"]
#2
regex = Regexp.new("#{arr2.join('|')}")
#=> /to|find|time|to|have|the|good|life/
arr1.grep(regex)
#=> ["the", "time", "good"]
Of course, Array#& generally would be preferred, especially in Code Golf.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm trying to use a Regular Expression to find all sub strings in word. It is finding some but not all. On such example is 'an' in the word 'banana'.
def substrings str
pattern = '.'
subs = []
while pattern.length < str.length do
subs << str.scan(/#{pattern}/)
pattern << '.'
end
subs.flatten
end
puts substrings("banana").sort_by{ |s| "banana".index(/#{s}/)}
Regular expression matches will never overlap. If you ask for /../, you will get ["ba", "na", "na"]. You will not get ["ba", "an" ...] because "an" overlaps "ba". The next match search will start from the last match's end, always.
If you want to find overlapping sequences, you need to use lookahead/lookbehind to shorten your match size so the matches themselves don't overlap: /(?=(..))/. Note that you have to introduce a capture group, since the match itself is an empty string in this case.
def substrings str
(0...str.length).flat_map{|i| (i...str.length).map{|j| str[i..j]}}.uniq
end
substrings("banana")
Result
[
"b",
"ba",
"ban",
"bana",
"banan",
"banana",
"a",
"an",
"ana",
"anan",
"anana",
"n",
"na",
"nan",
"nana"
]
or
def substrings str
(0...str.length).to_a.combination(2).map{|r| str[*r]}.uniq
end
Result
[
"b",
"ba",
"ban",
"bana",
"banan",
"banana",
"an",
"ana",
"anan",
"anana",
"nan",
"nana",
"na",
"a"
]
Here's another way that does not use a regex. I see now how it can be done with a regex, but I don't know why you'd want to, unless it's just an exercise.
def substrings(str)
arr = str.chars
(1..str.size).each_with_object([]) { |i,a|
a << arr.each_cons(i).to_a.map(&:join) }.flatten
end
substrings("banana")
#=> ["b", "a", "n", "a", "n", "a", "ba", "an", "na", "an", "na", "ban",
# "ana", "nan", "ana", "bana", "anan", "nana", "banan", "anana"]
If you want to include the word "banana", change str.size to str.size+1.
I have the string "111221" and want to match all sets of consecutive equal integers: ["111", "22", "1"].
I know that there is a special regex thingy to do that but I can't remember and I'm terrible at Googling.
Using regex in Ruby 1.8.7+:
p s.scan(/((\d)\2*)/).map(&:first)
#=> ["111", "22", "1"]
This works because (\d) captures any digit, and then \2* captures zero-or-more of whatever that group (the second opening parenthesis) matched. The outer (…) is needed to capture the entire match as a result in scan. Finally, scan alone returns:
[["111", "1"], ["22", "2"], ["1", "1"]]
…so we need to run through and keep just the first item in each array. In Ruby 1.8.6+ (which doesn't have Symbol#to_proc for convenience):
p s.scan(/((\d)\2*)/).map{ |x| x.first }
#=> ["111", "22", "1"]
With no Regex, here's a fun one (matching any char) that works in Ruby 1.9.2:
p s.chars.chunk{|c|c}.map{ |n,a| a.join }
#=> ["111", "22", "1"]
Here's another version that should work even in Ruby 1.8.6:
p s.scan(/./).inject([]){|a,c| (a.last && a.last[0]==c[0] ? a.last : a)<<c; a }
# => ["111", "22", "1"]
"111221".gsub(/(.)(\1)*/).to_a
#=> ["111", "22", "1"]
This uses the form of String#gsub that does not have a block and therefore returns an enumerator. It appears gsub was bestowed with that option in v2.0.
I found that this works, it first matches each character in one group, and then it matches any of the same character after it. This results in an array of two element arrays, with the first element of each array being the initial match, and then the second element being any additional repeated characters that match the first character. These arrays are joined back together to get an array of repeated characters:
input = "WWBWWWWBBBWWWWWWWB3333!!!!"
repeated_chars = input.scan(/(.)(\1*)/)
# => [["W", "W"], ["B", ""], ["W", "WWW"], ["B", "BB"], ["W", "WWWWWW"], ["B", ""], ["3", "333"], ["!", "!!!"]]
repeated_chars.map(&:join)
# => ["WW", "B", "WWWW", "BBB", "WWWWWWW", "B", "3333", "!!!!"]
As an alternative I found that I could create a new Regexp object to match one or more occurrences of each unique characters in the input string as follows:
input = "WWBWWWWBBBWWWWWWWB3333!!!!"
regexp = Regexp.new("#{input.chars.uniq.join("+|")}+")
#=> regexp created for this example will look like: /W+|B+|3+|!+/
and then use that Regex object as an argument for scan to split out all the repeated characters, as follows:
input.scan(regexp)
# => ["WW", "B", "WWWW", "BBB", "WWWWWWW", "B", "3333", "!!!!"]
you can try is
string str ="111221";
string pattern =#"(\d)(\1)+";
Hope can help you