Regex to catch groups of same digits in Ruby - ruby

Is it possible to catch all grous of same digits in string with regex on Ruby? I'm not familiar with regex.
I mean: regex on "1112234444" will produce ["111", "22", "3", "4444"]
I know, I can use (\d)(\1*), but it only gives me 2 groups in each match. ["1", "11"], ["2", "2"], ["3", -], ["4", "444"]
How can I get 1 group in each match? Thanks.

Here, give this a shot:
((\d)\2*)

You can use this regex
((\d)\2*)
group 1 catches your required value

My first quick answer was rightfully criticized for having no explanation for the code. So here's another one, better in all respects ;-)
We exploit the fact that the elements whose runs we want are digits and they are easy to enumerate by hand. So we construct a readable regex which means "a run of zeros, or a run of ones, ... or a run of nines". And we use the right method for the job, String#scan:
irb> "1112234444".scan(/0+|1+|2+|3+|4+|5+|6+|7+|8+|9+/)
=> ["111", "22", "3", "4444"]
For the record, here's my original answer:
irb> s = "1112234444"
=> "1112234444"
irb> rx = /(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)/
=> /(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)/
irb> s.split(rx).reject(&:empty?)
=> ["111", "22", "3", "4444"]

Related

Ruby Split string at character difference using regex

I'm current working on a problem that involves splitting a string by each group of characters.
For example,
"111223334456777" #=> ['111','22','333','44','5','6','777']
The way I am currently doing it now is using a enumerator and comparing each character with the next one, and splitting the array that way.
res = []
str = "111223334456777"
group = str[0]
(1...str.length).each do |i|
if str[i] != str[i-1]
res << group
group = str[i]
else
group << str[i]
end
end
res << group
res #=> ['111','22','333','44','5','6','777']
I want to see if I can use regex to do this, which will make this process a lot easier. I understand I could just put this block of code in a method, but I'm curious if regex can be used here.
So what I want to do is
str.split(/some regex/)
to produce the same result. I thought about positive lookahead, but I can't figure out how to have regex recognize that the character is different.
Does anyone have an idea if this is possible?
The chunk_while method is what you're looking for here:
str.chars.chunk_while { |b,a| b == a }.map(&:join)
That will break anything where the current character a doesn't match the previous character b. If you want to restrict to just numbers you can do some pre-processing.
There's a lot of very handy methods in Enumerable that are worth exploring, and each new version of Ruby seems to add more of them.
str = "111333224456777"
str.scan /0+|1+|2+|3+|4+|5+|6+|7+|8+|9+/
#=> ["111", "333", "22", "44", "5", "6", "777"]
or
str.gsub(/(\d)\1*/).to_a
#=> ["111", "333", "22", "44", "5", "6", "777"]
The latter uses the (underused) form of String#gsub that takes one argument and no block, returning an enumerator. It merely generates matches and has nothing to do with character replacement.
For fun, here are several other ways to do that.
str.scan(/((\d)\2*)/).map(&:first)
str.split(/(?<=(.))(?!\1)/).each_slice(2).map(&:first)
str.each_char.slice_when(&:!=).map(&:join)
str.each_char.chunk(&:itself).map { |_,a| a.join }
str.each_char.chunk_while(&:==).map(&:join)
str.gsub(/(?<=(.))(?!\1)/, ' ').split
str.gsub(/(.)\1*/).reduce([], &:<<)
str[1..-1].each_char.with_object([txt[0]]) {|c,a| a.last[-1]==c ? (a.last<<c) : a << c}
Another option which utilises the group_by method, which returns a hash with each individual number as a key and an array of grouped numbers as the value.
"111223334456777".split('').group_by { |i| i }.values.map(&:join) => => ["111", "22", "333", "44", "5", "6", "777"]
Although it doesn't implement a regex, someone else may find it useful.

Find few adjacent identical numbers

Could you help me?
I need a regex that splits strings like
"11231114"
to
['11', '2', '3', '111', '4']
You could implement String#scan as follows:
"11231114".scan(/((\d)\2*)/).map(&:first)
#=> ["11", "2", "3", "111", "4"]
You could pass a block to String#scan pushing the match group to an array.
matches = []
"11231114".scan(/((\d)\2*)/) do |n,r| matches << n end
In Javascript you can do:
var m = "11231114".match(/(\d)\1*/g)
//=> ["11", "2", "3", "111", "4"]
You can use similar approach in whatever language/tool you're using.
Approach is to capture a digit using (\d) and then match all the back-references for the same using \1*.
You could do something like this,
> str = "11231114"
=> "11231114"
> str1 = str.gsub(/(?<=(\d))(?!\1)/, "*")
=> "11*2*3*111*4*"
> str1.split('*')
=> ["11", "2", "3", "111", "4"]
There is slice_when in Ruby 2.2:
"11231114".chars.slice_when { |x, y| x != y }.map(&:join)

String#split strange behavior [duplicate]

This question already has answers here:
How do I avoid trailing empty items being removed when splitting strings?
(1 answer)
Empty strings at the beginning and end of split [duplicate]
(2 answers)
Closed 9 years ago.
I observed a strange behavior of the split method on a String.
"1..2".split('..') # => ['1', '2']
"1..2".split('..', 2) # => ['1', '2']
"..2".split('..') # => ['', '2']
"..2".split('..', 2) # => ['', '2']
Everything like expected, but now:
"1..".split('..') # => ['1']
"1..".split('..', 2) # => ['1', '']
I would expect the first to return the same that the second.
Does anyone have a good explanation, why "1..".split('..') returns an array with just one element? Or is it an inconsistency in Ruby? What do you think about that?
According to the Ruby String documentation for split:
If the limit parameter is omitted, trailing null fields are suppressed.
Regarding the limit parameter, the Ruby documentation isn't totally complete. Here is a little more detail:
If limit is positive, split returns at most that number of fields. The last element of the returned array is the "rest of the string", or a single null string ("") if there are fewer fields than limit and there's a trailing delimiter in the original string.
Examples:
"2_3_4_".split('_',3)
=> ["2", "3", "4_"]
"2_3_4_".split('_',4)
=> ["2", "3", "4", ""]
If limit is zero [not mentioned in the documentation], split appears to return all of the parsed fields, and no trailing null string ("") element if there is a trailing delimiter in the original string. I.e., it behaves as if limit were not present. (It may be implemented as a default value.)
Example:
"2_3_4_".split('_',0)
=> ["2", "3", "4"]
If limit is negative, split returns all of the parsed fields and a trailing null string element if there is a trailing delimiter in the original string.
Example:
"2_3_4".split('_',-2)
=> ["2", "3", "4"]
"2_3_4".split('_',-5)
=> ["2", "3", "4"]
"2_3_4_".split('_',-2)
=> ["2", "3", "4", ""]
"2_3_4_".split('_',-5)
=> ["2", "3", "4", ""]
It would seem that something a little more useful or interesting could have been done with the negative limit.

Rubyish way to invert a regular expression

Suppose a regex comes from calling code, outside of the current context, and then is passed on to another call implemented outside of the current project:
["1", "2"].grep(/1/) #=> ["1"]
Is there a simple, Rubyish way to achieve the following behavior when the call is being made?
["1", "2"].grep(/1/.negate) #=> ["2"]
This behavior is similar to switching the =~ operator with the !~ operator. It is possible to use #select or #reject, of course, or to open up or subclass Regexp. But I'm curious whether there is a way already available in Ruby to negate the matches returned by a regular expression in the manner above. Also, I don't care whether false or nil or true or the position of a match are involved in accomplishing this effect.
There is a theoretical question which is relevant but which goes beyond the simple considerations here.
EDIT: I get that iterators are the general way to go in Ruby for filtering a list, but people are overlooking the constraints of the question. Also, I think there is something nicely functional about the way the regex is being inverted. I don't see it as being overwrought or too-clever by half; it's plain-old object-oriented programming and the kind of thing that Ruby excels at doing.
["1", "2"].reject { |e| /1/ === e }
You can do something like this:
class NegatedRegex < Regexp
def ===(other)
!super
end
end
class Regexp
def negate
NegatedRegex.new self
end
end
There are probably other methods to reimplement, but for grep this is enough:
["1", "2"].grep(/1/.negate) #=> ["2"]
You can do them both in one go:
re = /1/
matches, non_matches = ["1", "2", "1", "3"].partition { |el| re =~ el }
p matches #=> ["1", "1"]
p non_matches #=> ["2", "3"]
this could be one way of doing this
["1", "2", 3].select {|i| i !~ /1/ }
=> ["2", 3]
Grep has a dark brother who does everything the same as grep, but vice versa.
["1", "2"].grep(/1/) #=> ["1"]
["1", "2"].grep_v(/1/) #=> ["2"]
arr=["1","2"]
arr-arr.grep("1") # ["2"]
:)

Match sequences of consecutive characters in a string

I have the string "111221" and want to match all sets of consecutive equal integers: ["111", "22", "1"].
I know that there is a special regex thingy to do that but I can't remember and I'm terrible at Googling.
Using regex in Ruby 1.8.7+:
p s.scan(/((\d)\2*)/).map(&:first)
#=> ["111", "22", "1"]
This works because (\d) captures any digit, and then \2* captures zero-or-more of whatever that group (the second opening parenthesis) matched. The outer (…) is needed to capture the entire match as a result in scan. Finally, scan alone returns:
[["111", "1"], ["22", "2"], ["1", "1"]]
…so we need to run through and keep just the first item in each array. In Ruby 1.8.6+ (which doesn't have Symbol#to_proc for convenience):
p s.scan(/((\d)\2*)/).map{ |x| x.first }
#=> ["111", "22", "1"]
With no Regex, here's a fun one (matching any char) that works in Ruby 1.9.2:
p s.chars.chunk{|c|c}.map{ |n,a| a.join }
#=> ["111", "22", "1"]
Here's another version that should work even in Ruby 1.8.6:
p s.scan(/./).inject([]){|a,c| (a.last && a.last[0]==c[0] ? a.last : a)<<c; a }
# => ["111", "22", "1"]
"111221".gsub(/(.)(\1)*/).to_a
#=> ["111", "22", "1"]
This uses the form of String#gsub that does not have a block and therefore returns an enumerator. It appears gsub was bestowed with that option in v2.0.
I found that this works, it first matches each character in one group, and then it matches any of the same character after it. This results in an array of two element arrays, with the first element of each array being the initial match, and then the second element being any additional repeated characters that match the first character. These arrays are joined back together to get an array of repeated characters:
input = "WWBWWWWBBBWWWWWWWB3333!!!!"
repeated_chars = input.scan(/(.)(\1*)/)
# => [["W", "W"], ["B", ""], ["W", "WWW"], ["B", "BB"], ["W", "WWWWWW"], ["B", ""], ["3", "333"], ["!", "!!!"]]
repeated_chars.map(&:join)
# => ["WW", "B", "WWWW", "BBB", "WWWWWWW", "B", "3333", "!!!!"]
As an alternative I found that I could create a new Regexp object to match one or more occurrences of each unique characters in the input string as follows:
input = "WWBWWWWBBBWWWWWWWB3333!!!!"
regexp = Regexp.new("#{input.chars.uniq.join("+|")}+")
#=> regexp created for this example will look like: /W+|B+|3+|!+/
and then use that Regex object as an argument for scan to split out all the repeated characters, as follows:
input.scan(regexp)
# => ["WW", "B", "WWWW", "BBB", "WWWWWWW", "B", "3333", "!!!!"]
you can try is
string str ="111221";
string pattern =#"(\d)(\1)+";
Hope can help you

Resources