Ruby Split string at character difference using regex - ruby

I'm current working on a problem that involves splitting a string by each group of characters.
For example,
"111223334456777" #=> ['111','22','333','44','5','6','777']
The way I am currently doing it now is using a enumerator and comparing each character with the next one, and splitting the array that way.
res = []
str = "111223334456777"
group = str[0]
(1...str.length).each do |i|
if str[i] != str[i-1]
res << group
group = str[i]
else
group << str[i]
end
end
res << group
res #=> ['111','22','333','44','5','6','777']
I want to see if I can use regex to do this, which will make this process a lot easier. I understand I could just put this block of code in a method, but I'm curious if regex can be used here.
So what I want to do is
str.split(/some regex/)
to produce the same result. I thought about positive lookahead, but I can't figure out how to have regex recognize that the character is different.
Does anyone have an idea if this is possible?

The chunk_while method is what you're looking for here:
str.chars.chunk_while { |b,a| b == a }.map(&:join)
That will break anything where the current character a doesn't match the previous character b. If you want to restrict to just numbers you can do some pre-processing.
There's a lot of very handy methods in Enumerable that are worth exploring, and each new version of Ruby seems to add more of them.

str = "111333224456777"
str.scan /0+|1+|2+|3+|4+|5+|6+|7+|8+|9+/
#=> ["111", "333", "22", "44", "5", "6", "777"]
or
str.gsub(/(\d)\1*/).to_a
#=> ["111", "333", "22", "44", "5", "6", "777"]
The latter uses the (underused) form of String#gsub that takes one argument and no block, returning an enumerator. It merely generates matches and has nothing to do with character replacement.
For fun, here are several other ways to do that.
str.scan(/((\d)\2*)/).map(&:first)
str.split(/(?<=(.))(?!\1)/).each_slice(2).map(&:first)
str.each_char.slice_when(&:!=).map(&:join)
str.each_char.chunk(&:itself).map { |_,a| a.join }
str.each_char.chunk_while(&:==).map(&:join)
str.gsub(/(?<=(.))(?!\1)/, ' ').split
str.gsub(/(.)\1*/).reduce([], &:<<)
str[1..-1].each_char.with_object([txt[0]]) {|c,a| a.last[-1]==c ? (a.last<<c) : a << c}

Another option which utilises the group_by method, which returns a hash with each individual number as a key and an array of grouped numbers as the value.
"111223334456777".split('').group_by { |i| i }.values.map(&:join) => => ["111", "22", "333", "44", "5", "6", "777"]
Although it doesn't implement a regex, someone else may find it useful.

Related

How to make a repeated string to the left be deleted without using While?

For example, I have this string of only numbers:
0009102
If I convert it to integer Ruby automatically gives me this value:
9102
That's correct. But my program gives me different types of numbers:
2229102 desired output => 9102
9999102 desired output => 102
If you look at them I have treated 2 and 9 as zeros since they are automatically deleted, well, it is easy to delete that with an while but I must avoid it.
In other words, how do you make 'n' on the left be considered a zero for Ruby?
"2229102".sub(/\A(\d)\1*/, "") #=> "9102"`.
The regular expression reads, "match the first digit in the string (\A is the beginning-of-string anchor) in capture group 1 ((\d)), followed by zero or more characters (*) that equal the contents of capture group 1 (\1). String#gsub converts that match to an empty string.
Try with Enumerable#chunk_while:
s = '222910222'
s.each_char.chunk_while(&:==).drop(1).join
#=> "910222"
Where s.each_char.chunk_while(&:==).to_a #=> [["2", "2", "2"], ["9"], ["1"], ["0"], ["2", "2", "2"]]
Similar to the solution of iGian you could also use drop_while.
s = '222910222'
s.each_char.each_cons(2).drop_while { |a, b| a == b }.map(&:last).join
#=> "910222"
# or
s.each_char.drop_while.with_index(-1) { |c, i| i < 0 || c == s[i] }.join
#=> "910222"
You can also try this way:
s = '9999102938'
s.chars.then{ |chars| chars[chars.index(chars.uniq[1])..-1] }.join
=> "102938"

How to extract number from array of string? (I m using regex)

I have a array of string
test= ["ChangeServer<br/>Test: 3-7<br/>PinCode:DFSFSDFB04008<br/>ShipCode:DFADFSDFSDM-000D3<br/>SomeCode:sdfsdf", "werwerwe", "adfsdfsd",
"sdfsdfsdfsd<br/>Test: 9<br/>PinCode:ADFSDF4NS0<br/>ShipCode:FADFSDFD-9ZM170<br/>"]
I want to grab the number after Test: which in the above array of string are 3, 4, 5, 6, 7 ( range 3-7) and 9
Desired output:
["3","4","5","6","7","9"]
What I tried so far
test.join.scan(/(?<=Test: )[0-9]+/)
=> ["3", "7"]
How to deal with range?
Second test case:
test= ["ChangeServer<br/>Test: 3-7<br/>PinCode:DFSFSDFB04008<br/>ShipCode:DFADFSDFSDM-000D3<br/>SomeCode:sdfsdf", "werwerwe", "adfsdfsd",
"sdfsdfsdfsd<br/>Test: 9<br/>PinCode:ADFSDF4NS0<br/>ShipCode:FADFSDFD-9ZM170<br/>", "sdfsdfsdfsd<br/>Test: 15-18<br/>PinCode:ADFSDF4NS0<br/>ShipCode:FADFSDFD-9ZM170<br/>"]
Desired output:
["3","4","5","6","7","9","15","16","17","18"]
There are a lot of ways you could solve this. I'd probably do it this way:
test.flat_map do |s|
_, m, n = *s.match(/Test:\s*(\d+)(?:-(\d+))?/)
m ? (m..n||m).to_a : []
end
See it in action on repl.it: https://repl.it/JFwT/13
Or, more succinctly:
test.flat_map {|s| s.match(/Test:\s*(\d+)(?:-(\d+))?/) { $1..($2||$1) }.to_a }
https://repl.it/JFwT/11
You could create a new Range for each range found (i.e N-N) using the splat operator (i.e. *) and combine the results, like this 1:
test.join.scan(/(?<=Test: )[0-9-]+/)
.flat_map { |r| Range.new(*r.split('-').values_at(0, -1)).to_a }
#=> ["3", "4", "5", "6", "7", "9"]
This will work for both examples.
1 Notice the the added - next to 0-9 in the regex.
Is the a way where we can include both Test: 1 (with space between
Test: and 1) and Test:1 (without space between Test: and 1)?
Yes, update your regex (change where space is placed) and add an additional map to get rid of those spaces:
test.join
.scan(/(?<=Test:)[ 0-9-]+/)
.map(&:strip)
.flat_map { |r| Range.new(*r.split('-').values_at(0, -1)).to_a }
And here's shortened option using two captures in the regex, as suggested by Jordan.
test.join
.scan(/Test:\s*(\d+)(?:-(\d+))?/)
.flat_map { |m,n| (m..n||m).to_a }
Just out of curiosity:
test.
join.
scan(/(?<=Test: )[\d-]+/).
map { |e| e.gsub(/\A\d+\Z/) { |m| "#{m}..#{m}" }.gsub('-', '..') }.
map(&method(:eval)).
flat_map(&:to_a)

Ruby multiple arrays comparison

I am trying to replace a component in a legacy system with a Ruby script. One piece of this system accepts a string that contains ASCII '0's and '1's apparently to represent a bitfield of locations. It then converts these location to a string of comma separated 2 two codes (mostly US states).
I have a Ruby method that does this but it doesn't seem like I am doing it the best way Ruby could. Ruby has a ton of ways built in to iterate over and manipulated array and I feel I am not using them to their fullest:
# input "0100010010" should return "AZ,PR,WY"
def locations(bits)
# Shortened from hundreds for this post. :u? is for locations I have't figured out yet.
fields = [ :u?, :az, :de, :mi, :ne, :wy, :u?, :u?, :pr, :u? ]
matches = []
counter = 0
fields.each { |f|
case bits[counter]
when '1' then matches << f
when '0' then nil
else raise "Unknown value in location bit field"
end
counter += 1
}
if matches.include(:u?) then raise "Unknown field bit set" end
matches.sort.join(",").upcase
end
What would be a better way to do this?
It seems counter to the "Ruby way" to have counter variables floating around. I tried looking at ways to use Array#map, and I could find nothing obvious. I also tried Googling for Ruby Idioms pertaining to Arrays.
matches = fields.select.with_index { |_,i| bits[i] == '1' }
# => [:az, :wy, :pr]
To verify bits only holds 0s and 1s, you can still do
raise "Unknown value in location bit field" if !bits.match(/^[01]*$/)
Use Array#zip and Array#reduce
bits.split('').zip(fields).reduce([]) do |a, (k, v)|
k == '1' ? a << v.to_s.upcase : a
end.sort.join(',')
# => "AZ,PR,WY
Explanation:
1) split bits into an array of chars:
bits.split('') # => ["0", "1", "0", "0", "0", "1", "0", "0", "1", "0"]
2) zip both arrays to generate an array of pairs (by position)
bits.split('').zip(fields) # => [["0", :u?], ["1", :az], ["0", :de], ["0", :mi],
# ["0", :ne], ["1", :wy], ["0", :u?], ["0", :u?], ["1", :pr], ["0", :u?]]
3) reduce the array taking the desired elements according to the conditions
.reduce([]) do |a, (k, v)|
k == '1' ? a << v.to_s.upcase : a
end # => "[AZ,WY,PR]
4) sort the resulting array and join their elements to get the expected string
.sort.join(',') # => "AZ,PR,WY"
You could combine each_with_index, map andcompact:
fields.each_with_index.map do |v,i|
v if bits[i] == '1'
end.compact
each_with_index returns an iterator for each each value and its integer index.
map uses the return value of a passed block to yield an output value for each input value. The block returns the value if its corresponding bit is set, and implicitly returns nil if it is not.
compact returns a copy of the output array with all of the nil values removed.
For more detail see the docs for Enumerable and Array.

Rubyish way to invert a regular expression

Suppose a regex comes from calling code, outside of the current context, and then is passed on to another call implemented outside of the current project:
["1", "2"].grep(/1/) #=> ["1"]
Is there a simple, Rubyish way to achieve the following behavior when the call is being made?
["1", "2"].grep(/1/.negate) #=> ["2"]
This behavior is similar to switching the =~ operator with the !~ operator. It is possible to use #select or #reject, of course, or to open up or subclass Regexp. But I'm curious whether there is a way already available in Ruby to negate the matches returned by a regular expression in the manner above. Also, I don't care whether false or nil or true or the position of a match are involved in accomplishing this effect.
There is a theoretical question which is relevant but which goes beyond the simple considerations here.
EDIT: I get that iterators are the general way to go in Ruby for filtering a list, but people are overlooking the constraints of the question. Also, I think there is something nicely functional about the way the regex is being inverted. I don't see it as being overwrought or too-clever by half; it's plain-old object-oriented programming and the kind of thing that Ruby excels at doing.
["1", "2"].reject { |e| /1/ === e }
You can do something like this:
class NegatedRegex < Regexp
def ===(other)
!super
end
end
class Regexp
def negate
NegatedRegex.new self
end
end
There are probably other methods to reimplement, but for grep this is enough:
["1", "2"].grep(/1/.negate) #=> ["2"]
You can do them both in one go:
re = /1/
matches, non_matches = ["1", "2", "1", "3"].partition { |el| re =~ el }
p matches #=> ["1", "1"]
p non_matches #=> ["2", "3"]
this could be one way of doing this
["1", "2", 3].select {|i| i !~ /1/ }
=> ["2", 3]
Grep has a dark brother who does everything the same as grep, but vice versa.
["1", "2"].grep(/1/) #=> ["1"]
["1", "2"].grep_v(/1/) #=> ["2"]
arr=["1","2"]
arr-arr.grep("1") # ["2"]
:)

Regex to catch groups of same digits in Ruby

Is it possible to catch all grous of same digits in string with regex on Ruby? I'm not familiar with regex.
I mean: regex on "1112234444" will produce ["111", "22", "3", "4444"]
I know, I can use (\d)(\1*), but it only gives me 2 groups in each match. ["1", "11"], ["2", "2"], ["3", -], ["4", "444"]
How can I get 1 group in each match? Thanks.
Here, give this a shot:
((\d)\2*)
You can use this regex
((\d)\2*)
group 1 catches your required value
My first quick answer was rightfully criticized for having no explanation for the code. So here's another one, better in all respects ;-)
We exploit the fact that the elements whose runs we want are digits and they are easy to enumerate by hand. So we construct a readable regex which means "a run of zeros, or a run of ones, ... or a run of nines". And we use the right method for the job, String#scan:
irb> "1112234444".scan(/0+|1+|2+|3+|4+|5+|6+|7+|8+|9+/)
=> ["111", "22", "3", "4444"]
For the record, here's my original answer:
irb> s = "1112234444"
=> "1112234444"
irb> rx = /(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)/
=> /(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)/
irb> s.split(rx).reject(&:empty?)
=> ["111", "22", "3", "4444"]

Resources