Rubyish way to invert a regular expression - ruby

Suppose a regex comes from calling code, outside of the current context, and then is passed on to another call implemented outside of the current project:
["1", "2"].grep(/1/) #=> ["1"]
Is there a simple, Rubyish way to achieve the following behavior when the call is being made?
["1", "2"].grep(/1/.negate) #=> ["2"]
This behavior is similar to switching the =~ operator with the !~ operator. It is possible to use #select or #reject, of course, or to open up or subclass Regexp. But I'm curious whether there is a way already available in Ruby to negate the matches returned by a regular expression in the manner above. Also, I don't care whether false or nil or true or the position of a match are involved in accomplishing this effect.
There is a theoretical question which is relevant but which goes beyond the simple considerations here.
EDIT: I get that iterators are the general way to go in Ruby for filtering a list, but people are overlooking the constraints of the question. Also, I think there is something nicely functional about the way the regex is being inverted. I don't see it as being overwrought or too-clever by half; it's plain-old object-oriented programming and the kind of thing that Ruby excels at doing.

["1", "2"].reject { |e| /1/ === e }

You can do something like this:
class NegatedRegex < Regexp
def ===(other)
!super
end
end
class Regexp
def negate
NegatedRegex.new self
end
end
There are probably other methods to reimplement, but for grep this is enough:
["1", "2"].grep(/1/.negate) #=> ["2"]

You can do them both in one go:
re = /1/
matches, non_matches = ["1", "2", "1", "3"].partition { |el| re =~ el }
p matches #=> ["1", "1"]
p non_matches #=> ["2", "3"]

this could be one way of doing this
["1", "2", 3].select {|i| i !~ /1/ }
=> ["2", 3]

Grep has a dark brother who does everything the same as grep, but vice versa.
["1", "2"].grep(/1/) #=> ["1"]
["1", "2"].grep_v(/1/) #=> ["2"]

arr=["1","2"]
arr-arr.grep("1") # ["2"]
:)

Related

Ruby: Is it true that #map generally doesn't make sense with bang methods?

This question was inspired by this one:
Ruby: Why does this way of using map throw an error?
Someone pointed out the following:
map doesn't make much sense when used with ! methods.
You should either:
use map with gsub
or use each with gsub!
Can someone explain why that is?
Base object
Here's an array with strings as element :
words = ['hello', 'world']
New array
If you want a new array with modified strings, you can use map with gsub :
new_words = words.map{|word| word.gsub('o','#') }
p new_words
#=> ["hell#", "w#rld"]
p words
#=> ["hello", "world"]
p new_words == words
#=> false
The original strings and the original array aren't modified.
Strings modified in place
If you want to modify the strings in place, you can use :
words.each{|word| word.gsub!('o','#') }
p words
#=> ["hell#", "w#rld"]
map and gsub!
new_words = words.map{|word| word.gsub!('o','#') }
p words
#=> ["hell#", "w#rld"]
p new_words
#=> ["hell#", "w#rld"]
p words == new_words
#=> true
p new_words.object_id
#=> 12704900
p words.object_id
#=> 12704920
Here, a new array is created, but the elements are the exact same ones!
It doesn't bring anything more than the previous examples. It creates a new Array for nothing. It also might confuse people reading your code by sending opposite signals :
gsub! will indicate that you want to modifiy existing objects
map will indicate that you don't want to modify existing objects.
Map is for building a new array without mutating the original. Each is for performing some action on each element of an array. Doing both at once is surprising.
>> arr = ["foo bar", "baz", "quux"]
=> ["foo bar", "baz", "quux"]
>> arr.map{|x| x.gsub!(' ', '-')}
=> ["foo-bar", nil, nil]
>> arr
=> ["foo-bar", "baz", "quux"]
Since !-methods generally have side effects (and only incidentally might return a value), each should be preferred to map when invoking a !-method.
An exception might be when you have a list of actions to perform. The method to perform the action might sensibly be named with a !, but you wish to collect the results in order to report which ones succeeded or failed.

Ruby Split string at character difference using regex

I'm current working on a problem that involves splitting a string by each group of characters.
For example,
"111223334456777" #=> ['111','22','333','44','5','6','777']
The way I am currently doing it now is using a enumerator and comparing each character with the next one, and splitting the array that way.
res = []
str = "111223334456777"
group = str[0]
(1...str.length).each do |i|
if str[i] != str[i-1]
res << group
group = str[i]
else
group << str[i]
end
end
res << group
res #=> ['111','22','333','44','5','6','777']
I want to see if I can use regex to do this, which will make this process a lot easier. I understand I could just put this block of code in a method, but I'm curious if regex can be used here.
So what I want to do is
str.split(/some regex/)
to produce the same result. I thought about positive lookahead, but I can't figure out how to have regex recognize that the character is different.
Does anyone have an idea if this is possible?
The chunk_while method is what you're looking for here:
str.chars.chunk_while { |b,a| b == a }.map(&:join)
That will break anything where the current character a doesn't match the previous character b. If you want to restrict to just numbers you can do some pre-processing.
There's a lot of very handy methods in Enumerable that are worth exploring, and each new version of Ruby seems to add more of them.
str = "111333224456777"
str.scan /0+|1+|2+|3+|4+|5+|6+|7+|8+|9+/
#=> ["111", "333", "22", "44", "5", "6", "777"]
or
str.gsub(/(\d)\1*/).to_a
#=> ["111", "333", "22", "44", "5", "6", "777"]
The latter uses the (underused) form of String#gsub that takes one argument and no block, returning an enumerator. It merely generates matches and has nothing to do with character replacement.
For fun, here are several other ways to do that.
str.scan(/((\d)\2*)/).map(&:first)
str.split(/(?<=(.))(?!\1)/).each_slice(2).map(&:first)
str.each_char.slice_when(&:!=).map(&:join)
str.each_char.chunk(&:itself).map { |_,a| a.join }
str.each_char.chunk_while(&:==).map(&:join)
str.gsub(/(?<=(.))(?!\1)/, ' ').split
str.gsub(/(.)\1*/).reduce([], &:<<)
str[1..-1].each_char.with_object([txt[0]]) {|c,a| a.last[-1]==c ? (a.last<<c) : a << c}
Another option which utilises the group_by method, which returns a hash with each individual number as a key and an array of grouped numbers as the value.
"111223334456777".split('').group_by { |i| i }.values.map(&:join) => => ["111", "22", "333", "44", "5", "6", "777"]
Although it doesn't implement a regex, someone else may find it useful.

Ruby string char chunking

I have a string "wwwggfffw" and want to break it up into an array as follows:
["www", "gg", "fff", "w"]
Is there a way to do this with regex?
"wwwggfffw".scan(/((.)\2*)/).map(&:first)
scan is a little funny, as it will return either the match or the subgroups depending on whether there are subgroups; we need to use subgroups to ensure repetition of the same character ((.)\1), but we'd prefer it if it returned the whole match and not just the repeated letter. So we need to make the whole match into a subgroup so it will be captured, and in the end we need to extract just the match (without the other subgroup), which we do with .map(&:first).
EDIT to explain the regexp ((.)\2*) itself:
( start group #1, consisting of
( start group #2, consisting of
. any one character
) and nothing else
\2 followed by the content of the group #2
* repeated any number of times (including zero)
) and nothing else.
So in wwwggfffw, (.) captures w into group #2; then \2* captures any additional number of w. This makes group #1 capture www.
You can use back references, something like
'wwwggfffw'.scan(/((.)\2*)/).map{ |s| s[0] }
will work
Here's one that's not using regex but works well:
def chunk(str)
chars = str.chars
chars.inject([chars.shift]) do |arr, char|
if arr[-1].include?(char)
arr[-1] << char
else
arr << char
end
arr
end
end
In my benchmarks it's faster than the regex answers here (with the example string you gave, at least).
Another non-regex solution, this one using Enumerable#slice_when, which made its debut in Ruby v.2.2:
str.each_char.slice_when { |a,b| a!=b }.map(&:join)
#=> ["www", "gg", "fff", "w"]
Another option is:
str.scan(Regexp.new(str.squeeze.each_char.map { |c| "(#{c}+)" }.join)).first
#=> ["www", "gg", "fff", "w"]
Here the steps are as follows
s = str.squeeze
#=> "wgfw"
a = s.each_char
#=> #<Enumerator: "wgfw":each_char>
This enumerator generates the following elements:
a.to_a
#=> ["w", "g", "f", "w"]
Continuing
b = a.map { |c| "(#{c}+)" }
#=> ["(w+)", "(g+)", "(f+)", "(w+)"]
c = b.join
#=> "(w+)(g+)(f+)(w+)"
r = Regexp.new(c)
#=> /(w+)(g+)(f+)(w+)/
d = str.scan(r)
#=> [["www", "gg", "fff", "w"]]
d.first
#=> ["www", "gg", "fff", "w"]
Here's one more way of doing it without a regex:
'wwwggfffw'.chars.chunk(&:itself).map{ |s| s[1].join }
# => ["www", "gg", "fff", "w"]

What's the Ruby equivalent of map() for strings?

If you wanted to split a space-separated list of words, you would use
def words(text)
return text.split.map{|word| word.downcase}
end
similarly to Python's list comprehension:
words("get out of here")
which returns ["get", "out", "of", "here"]. How can I apply a block to every character in a string?
Use String#chars:
irb> "asdf".chars.map { |ch| ch.upcase }
=> ["A", "S", "D", "F"]
Are you looking for something like this?
class String
def map
size.times.with_object('') {|i,s| s << yield(self[i])}
end
end
"ABC".map {|c| c.downcase} #=> "abc"
"ABC".map(&:downcase) #=> "abc"
"abcdef".map {|c| (c.ord+1).chr} #=> "bcdefg"
"abcdef".map {|c| c*3} #=> "aaabbbcccdddeeefff"
I think the short answer to your question is "no, there's nothing like map for strings that operates a character at a time." Previous answerer had the cleanest solution in my book; simply create one by adding a function definition to the class.
BTW, there's also String#each_char which is an iterator across each character of a string. In this case String#chars gets you the same result because it returns an Array which also responds to each (or map), but I guess there may be cases where the distinction would be important.

Regex to catch groups of same digits in Ruby

Is it possible to catch all grous of same digits in string with regex on Ruby? I'm not familiar with regex.
I mean: regex on "1112234444" will produce ["111", "22", "3", "4444"]
I know, I can use (\d)(\1*), but it only gives me 2 groups in each match. ["1", "11"], ["2", "2"], ["3", -], ["4", "444"]
How can I get 1 group in each match? Thanks.
Here, give this a shot:
((\d)\2*)
You can use this regex
((\d)\2*)
group 1 catches your required value
My first quick answer was rightfully criticized for having no explanation for the code. So here's another one, better in all respects ;-)
We exploit the fact that the elements whose runs we want are digits and they are easy to enumerate by hand. So we construct a readable regex which means "a run of zeros, or a run of ones, ... or a run of nines". And we use the right method for the job, String#scan:
irb> "1112234444".scan(/0+|1+|2+|3+|4+|5+|6+|7+|8+|9+/)
=> ["111", "22", "3", "4444"]
For the record, here's my original answer:
irb> s = "1112234444"
=> "1112234444"
irb> rx = /(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)/
=> /(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)/
irb> s.split(rx).reject(&:empty?)
=> ["111", "22", "3", "4444"]

Resources