Ruby regex matching overlapping terms - ruby

I'm using:
r = /(hell|hello)/
"hello".scan(r) #=> ["hell"]
but I would like to get [ "hell", "hello" ].
http://rubular.com/r/IxdPKYSUAu

You can use a fancier capture:
'hello'.match(/((hell)o)/).captures
=> ["hello", "hell"]

No, regexes don't work like that. But you can do something like this:
terms = %w{hell hello}.map{|t| /#{t}/}
str = "hello"
matches = terms.map{|t| str.scan t}
puts matches.flatten.inspect # => ["hell", "hello"]

Well, you can always take out common subexpression. I.e., the following works:
r = /hello{0,1}/
"hello".scan(r) #=> ["hello"]
"hell".scan(r) #=> ["hell"]

You could do something like this:
r = /(hell|(?<=hell)o)/
"hello".scan(r) #=> ["hell","o"]
It won't give you ["hell", "hello"], but rather ["hell", "o"]

Related

Ruby string char chunking

I have a string "wwwggfffw" and want to break it up into an array as follows:
["www", "gg", "fff", "w"]
Is there a way to do this with regex?
"wwwggfffw".scan(/((.)\2*)/).map(&:first)
scan is a little funny, as it will return either the match or the subgroups depending on whether there are subgroups; we need to use subgroups to ensure repetition of the same character ((.)\1), but we'd prefer it if it returned the whole match and not just the repeated letter. So we need to make the whole match into a subgroup so it will be captured, and in the end we need to extract just the match (without the other subgroup), which we do with .map(&:first).
EDIT to explain the regexp ((.)\2*) itself:
( start group #1, consisting of
( start group #2, consisting of
. any one character
) and nothing else
\2 followed by the content of the group #2
* repeated any number of times (including zero)
) and nothing else.
So in wwwggfffw, (.) captures w into group #2; then \2* captures any additional number of w. This makes group #1 capture www.
You can use back references, something like
'wwwggfffw'.scan(/((.)\2*)/).map{ |s| s[0] }
will work
Here's one that's not using regex but works well:
def chunk(str)
chars = str.chars
chars.inject([chars.shift]) do |arr, char|
if arr[-1].include?(char)
arr[-1] << char
else
arr << char
end
arr
end
end
In my benchmarks it's faster than the regex answers here (with the example string you gave, at least).
Another non-regex solution, this one using Enumerable#slice_when, which made its debut in Ruby v.2.2:
str.each_char.slice_when { |a,b| a!=b }.map(&:join)
#=> ["www", "gg", "fff", "w"]
Another option is:
str.scan(Regexp.new(str.squeeze.each_char.map { |c| "(#{c}+)" }.join)).first
#=> ["www", "gg", "fff", "w"]
Here the steps are as follows
s = str.squeeze
#=> "wgfw"
a = s.each_char
#=> #<Enumerator: "wgfw":each_char>
This enumerator generates the following elements:
a.to_a
#=> ["w", "g", "f", "w"]
Continuing
b = a.map { |c| "(#{c}+)" }
#=> ["(w+)", "(g+)", "(f+)", "(w+)"]
c = b.join
#=> "(w+)(g+)(f+)(w+)"
r = Regexp.new(c)
#=> /(w+)(g+)(f+)(w+)/
d = str.scan(r)
#=> [["www", "gg", "fff", "w"]]
d.first
#=> ["www", "gg", "fff", "w"]
Here's one more way of doing it without a regex:
'wwwggfffw'.chars.chunk(&:itself).map{ |s| s[1].join }
# => ["www", "gg", "fff", "w"]

How to match given characters in string

Given:
fruits = %w[Banana Apple Orange Grape]
chars = 'ep'
how can I print all elements of fruits that have all characters of chars? I tried the following:
fruits.each{|fruit| puts fruit if !(fruit=~/["#{chars}"]/i).nil?)}
but I see 'Orange' in the result, which does not have the 'p' character in it.
p fruits.select { |fruit| chars.delete(fruit.downcase).empty? }
["Apple", "Grape"]
String#delete returns a copy of chars with all characters in delete's argument deleted.
Just for fun, here's how you might do this with a regular expression, thanks to the magic of positive lookahead:
fruits = %w[Banana Apple Orange Grape]
p fruits.grep(/(?=.*e)(?=.*p)/i)
# => ["Apple", "Grape"]
This is nice and succinct, but the regex is a bit occult, and it gets worse if you want to generalize it:
def match_chars(arr, chars)
expr_parts = chars.chars.map {|c| "(?=.*#{Regexp.escape(c)})" }
arr.grep(Regexp.new(expr_parts.join, true))
end
p match_chars(fruits, "ar")
# => ["Orange", "Grape"]
Also, I'm pretty sure this would be outperformed by most or all of the other answers.
fruits = ["Banana", "Apple", "Orange", "Grape"]
chars = 'ep'.chars
fruits.select { |fruit| (fruit.split('') & chars).length == chars.length }
#=> ["Apple", "Grape"]
chars.each_char.with_object(fruits.dup){|e, a| a.select!{|s| s.include?(e)}}
# => ["Apple", "Grape"]
To print:
puts chars.each_char.with_object(fruits.dup){|e, a| a.select!{|s| s.include?(e)}}
I'm an absolute beginner, but here's what worked for me
fruits = %w[Banana Apple Orange Grape]
chars = 'ep'
fruits.each {|fruit| puts fruit if fruit.include?('e') && fruit.include?('p')}
Here is one more way to do this:
fruits.select {|f| chars.downcase.chars.all? {|c| f.downcase.include?(c)} }
Try this, first split all characters into an Array ( chars.split("") ) and after check if all are present into word.
fruits.select{|fruit| chars.split("").all? {|char| fruit.include?(char)}}
#=> ["Apple", "Grape"]

retrieve numbers from a string with regex

I have a string which returns duration in the below format.
"152M0S" or "1H22M32S"
I need to extract hours, minutes and seconds from it as numbers.
I tried like the below with regex
video_duration.scan(/(\d+)?.(\d+)M(\d+)S/)
But it does not return as expected. Anyone has any idea where I am going wrong here.
"1H22M0S".scan(/\d+/)
#=> ["1", "22", "0']
You can use this expression: /((?<h>\d+)H)?(?<m>\d+)M(?<s>\d+)S/.
"1H22M32S".match(/((?<h>\d+)H)?(?<m>\d+)M(?<s>\d+)S/)
#=> #<MatchData "1H22M32S" h:"1" m:"22" s:"32">
"152M0S".match(/((?<h>\d+)H)?(?<m>\d+)M(?<s>\d+)S/)
#=> #<MatchData "152M0S" h:nil m:"152" s:"0">
Question mark after group makes it optional. To access data: $~[:h].
If you want to extract numbers, you could do as :
"1H22M32S".match(/(?<hour>(\d+))H(?<min>(\d+))M(?<sec>(\d+))S/i).captures
# => ["1", "22", "32"]
"1H22M32S".match(/(?<hour>(\d+))H(?<min>(\d+))M(?<sec>(\d+))S/i)['min']
# => "22"
"1H22M32S".match(/(?<hour>(\d+))H(?<min>(\d+))M(?<sec>(\d+))S/i)['hour']
# => "1"
Me, I'd hashify:
def hashify(str)
str.gsub(/\d+[HMS]/).with_object({}) { |s,h| h[s[-1]] = s.to_i }
end
hashify "152M0S" #=> {"M"=>152, "S"=>0}
hashify "1H22M32S" #=> {"H"=>1, "M"=>22, "S"=>32}
hashify "32S22M11H" #=> {"S"=>32, "M"=>22, "H"=>11}
hashify "1S" #=> {"S"=>1}

Split a string in Ruby

I have a hash returned to me in ruby
test_string = "{cat=6,bear=2,mouse=1,tiger=4}"
I need to get a list of these items in this form ordered by the number.
animals = [cat, tiger, bear, mouse]
My thoughts were to_s this in ruby and split on the '=' character. Then try to order them and put in a new list. Is there an easy way to do this in ruby? Sample code would be greatly appreciated.
s = "{cat=6,bear=2,mouse=1,tiger=4}"
a = s.scan(/(\w+)=(\d+)/)
p a.sort_by { |x| x[1].to_i }.reverse.map(&:first)
a = test_string.split('{')[1].split('}').first.split(',')
# => ["cat=6", "bear=2", "mouse=1", "tiger=4"]
a.map{|s| s.split('=')}.sort_by{|p| p[1].to_i}.reverse.map(&:first)
# => ["cat", "tiger", "bear", "mouse"]
Not the most elegant way to do it, but it works:
test_string.gsub(/[{}]/, "").split(",").map {|x| x.split("=")}.sort_by {|x| x[1].to_i}.reverse.map {|x| x[0].strip}
The below code should do it.
Explained the steps inline
test_string.gsub!(/{|}/, "") # Remove the curly braces
array = test_string.split(",") # Split on comma
array1= []
array.each {|word|
array1<<word.split("=") # Create an array of arrays
}
h1 = Hash[*array1.flatten] # Convert Array into Hash
puts h1.keys.sort {|a, b| h1[b] <=> h1[a]} # Print keys of the hash based on sorted values
test_string = "{cat=6,bear=2,mouse=1,tiger=4}"
Hash[*test_string.scan(/\w+/)].sort_by{|k,v| v.to_i }.map(&:first).reverse
#=> ["cat", "tiger", "bear", "mouse"]

How to get a substring of text?

I have text with length ~700. How do I get only ~30 of its first characters?
If you have your text in your_text variable, you can use:
your_text[0..29]
Use String#slice, also aliased as [].
a = "hello there"
a[1] #=> "e"
a[1,3] #=> "ell"
a[1..3] #=> "ell"
a[6..-1] #=> "there"
a[6..] #=> "there" (requires Ruby 2.6+)
a[-3,2] #=> "er"
a[-4..-2] #=> "her"
a[12..-1] #=> nil
a[-2..-4] #=> ""
a[/[aeiou](.)\1/] #=> "ell"
a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil
a["lo"] #=> "lo"
a["bye"] #=> nil
Since you tagged it Rails, you can use truncate:
http://api.rubyonrails.org/classes/ActionView/Helpers/TextHelper.html#method-i-truncate
Example:
truncate(#text, :length => 17)
Excerpt is nice to know too, it lets you display an excerpt of a text Like so:
excerpt('This is an example', 'an', :radius => 5)
# => ...s is an exam...
http://api.rubyonrails.org/classes/ActionView/Helpers/TextHelper.html#method-i-excerpt
if you need it in rails you can use first (source code)
'1234567890'.first(5) # => "12345"
there is also last (source code)
'1234567890'.last(2) # => "90"
alternatively check from/to (source code):
"hello".from(1).to(-2) # => "ell"
If you want a string, then the other answers are fine, but if what you're looking for is the first few letters as characters you can access them as a list:
your_text.chars.take(30)

Resources