Matching groups of words - ruby

I would like a regexp that match all groups of words (single words and sub-sentences) in a sentence separated by white space.
Example :
"foo bar bar2".scan(regexp)
I want a regexp that will returns :
['foo', 'bar', 'bar2', 'foo bar', 'bar bar2', 'foo bar bar2']
So far, I tried :
"foo bar bar2".scan(/\S*[\S]/) (ie regexp=/\S*/)
which returns ['foo', 'bar', 'bar2']
"foo bar bar2".scan(/\S* [\S]+/) (ie regexp=/\S* [\S]+/)
which returns ["foo bar", " bar2"]

words = "foo bar bar2".scan(/\S+/)
result = 1.upto(words.length).map do |n|
words.each_cons(n).to_a
end.flatten(1)
#⇒ [["foo"], ["bar"], ["bar2"],
# ["foo", "bar"], ["bar", "bar2"],
# ["foo", "bar", "bar2"]]
result.map { |e| e.join(' ') }
#⇒ ["foo", "bar", "bar2", "foo bar", "bar bar2", "foo bar bar2"]
Here we used Enumerable#each_cons to get to the result.

Mudasobwa did a nice variation of this answer check here.
I've used combine , builtin method for arrays. The procedure is almost the same:
string = "foo bar bar2"
groups = string.split
objects = []
for i in 1..groups.size
groups = string.split.combination(i).to_a
objects << groups
end
results = objects.flatten(1).map { |e| e.join('-') }
puts results
Anyway , you can't do it with one regex.(suppose you have 50 words and need to find all the combinations; regex can't do it). You will need to iterate with the objects like Mudasobwa showed.
I would start doing this: the regex, if you want to use one, can be /([^\s]\w+)/m ; for example.
This regex will match words. And by words I mean groups of characters surrounded by white-spaces.
With this you can scan your text or split your string. You can do it many ways and in the end you will have an array with the words you wanna combine.
string = "foo bar bar2"
Then you split it, creating an array and applying to it the combination method.
groups = string.split
=> ["foo", "bar", "bar2"]
combination method takes a number as argument, and that number will be the 'size' of the combination. combination(2) combines the elements in groups of two. 1 - groups of 1 .. 0 groups of zero! (this is why we start combinations with 1).
You need to loop and cover all possible group sizes, saving the results
in a results array. :
objects = []
use the number of elements as parameter to the loop
for i in 1..groups.size
groups = string.split.combination(i).to_a
objects << groups
end
Now you just have to finish with a loop to flatten the arrays that are inside arrays and to take out the comas and double quotes
results = objects.flatten(1).map { |e| e.join('-') }
Thats it! You can run the code above (example with more words)here https://repl.it/JLK9/1
Ps: both question and the mentioned answer are lacking a combination (foo-bar2)

Related

Ruby: Is it true that #map generally doesn't make sense with bang methods?

This question was inspired by this one:
Ruby: Why does this way of using map throw an error?
Someone pointed out the following:
map doesn't make much sense when used with ! methods.
You should either:
use map with gsub
or use each with gsub!
Can someone explain why that is?
Base object
Here's an array with strings as element :
words = ['hello', 'world']
New array
If you want a new array with modified strings, you can use map with gsub :
new_words = words.map{|word| word.gsub('o','#') }
p new_words
#=> ["hell#", "w#rld"]
p words
#=> ["hello", "world"]
p new_words == words
#=> false
The original strings and the original array aren't modified.
Strings modified in place
If you want to modify the strings in place, you can use :
words.each{|word| word.gsub!('o','#') }
p words
#=> ["hell#", "w#rld"]
map and gsub!
new_words = words.map{|word| word.gsub!('o','#') }
p words
#=> ["hell#", "w#rld"]
p new_words
#=> ["hell#", "w#rld"]
p words == new_words
#=> true
p new_words.object_id
#=> 12704900
p words.object_id
#=> 12704920
Here, a new array is created, but the elements are the exact same ones!
It doesn't bring anything more than the previous examples. It creates a new Array for nothing. It also might confuse people reading your code by sending opposite signals :
gsub! will indicate that you want to modifiy existing objects
map will indicate that you don't want to modify existing objects.
Map is for building a new array without mutating the original. Each is for performing some action on each element of an array. Doing both at once is surprising.
>> arr = ["foo bar", "baz", "quux"]
=> ["foo bar", "baz", "quux"]
>> arr.map{|x| x.gsub!(' ', '-')}
=> ["foo-bar", nil, nil]
>> arr
=> ["foo-bar", "baz", "quux"]
Since !-methods generally have side effects (and only incidentally might return a value), each should be preferred to map when invoking a !-method.
An exception might be when you have a list of actions to perform. The method to perform the action might sensibly be named with a !, but you wish to collect the results in order to report which ones succeeded or failed.

Search array of symbols - Ruby

Given an array of symbols, how would you iterate over the array, and store the array indexes of matching pairs?
foo = [:date, :recorded_at, :scheduled_for, :amount, :activity, :pending ]
For example, if you searched for 'recorded_at' and 'activity', it should return [1,4]
I thought something like this would work:
bar = ['recorded_at','activity']
buzz = bar.each{|i| foo.index(i.to_sym)}
However this just returns the strings, ["foo", "bar"], not the actual array indexes.
Use Array#map instead of Array#each:
buzz = bar.map{|i| foo.index(i.to_sym)}
#=> [1, 4]
An efficient way, particularly if you have to do this multiple times, for different values of bar, or if the arrays are large, would be to first convert foo to a hash:
foo = [:date, :recorded_at, :scheduled_for, :amount, :activity, :pending ]
foo_hash = Hash[foo.map(&:to_s).zip([*0...foo.size]).to_h]
#=> {"date"=>0, "recorded_at"=>1, "scheduled_for"=>2,
# "amount"=>3, "activity"=>4, "pending"=>5}
Then it is a simple hash lookup for any value of bar:
bar = ['recorded_at','activity']
foo_hash.values_at(*bar)
#=> [1, 4]
For Ruby 1.9+ Hash[arr] can be replaced by arr.to_h.
If n=foo.size and m=bar.size, only m hash lookups would be required after constructing the hash. By constrast, for each element of bar, there would be, on average, n/2 comparisons to find the index of an element of foo that matches the element of bar, for a total of m*n/2 comparisons.

Ruby Parameters [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Could someone please explain lines 5 and 6? On line 5, is |word| a parameter? Why is it needed there? Also, on line 6, are {|a, b| b} also parameters. How should one read line 6? What is it doing?
puts "Input something: " # 1
text = gets.chomp # 2
words = text.split # 3
frequencies = Hash.new(0) # 4
words.each { |word| frequencies[word] += 1 } # 5
frequencies = frequencies.sort_by {|a, b| b} # 6
frequencies.reverse! # 7
On line 5, is |word| a parameter?
Yes, it's a block argument.
Why is it needed there?
From Array#each's documentation: "Calls the given block once for each element in self, passing that element as a parameter."
Example:
words = ["foo", "bar", "baz"]
words.each { |word| puts word }
The block is called three times. On the first pass, its block argument word is set to "foo", on the second pass it's set to "bar" and on the third pass it's set to "baz". Each time word is printed using puts.
Output:
foo
bar
baz
In your example, a hash is used to store the word frequencies. Within the each loop, the word's count is incremented.
How should one read line 6? What is it doing?
Enumerable#sort_by sorts a collection by the block's result. For example, to sort an array of strings by the string's length you would use:
["xxx", "xx", "x"].sort_by { |str| str.length }
#=> ["x", "xx", "xxx"]
Since frequencies is a hash, the block is called for each pair. Therefore, two arguments are set - a is the pair's key and b is the pair's value:
frequencies = { "foo" => 3, "bar" => 2, "baz" => 1}
frequencies = frequencies.sort_by { |a, b| b }
#=> [["baz", 1], ["bar", 2], ["foo", 3]]
It sorts the hash by its values. Note that sort_by returns an array. The array is assigned to the frequencies variable.
Instead of a and b you could use more descriptive argument names:
frequencies.sort_by { |word, count| count }

Is there a version of Ruby's Regexp.match that responds to the order of the matches within the string?

I want to use regexes to check if a given string is composed of certain substrings.
For example, given the regular expression
> regex = /(?:(foo)|(bar)|(baz))*/
I can determine whether a given string matches the pattern:
> regex === "bazbar"
=> true
> regex === "qux"
=> false
But I want to know how to break the string into substrings. I can almost do this with
> regex.match("barbazfoo").captures
=> ["foo", "bar", "baz"]
But here they appear in the order in which I specified them within the regex. I want to return
["bar", "baz", "foo"]
In the order in which they appeared in the string.
You can use String#scan with a modified regular expression:
regex = /foo|bar|baz/
"barbazfoo".scan(regex)
# => ["bar", "baz", "foo"]
UPDATE according to OP's comment.
If some of the strings I'm using are substrings of the others, you need to order the so that all the substrings go last.
"barfoo".scan(/ba|bar|foo/) # without ordering
# => ["ba", "foo"]
words = ['ba', 'bar', 'foo']
pattern = words.map { |word| Regexp.escape(word) }.sort_by { |x| -x.size }.join('|')
"barfoo".scan(Regexp.new(pattern))
# => ["bar", "foo"]

Ruby what's the lazy way to get input in loop

Say I want to get 10 inputs in loop and store it in an array. The input will be either string or line or json string.
I'm aware of Ruby's upto and gets.chomp but I'm looking for a simple and lazy technique like:
n=10
arr = []
loop(n) { arr.push getline } #Just an example to share my thought. Will not work
Don't know if this is "simple and lazy" enough:
irb> 3.times.collect { gets.chomp }
foo
bar
baz
# => ["foo", "bar", baz"]
Array.new.
Array.new(3){gets.chomp}
(1..3).map {gets.strip!}
This works nice, and is clean for noise before and after the entries.
Valid in 1.9 and 2.0.
>> (1..3).map {gets.strip!}
Hello
1
2
=> ["Hello", "1", "2"]

Resources