Array of strings Group by first common letters [closed] - ruby

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Is there anyway of grouping first common letters in an array of strings?
For example:
array = [ 'hello', 'hello you', 'people', 'finally', 'finland' ]
so when i do
array.group_by{ |string| some_logic_with_string }
The result should be,
{
'hello' => ['hello', 'hello you'],
'people' => ['people'],
'fin' => ['finally', 'finland']
}

NOTE: Some test cases are ambiguous and expectations conflict with other tests, you need to fix them.
I guess plain group_by may not work, a further processing is needed.
I have come up with below code that seems to work for all the given test cases in consistent manner.
I have left notes in the code to explain the logic. Only way to fully understand it will be to inspect value of h and see the flow for a simple test case.
def group_by_common_chars(array)
# We will iteratively group by as many time as there are characters
# in a largest possible key, which is max length of all strings
max_len = array.max_by {|i| i.size}.size
# First group by first character.
h = array.group_by{|i| i[0]}
# Now iterate remaining (max_len - 1) times
(1...max_len).each do |c|
# Let's perform a group by next set of starting characters.
t = h.map do |k,v|
h1 = v.group_by {|i| i[0..c]}
end.reduce(&:merge)
# We need to merge the previously generated hash
# with the hash generated in this iteration. Here things get tricky.
# If previously, we had
# {"a" => ["a"], "ab" => ["ab", "abc"]},
# and now, we have
# {"a"=>["a"], "ab"=>["ab"], "abc"=>["abc"]},
# We need to merge the two hashes such that we have
# {"a"=>["a"], "ab"=>["ab", "abc"], "abc"=>["abc"]}.
# Note that `Hash#merge`'s block is called only for common keys, so, "abc"
# will get merged, we can't do much about it now. We will process
# it later in the loop
h = h.merge(t) do |k, o, n|
if (o.size != n.size)
diff = [o,n].max - [o,n].min
if diff.size == 1 && t.value?(diff)
[o,n].max
else
[o,n].min
end
else
o
end
end
end
# Sort by key length, smallest in the beginning.
h = h.sort {|i,j| i.first.size <=> j.first.size }.to_h
# Get rid of those key-value pairs, where value is single element array
# and that single element is already part of another key-value pair, and
# that value array has more than one element. This step will allow us
# to get rid of key-value like "abc"=>["abc"] in the example discussed
# above.
h = h.tap do |h|
keys = h.keys
keys.each do |k|
v = h[k]
if (v.size == 1 &&
h.key?(v.first) &&
h.values.flatten.count(v.first) > 1) then
h.delete(k)
end
end
end
# Get rid of those keys whose value array consist of only elements that
# already part of some other key. Since, hash is ordered by key's string
# size, this process allows us to get rid of those keys which are smaller
# in length but consists of only elements that are present somewhere else
# with a key of larger length. For example, it lets us to get rid of
# "a"=>["aba", "abb", "aaa", "aab"] from a hash like
# {"a"=>["aba", "abb", "aaa", "aab"], "ab"=>["aba", "abb"], "aa"=>["aaa", "aab"]}
h.tap do |h|
keys = h.keys
keys.each do |k|
values = h[k]
other_values = h.values_at(*(h.keys-[k])).flatten
already_present = values.all? do |v|
other_values.include?(v)
end
h.delete(k) if already_present
end
end
end
Sample Run:
p group_by_common_chars ['hello', 'hello you', 'people', 'finally', 'finland']
#=> {"fin"=>["finally", "finland"], "hello"=>["hello", "hello you"], "people"=>["people"]}
p group_by_common_chars ['a', 'ab', 'abc']
#=> {"a"=>["a"], "ab"=>["ab", "abc"]}
p group_by_common_chars ['aba', 'abb', 'aaa', 'aab']
#=> {"ab"=>["aba", "abb"], "aa"=>["aaa", "aab"]}
p group_by_common_chars ["Why", "haven't", "you", "answered", "the", "above", "questions?", "Please", "do", "so."]
#=> {"a"=>["answered", "above"], "do"=>["do"], "Why"=>["Why"], "you"=>["you"], "so."=>["so."], "the"=>["the"], "Please"=>["Please"], "haven't"=>["haven't"], "questions?"=>["questions?"]}

Not sure, if you can sort by all common letters. But if you want to do sort only by first letter then here it is:
array = [ 'hello', 'hello you', 'people', 'finally', 'finland' ]
result = {}
array.each { |st| result[st[0]] = result.fetch(st[0], []) + [st] }
pp result
{"h"=>["hello", "hello you"], "p"=>["people"], "f"=>["finally", "finland"]}
Now result contains your desired hash.

Hmm, you're trying to do something that's pretty custom. I can think of two classical approaches that sort of do what you want: 1) Stemming and 2) Levenshtein Distance.
With stemming you're finding the root word to a longer word. Here's a gem for it.
Levenshtein is a famous algorithm which calculates the difference between two strings. There is a gem for it that runs pretty fast due to a native C extension.

Related

Why does is the index of the array repeatedly be printed out to be 2?

I'm working on this function:
It's supposed to take in an array and match it with a given word to see if that word can be formed with the given array of strings.
I added the two commented lines because I wanted to see how the for-loop works.
def canformword(arr,word)
arrword = word.chars
arrleft = arr
flag = true
for i in 0...arrword.size
ch = arrword[i]
# puts arrword[i]
if !arrleft.include?(ch)
flag = false
break
else
ind = arrleft.index(ch)
# puts ind
arrleft.delete_at(ind)
end
end
if flag
puts 'can form word'
else
puts 'can not form word'
end
end
canformword(['y','b','z','e','a','u','t'], 'beauty')
canformword(['r','o','u','g','h'], 'tough')
When I uncomment those two lines, the following is the output:
Why does the output print out the index 2 repeatedly? I would think that it would print out the index of each letter in my arrleft array rather than repeatedly spitting out 2!
I understand the 1 it prints out, because that's the index of b, but the rest is weird to me.
b
1
e
2
a
2
u
2
t
2
y
0
can form word
t
can not form word
hear a better implementation that
def can_form_word?(chars_array, word)
(word.chars - chars_array).empty?
end
that's all.
here another implementation the Ruby way. Because your code is like C. I've been writing Ruby code for more than three years now, and I never used for loops.
def canformword(chars,word)
word.each_char do |char|
puts char
if !chars.include?(char)
return false # or puts "Can't form word"
end
end
true # or puts "Can form word"
end
this is because you are deleting the character at position ind(arrleft.delete_at(ind)); so each time array characters are shifting one cell left.
Now as all your letters 'e','a','u','t','y' are placed ordered way so it is showing 2,2,2,2 continuously.
Now look at 'y'; it is at position 0 ; so 0 is printed at end.
So the issue is because you are deleting the characters at position 'ind'.
So, to achieve this you can just do one thing ; do not delete the characters when found rather replace it by some numeric value like '0'.
You obtain 2 several times because you are deleting elements from the array. In that case you delete the second element every time so the next character, in the next iteration, take the index 2 again.
Problem
If you want do delete index 2 and 3 from an array, you need to delete them in decreasing order, becausing deleting index 2 would modify index of 3:
array = %w(a b c d e)
array.delete_at(3)
array.delete_at(2)
p array
#=> ["a", "b", "e"]
or
array = %w(a b c d e)
array.delete_at(2)
array.delete_at(2)
p array
#=> ["a", "b", "e"]
Solution
For your code, you just need to replace
arrleft.delete_at(ind)
with
arrleft[ind] = nil
Alternative
Since you take the numbers of characters into account, here's a modified version of a previous answer :
class Array
def count_by
each_with_object(Hash.new(0)) { |e, h| h[e] += 1 }
end
def subset_of?(superset)
superset_counts = superset.count_by
count_by.all? { |k, count| superset_counts[k] >= count }
end
end
def can_form_word?(chars, word)
word.chars.subset_of?(chars)
end
p can_form_word?(['y','b','z','e','a','u','t'], 'beauty')
#=> true
p can_form_word?(['y','b','z','e','u','t'], 'beauty')
#=> false
p can_form_word?(['a', 'c', 'e', 'p', 't', 'b', 'l'], 'acceptable')
#=> false
p ('acceptable'.chars - ['a', 'c', 'e', 'p', 't', 'b', 'l']).empty?
#=> true

Build an array of descending match counts?

I have a hash where the keys are book titles and the values are an array of words in the book.
I want to write a method where, if I enter a word, I can search through the hash to find which array has the highest frequency of the word. Then I want to return an array of the book titles in order of most matches.
The method should return an array in descending order of highest amount of occurrences of the searched word.
This is what I have so far:
def search(query)
books_names = #book_info.keys
book_info = {}
#result.each do |key,value|
contents = #result[key]
if contents.include?(query)
book_info[:key] += 1
end
end
end
If book_info is your hash and input_str is the string you want to search in book_info, the following will return you a hash in the order of frequency of input_str in the text:
Hash[book_info.sort_by{|k, v| v.count(input_str)}.reverse]
If you want output to be an array of book names, remove Hash and take out the first elements:
book_info.sort_by{|k, v| v.count(input_str)}.reverse.map(&:first)
This is a more compact version(but little bit slow), replacing reverse with negative sort criteria:
book_info.sort_by{|k, v| -v.count(input_str)}.map(&:first)
You may want to consider creating a Book class. Here's a book class that will index the words into a word_count hash for quick sorting.
class Book
attr_accessor :title, :words
attr_reader :word_count
#books = []
class << self
attr_accessor :books
def top(word)
#books.sort_by{|b| b.word_count[word.downcase]}.reverse
end
end
def initialize
self.class.books << self
#word_count = Hash.new { |h,k| h[k] = 0}
end
def words=(str)
str.gsub(/[^\w\s]/,"").downcase.split.each do |word|
word_count[word] += 1
end
end
def to_s
title
end
end
Use it like so:
a = Book.new
a.title = "War and Peace"
a.words = "WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bonaparte family."
b = Book.new
b.title = "Moby Dick"
b.words = "Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world."
puts Book.top("ago")
result:
Moby Dick
War and Peace
Here is one way to build a hash whose keys are words and whose values are arrays of hashes with keys :title and :count, the hashes ordered by decreasing value of count.
Code
I am assuming we will start with a hash books, whose keys are titles and whose values are all the text in the book in one string.
def word_count_hash(books)
word_and_count_by_title = books.each_with_object({}) { |(title,words),h|
h[title] = words.scan(/\w+/)
.map(&:downcase)
.each_with_object({}) { |w,g| g[w] = (g[w] || 0)+1 } }
title_and_count_by_word = word_and_count_by_title
.each_with_object({}) { |(title,words),g| words.each { |w,count|
g.update({w =>[{title: title, count: count}]}){|_,oarr,narr|oarr+narr}}}
title_and_count_by_word.keys.each { |w| g[w].sort_by! { |h| -h[:count] } }
title_and_count_by_word
end
Example
books = {}
books["Grapes of Wrath"] =
<<_
To the red country and part of the gray country of Oklahoma, the last rains
came gently, and they did not cut the scarred earth. The plows crossed and
recrossed the rivulet marks. The last rains lifted the corn quickly and
scattered weed colonies and grass along the sides of the roads so that the
gray country and the dark red country began to disappear under a green cover.
_
books["Tale of Two Cities"] =
<<_
It was the best of times, it was the worst of times, it was the age of wisdom,
it was the age of foolishness, it was the epoch of belief, it was the epoch of
incredulity, it was the season of Light, it was the season of Darkness, it was
the spring of hope, it was the winter of despair, we had everything before us,
we had nothing before us, we were all going direct to Heaven, we were all
going direct the other way
_
books["Moby Dick"] =
<<_
Call me Ishmael. Some years ago - never mind how long precisely - having little
or no money in my purse, and nothing particular to interest me on shore, I
thought I would sail about a little and see the watery part of the world. It is
a way I have of driving off the spleen and regulating the circulation. Whenever
I find myself growing grim about the mouth; whenever it is a damp, drizzly
November in my soul; whenever I find myself involuntarily pausing before coffin
warehouses
_
Construct the hash:
title_and_count_by_word = word_count_hash(books)
and then look up words:
title_and_count_by_word["the"]
#=> [{:title=>"Grapes of Wrath", :count=>12},
# {:title=>"Tale of Two Cities", :count=>11},
# {:title=>"Moby Dick", :count=>5}]
title_and_count_by_word["to"]
#=> [{:title=>"Grapes of Wrath", :count=>2},
# {:title=>"Tale of Two Cities", :count=>1},
# {:title=>"Moby Dick", :count=>1}]
Note the words being looked up must be entered in (or converted to) lower case.
Explanation
Construct the first hash:
word_and_count_by_title = books.each_with_object({}) { |(title,words),h|
h[title] = words.scan(/\w+/)
.map(&:downcase)
.each_with_object({}) { |w,g| g[w] = (g[w] || 0)+1 } }
#=> {"Grapes of Wrath"=>
# {"to"=>2, "the"=>12, "red"=>2, "country"=>4, "and"=>6, "part"=>1,
# ...
# "disappear"=>1, "under"=>1, "a"=>1, "green"=>1, "cover"=>1},
# "Tale of Two Cities"=>
# {"it"=>10, "was"=>10, "the"=>11, "best"=>1, "of"=>10, "times"=>2,
# ...
# "going"=>2, "direct"=>2, "to"=>1, "heaven"=>1, "other"=>1, "way"=>1},
# "Moby Dick"=>
# {"call"=>1, "me"=>2, "ishmael"=>1, "some"=>1, "years"=>1, "ago"=>1,
# ...
# "pausing"=>1, "before"=>1, "coffin"=>1, "warehouses"=>1}}
To see what's happening here, consider the first element of books that Enumerable#each_with_object passes into the block. The two block variables are assigned the following values:
title
#=> "Grapes of Wrath"
words
#=> "To the red country and part of the gray country of Oklahoma, the
# last rains came gently,\nand they did not cut the scarred earth.
# ...
# the dark red country began to disappear\nunder a green cover.\n"
each_with_object has created a hash represented by the block variable h, which is initially empty.
First construct an array of words and convert each to lower-case.
q = words.scan(/\w+/).map(&:downcase)
#=> ["to", "the", "red", "country", "and", "part", "of", "the", "gray",
# ...
# "began", "to", "disappear", "under", "a", "green", "cover"]
We may now create a hash that contains a count of each word for the title "Grapes of Wrath":
h[title] = q.each_with_object({}) { |w,g| g[w] = (g[w] || 0) + 1 }
#=> {"to"=>2, "the"=>12, "red"=>2, "country"=>4, "and"=>6, "part"=>1,
# ...
# "disappear"=>1, "under"=>1, "a"=>1, "green"=>1, "cover"=>1}
Note the expression
g[w] = (g[w] || 0) + 1
If the hash g already has a key for the word w, this expression is equivalent to
g[w] = g[w] + 1
On the other hand, if g does not have this key (word) (in which case g[w] => nil), then the expression is eqivalent to
g[w] = 0 + 1
The same calculations are then performed for each of the other two books.
We can now construct the second hash.
title_and_count_by_word =
word_and_count_by_title.each_with_object({}) { |(title,words),g|
words.each { |w,count| g.update({ w => [{title: title, count: count}]}) \
{ |_, oarr, narr| oarr + narr } } }
#=> {"to" => [{:title=>"Grapes of Wrath", :count=>2},
# {:title=>"Tale of Two Cities", :count=>1},
# {:title=>"Moby Dick", :count=>1}],
#=> "the" => [{:title=>"Grapes of Wrath", :count=>12},
# {:title=>"Tale of Two Cities", :count=>11},
# {:title=>"Moby Dick", :count=>5}],
# ...
# "warehouses"=> [{:title=>"Moby Dick", :count=>1}]}
(Note that this operation does not order the hashes for each word by :count, even though that may appear to be the case in this output fragment. The hashes are sorted in the next and final step.)
The main operation here that requires explanation is Hash#update (aka Hash#merge!). We are building a hash denoted by the block variable g, which initially is empty. The keys of this hash are words, the values are hashes with keys :title and :count. Whenever the hash being merged has a key (word) that is already a key of g, the block
{ |_, oarr, narr| oarr + narr }
is called to determine the value for the key in the merged hash. The block variables here are the key (word) (which we have replaced with an underscore because it will not be used), the old array of hashes and the new array of hashes to be merged (of which there is just one). We simply add the new hash to merged array of hashes.
Lastly we sort the values of the hash (which are arrays of hashes) on decreasing value of :count.
title_and_count_by_word.keys.each { |w| g[w].sort_by! { |h| -h[:count] } }
title_and_count_by_word
#=> {"to"=>
# [{:title=>"Grapes of Wrath", :count=>2},
# {:title=>"Tale of Two Cities", :count=>1},
# {:title=>"Moby Dick", :count=>1}],
# "the"=>
# [{:title=>"Grapes of Wrath", :count=>12},
# {:title=>"Tale of Two Cities", :count=>11},
# {:title=>"Moby Dick", :count=>5}],
# ...
# "warehouses"=>[{:title=>"Moby Dick", :count=>1}]}

Complicated ruby inject method

Can't seem to figure this out. Please help me understand what this code is requesting for regarding a variable and what the intended output is supposed to be. Thanks in advance!
def function_name(a)
a.inject({}){ |a,b| a[b] = a[b].to_i + 1; a}.\
reject{ |a,b| b == 1 }.keys
end
Assuming a is an array,
The function first count the occurrences of the keys.
a = ['a', 'b', 'c', 'b']
a.inject({}) { |a,b|
# a: a result hash, this is initially an empty hash (`{}` passed to inject)
# b: each element of the array.
a[b] = a[b].to_i + 1 # Increase count of the item
a # The return value of this block is used as `a` argument of the block
# in the next iteration.
}
# => {"a"=>1, "b"=>2, "c"=>1}
Then, it filter items that occur multiple times:
...reject{ |a,b|
# a: key of the hash entry, b: value of the hash entry (count)
b == 1 # entry that match this condition (occurred only once) is filtered out.
}.keys
# => ["b"]
So, function names like get_duplicated_items should be used instead of function_name to better describe the purpose.
It wants a to be an array, but it doesn't seem to matter what the array is made up of so you'll need some other clue to know what should be in the array.
What the code does is fairly straight foreword. For each item in the array it uses it as a key in a hash. It then basically counts how many times it sees that key. Finally it removes all of the items that only showed up once.
It returns the unique items in the array a that show up 2 or more times.

RoR different bracket notation

I'm getting to grips with rails and whilst I feel I am progressing there is one thing that I am struggling to get to grips with and it's very basic. I am trying to understand the different usage of [] {} and () Are there any good sources of their usage and are there any tips you can give to a beginner in recognizing when to use one or the other, or as I seem to see in some cases when they are not required at all?
I know this is extremely basic but I have struggled to find literature which explains concisely the interplay between them and Ruby or specifically RoR
It has nothing to do with RoR; the various brackets are Ruby language constructs.
[] is the array operator, for arrays and other classes that implement it (like a string taking a range to get substrings, or hashes to look up a key's value):
a = [1, 2, 3]
a.each { |n| puts n }
s = "ohai"
puts s[1..-1]
h = { foo: "bar", baz: "plugh" }
puts h[:foo]
{} is for hashes, and one of two ways of delimiting blocks (the other being begin/end). (And used with # for string interpolation.)
h = { foo: "bar", baz: "plugh" }
h.each { |k, v| puts "#{k} == #{v}" }
() is for method parameters, or for enforcing evaluation order in an expression.
> puts 5 * 3 + 5 # Normal precedence has * ahead of +
=> 20
> puts 5 * (3 + 5) # Force 3+5 to be evaluated first
=> 40
def foo(s)
puts(s)
end
They're sometimes optional if the statement has no ambiguity:
def foo s
puts s
end
(They're not always optional, and putting a space between the method call and its parenthetical parameter list can cause issues--best not to, IMO.)
(I probably missed something, too, but there's the nutshell.)
[] are used to access objects within a hash (via a key) or within an array (via an index).
hash[:key] # returns a value
array[0] # returns the first array element
[] is used to describe an array.
array = ['a', 'b', 'c']
Of course this can be nested.
nested = [['a','b','c'], [1,2,3]]
[] can be used to declare a hash, but that's because the Hash class can accept an array.
hash = Hash[['a',1], ['b',2]] # { 'a' => 1, 'b', => 2 }
{} is used to declare a hash.
hash = { 'a' => 1, 'b' => 2 }
This too can be nested.
hash = { 'a' => { 'c' => 3 }, 'b' => { 'd' => 4 } }
{} is also used to delimit blocks. The .each method is a common one. The following two blocks of code are equivalent.
array.each do |n|
puts n
end
array.each { |n| puts n }
The () is just used for grouping in cases where ambiguity needs clarification. This is especially true in methods that take many arguments, some of which may be nil, some of which may be obejcts, etc. You'll see a lot of code that omit them entirely as no grouping is needed for clarity.
puts(string)
puts string
I recommend firing up the rails console and start declaring variables and accessing them.

How to make sure certain elements not get into arrays in Ruby

I have an array lets say
array1 = ["abc", "a", "wxyz", "ab",......]
How do I make sure neither for example "a" (any 1 character), "ab" (any 2 characters), "abc" (any 3 characters), nor words like "that", "this", "what" etc nor any of the foul words are saved in array1?
This removes elements with less than 4 characters and words like this, that, what from array1 (if I got it right):
array1.reject! do |el|
el.length < 4 || ['this', 'that', 'what'].include?(el)
end
This changes array1. If you use reject (without !), it'll return the result and not change array1
You can open and add a new interface to the Array class which will disallow certain words. Example:
class Array
def add(ele)
unless rejects.include?(ele)
self.push ele
end
end
def rejects
['this', 'that', 'what']
end
end
arr = []
arr.add "one"
puts arr
arr.add "this"
puts arr
arr.add "aslam"
puts arr
Output would be:
one one one aslam
And notice the word "this" was not added.
You could create a stop list. Using a hash for this would be more efficient than an array as lookup time will be consistant with a hash. With an array the lookup time is proportional to the number of elements in the array. If you are going to check for stop words a lot, I suggest using a hash that contains all the stop words. Using your code, you could do the following
badwords_a = ["abc", "a", "wxyz", "ab"] # Your array of bad words
badwords_h = {} # Initialize and empty hash
badwords_a.each{|word| badwords_h[word] = nil} # Fill the hash
goodwords = []
words_to_process = ["abc","a","Foo","Bar"] # a list of words you want to process
words_to_process.each do |word| # Process new words
if badwords_h.key?(word)
else
goodwords << word # Add the word if it did not match the bad list
end
end
puts goodwords.join(", ")

Resources