Ruby search for word in string - ruby

Given input = "helloworld"
The output should be output = ["hello", "world"]
Given I have a method called is_in_dict? which returns true if there's a word given
So far i tried:
ar = []
input.split("").each do |f|
ar << f if is_in_dict? f
// here need to check given char
end
How to achieve it in Ruby?

Instead of splitting the input into characters, you have to inspect all combinations, i.e. "h", "he", "hel", ... "helloworld", "e", "el" , "ell", ... "elloworld" and so on.
Something like this should work:
(0..input.size).to_a.combination(2).each do |a, b|
word = input[a...b]
ar << word if is_in_dict?(word)
end
#=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
ar
#=> ["hello", "world"]
Or, using each_with_object, which returns the array:
(0..input.size).to_a.combination(2).each_with_object([]) do |(a, b), array|
word = input[a...b]
array << word if is_in_dict?(word)
end
#=> ["hello", "world"]
Another approach is to build a custom Enumerator:
class String
def each_combination
return to_enum(:each_combination) unless block_given?
(0..size).to_a.combination(2).each do |a, b|
yield self[a...b]
end
end
end
String#each_combination yields all combinations (instead of just the indices):
input.each_combination.to_a
#=> ["h", "he", "hel", "hell", "hello", "hellow", "hellowo", "hellowor", "helloworl", "helloworld", "e", "el", "ell", "ello", "ellow", "ellowo", "ellowor", "elloworl", "elloworld", "l", "ll", "llo", "llow", "llowo", "llowor", "lloworl", "lloworld", "l", "lo", "low", "lowo", "lowor", "loworl", "loworld", "o", "ow", "owo", "owor", "oworl", "oworld", "w", "wo", "wor", "worl", "world", "o", "or", "orl", "orld", "r", "rl", "rld", "l", "ld", "d"]
It can be used with select to easily filter specific words:
input.each_combination.select { |word| is_in_dict?(word) }
#=> ["hello", "world"]

This seems to be a task for recursion. In short you want to take letters one by one until you get a word which is in dictionary. This however will not guarantee that the result is correct, as the remaining letters may not form a words ('hell' + 'oworld'?). This is what I would do:
def split_words(string)
return [[]] if string == ''
chars = string.chars
word = ''
(1..string.length).map do
word += chars.shift
next unless is_in_dict?(word)
other_splits = split_words(chars.join)
next if other_splits.empty?
other_splits.map {|split| [word] + split }
end.compact.inject([], :+)
end
split_words('helloworld') #=> [['hello', 'world']] No hell!
It will also give you all possible splits, so pages with urls like penisland can be avoided
split_words('penisland') #=> [['pen', 'island'], [<the_other_solution>]]

Related

Ruby codes works

I have the ff array:
words = ['demo', 'none', 'tied', 'evil', 'dome', 'mode', 'live',
'fowl', 'veil', 'wolf', 'diet', 'vile', 'edit', 'tide',
'flow', 'neon']
And now I am trying to do an anagram like this:
["demo", "dome", "mode"]
["neon", "none"]
(etc)
So I got this code:
result = {}
words.each do |word|
key = word.split('').sort.join
if result.has_key?(key)
result[key].push(word)
else
result[key] = [word]
end
end
result.each do |k, v|
puts "------"
p v
end
I understand how the word got split and joined but this part here is not clear to me:
if result.has_key?(key)
result[key].push(word)
else
result[key] = [word]
end
On the code above it's pretty obvious that result is an empty hash and now we're asking if it has a key of the sorted/joined key via if result.has_key?(key) How does that work? Why ask an empty hash if it has a key of the selected key via word iteration?
result[key].push(word) also is not clear to me. So is this code putting the key inside the result as its key? or the word itself?
result[key] = [word] this one also. Is it adding the word inside the array with the key?
Sorry I am bit confused.
The results is only empty on the first iteration of the loop. The line
if result.has_key?(key)
is checking if the key created by sorting the letters in the current word exists, and in the case of the first iteration when it's empty, yes, it is obviously not there this time, but it still needs to check every other time too.
Now, when a particular key does not exist yet in results, that key is added to results and a new array containing the current word is added as the value for that key, in the line
result[key] = [word]
When a key already exists in results, that means there is already an array containing at least one word, so the current word is added into that array, in the line
result[key].push(word)
Stepping through what's happening:
words = ['demo', 'neon', 'dome', 'mode', 'none']
// first iteration of the loop
word = 'demo'
key = 'demo' // the letters in "demo" happen to already be sorted
Is 'demo' a key in results?
results is currently {}
No, 'demo' is not a key in {}
Add 'demo' as a key, and add an array with 'demo' inside
results is now { 'demo' => ['demo'] }
// second iteration
word = 'neon'
key = 'enno'
Is 'enno' a key in results?
results is currently { 'demo' => ['demo'] }
No, 'enno' is not a key in { 'demo' => ['demo'] }
Add 'enno' as a key, and add an array with 'neon' inside
results is now { 'demo' => ['demo'], 'enno' => ['neon'] }
// third iteration
word = 'dome'
key = 'demo'
Is 'demo' a key in results?
results is currently { 'demo' => ['demo'], 'enno' => ['neon'] }
Yes, 'demo' is a key in { 'demo' => ['demo'], 'enno' => ['neon'] }
Add 'dome' to the array at key = 'demo'
results is now { 'demo' => ['demo', 'dome'], 'enno' => ['neon'] }
// ... and so on
There are tools that help you figure this stuff out on your own. Here's an example using Seeing Is Believing with vim:
words = ['demo', 'mode']
result = {}
words.each do |word| # => ["demo", "mode"]
key = word # => "demo", "mode"
.split('') # => ["d", "e", "m", "o"], ["m", "o", "d", "e"]
.sort # => ["d", "e", "m", "o"], ["d", "e", "m", "o"]
.join # => "demo", "demo"
result # => {}, {"demo"=>["demo"]}
if result.has_key?(key)
result[key].push(word) # => ["demo", "mode"]
else
result[key] = [word] # => ["demo"]
end
result # => {"demo"=>["demo"]}, {"demo"=>["demo", "mode"]}
end
result.each do |k, v| # => {"demo"=>["demo", "mode"]}
puts "------"
p v
end
# >> ------
# >> ["demo", "mode"]
Other tools I'd use are Irb and Pry.
Considering that you have answers that provide good explanations of your problem, I would like to present some more Ruby-like approaches that could be used. All of these methods create a hash h whose values are arrays of words that are anagrams of each other, which can be extracted from the hash by executing h.values.
Use Enumerable#group_by and Array#sort
This is arguably the most direct approach.
words.group_by { |w| w.each_char.sort }.values
#=> [["demo", "dome", "mode"], ["none", "neon"], ["tied", "diet", "edit", "tide"],
# ["evil", "live", "veil", "vile"], ["fowl", "wolf", "flow"]]
group_by produces
words.group_by { |w| w.each_char.sort }
#=> {["d", "e", "m", "o"]=>["demo", "dome", "mode"],
# ["e", "n", "n", "o"]=>["none", "neon"],
# ["d", "e", "i", "t"]=>["tied", "diet", "edit", "tide"],
# ["e", "i", "l", "v"]=>["evil", "live", "veil", "vile"],
# ["f", "l", "o", "w"]=>["fowl", "wolf", "flow"]}
after which it is a simple matter of extracting the values of this hash.
Build a hash by appending words to arrays that are the values of the hash
words.each_with_object({}) { |w,h| (h[w.each_char.sort] ||= []) << w }.values
#=> [["demo", "dome", "mode"], ["none", "neon"], ["tied", "diet", "edit", "tide"],
# ["evil", "live", "veil", "vile"], ["fowl", "wolf", "flow"]]
When "demo" is passed to the block the hash h is empty, so the block variables are assigned values
w = "demo"
h = {}
and the block calculation is performed:
h[["d", "e", "m", "o"]] ||= []) << w
as
w.each_char.sort
#=> ["d", "e", "m", "o"]
Ruby first expands this to
h[["d", "e", "m", "o"]] = (h[["d", "e", "m", "o"]] ||= []) << "demo"
At this point h has no keys, so h[["d", "e", "m", "o"]] evaluates to nil. The expression therefore becomes
h[["d", "e", "m", "o"]] = (nil ||= []) << "demo"
= [] << "demo"
= ["demo"]
Later, when "dome" is encountered,
w = "dome"
w.each_char.sort
#=> ["d", "e", "m", "o"]
and since h already has this key, the block calculation is as follows.
h[["d", "e", "m", "o"]] = (h[["d", "e", "m", "o"]] ||= []) << "dome"
= (["demo"] ||= []) << "dome"
= ["demo"] << "dome"
= ["demo", "dome"]
We obtain
words.each_with_object({}) { |w,h| (h[w.each_char.sort] ||= []) << w }
#=> {["d", "e", "m", "o"]=>["demo", "dome", "mode"],
# ["e", "n", "n", "o"]=>["none", "neon"],
# ["d", "e", "i", "t"]=>["tied", "diet", "edit", "tide"],
# ["e", "i", "l", "v"]=>["evil", "live", "veil", "vile"],
# ["f", "l", "o", "w"]=>["fowl", "wolf", "flow"]}
after which the values are extracted.
A variant of this is the following.
words.each_with_object(Hash.new { |h,k| h[k] = []}) { |w,h|
h[w.each_char.sort] << w }.values
See the doc for Hash::new for an explanation, in particular the discussion of default values given by a block.
For each word, merge a hash having a single key into an initially-empty hash
words.each_with_object({}) { |w,h|
h.update(w.each_char.sort=>[w]) { |_,o,n| o+n } }.values
The argument w.each_char.sort=>[w] is shorthand for { w.each_char.sort=>[w] }.
This uses the form of Hash#update (aka merge!) that employs a "resolution" block (here { |_,o,n| o+n }) to determine the values of keys that are present in both hashes begin merged. See the doc for a description of that block's three keys (the first block variable, the common key, is not used in this calculation, which is why I used an underscore).

Ruby string char chunking

I have a string "wwwggfffw" and want to break it up into an array as follows:
["www", "gg", "fff", "w"]
Is there a way to do this with regex?
"wwwggfffw".scan(/((.)\2*)/).map(&:first)
scan is a little funny, as it will return either the match or the subgroups depending on whether there are subgroups; we need to use subgroups to ensure repetition of the same character ((.)\1), but we'd prefer it if it returned the whole match and not just the repeated letter. So we need to make the whole match into a subgroup so it will be captured, and in the end we need to extract just the match (without the other subgroup), which we do with .map(&:first).
EDIT to explain the regexp ((.)\2*) itself:
( start group #1, consisting of
( start group #2, consisting of
. any one character
) and nothing else
\2 followed by the content of the group #2
* repeated any number of times (including zero)
) and nothing else.
So in wwwggfffw, (.) captures w into group #2; then \2* captures any additional number of w. This makes group #1 capture www.
You can use back references, something like
'wwwggfffw'.scan(/((.)\2*)/).map{ |s| s[0] }
will work
Here's one that's not using regex but works well:
def chunk(str)
chars = str.chars
chars.inject([chars.shift]) do |arr, char|
if arr[-1].include?(char)
arr[-1] << char
else
arr << char
end
arr
end
end
In my benchmarks it's faster than the regex answers here (with the example string you gave, at least).
Another non-regex solution, this one using Enumerable#slice_when, which made its debut in Ruby v.2.2:
str.each_char.slice_when { |a,b| a!=b }.map(&:join)
#=> ["www", "gg", "fff", "w"]
Another option is:
str.scan(Regexp.new(str.squeeze.each_char.map { |c| "(#{c}+)" }.join)).first
#=> ["www", "gg", "fff", "w"]
Here the steps are as follows
s = str.squeeze
#=> "wgfw"
a = s.each_char
#=> #<Enumerator: "wgfw":each_char>
This enumerator generates the following elements:
a.to_a
#=> ["w", "g", "f", "w"]
Continuing
b = a.map { |c| "(#{c}+)" }
#=> ["(w+)", "(g+)", "(f+)", "(w+)"]
c = b.join
#=> "(w+)(g+)(f+)(w+)"
r = Regexp.new(c)
#=> /(w+)(g+)(f+)(w+)/
d = str.scan(r)
#=> [["www", "gg", "fff", "w"]]
d.first
#=> ["www", "gg", "fff", "w"]
Here's one more way of doing it without a regex:
'wwwggfffw'.chars.chunk(&:itself).map{ |s| s[1].join }
# => ["www", "gg", "fff", "w"]

My Ruby Anagram not working correctly

i was given an assignment to write an Anagram program
below is what i came up with
class Anagram
attr_accessor :anagram_value
def initialize(value)
#anagram_value = value
end
def matches(*collection)
matches = []
matches = collection.select do |word|
(word.length == #anagram_value.length) ? is_an_anagram?(word) : false
end
return matches
end
def is_an_anagram?(word)
return get_word_ord_sum(word) == get_word_ord_sum(#anagram_value)
end
def get_word_ord_sum(word)
sum = 0
word.split("").each { |c| sum += c.ord }
Areturn sum
end
end
while the Above works using the following cases, Surprisingly.
it "detects multiple Anagrams" do
subject = Anagram.new("allergy")
matches = subject.matches('gallery', 'ballerina', 'regally', 'clergy', 'largely', 'leading');
expect(matches).to eq ['gallery', 'regally', 'largely']
end
it actually fails the following
it "no matches" do
subject = Anagram.new("abc")
matches = subject.matches("bbb")
expect(matches).to eq []
end
The problem is that 97 + 98 + 99 == 98 + 98 + 98. Aka, the sum of the character numbers does not uniquely map to the histogram of a given string.
A way to fix it would be to map get_word_ord_sum to something else. For example, the "smallest" anagram will do. However, note it's O(nlgn):
word.chars.sort.join
EDIT: Expanding on the idea to use Array#group_by, replace get_word_ord_sum with:
word.downcase.chars.group_by(&:itself)
Now you will get a histogram-like hash, and since order of keys while comparing hashes doesn't matter, you will get your desired result in O(n).
This might help.
words = ['demo', 'none', 'tied', 'evil', 'dome', 'mode', 'live',
'fowl', 'veil', 'wolf', 'diet', 'vile', 'edit', 'tide',
'flow', 'neon']
groups = words.group_by { |word| word.split('').sort }
Return groups:
{["d", "e", "m", "o"]=>["demo", "dome", "mode"], ["e", "n", "n", "o"]=>["none", "neon"], ["d", "e", "i", "t"]=>["tied", "diet", "edit", "tide"], ["e", "i", "l", "v"]=>["evil", "live", "veil", "vile"], ["f", "l", "o", "w"]=>["fowl", "wolf", "flow"]}
groups.each { |x, y| p y }
Returns each value:
["demo", "dome", "mode"]
["none", "neon"]
["tied", "diet", "edit", "tide"]
["evil", "live", "veil", "vile"]
["fowl", "wolf", "flow"]

How do I create a histogram by iterating over an array in Ruby

So I was told to rewrite this question and outline my goal. They asked me to iterate over the array and "Use .each to iterate over frequencies and print each word and its frequency to the console... put a single space between the word and its frequency for readability."
puts "Type something profound please"
text = gets.chomp
words = text.split
frequencies = Hash.new 0
frequencies = frequencies.sort_by {|x,y| y}
words.each {|word| frequencies[word] += 1}
frequencies = frequencies.sort_by{|x,y| y}.reverse
puts word +" " + frequencies.to_s
frequencies.each do |word, frequencies|
end
Why can't it convert the string into an integer? What am I doing incorrectly?
Try this code:
puts "Type something profound please"
words = gets.chomp.split #No need for the test variable
frequencies = Hash.new 0
words.each {|word| frequencies[word] += 1}
words.uniq.each {|word| puts "#{word} #{frequencies[word]}"}
#Iterate over the words, and print each one with it's frequency.
I'd do as below :
puts "Type something profound please"
text = gets.chomp.split
I called here Enumerable#each_with_object method.
hash = text.each_with_object(Hash.new(0)) do |word,freq_hsh|
freq_hsh[word] += 1
end
I called below Hash#each method.
hash.each do |word,freq|
puts "#{word} has a freuency count #{freq}"
end
Now run the code :
(arup~>Ruby)$ ruby so.rb
Type something profound please
foo bar foo biz bar baz
foo has a freuency count 2
bar has a freuency count 2
biz has a freuency count 1
baz has a freuency count 1
(arup~>Ruby)$
chunk is a good method for this. It returns an array of 2-element arrays. The first of each is the return value of the block, the second is the array of original elements for which the block returned that value:
words = File.open("/usr/share/dict/words", "r:iso-8859-1").readlines
p words.chunk{|w| w[0].downcase}.map{|c, words| [c, words.size]}
=> [["a", 17096], ["b", 11070], ["c", 19901], ["d", 10896], ["e", 8736], ["f", 6860], ["g", 6861], ["h", 9027], ["i", 8799], ["j", 1642], ["k", 2281], ["l", 6284], ["m", 12616], ["n", 6780], ["o", 7849], ["p", 24461], ["q", 1152], ["r", 9671], ["s", 25162], ["t", 12966], ["u", 16387], ["v", 3440], ["w", 3944], ["x", 385], ["y", 671], ["z", 949]]

Determining if a prefix exists in a set

Given a set of strings, say:
"Alice"
"Bob"
"C"
"Ca"
"Car"
"Carol"
"Caroling"
"Carousel"
and given a single string, say:
"Carolers"
I would like a function that returns the smallest prefix not already inside the array.
For the above example, the function should return: "Caro". (A subsequent call would return "Carole")
I am very new to Ruby, and although I could probably hack out something ugly (using my C/C++/Objective-C brain), I would like to learn how to properly (elegantly?) code this up.
There's a little known magical module in Ruby called Abbrev.
require 'abbrev'
abbreviations = Abbrev::abbrev([
"Alice",
"Bob",
"C",
"Ca",
"Car",
"Carol",
"Caroling",
"Carousel"
])
carolers = Abbrev::abbrev(%w[Carolers])
(carolers.keys - abbreviations.keys).sort.first # => "Caro"
Above I took the first element but this shows what else would be available.
pp (carolers.keys - abbreviations.keys).sort
# >> ["Caro", "Carole", "Caroler", "Carolers"]
Wrap all the above in a function, compute the resulting missing elements, and then iterate over them yielding them to a block, or use an enumerator to return them one-by-one.
This is what is generated for a single word. For an array it is more complex.
require 'pp'
pp Abbrev::abbrev(['cat'])
# >> {"ca"=>"cat", "c"=>"cat", "cat"=>"cat"}
pp Abbrev::abbrev(['cat', 'car', 'cattle', 'carrier'])
# >> {"cattl"=>"cattle",
# >> "catt"=>"cattle",
# >> "cat"=>"cat",
# >> "carrie"=>"carrier",
# >> "carri"=>"carrier",
# >> "carr"=>"carrier",
# >> "car"=>"car",
# >> "cattle"=>"cattle",
# >> "carrier"=>"carrier"}
Your question still doesn't match what you are expecting as a result. It seems that you need prefixes, not the substrings (as "a" would be the shortest substring not already in the array). For searching the prefix, this should suffice:
array = [
"Alice",
"Bob",
"C",
"Ca",
"Car",
"Carol",
"Caroling",
"Carousel",
]
str = 'Carolers'
(0..str.length).map{|i|
str[0..i]
}.find{|s| !array.member?(s)}
I am not a Ruby expert, but I think you may want to approach this problem by converting your set into a trie. Once you have the trie constructed, your problem can be solved simply by walking down from the root of the trie, following all of the edges for the letters in the word, until you either find a node that is not marked as a word or walk off the trie. In either case, you've found a node that isn't part of any word, and you have the shortest prefix of your word in question that doesn't already exist inside of the set. Moreover, this would let you run any number of prefix checks quickly, since after you've built up the trie the algorithm takes time at most linear in the length of the string.
Hope this helps!
I'm not really sure what you're asking for other than an example of some Ruby code to find common prefixes. I'll assume you want to find the smallest string which is a prefix of the most number of strings in the given set. Here's an example implementation:
class PrefixFinder
def initialize(words)
#words = Hash[*words.map{|x|[x,x]}.flatten]
end
def next_prefix
max=0; biggest=nil
#words.keys.sort.each do |word|
0.upto(word.size-1) do |len|
substr=word[0..len]; regex=Regexp.new("^" + substr)
next if #words[substr]
count = #words.keys.find_all {|x| x=~regex}.size
max, biggest = [count, substr] if count > max
#puts "OK: s=#{substr}, biggest=#{biggest.inspect}"
end
end
#words[biggest] = biggest if biggest
biggest
end
end
pf = PrefixFinder.new(%w(C Ca Car Carol Caroled Carolers))
pf.next_prefix # => "Caro"
pf.next_prefix # => "Carole"
pf.next_prefix # => "Caroler"
pf.next_prefix # => nil
No comment on the performance (or correctness) of this code but it does show some Ruby idioms (instance variables, iteration, hashing, etc).
=> inn = ["Alice","Bob","C","Ca","Car","Carol","Caroling","Carousel"]
=> y = Array.new
=> str="Carolers"
Split the given string to an array
=> x=str.split('')
# ["C","a","r","o","l","e","r","s"]
Form all the combination
=> x.each_index {|i| y << x.take(i+1)}
# [["c"], ["c", "a"], ["c", "a", "r"], ["c", "a", "r", "o"], ["c", "a", "r", "o", "l"], ["c", "a", "r", "o", "l", "e"], ["c", "a", "r", "o", "l", "e", "r"], ["c", "a", "r", "o", "l", "e", "r", "s"]]
Using Join to concatenate the
=> y = y.map {|s| s.join }
# ["c", "ca", "car", "caro", "carol", "carole", "caroler", "carolers"]
Select the first item from the y thats not available in the input Array
=> y.select {|item| !inn.include? item}.first
You will get "caro"
Putting together all
def FindFirstMissingItem(srcArray,strtocheck)
y=Array.new
x=strtocheck.split('')
x.each_index {|i| y << x.take(i+1)}
y=y.map {|s| s.join}
y.select {|item| !srcArray.include? item}.first
end
And call
=> inn = ["Alice","Bob","C","Ca","Car","Carol","Caroling","Carousel"]
=> str="Carolers"
FindFirstMissingItem inn,str
Very simple version (but not very Rubyish):
str = 'Carolers'
ar = %w(Alice Bob C Ca Car Carol Caroling Carousel)
substr = str[0, n=1]
substr = str[0, n+=1] while ar.include? substr
puts substr

Resources