Loop returns only the last item - ruby

I'm new to Ruby, so the answer is probably pretty simple. Not to me though
I am taking an array of strings (A) and matching it against another array of strings (B) to see if a given string from (A) exists as a substring within a string from B.
The compare seems to work however, I only get back a result from the last (A) string compared.
What might this be?
def checkIfAvailableOnline(film)
puts "Looking for " + film
lowerCaseFilm = film.downcase
#iterate through the linesarray scanning for the film in question
for line in #linesArray
#get the line in lowercase
lowerCaseLine = line.downcase
#look for the film name as a substring within the line
results = lowerCaseLine.scan(lowerCaseFilm)
if results.length > 0
#availableOnlineArray << results
end
end
end
#-----------------------------------------
listFilmsArray.each {|line| checkIfAvailableOnline(line)}

Given a list of film names:
FILM_NAMES = [
'Baked Blue Tomatoes',
'Fried Yellow Tomatoes',
'The thing that ate my homework',
'In a world where',
]
Then to find all film names containing a substring, ignoring case:
def find_films_available_online(partial_film_name)
FILM_NAMES.find_all do |film_name|
film_name.downcase[partial_film_name.downcase]
end
end
p find_films_available_online('tomatoes')
# => ["Baked Blue Tomatoes", "Fried Yellow Tomatoes"]
p find_films_available_online('godzooka')
# => []
To find out if a film name is available online:
def available_online?(partial_film_name)
!find_films_available_online(partial_film_name).empty?
end
p available_online?('potatoes') # => false
p available_online?('A World') # => true
To find out which of a list of partial film names are available online:
def partial_film_names_available_online(partial_film_names)
partial_film_names.find_all do |partial_film_name|
available_online?(partial_film_name)
end
end
p partial_film_names_available_online [
'tomatoes',
'potatoes',
'A World',
]
# => ["tomatoes", "A World"]

A more rubyish way to do this is:
Given an array of films we are looking for:
#films = ["how to train your dragon", "kung fu panda", "avatar"]
Given an array of lines that may contain the films we are looking for:
#lines_array = ["just in kung fu panda", "available soon how to train your dragon"]
Return the film name early if it exists in a line or false if it doesn't after searching all the lines:
def online_available(film)
#lines_array.each do |l|
l.downcase.include?(film) ? (return film) : false
end
false
end
Check for the films in the lines rejecting the ones that returned false, print them and ultimately return an array of the matches we found:
def films_available
available = #films.collect{ |x| p "Looking for: #{x}"; online_available(x) }
.reject{ |x| x == false }
available.each{|x| p "Found: #{x}"}
available
end
It is considered bad style to use camel-case in method names with Ruby but you know what they say about opinions.
.each is an internal iterator and I'm pretty sure the "for" loop will run slower than the enumerable each method that arrays inherit.

Related

How to find duplicated Ruby methods with the same name but different code?

The very large Ruby codebase I am working with has many instances of duplicated methods defined with the same name but some of its code is different (causing a large race condition problem). The eventual end goal is to reconcile the duplicates and have just one version of the same-named method. First I need to find all versions of a method that deviate from the "control" version of that method. Is there an optimal way to search and for and find all instances of duplicated same-named methods that deviate from one defined version?
The duplicated methods are spread out across hundreds of different files and contained in one class. These are essentially helper methods that should have been centralized in one file but instead have been duplicated and often altered, but keeping the same method name. Right now I just need a good way to locate all the instances where these methods have been duplicated and are different from what the method should be.
I think Rubocop only searches for duplicated method names which is only moderately helpful since it could find 237 methods with the same name but I don't know how many of those methods are deviations from my "control" method without manually looking and comparing.
Some examples of a method redefined in files across multiple subdirectories:
def get_field(field_name)
return nil unless field = #global_vars.business.fields.find_by_identifier(field_name)
field.value.present? ? field.value : nil
end
def get_field(field_name)
#global_vars.business.fields.find_by_identifier(field_name).try(:value)
end
def get_field(field_name)
return nil unless field #company.fields.find_by_identifier(field_name)
field.value.present? ? field.value : nil
end
def get_field(field_name)
#property.fields.find_by_identifier(field_name).try(:value)
end
Thanks for your help!
My first thought was to execute each file of interest with additional code added on the fly to build a directory of methods and their locations. That clearly would not work, however, as exceptions could be expected to be raised almost immediately. Even if exceptions were avoided there would be no guarantee that that added code would be executed. In addition, there could be unintended adverse consequences of blinding running code.
I think the only reasonable approach would be to parse the files of interest. There may even be gems that do just that. It's certainly worth a search.
I have constructed a method that parses the files to build a hash containing the information desired. The main requirement for its use is that the files are formatted properly; specifically, the key words class, module and def must be indented the same number of spaces as their corresponding end keywords. It will therefore miss modules, classes and methods that are defined in-line, such as the following.
module M; end
class C; end
def im(n) 2*n end
def self.cm(n) 2*n end
If vertical alignment is a problem there certainly are gems that format code properly.
I chose a particular hash structure, but once that hash has been constructed it could be modified as desired. For example, I've adopted the hierarchy "instance methods->files->containers" ("containers" being modules, classes and top-level). One could easily modify that hash to change the hierarchy to, say, "container->module methods->files". Alternatively, one could enter the information into a database to maintain flexibility on how is used.
Code
The following regular expression is used to parse each line of each file of interest.
R = /
\A # match beginning of string
(?<indent>[ ]*) # capture zero or more spaces, name 'indent'
(?: # begin non-capture group
(?<type>class|module) # capture keyword 'class' or 'module', name 'type'
[ ]+ # match one or more spaces
(?<name>\p{Upper}\p{Alnum}*) # capture an uppercase letter followed by
# >= alphanumeric chars, name 'name'
| # or
(?<type>def) # capture keyword 'def', name 'type'
[ ]+ # match one or more spaces
(?<name> # begin capture group named 'name'
(?:self\.)? # optionally match 'self.'
\p{Lower}\p{Alnum}* # match a lowercase letter followed by
# >= 0 zero alphanumeric chars, name 'name'
) # close capture group 'name'
| # or
(?<type>end) # capture keyword 'end', name 'type'
\b # match a word break
) # end non-capture group
/x # free-spacing regex definition mode
The method used for parsing follows.
def find_methods_by_name(files_of_interest)
files_of_interest.each_with_object({ imethod: {}, cmethod: {} }) do |fname, h|
stack = []
File.readlines(fname).each do |line|
m = line.match R
next if m.nil?
indent, type, name = m[:indent].size, m[:type], m[:name]
case type
when "module", "class"
name = stack.any? ? [stack.last[:name], name].join('::') : name
stack << { indent: indent, type: type, name: name }
when "def"
if name =~ /\Aself\./
stack << { indent: indent, type: :cmethod, name: name[5..-1] }
else
stack << { indent: indent, type: :imethod, name: name }
end
when "end"
next if stack.empty? || stack.last[:indent] != indent
type, name = stack.pop.values_at(:type, :name)
next if type == "module" or type == "class"
((h[type][name] ||= {})[fname] ||= []) << (stack.any? ?
[stack.last[:type], stack.last[:name]].join(' ') : :main)
end
end
raise StandardError, "stack = #{stack} after processing file '#{fname}'" if stack.any?
end
end
Example
The files of interest might be, for example, all files in certain directories. In this example we have just two files.
files_of_interest = ['file1.rb', 'file2.rb']
Those files are as follows.
File.write('file1.rb',
<<_)
def mm
end
module M
def m
end
module N
def self.nm
end
def n
end
def a2
end
end
end
class A
def self.a1c
end
def a1
end
def a2
end
end
class B
include M
def b
end
end
_
#=> 327
File.write('file2.rb',
<<_)
def mm
end
module M
def m
end
module N
def n
end
def a2
end
end
end
module P
def p
end
end
class A
include M::N
def self.a1c
end
def a1
end
end
class B
include P
def b
end
end
_
#=> 335
h = find_methods_by_name(files_of_interest)
#=> {
# :imethod=>{
# "mm"=>{
# "file1.rb"=>[:main],
# "file2.rb"=>[:main]
# },
# "m"=>{
# "file1.rb"=>["module M"],
# "file2.rb"=>["module M"]
# },
# "n"=>{
# "file1.rb"=>["module M::N"],
# "file2.rb"=>["module M::N"]
# },
# "a2"=>{
# "file1.rb"=>["module M::N", "class A"],
# "file2.rb"=>["module M::N"]
# },
# "a1"=>{
# "file1.rb"=>["class A"],
# "file2.rb"=>["class A"]
# },
# "b"=>{
# "file1.rb"=>["class B"],
# "file2.rb"=>["class B"]
# },
# "p"=>{
# "file2.rb"=>["module P"]
# }
# },
# :cmethod=>{
# "nm"=>{
# "file1.rb"=>["module M::N"]
# },
# "a1c"=>{
# "file1.rb"=>["class A"],
# "file2.rb"=>["class A"]
# }
# }
# }
To eliminate files that appear only once, we can perform an additional step.
h.transform_values! { |g| g.reject { |k,v| v.size == 1 && v.values.first.size == 1 } }
This removes the instance method p and the class method nm.

Learning Ruby - Stuck on string comparison within an array

I'm working through Learning to Program with Ruby and I am stuck on building my own sort method.
I'm struggling to figure out why the comparison method inside my recursive_sort is throwing out an error
chapter10.rb:120:in `block in recursive_sort': undefined method `<' for ["zebra"]:Array (NoMethodError)
But this works just fine...
lowest = 'zebra'
if 'cat' < 'zebra'
lowest = 'cat'
end
puts lowest
Could someone put in the right direction to something that can help me wrap my head around this? Thanks!
puts 'Sorting Program with recursion v1.0'
# Keep two more lists around
# One for already-sorted words
# One for still - unsorted words
# Find the smallest word in the unsorted list
# push it into the end of the sorted_array
def sort some_array
recursive_sort some_array, []
end
def recursive_sort unsorted_array, sorted_array
lowest = unsorted_array[0]
unsorted_array.each do |uns|
if uns < lowest
lowest = uns
end
end
puts lowest
end
# Get a list of unsorted words into an array
orig_array = []
word = 'placeholder'
puts 'Enter a list of words to be sorted. Press enter when done.'
while word != ''
word = gets.chomp
orig_array.push [word]
end
orig_array.pop
puts 'This is the output of the built in sort method.'
orig_array.sort.each do |un|
puts un
end
puts 'This is the output of Rick\'s sort method.'
sort orig_array
orig_array.push [word]
Here, you are actually pushing an array into an array, so that your orig_array becomes
[["word 1"], ["word 2"], ["word 3"], ...]
Remove the [] around word to fix this, or change the .push to += or .concat, which will glue together the two arrays.

Ruby regex into array of hashes but need to drop a key/val pair

I'm trying to parse a file containing a name followed by a hierarchy path. I want to take the named regex matches, turn them into Hash keys, and store the match as a hash. Each hash will get pushed to an array (so I'll end up with an array of hashes after parsing the entire file. This part of the code is working except now I need to handle bad paths with duplicated hierarchy (top_* is always the top level). It appears that if I'm using named backreferences in Ruby I need to name all of the backreferences. I have gotten the match working in Rubular but now I have the p1 backreference in my resultant hash.
Question: What's the easiest way to not include the p1 key/value pair in the hash? My method is used in other places so we can't assume that p1 always exists. Am I stuck with dropping each key/value pair in the array after calling the s_ary_to_hash method?
NOTE: I'm keeping this question to try and solve the specific issue of ignoring certain hash keys in my method. The regex issue is now in this ticket: Ruby regex - using optional named backreferences
UPDATE: Regex issue is solved, the hier is now always stored in the named 'hier' group. The only item remaining is to figure out how to drop the 'p1' key/value if it exists prior to creating the Hash.
Example file:
name1 top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
new12 top_ab12/hat[1]/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
tops top_bat/car[0]
ab123 top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
Expected output:
[{:name => "name1", :hier => "top_cat/mouse/dog/elephant/horse"},
{:name => "new12", :hier => "top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool"},
{:name => "tops", :hier => "top_bat/car[0]"},
{:name => "ab123", :hier => "top_2/top_1/top_3/top_4/dog"}]
Code snippet:
def s_ary_to_hash(ary, regex)
retary = Array.new
ary.each {|x| (retary << Hash[regex.match(x).names.map{|key| key.to_sym}.zip(regex.match(x).captures)]) if regex.match(x)}
return retary
end
regex = %r{(?<name>\w+) (?<p1>[\w\/\[\]]+)?(?<hier>(\k<p1>.*)|((?<= ).*$))}
h_ary = s_ary_to_hash(File.readlines(filename), regex)
What about this regex ?
^(?<name>\S+)\s+(?<p1>top_.+?)(?:\/(?<hier>\k<p1>(?:\[.+?\])?.+))?$
Demo
http://rubular.com/r/awEP9Mz1kB
Sample code
def s_ary_to_hash(ary, regex, mappings)
retary = Array.new
for item in ary
tmp = regex.match(item)
if tmp then
hash = Hash.new
retary.push(hash)
mappings.each { |mapping|
mapping.map { |key, groups|
for group in group
if tmp[group] then
hash[key] = tmp[group]
break
end
end
}
}
end
end
return retary
end
regex = %r{^(?<name>\S+)\s+(?<p1>top_.+?)(?:\/(?<hier>\k<p1>(?:\[.+?\])?.+))?$}
h_ary = s_ary_to_hash(
File.readlines(filename),
regex,
[
{:name => ['name']},
{:hier => ['hier','p1']}
]
)
puts h_ary
Output
{:name=>"name1", :hier=>"top_cat/mouse/dog/elephant/horse\r"}
{:name=>"new12", :hier=>"top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool\r"}
{:name=>"tops", :hier=>"top_bat/car[0]"}
Discussion
Since Ruby 2.0.0 doesn't support branch reset, I have built a solution that add some more power to the s_ary_to_hash function. It now admits a third parameter indicating how to build the final array of hashes.
This third parameter is an array of hashes. Each hash in this array has one key (K) corresponding to the key in the final array of hashes. K is associated with an array containing the named group to use from the passed regex (second parameter of s_ary_to_hash function).
If a group equals nil, s_ary_to_hash skips it for the next group.
If all groups equal nil, K is not pushed on the final array of hashes.
Feel free to modify s_ary_to_hash if this isn't a desired behavior.
Edit: I've changed the method s_ary_to_hash to conform with what I now understand to be the criterion for excluding directories, namely, directory d is to be excluded if there is a downstream directory with the same name, or the same name followed by a non-negative integer in brackets. I've applied that to all directories, though I made have misunderstood the question; perhaps it should apply to the first.
data =<<THE_END
name1 top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
new12 top_ab12/hat/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
tops top_bat/car[0]
ab123 top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
THE_END
text = data.split("\n")
def s_ary_to_hash(ary)
ary.map do |s|
name, _, downstream_path = s.partition(' ').map(&:strip)
arr = []
downstream_dirs = downstream_path.split('/')
downstream_dirs.each {|d| puts "'#{d}'"}
while downstream_dirs.any? do
dir = downstream_dirs.shift
arr << dir unless downstream_dirs.any? { |d|
d == dir || d =~ /#{dir}\[\d+\]/ }
end
{ name: name, hier: arr.join('/') }
end
end
s_ary_to_hash(text)
# => [{:name=>"name1", :hier=>"top_cat/mouse/dog/elephant/horse"},
# {:name=>"new12", :hier=>"top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool"},
# {:name=>"tops", :hier=>"top_bat/car[0]"},
# {:name=>"ab123", :hier=>"top_2/top_1/top_3/top_4/dog"}]
The exclusion criterion is implement in downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\[\d+\]/ }, where dir is the directory that is being tested and downstream_dirs is an array of all the downstream directories. (When dir is the last directory, downstream_dirs is empty.) Localizing it in this way makes it easy to test and change the exclusion criterion. You could shorten this to a single regex and/or make it a method:
dir exclude_dir?(dir, downstream_dirs)
downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\[\d+\]/ }end
end
Here is a non regexp solution:
result = string.each_line.map do |line|
name, path = line.split(' ')
path = path.split('/')
last_occur_of_root = path.rindex(path.first)
path = path[last_occur_of_root..-1]
{name: name, heir: path.join('/')}
end

Reading strings from one file and adding to another file with suffix to make unique

I am processing documents in ruby.
I have a document I am extracting specific strings from using regexp and then adding them to another file. When added to the destination file they must be made unique so if that string already exists in the destination file I'am adding a simple suffix e.g. <word>_1. Eventually I want to be referencing the strings by name so random number generation or string from the date is no good.
At present I am storing each word added in an array and then everytime I add a word I check the string doesn't exist in an array which is fine if there is only 1 duplicate however there might be 2 or more so I need to check for the initial string then loop incrementing the suffix until it doesn't exist, (I have simplified my code so there may be bugs)
def add_word(word)
if #added_words include? word
suffix = 1
suffixed_word = word
while added_words include? suffixed_word
suffixed_word = word + "_" + suffix.to_s
suffix += 1
end
word = suffixed_word
end
#added_words << word
end
It looks messy, is there a better algorithm or ruby way of doing this?
Make #added_words a Set (don't forget to require 'set'). This makes for faster lookup as sets are implemented with hashes, while still using include? to check for set membership. It's also easy to extract the highest used suffix:
>> s << 'foo'
#=> #<Set: {"foo"}>
>> s << 'foo_1'
#=> #<Set: {"foo", "foo_1"}>
>> word = 'foo'
#=> "foo"
>> s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' }
#=> "foo_1"
>> s << 'foo_12' #=>
#<Set: {"foo", "foo_1", "foo_12"}>
>> s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' }
#=> "foo_12"
Now to get the next value you can insert, you could just do the following (imagine you already had 12 foos, so the next should be a foo_13):
>> s << s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' }.next
#=> #<Set: {"foo", "foo_1", "foo_12", "foo_13"}
Sorry if the examples are a bit confused, I had anesthesia earlier today. It should be enough to give you an idea of how sets could potentially help you though (most of it would work with array too, but sets have faster lookup).
Change #added_words to a Hash with a default of zero. Then you can do:
#added_words = Hash.new(0)
def add_word( word)
#added_words[word] += 1
end
# put it to work:
list = %w(test foo bar test bar bar)
names = list.map do |w|
"#{w}_#{add_word(w)}"
end
p #added_words
#=> {"test"=>2, "foo"=>1, "bar"=>3}
p names
#=>["test_1", "foo_1", "bar_1", "test_2", "bar_2", "bar_3"]
In that case, I'd probably use a set or hash:
#in your class:
require 'set'
require 'forwardable'
extend Forwardable #I'm just including this to keep your previous api
#elsewhere you're setting up your instance_var, it's probably [] at the moment
def initialize
#added_words = Set.new
end
#then instead of `def add_word(word); #added_words.add(word); end`:
def_delegator :added_words, :add_word, :add
#or just change whatever loop to use ##added_words.add('word') rather than self#add_word('word')
##added_words.add('word') does nothing if 'word' already exists in the set.
If you've got some attributes that you're grouping via these sections, then a hash might be better:
#elsewhere you're setting up your instance_var, it's probably [] at the moment
def initialize
#added_words = {}
end
def add_word(word, attrs={})
#added_words[word] ||= []
#added_words[word].push(attrs)
end
Doing it the "wrong way", but in slightly nicer code:
def add_word(word)
if #added_words.include? word
suffixed_word = 1.upto(1.0/0.0) do |suffix|
candidate = [word, suffix].join("_")
break candidate unless #added_words.include?(candidate)
end
word = suffixed_word
end
#added_words << word
end

Extract individual existing words in domain names

I'm looking for a Ruby gem (preferably) that will cut domain names up into their words.
whatwomenwant.com => 3 words, "what", "women", "want".
If it can ignore things like numbers and gibberish then great.
You'll need a word list such as those produced by Project Gutenberg or available in the source for ispell &c. Then you can use the following code to decompose a domain into words:
WORD_LIST = [
'experts',
'expert',
'exchange',
'sex',
'change',
]
def words_that_phrase_begins_with(phrase)
WORD_LIST.find_all do |word|
phrase.start_with?(word)
end
end
def phrase_to_words(phrase, words = [], word_list = [])
if phrase.empty?
word_list << words
else
words_that_phrase_begins_with(phrase).each do |word|
remainder = phrase[word.size..-1]
phrase_to_words(remainder, words + [word], word_list)
end
end
word_list
end
p phrase_to_words('expertsexchange')
# => [["experts", "exchange"], ["expert", "sex", "change"]]
If given a phrase that has any unrecognized words, it returns an empty array:
p phrase_to_words('expertsfoo')
# => []
If the word list is long, this will be slow. You can make this algorithm faster by preprocessing the word list into a tree. The preprocessing itself will take time, so whether it's worth it will depend upon how many domains you want to test.
Here's some code to turn the word list into a tree:
def add_word_to_tree(tree, word)
first_letter = word[0..0].to_sym
remainder = word[1..-1]
tree[first_letter] ||= {}
if remainder.empty?
tree[first_letter][:word] = true
else
add_word_to_tree(tree[first_letter], remainder)
end
end
def make_word_tree
root = {}
WORD_LIST.each do |word|
add_word_to_tree(root, word)
end
root
end
def word_tree
#word_tree ||= make_word_tree
end
This produces a tree that looks like this:
{:c=>{:h=>{:a=>{:n=>{:g=>{:e=>{:word=>true}}}}}}, :s=>{:e=>{:x=>{:word=>true}}}, :e=>{:x=>{:c=>{:h=>{:a=>{:n=>{:g=>{:e=>{:word=>true}}}}}}, :p=>{:e=>{:r=>{:t=>{:word=>true, :s=>{:word=>true}}}}}}}}
It looks like Lisp, doesn't it? Each node in the tree is a hash. Each hash key is either a letter, with the value being another node, or it is the symbol :word with the value being true. Nodes with :word are words.
Modifying words_that_phrase_begins_with to use the new tree structure will make it faster:
def words_that_phrase_begins_with(phrase)
node = word_tree
words = []
phrase.each_char.with_index do |c, i|
node = node[c.to_sym]
break if node.nil?
words << phrase[0..i] if node[:word]
end
words
end
I don't know gems for this, but if I had to solve this problem, I would download some english words dictionary and read about text searching algorythms.
When you have more than one variant to divide letters (like in sepp2k's expertsexchange), than you can have two hints:
Your dictionary is sorted by... for example, popularity of a word. So dividings with most popular words will be more valuable.
You can go to the main page of site with domain you are anazyling and just read the content, searching your words. I don't think that you'll find sex on a page for some experts. But... hm... experts can be so different ,.)
Update
I've been working with this challenge and came up with the following code.
Please refactor if I'm doing something wrong :-)
Benchmark:
Runtime: 11 sec.
f- file: 13.000 lines of domain names
w- file: 2000 words (to check against)
Code:
f = File.open('resource/domainlist.txt', 'r')
lines = f.readlines
w = File.open('resource/commonwords.txt', 'r')
words = w.readlines
results = {}
lines.each do |line|
# Start with words from 2 letters on, so ignoring 1 letter words like 'a'
word_size = 2
# Only get the .com domains
if line =~ /^.*,[a-z]+\.com.*$/i then
# Strip the .com off the domain
line.gsub!(/^.*,([a-z]+)\.com.*$/i, '\\1')
# If the domain name is between 3 and 12 characters
if line.size > 3 and line.size < 15 then
# For the length of the string run ...
line.size.times do |n|
# Set the counter
i = 0
# As long as we're within the length of the string
while i <= line.size - word_size do
# Get the word in proper DRY fashion
word = line[i,word_size]
# Check the word against our list
if words.include?(word)
results[line] = [] unless results[line]
# Add all the found words to the hash
results[line] << word
end
i += 1
end
word_size += 1
end
end
end
end
p results

Resources