Explaining a Ruby code snippet - ruby

I'm in that uncomfortable position again, where somebody has left me with a code snippet in a language I don't know and I have to maintain it. While I haven't introduced Ruby to myself some parts of it are quite simple, but I'd like to hear your explanations nonetheless.
Here goes:
words = File.open("lengths.txt") {|f| f.read }.split # read all lines of a file in 'words'?
values = Array.new(0)
words.each { |value| values << value.to_i } # looked this one up, it's supposed to convert to an array of integers, right?
values.sort!
values.uniq!
diffs = Array.new(0) # this looks unused, unless I'm missing something obvious
sum = 0
s = 0 # another unused variable
# this looks like it's computing the sum of differences between successive
# elements, but that sum also remains unused, or does it?
values.each_index { |index| if index.to_i < values.length-1 then sum += values.at(index.to_i + 1) - values.at(index.to_i) end } # could you also explain the syntax here?
puts "delta has the value of\n"
# this will eventually print the minimum of the original values divided by 2
puts values.at(0) / 2
The above script was supposed to figure out the average of the differences between every two successive elements (integers, essentially) in a list. Am I right in saying this is nowhere near what it actually does, or am I missing something fundamental, which is likely considering I have no Ruby knowledge?

Explanation + refactor (non used variables removed, functional approach, each_cons):
# Read integer numbers from file, sort them ASC and remove duplicates
values = File.read("lengths.txt").split.map(&:to_i).sort.uniq
# Take pairwise combinations and get the total sum of partial differences
partial_diffs = values.each_cons(2).map { |a, b| b - a }.inject(0, :+)

That guy surely didn't grasp Ruby himself. I wonder why he chose to use that language.
Here's an annotated explanation:
# Yes, it reads all lines of a file in words (an array)
words = File.open("lengths.txt") {|f| f.read }.split
values = Array.new(0)
# Yes, to_i convert string into integer
words.each { |value| values << value.to_i }
values.sort!
values.uniq!
# diffs and s seem unused
diffs = Array.new(0)
sum = 0
s = 0
# The immediate line below can be read as `for(int index = 0; index < values.length; index++)`
values.each_index { |index|
# index is integer, to_i is unnecessary
if index.to_i < values.length-1 then
# The `sum` variable is used here
# Following can be rewritten as sum += values[i-1] - values[i]
sum += values.at(index.to_i + 1) - values.at(index.to_i)
end
}
puts "delta has the value of\n"
# Yes, this will eventually print the minimal of the original values divided by 2
puts values.at(0) / 2
To help you get a better grasp of what "real" (idiomatic) Ruby looks like, I've written what you wanted, with some annotations
values = open("lengths.txt") do |f|
# Read it like this:
#
# Take the list of all lines in a file,
# apply a function to each line
# The function is stripping the line and turning it
# into an integer
# (This means the resultant list is a list of integers)
#
# And then sort it and unique the resultant list
#
# The eventual resultant list is assigned to `values`
# by being the return value of this "block"
f.lines.map { |l| l.strip.to_i }.sort.uniq
end
# Assign `diffs` to an empty array (instead of using Array.new())
diffs = []
values.each_index do |i|
# Syntactic sugar for `if`
# It applies the 1st part if the 2nd part is true
diffs << (values[i+1] - values[i]) if i < values.length - 1
end
# You can almost read it like this:
#
# Take the list `diffs`, put all the elements in a sentence, like this
# 10 20 30 40 50
#
# We want to inject the function `plus` in between every element,
# so it becomes
# 10 + 20 + 30 + 40 + 50
#
# The colon `:+` is used to refer to the function `plus` as a symbol
#
# Take the result of the above summation, divided by length,
# which gives us average
delta = diffs.inject(:+) / diffs.length
# `delta` should now contains the "average of differences" between
# the original `values`
# String formatting using the % operator
# No \n needed since `puts` already add one for us
puts "delta has the value of %d" % delta
That is by no means pushing the true power of Ruby, but you see why Rubyists get so enthusiastic about expressiveness and stuffs :P

values.each_index { |index| if index.to_i < values.length-1 then sum += values.at(index.to_i + 1) - values.at(index.to_i) end }
The above line sums the differences between consecutive values. the test index.to_i < values.length-1 is to not access the array out of bounds, because of values.at(index.to_i + 1).
You are right, this code does not do much thing. it only prints half of the minimum value from the file.

Related

Bug in my Ruby counter

It is only counting once for each word. I want it to tell me how many times each word appears.
dictionary = ["to","do","to","do","to","do"]
string = "just do it to"
def machine(word,list)
initialize = Hash.new
swerve = word.downcase.split(" ")
list.each do |i|
counter = 0
swerve.each do |j|
if i.include? j
counter += 1
end
end
initialize[i]=counter
end
return initialize
end
machine(string,dictionary)
I assume that, for each word in string, you wish to determine the number of instances of that word in dictionary. If so, the first step is to create a counting hash.
dict_hash = dictionary.each_with_object(Hash.new(0)) { |word,h| h[word] += 1 }
#=> {"to"=>3, "do"=>3}
(I will explain this code later.)
Now split string on whitespace and create a hash whose keys are the words in string and whose values are the numbers of times that the value of word appears in dictionary.
string.split.each_with_object({}) { |word,h| h[word] = dict_hash.fetch(word, 0) }
#=> {"just"=>0, "do"=>3, "it"=>0, "to"=>3}
This of course assumes that each word in string is unique. If not, depending on the desired behavior, one possibility would be to use another counting hash.
string = "to just do it to"
string.split.each_with_object(Hash.new(0)) { |word,h|
h[word] += dict_hash.fetch(word, 0) }
#=> {"to"=>6, "just"=>0, "do"=>3, "it"=>0}
Now let me explain some of the constructs above.
I created two hashes with the form of the class method Hash::new that takes a parameter equal to the desired default value, which here is zero. What that means is that if
h = Hash.new(0)
and h does not have a key equal to the value word, then h[word] will return h's default value (and the hash h will not be changed). After creating the first hash that way, I wrote h[word] += 1. Ruby expands that to
h[word] = h[word] + 1
before she does any further processing. The first word in string that is passed to the block is "to" (which is assigned to the block variable word). Since the hash h is is initially empty (has no keys), h[word] on the right side of the above equality returns the default value of zero, giving us
h["to"] = h["to"] + 1
#=> = 0 + 1 => 1
Later, when word again equals "to" the default value is not used because h now has a key "to".
h["to"] = h["to"] + 1
#=> = 1 + 1 => 2
I used the well-worn method Enumerable#each_with_object. To a newbie this might seem complex. It isn't. The line
dict_hash = dictionary.each_with_object(Hash.new(0)) { |word,h| h[word] += 1 }
is effectively1 the same as the following.
h = Hash.new(0)
dict_hash = dictionary.each { |word| h[word] += 1 }
h
In other words, the method allows one to write a single line that creates, constructs and returns the hash, rather than three lines that do the same.
Notice that I used the method Hash#fetch for retrieving values from the hash:
dict_hash.fetch(word, 0)
fetch's second argument (here 0) is returned if dict_hash does not have a key equal to the value of word. By contrast, dict_hash[word] returns nil in that case.
1 The reason for "effectively" is that when using each_with_object, the variable h's scope is confined to the block, which is generally a good programming practice. Don't worry if you haven't learned about "scope" yet.
You can actually do this using Array#count rather easily:
def machine(word,list)
word.downcase.split(' ').collect do |w|
# for every word in `word`, count how many appearances in `list`
[w, list.count { |l| l.include?(w) }]
end.to_h
end
machine("just do it to", ["to","do","to","do","to","do"]) # => {"just"=>0, "do"=>3, "it"=>0, "to"=>3}
I think this is what you're looking for, but it seems like you're approaching this backwards
Convert your string "string" into an array, remove duplicate values and iterate through each element, counting the number of matches in your array "dictionary". The enumerable method :count is useful here.
A good data structure to output here would be a hash, where we store the unique words in our string "string" as keys and the number of occurrences of these words in array "dictionary" as the values. Hashes allow one to store more information about the data in a collection than an array or string, so this fits here.
dictionary = [ "to","do","to","do","to","do" ]
string = "just do it to"
def group_by_matches( match_str, list_of_words )
## trim leading and trailing whitespace and split string into array of words, remove duplicates.
to_match = match_str.strip.split.uniq
groupings = {}
## for each element in array of words, count the amount of times it appears *exactly* in the list of words array.
## store that in the groupings hash
to_match.each do | word |
groupings[ word ] = list_of_words.count( word )
end
groupings
end
group_by_matches( string, dictionary ) #=> {"just"=>0, "do"=>3, "it"=>0, "to"=>3}
On a side note, you should consider using more descriptive variable and method names to help yourself and others follow what's going on.
This also seems like you have it backwards. Typically, you'd want to use the array to count the number of occurrences in the string. This seems to more closely fit a real-world application where you'd examine a sentence/string of data for matches from a list of predefined words.
Arrays are also useful because they're flexible collections of data, easily iterated through and mutated with enumerable methods. To work with the words in our string, as you can see, it's easiest to immediately convert it to an array of words.
There are many alternatives. If you wanted to shorten the method, you could replace the more verbose each loop with an each_with_object call or a map call which will return a new object rather than the original object like each. In the case of using map.to_h, be careful as to_h will work on a two-dimensional array [["key1", "val1"], ["key2", "val2"]] but not on a single dimensional array.
## each_with_object
def group_by_matches( match_str, list_of_words )
to_match = match_str.strip.split.uniq
to_match.
each_with_object( {} ) { | word, groupings | groupings[ word ] = list_of_words.count( word ) }
end
## map
def group_by_matches( match_str, list_of_words )
to_match = match_str.strip.split.uniq
to_match.
map { | word | [ word, list_of_words.count( word ) ] }.to_h
end
Gauge your method preferences depending on performance, readability, and reliability.
list.each do |i|
counter = 0
swerve.each do |j|
if i.include? j
counter += 1
needs to be changed to
swerve.each do |i|
counter = 0
list.each do |j|
if i.include? j
counter += 1
Your code is telling how many times each word in the word/string (the word which is included in the dictionary) appears.
If you want to tell how many times each word in the dictionary appears, you can switch the list.each and swerve.each loops. Then, it will return a hash # => {"just"=>0, "do"=>3, "it"=>0, "to"=>3}

Syntax of loops

I'm trying to iterate a URL to scrape. What am I missing in my syntax?
array = [1...100]
array.each do |i|
a = 'http://www.web.com/page/#{i}/'.scrapify(images: [:png, :gif, :jpg])
extract_images(a[:images])
end
array = [1...100] doesn't do what you think it does. That creates an array with a single element and that single element is a Range instance whose first value is 1 and whose last value is 99.
So, after sorting out your string interpolation problem (as noted elsewhere), this:
"http://www.web.com/page/#{i}/"
will be the string:
"http://www.web.com/page/1...100/"
and the remote server probably doesn't know what that means and it will either 404 or give you page one; your comments elsewhere suggest that it will give you page one and ignore the ...100 part of the URL.
If you want it loop from 1 to 99 then you'd say:
(1...100).each do |i|
# `i` will range from 1 to 99 in this block
end
If you want to loop from 1 to 100 you'd use .. instead of ...:
(1..100).each do |i|
# `i` will range from 1 to 100 in this block
end
You could also ditch the range completely and use times:
99.times do |i|
# `i` will range from 0 to 98 in this block so
# you'd work with `i+1`
end
100.times do |i|
# `i` will range from 0 to 99 in this block so
# you'd work with `i+1`
end
or upto (thanks to JKillian for the reminder about this one):
1.upto(99) do |i|
# `i` will range from 1 to 99 in this block
end
1.upto(100) |i|
# `i` will range from 1 to 100 in this block
end
For interpolation you should use double quotes(" " instead ' '):
array = [1...100]
array.each do |i|
a = "http://www.web.com/page/#{i}/".scrapify(images: [:png, :gif, :jpg])
extract_images(a[:images])
end

Coderbyte Second Great Low - code works but is rejected

I'm currently working through the Coderbyte series to get better at Ruby programming. Maybe this is just a bug in their site (I don't know), but my code works for me everywhere else besides on Coderbyte.
The purpose of the method is to return the 2nd smallest and the 2nd largest elements in any inputted array.
Code:
def SecondGreatLow(arr)
arr=arr.sort!
output=[]
j=1
i=(arr.length-1)
secSmall=''
secLarge=''
while output.length < 1
unless arr.length <= 2
#Get second largest here
while (j<arr.length)
unless arr[j]==arr[j-1]
unless secSmall != ''
secSmall=arr[j]
output.push(secSmall)
end
end
j+=1
end
#get second smallest here
while i>0
unless arr[i-1] == arr[i]
unless secLarge != ''
secLarge=arr[i-1]
output.push(secLarge)
end
end
i-=1
end
end
end
# code goes here
return output
end
# keep this function call here
# to see how to enter arguments in Ruby scroll down
SecondGreatLow(STDIN.gets)
Output
Input: [1,2,3,100] => Output: [2,3] (correct)
Input: [1,42,42,180] => Output: [42,42] (correct)
Input: [4,90] => Output: [90,4] (correct)
The problem is that I'm awarded 0 points and it tells me that my output was incorrect for every test. Yet, when I actually put any inputs in, it gives me the output that I expect. Can someone please assist with what the problem might be? Thanks!
Update
Thanks to #pjs answer below, I realized this could be done in just a few lines:
def SecondGreatLow(arr)
arr=arr.sort!.uniq
return "#{arr[1]} #{arr[-2]}"
end
# keep this function call here
# to see how to enter arguments in Ruby scroll down
SecondGreatLow(STDIN.gets)
It's important to pay close attention to the problem's specification. Coderbyte says the output should be the values separated by a space, i.e., a string, not an array. Note that they even put quotes around their "Correct Sample Outputs".
Spec aside, you're doing way too much work to achieve this. Once the array is sorted, all you need is the second element, a space, and the second-to-last element. Hint: Ruby allows both positive and negative indices for arrays. Combine that with .to_s and string concatenation, and this should only take a couple of lines.
If you are worried about non-unique numbers for the max and min, you can trim the array down using .uniq after sorting.
You need to check condition for when array contains only two elements. Here is the complete code:
def SecondGreatLow(arr)
arr.uniq!
arr.sort!
if arr.length == 2
sec_lowest = arr[1]
sec_greatest = arr[0]
else
sec_lowest = arr[1]
sec_greatest = arr[-2]
end
return "#{sec_lowest} #{sec_greatest}"
end

Unexpectedly high memory usage in Ruby: 500B normal for empty hash?

our program creates a master hash where each key is a symbol representing an ID (about 10-20 characters). each value is an empty hash.
the master hash has about 800K records.
yet we're seeing ruby memory hit almost 400MB.
this suggests each key/value pair (symbol + empty hash) consumes ~500B each.
is this normal for ruby?
code below:
def load_app_ids
cols = get_columns AppFile
id_col = cols[:application_id]
each_record AppFile do |r|
#apps[r[id_col].intern] = {}
end
end
# Takes a line, strips the record seperator, and return
# an array of fields
def split_line(line)
line.gsub(RecordSeperator, "").split(FieldSeperator)
end
# Run a block on each record in a file, up to
# #limit records
def each_record(filename, &block)
i = 0
path = File.join(#dir, filename)
File.open(path, "r").each_line(RecordSeperator) do |line|
# Get the line split into columns unless it is
# a comment
block.call split_line(line) unless line =~ /^#/
# This import can take a loooong time.
print "\r#{i}" if (i+=1) % 1000 == 0
break if #limit and i >= #limit
end
print "\n" if i > 1000
end
# Return map of column name symbols to column number
def get_columns(filename)
path = File.join(#dir, filename)
description = split_line(File.open(path, &:readline))
# Strip the leading comment character
description[0].gsub!(/^#/, "")
# Return map of symbol to column number
Hash[ description.map { |str| [ str.intern, description.index(str) ] } ]
end
I would say this is normal for Ruby. I don't have metrics for space used by each data structure, but in general basic Ruby works poorly on this kind of large structure. It has to allow for the fact that the keys and values can be any kind of object for instance, and although that is very flexible for high-level coding, it's inefficient when you don't need such arbitrary control.
If I do this in irb
h = {}
800000.times { |x| h[("test" + x.to_s).to_sym] = {} }
I get a process with 197 Mb used.
Your process has claimed more space as it created large numbers of hashes during processing - one for each row. Ruby will eventually clean up - but that doesn't happen immediately, and the memory is not returned to the OS immediately either.
Edit: I should add that I have been working with large data structures of various kinds in Ruby - the general approach if you need them is to find something coded in native extensions (or ffi) where the code can take advantage of using restricted types in an array for example. The gem narray is a good example of this for numeric arrays, vectors, matrices etc.

ruby string array iteration. Array of arrays

I have a ruby problem
Here's what i'm trying to do
def iterate1 #define method in given class
#var3 = #var2.split(" ") #split string to array
#var4 = #var3
#var4.each do |i| #for each array item do i
ra = []
i.each_char {|d| ra << counter1(d)} # for each char in i, apply def counter1
#sum = ra.inject(:+)
#sum2 = #sum.inject(:+) #have to do the inject twice to get values
end
#sum2
I know i have over complicated this
Basically the input is a string of letters and values like "14556 this word 398"
I am trying to sum the numbers in each value, seperated by the whitespace like (" ")
When i use the def iterate1 method the block calls the counter1 method just fine, but i can only get the value for the last word or value in the string.
In this case that's 398, which when summed would be 27.
If i include a break i get the first value, which would be 21.
I'm looking to output an array with all of the summed values
Any help would be greatly appreciated
I think you're after:
"10 d 20 c".scan(/\b\d+\b/).map(&:to_i).inject(:+) # Returns 30
scan(/\b\d+\b/) will extract all numbers that are made up of digits only in an array, map(&:to_i) will convert them to integers and I guess you already know what inject(:+) will do.
I'm not sure if I understand what you're after correctly, though, so it might help if you provide the answer you expect to this input.
EDIT:
If you want to sum the digits in each number, you can do it with:
"12 d 34 c".scan(/\b\d+\b/).map { |x| x.chars.map(&:to_i).inject(:+) }
x.chars will return an enumerator for the digits, map(&:to_i) will convert them to integers and inject(:+) will sum them.
The simplest answer is to use map instead of each because the former collects the results and returns an array. e.g:
def iterate1 #define method in given class
#var3 = #var2.split(" ") #split string to array
#var4 = #var3
#var4.map do |i| #for each array item do i
ra = []
i.each_char {|d| ra << counter1(d)} # for each char in i, apply def counter1
#sum = ra.inject(:+)
#sum2 = #sum.inject(:+) #have to do the inject twice to get values
end
end
You could write it a lot cleaner though and I think Stefan was a big help. You could solve the issue with a little modification of his code
# when you call iterate, you should pass in the value
# even if you have an instance variable available (e.g. #var2)
def iterate(thing)
thing.scan(/\b\d+\b/).map do |x|
x.chars.map{|d| counter1(d)}.inject(:+)
end
end
The above assumes that the counter1 method returns back the value as an integer

Resources