Cannot understand what the following code does - ruby

Can somebody explain to me what the below code is doing. before and after are hashes.
def differences(before, after)
before.diff(after).keys.sort.inject([]) do |diffs, k|
diff = { :attribute => k, :before => before[k], :after => after[k] }
diffs << diff; diffs
end
end
It is from the papertrail differ gem.

It's dense code, no question. So, as you say before and after are hash(-like?) objects that are handed into the method as parameters. Calling before.diff(after) returns another hash, which then immediately has .keys called on it. That returns all the keys in the hash that diff returned. The keys are returned as an array, which is then immediately sorted.
Then we get to the most complex/dense bit. Using inject on that sorted array of keys, the method builds up an array (called diffs inside the inject block) which will be the return value of the inject method.
That array is made up of records of differences. Each record is a hash - built up by taking one key from the sorted array of keys from the before.diff(after) return value. These hashes store the attribute that's being diffed, what it looked like in the before hash and what it looks like in the after hash.
So, in a nutshell, the method gets a bunch of differences between two hashes and collects them up in an array of hashes. That array of hashes is the final return value of the method.
Note: inject can be, and often is, much, much simpler than this. Usually it's used to simply reduce a collection of values to one result, by applying one operation over and over again and storing the results in an accumlator. You may know inject as reduce from other languages; reduce is an alias for inject in Ruby. Here's a much simpler example of inject:
[1,2,3,4].inject(0) do |sum, number|
sum + number
end
# => 10
0 is the accumulator - the initial value. In the pair |sum, number|, sum will be the accumulator and number will be each number in the array, one after the other. What inject does is add 1 to 0, store the result in sum for the next round, add 2 to sum, store the result in sum again and so on. The single final value of the accumulator sum will be the return value. Here 10. The added complexity in your example is that the accumulator is different in kind from the values inside the block. That's less common, but not bad or unidiomatic. (Edit: Andrew Marshall makes the good point that maybe it is bad. See his comment on the original question. And #tokland points out that the inject here is just a very over-complex alternative for map. It is bad.) See the article I linked to in the comments to your question for more examples of inject.
Edit: As #tokland points out in a few comments, the code seems to need just a straightforward map. It would read much easier then.
def differences(before, after)
before.diff(after).keys.sort.map do |k|
{ :attribute => k, :before => before[k], :after => after[k] }
end
end
I was too focused on explaining what the code was doing. I didn't even think of how to simplify it.

It finds the entries in before and after that differ according to the underlying objects, then builds up a list of those differences in a more convenient format.
before.diff(after) finds the entries that differ.
keys.sort gives you the keys (of the map of differences) in sorted order
inject([]) is like map, but starts with diffs initialized to an empty array.
The block creates a diff line (a hash) for each of these differences, and then appends it to diffs.

Related

Ruby internals and how to guarantee unique hash values

Hash in Ruby just uses its hash value (for strings and numbers). Internally, it uses the Murmur hash function. I wonder how it can can be done given that the probability of having the same hash value for two different keys is not zero.
Can you share with us how you came to the conclusion that Ruby uses only the hash value to determine equality?
The text below is to explain to others your excellent point that the probability of computing the same hash value for two different keys is not zero, so how can the Hash class rely on just the hash value to determine equality?
For the purpose of this discussion I will refer to Ruby hashes as maps, so as not to confuse the 2 uses of the term hash in the Ruby language (1, a computed value on an object, and 2, a map/dictionary of pairs of values and unique keys).
As I understand it, hash values in maps, sets, etc. are used as a quick first pass at determining possible equality. That is, if the hashes of 2 objects are equal, then it is possible that the 2 objects are equal; but it's also possible that the 2 objects are not equal, but coincidentally produce the same hash value.
In other words, the only sure thing you can tell about equality from the hash values of the objects being compared is that if hash1 != hash2 then the objects are definitely not equal.
If the 2 hashes are equal, then the 2 objects must be compared by their content (in Ruby, by calling the == method, I believe).
So comparing hashes is not a substitute for comparing the objects themselves, it's just a quick first pass used to optimize performance.
Remember that a "hash table" or dictionary is perfectly okay with collisions. In fact, it's expected and accommodated in any reasonable implementation.
Ideally you strive to have a hash with as few collisions as possible, and there are entire doctoral level discussions on what makes a good hashing function, but they are inevitable. When a collision does occur then two values share the same index in the container.
Regardless of how a value is hashed, any potential match based on hash must be evaluated. A direct comparison is performed to ensure that the value you're accessing is the one requested, not one that coincidentally maps to the same spot.
Normal hash tables can be thought of as an array of arrays even though this is all completely hidden from you in general purpose use.
You can implement your own hash table in Ruby if you want to explore how this behaves:
class ExampleHash
include Enumerable
def initialize
#size = 9
#slots = Array.new(#size) { [ ] }
end
def [](key)
#slots[key.hash % #size].each do |entry|
if (entry[0] == key)
return entry[1]
end
end
nil
end
def []=(key, value)
entries = #slots[key.hash % #size]
entries.each do |entry|
if (entry[0] == key)
entry[1] = value
return
end
end
entries << [ key, value ]
end
end
This is made easy since every object in Ruby has a built-in hash method that produces a large numerical value that's based on the object's content.

Hashes, arrays, and the ruby '.first' method

I am working on an Rails API that receives POSTed json input and processes things accordingly. The current element involves prizes and prize codes, and I have a question about hashes, arrays, and the ruby .first method.
Disclaimer: I am receiving json from a legacy PHP application and as of right now I don't have a way to edit the output from that application. I'm getting it as is and am dealing with it the best I can.
The prize codes are coming in as a nested element within a larger object, but we are just going to focus on the codes for now. Here's what I'm getting:
"codes" => {
"0"=>{"Code"=>"1582566"},
"1"=>{"Code"=>"2153094"},
"2"=>{"Code"=>"3968052"},
"3"=>{"Code"=>"4702730"},
"4"=>{"Code"=>"5582567"},
"5"=>{"Code"=>"6153097"},
"6"=>{"Code"=>"7968052"},
"7"=>{"Code"=>"8702730"},
"8"=>{"Code"=>"9582567"},
"9"=>{"Code"=>"1053097"}
}
Starting there, everything gets pulled into a block, and then I am dealing with each code individually.
I'm going to start with my final solution and we will work our way backwards. In order to get a single code, just the number, out of each code, this is my solution:
code[1].first.second.to_i returns 1582566
Now let's rewind, and start at the beginning.
code returns "0"=>{"Code"=>"1582566"}
That makes sense. Now I want to skip past that first level "0".
code[1] returns {"Code"=>"1582566"}
All of that makes sense so far, but this is where things get weird. From that point I want to grab the 2nd item, maybe you might call it the value in this hashy key value pair. Unfortunately, the ruby .second doesn't work on a hash, I get an undefined method error, but .first does for some reason. And the output is confusing to me.
code[1].first returns ["Code", "4582566"]
And now all of a sudden that hash is an array. Why does the ruby .first array method turn a key value hash into an array?
I did a little sleuthing and discovered that :first is a method listed in the output from code[1].methods, but I can't seem to find any documentation out there about what it does.
code[1].class returns Hash
But then why do I get a completely different set of available methods returned when I run code[1].methods vs Hash.methods? The :first method is listed in the output from code[1].methods but NOT in the output from Hash.methods and it is not on the list of available methods here either:
http://ruby-doc.org/core-2.1.5/Hash.html
If they are both Hashes, why are their different methods available? Does it have something to do with class methods vs instance methods?
I'm still confused about what exactly the .first method does to a hash.
In the end I realized a better / cleaner way to get what is code[1]['Code'].to_i, but I'm still curious about what is going on under the hood.
In Ruby, a Hash is an ordered collection of key/value pairs. We can say they are "ordered" because:
Hashes enumerate their values in the order that the corresponding keys were inserted.
Because of this ordered enumeration, the Enumberable#first method returns the first key/value pair that was added to the hash as an Array of two items [key, value].
{}.first # => nil
{1 => 2}.first # => [1, 2]
{:a => true, :b => false}.first # => [:a, true]
h = Hash.new
h[:m] = "n"
h[:o] = "p"
h.first # => [:m, "n"]

Ruby - How to access elements in a multi-dimensional array

I thought this was simple, but when it came down to this case I wasn't able to do it. I feel if I can understand this I will have a much better understanding of Ruby.
WHAT: I want to search inside a 2-d array of strings and integers, and return the index/indices of where a certain string is found. These indices of each subarray would also be placed in an array in the order of the corresponding subarrays.
Example when searching for the string "a":
Input array: [[1,"a","a",3],[1,"b"],["a",2]]
Output array: [[1,2],[],[0]]
WHAT I TRIED:
Intuitively I thought it would be something like:
source = [[1,"a","a",3],[1,"b"],["a",2]]
with
source.each.each_index.select { |v| v == "a" }
or
source.each {|x| x.each_index.select { |i| x[i] == "a" }}
QUESTIONS:
1) What should I call to get my output array from my input array?
2) I see so many other enumerators and methods get mashed together this way, how come I can't do it in this case? I don't want to clutter up this question with some of the simpler test cases I tried, but I either got undefined method errors or it would just return my source array.
3) Does it have to do with what blocks are associated with which methods? I modeled my code after the answer to question: Find indices of elements that match a given condition where I am confused why it seems like the block is directly associated with multiple methods. In other words, the |i| is from the #each_index while the boolean is for the #select. Seems random and disorganized to me right now how to structure these blocks (i.e why not vice versa?).
source.map { |row| row.each_index.select { |i| row[i] == "a" } }
# => [[1, 2], [], [0]]
You can't do it because your logic is mistaken :) source.each {...} will return source, doesn't matter what you do inside the block. To return the result, use map. source.each.each_index calls each_index on the enumerator returned by each without block, and that's not a method available on enumerators (you want each_index on arrays).
Indeed. each with block and each without block will do very different things.
Specifically, in my code above:
Start with source array. map with block will process the block for each element (called row), and return an array of the results. For each row, row.each_index without block will return an iterator of all indices of the row array, select with block on the iterator will return an array that will contain only some of those elements.

Hash.map method

One of the exercises in this tutorial is:
Exploit the fact that map always returns an array: write a method hash_keys that accepts a hash and maps over it to return all the keys in a linear Array.
The solution is:
def hash_keys(hash)
hash.map { |pair| pair.first }
end
However, I'm having trouble understanding why the above works. For example, I wrote a solution as follows that also works:
def hash_keys(hash)
# Initialize a new array
result = Array.new
# Cycle through each element of the hash and push each key on to our array
hash.map { |x,y| result.push(x) }
# Return the array
result
end
I can understand why my method works, but I don't understand their proposed solution. For example, they are not even creating an Array object. They are not returning anything. It seems they are just listing the first element in each key/value element array.
I think you misunderstood the point of map. It doesn't just iterate over the given collection (that's what each is for) - it creates an array where each element is the result of calling the block with the corresponding element of the original collection.
Your solution could (and should) just as well be written using each instead of map as you aren't really making use of what map does - you're only making use of the fact that it invokes its block once for each element in the given collection.
When map is applied to a hash, the hash is converted to an array. That is why explicit conversion into an array is not necessary. And map returns an array by replacing each item of the original array with the result of evaluating the block. Each time the block is evaluated, it will be given an array that is a pair of a key and its value. first applies to this pair and returns the key. map returns an array of these keys.
map turns an Enumerable object into an Array. It's what it does. The block describes, in terms of each element in the receiver, what the corresponding element in the resulting array should be.
So, a simpler example is map on an Array:
[1,2,3,4].map {|n| n*2}
# => [2,4,6,8]
That is - from [1,2,3,4], generate a new Array, where each element is twice the equivalent entry in [1,2,3,4].
Half of your answer is right in the question: "Exploit the fact that map always returns an array." You don't need to explicitly create an array because map does that for you.
As far as returning it, you already seem to know that the last line of a ruby method is its return value. In the tutorial's solution, since the hash creates an array at the last (and only line), the array is returned from the method.

Sorting a hash table of objects (by the attributes of the objects) in Ruby

Say I have a class called Person, and it contains things such as last name, first name, address, etc.
I also have a hash table of Person objects that needs to be sorted by last and first name.
I understand that a sort_by will not change the hash permanently, which is fine, I only need to print in that order. Currently, I am trying to sort/print in place using:
#hash.sort_by {|a,b| a <=> b}.each { |person| puts person.last}
I have overloaded the <=> operator to sort by last/first, but nothing appears to actually sort. The puts there simply outputs in the hash's original order. I have spent a good 4 days trying to figure this out (it is a school assignment, and my first Ruby program). Any ideas? I am sure this is easy, but I am having the hardest time bringing my brain out of the C++ way of thinking.
You appear to be confusing sort and sort_by
sort yields two objects from the collection to the block and expects you to return a <=> like value: -1,0 or 1 depending on whether the arguments are equal, ascending or descending, for example
%w(one two three four five).sort {|a,b| a.length <=> b.length}
Sorts the strings by length. This is the form to use if you want to use your <=> operator
sort_by yields one object from the collection at a time and expects you to return what you want to sort by - you shouldn't be doing any comparison here. Ruby then uses <=> on these objecfs to sort your collection. The previous example can e rewritten as
%w(one two three four five).sort_by {|s| s.length}
This is also known as a schwartzian transform
In your case the collection is a hash so things are slightly more complicated: the values that are passed into the block are arrays that contain key/value pairs, so you'll need to extract the person object from that pair. You could also just work on #hash.keys or #hash.values (depending on whether the person objects are keys or values)
If you've overidden the <=> operator to sort Person objects appropriately then you can simply do:
#hash.sort_by{ |key, person| person }
because sort_by will yield both the hash key, and the object (in your case a person) to each iteration of the block. So the code above will sort your hash based on Person objects - for which you've already specified an <=> operator.
When you #.sort_by with a Hash, the parameters passed to the block are 'key','value', not 'element a' and 'element b'. Try:
#hash.sort_by{|key,value| value.last}.each{|key,value| puts value.last}
Also, see Frederick Cheung's excellent explanation of #sort vs #sort_by.

Resources