Is order of a Ruby hash literal guaranteed? - ruby

Ruby, since v1.9, supports a deterministic order when looping through a hash; entries added first will be returned first.
Does this apply to literals, i.e. will { a: 1, b: 2 } always yield a before b?
I did a quick experiment with Ruby 2.1 (MRI) and it was in fact consistent, but to what extent is this guaranteed by the language to work on all Ruby implementations?

There are couple of locations where this could be specified, i.e. a couple of things that are considered "The Ruby Language Specification":
the ISO Ruby Language Specification
the RubySpec project
the YARV testsuite
The Ruby Programming Language book by matz and David Flanagan
The ISO spec doesn't say anything about Hash ordering: it was written in such a way that all existing Ruby implementations are automatically compliant with it, without having to change, i.e. it was written to be descriptive of current Ruby implementations, not prescriptive. At the time the spec was written, those implementations included MRI, YARV, Rubinius, JRuby, IronRuby, MagLev, MacRuby, XRuby, Ruby.NET, Cardinal, tinyrb, RubyGoLightly, SmallRuby, BlueRuby, and others. Of particular interest are MRI (which only implements 1.8) and YARV (which only implements 1.9 (at the time)), which means that the spec can only specify behavior which is common to 1.8 and 1.9, which Hash ordering is not.
The RubySpec project was abandoned by its developers out of frustration that the ruby-core developers and YARV developers never recognized it. It does, however, (implicitly) specify that Hash literals are ordered left-to-right:
new_hash(1 => 2, 4 => 8, 2 => 4).keys.should == [1, 4, 2]
That's the spec for Hash#keys, however, the other specs test that Hash#values has the same order as Hash#keys, Hash#each_value and Hash#each_key has the same order as those, and Hash#each_pair and Hash#each have the same order as well.
I couldn't find anything in the YARV testsuite that specifies that ordering is preserved. In fact, I couldn't find anything at all about ordering in that testsuite, quite the opposite: the tests go to great length to avoid depending on ordering!
The Flanagan/matz book kinda-sorta implicitly specifies Hash literal ordering in section 9.5.3.6 Hash iterators. First, it uses much the same formulation as the docs:
In Ruby 1.9, however, hash elements are iterated in their insertion order, […]
But then it goes on:
[…], and that is the order shown in the following examples:
And in those examples, it actually uses a literal:
h = { :a=>1, :b=>2, :c=>3 }
# The each() iterator iterates [key,value] pairs
h.each {|pair| print pair } # Prints "[:a, 1][:b, 2][:c, 3]"
# It also works with two block arguments
h.each do |key, value|
print "#{key}:#{value} " # Prints "a:1 b:2 c:3"
end
# Iterate over keys or values or both
h.each_key {|k| print k } # Prints "abc"
h.each_value {|v| print v } # Prints "123"
h.each_pair {|k,v| print k,v } # Prints "a1b2c3". Like each
In his comment, #mu is too short mentioned that
h = { a: 1, b: 2 } is the same as h = { }; h[:a] = 1; h[:b] = 2
and in another comment that
nothing else would make any sense
Unfortunately, that is not true:
module HashASETWithLogging
def []=(key, value)
puts "[]= was called with [#{key.inspect}] = #{value.inspect}"
super
end
end
class Hash
prepend HashASETWithLogging
end
h = { a: 1, b: 2 }
# prints nothing
h = { }; h[:a] = 1; h[:b] = 2
# []= was called with [:a] = 1
# []= was called with [:b] = 2
So, depending on how you interpret that line from the book and depending on how "specification-ish" you judge that book, yes, ordering of literals is guaranteed.

From the documentation:
Hashes enumerate their values in the order that the corresponding keys
were inserted.

Related

Ruby Set with custom class to equal basic strings

I want to be able to find a custom class in my set given just a string. Like so:
require 'set'
Rank = Struct.new(:name, keyword_init: true) {
def hash
name.hash
end
def eql?(other)
hash == other.hash
end
def ==(other)
hash == other.hash
end
}
one = Rank.new(name: "one")
two = Rank.new(name: "two")
set = Set[one, two]
but while one == "one" and one.eql?("one") are both true, set.include?("one") is still false. what am i missing?
thanks!
Set is built upon Hash, and Hash considers two objects the same if:
[...] their hash value is identical and the two objects are eql? to each other.
What you are missing is that eql? isn't necessarily commutative. Making Rank#eql? recognize strings doesn't change the way String#eql? works:
one.eql?('one') #=> true
'one'.eql?(one) #=> false
Therefore it depends on which object is the hash key and which is the argument to include?:
Set['one'].include?(one) #=> true
Set[one].include?('one') #=> false
In order to make two objects a and b interchangeable hash keys, 3 conditions have to be met:
a.hash == b.hash
a.eql?(b) == true
b.eql?(a) == true
But don't try to modify String#eql? – fiddling with Ruby's core classes isn't recommended and monkey-patching probably won't work anyway because Ruby usually calls the C methods directly for performance reasons.
In fact, making both hash and eql? mimic name doesn't seem like a good idea in the first place. It makes the object's identity ambiguous which can lead to very strange behavior and hard to find bugs:
h = { one => 1, 'one' => 1 }
#=> {#<struct Rank name="one">=>1, "one"=>1}
# vs
h = { 'one' => 1, one => 1 }
#=> {"one"=>1}
what am i missing?
What you are missing is that "one" isn't in your set. one is in your set, but "one" isn't.
Therefore, the answer Ruby is giving you is perfectly correct.
All that you have done with your implementation of Rank is that any two ranks with the same name are considered to be the same by a Hash, Set, or Array#uniq. But, a Rank is not the same as a String.
If you want to be able to have a set-like data structure where you can look up things by one of their attributes, you will have to write it yourself.
Something like (untested):
class RankSet < Set
def [](*args)
super(*args.map(&:name))
end
def each
return enum_for(__callee__) unless block_given?
super {|e| yield e.name }
end
end
might get you started.
Or, instead of writing your own set, you can just use the fact that any arbitrary rank with the right name can be used for lookup:
set.include?(Rank.new(name: "one"))
#=> true
# even though it is a *different* `Rank` object

Why would #each_with_object and #inject switch the order of block parameters?

#each_with_object and #inject can both be used to build a hash.
For example:
matrix = [['foo', 'bar'], ['cat', 'dog']]
some_hash = matrix.inject({}) do |memo, arr|
memo[arr[0]] = arr
memo # no implicit conversion of String into Integer (TypeError) if commented out
end
p some_hash # {"foo"=>["foo", "bar"], "cat"=>["cat", "dog"]}
another_hash = matrix.each_with_object({}) do |arr, memo|
memo[arr[0]] = arr
end
p another_hash # {"foo"=>["foo", "bar"], "cat"=>["cat", "dog"]}
One of the key differences between the two is #each_with_object keeps track of memo through the entire iteration while #inject sets memo equal to the value returned by the block on each iteration.
Another difference is the order or the block parameters.
Is there some intention being communicated here? It doesn't make sense to reverse the block parameters of two similar methods.
They have a different ancestry.
each_with_object has been added to Ruby 1.9 in 2007
inject goes back to Smalltalk in 1980
I guess if the language were designed with both methods from the begin they would likely expect arguments in the same order. But this is not how it happened. inject has been around since the begin of Ruby whereas each_with_object has been added 10 years later only.
inject expects arguments in the same order as Smalltalk's inject:into:
collection inject: 0 into: [ :memo :each | memo + each ]
which does a left fold. You can think of the collection as a long strip of paper that is folded up from the left and the sliding window of the fold function is always the part that has already been folded plus the next element of the remaining paper strip
# (memo = 0 and 1), 2, 3, 4
# (memo = 1 and 2), 3, 4
# (memo = 3 and 3), 4
# (memo = 6 and 4)
Following the Smalltalk convention made sense back then since all the initial methods in Enumerable are taken from Smalltalk and Matz did not want to confuse people who are familiar with Smalltalk.
Nor could anyone have the foresight to know that would happen in 2007 when each_with_object was introduced to Ruby 1.9 and the order of argument reflects the lexical order of the method name, which is each ... object.
And hence these two methods expect arguments in different orders for historical reasons.

Ruby Hash destructive vs. non-destructive method

Could not find a previous post that answers my question...I'm learning how to use destructive vs. non-destructive methods in Ruby. I found an answer to the exercise I'm working on (destructively adding a number to hash values), but I want to be clear on why some earlier solutions of mine did not work. Here's the answer that works:
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each { |k, v| the_hash[k] = v + number_to_add_to_each_value}
end
These two solutions come back as non-destructive (since they all use "each" I cannot figure out why. To make something destructive is it the equals sign above that does the trick?):
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each_value { |v| v + number_to_add_to_each_value}
end
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each { |k, v| v + number_to_add_to_each_value}
end
The terms "destructive" and "non-destructive" are a bit misleading here. Better is to use the conventional "in-place modification" vs. "returns a copy" terminology.
Generally methods that modify in-place have ! at the end of their name to serve as a warning, like gsub! for String. Some methods that pre-date this convention do not have them, like push for Array.
The = performs an assignment within the loop. Your other examples don't actually do anything useful since each returns the original object being iterated over regardless of any results produced.
If you wanted to return a copy you'd do this:
def modify_a_hash(the_hash, number_to_add)
Hash[
the_hash.collect do |k, v|
[ k, v + number_to_add ]
end
]
end
That would return a copy. The inner operation collect transforms key-value pairs into new key-value pairs with the adjustment applied. No = is required since there's no assignment.
The outer method Hash[] transforms those key-value pairs into a proper Hash object. This is then returned and is independent of the original.
Generally a non-destructive or "return a copy" method needs to create a new, independent version of the thing it's manipulating for the purpose of storing the results. This applies to String, Array, Hash, or any other class or container you might be working with.
Maybe this slightly different example will be helpful.
We have a hash:
2.0.0-p481 :014 > hash
=> {1=>"ann", 2=>"mary", 3=>"silvia"}
Then we iterate over it and change all the letters to the uppercase:
2.0.0-p481 :015 > hash.each { |key, value| value.upcase! }
=> {1=>"ANN", 2=>"MARY", 3=>"SILVIA"}
The original hash has changed because we used upcase! method.
Compare to method without ! sign, that doesn't modify hash values:
2.0.0-p481 :017 > hash.each { |key, value| value.downcase }
=> {1=>"ANN", 2=>"MARY", 3=>"SILVIA"}

Difference between map and collect in Ruby?

I have Googled this and got patchy / contradictory opinions - is there actually any difference between doing a map and doing a collect on an array in Ruby/Rails?
The docs don't seem to suggest any, but are there perhaps differences in method or performance?
There's no difference, in fact map is implemented in C as rb_ary_collect and enum_collect (eg. there is a difference between map on an array and on any other enum, but no difference between map and collect).
Why do both map and collect exist in Ruby? The map function has many naming conventions in different languages. Wikipedia provides an overview:
The map function originated in functional programming languages but is today supported (or may be defined) in many procedural, object oriented, and multi-paradigm languages as well: In C++'s Standard Template Library, it is called transform, in C# (3.0)'s LINQ library, it is provided as an extension method called Select. Map is also a frequently used operation in high level languages such as Perl, Python and Ruby; the operation is called map in all three of these languages. A collect alias for map is also provided in Ruby (from Smalltalk) [emphasis mine]. Common Lisp provides a family of map-like functions; the one corresponding to the behavior described here is called mapcar (-car indicating access using the CAR operation).
Ruby provides an alias for programmers from the Smalltalk world to feel more at home.
Why is there a different implementation for arrays and enums? An enum is a generalized iteration structure, which means that there is no way in which Ruby can predict what the next element can be (you can define infinite enums, see Prime for an example). Therefore it must call a function to get each successive element (typically this will be the each method).
Arrays are the most common collection so it is reasonable to optimize their performance. Since Ruby knows a lot about how arrays work it doesn't have to call each but can only use simple pointer manipulation which is significantly faster.
Similar optimizations exist for a number of Array methods like zip or count.
I've been told they are the same.
Actually they are documented in the same place under ruby-doc.org:
http://www.ruby-doc.org/core/classes/Array.html#M000249
ary.collect {|item| block } → new_ary
ary.map {|item| block } → new_ary
ary.collect → an_enumerator
ary.map → an_enumerator
Invokes block once for each element of self.
Creates a new array containing the values returned by the block.
See also Enumerable#collect.
If no block is given, an enumerator is returned instead.
a = [ "a", "b", "c", "d" ]
a.collect {|x| x + "!" } #=> ["a!", "b!", "c!", "d!"]
a #=> ["a", "b", "c", "d"]
The collect and collect! methods are aliases to map and map!, so they can be used interchangeably. Here is an easy way to confirm that:
Array.instance_method(:map) == Array.instance_method(:collect)
=> true
I did a benchmark test to try and answer this question, then found this post so here are my findings (which differ slightly from the other answers)
Here is the benchmark code:
require 'benchmark'
h = { abc: 'hello', 'another_key' => 123, 4567 => 'third' }
a = 1..10
many = 500_000
Benchmark.bm do |b|
GC.start
b.report("hash keys collect") do
many.times do
h.keys.collect(&:to_s)
end
end
GC.start
b.report("hash keys map") do
many.times do
h.keys.map(&:to_s)
end
end
GC.start
b.report("array collect") do
many.times do
a.collect(&:to_s)
end
end
GC.start
b.report("array map") do
many.times do
a.map(&:to_s)
end
end
end
And the results I got were:
user system total real
hash keys collect 0.540000 0.000000 0.540000 ( 0.570994)
hash keys map 0.500000 0.010000 0.510000 ( 0.517126)
array collect 1.670000 0.020000 1.690000 ( 1.731233)
array map 1.680000 0.020000 1.700000 ( 1.744398)
Perhaps an alias isn't free?
Ruby aliases the method Array#map to Array#collect; they can be used interchangeably. (Ruby Monk)
In other words, same source code :
static VALUE
rb_ary_collect(VALUE ary)
{
long i;
VALUE collect;
RETURN_SIZED_ENUMERATOR(ary, 0, 0, ary_enum_length);
collect = rb_ary_new2(RARRAY_LEN(ary));
for (i = 0; i < RARRAY_LEN(ary); i++) {
rb_ary_push(collect, rb_yield(RARRAY_AREF(ary, i)));
}
return collect;
}
http://ruby-doc.org/core-2.2.0/Array.html#method-i-map
#collect is actually an alias for #map. That means the two methods can be used interchangeably, and effect the same behavior.

What is the "right" way to iterate through an array in Ruby?

PHP, for all its warts, is pretty good on this count. There's no difference between an array and a hash (maybe I'm naive, but this seems obviously right to me), and to iterate through either you just do
foreach (array/hash as $key => $value)
In Ruby there are a bunch of ways to do this sort of thing:
array.length.times do |i|
end
array.each
array.each_index
for i in array
Hashes make more sense, since I just always use
hash.each do |key, value|
Why can't I do this for arrays? If I want to remember just one method, I guess I can use each_index (since it makes both the index and value available), but it's annoying to have to do array[index] instead of just value.
Oh right, I forgot about array.each_with_index. However, this one sucks because it goes |value, key| and hash.each goes |key, value|! Is this not insane?
This will iterate through all the elements:
array = [1, 2, 3, 4, 5, 6]
array.each { |x| puts x }
# Output:
1
2
3
4
5
6
This will iterate through all the elements giving you the value and the index:
array = ["A", "B", "C"]
array.each_with_index {|val, index| puts "#{val} => #{index}" }
# Output:
A => 0
B => 1
C => 2
I'm not quite sure from your question which one you are looking for.
I think there is no one right way. There are a lot of different ways to iterate, and each has its own niche.
each is sufficient for many usages, since I don't often care about the indexes.
each_ with _index acts like Hash#each - you get the value and the index.
each_index - just the indexes. I don't use this one often. Equivalent to "length.times".
map is another way to iterate, useful when you want to transform one array into another.
select is the iterator to use when you want to choose a subset.
inject is useful for generating sums or products, or collecting a single result.
It may seem like a lot to remember, but don't worry, you can get by without knowing all of them. But as you start to learn and use the different methods, your code will become cleaner and clearer, and you'll be on your way to Ruby mastery.
I'm not saying that Array -> |value,index| and Hash -> |key,value| is not insane (see Horace Loeb's comment), but I am saying that there is a sane way to expect this arrangement.
When I am dealing with arrays, I am focused on the elements in the array (not the index because the index is transitory). The method is each with index, i.e. each+index, or |each,index|, or |value,index|. This is also consistent with the index being viewed as an optional argument, e.g. |value| is equivalent to |value,index=nil| which is consistent with |value,index|.
When I am dealing with hashes, I am often more focused on the keys than the values, and I am usually dealing with keys and values in that order, either key => value or hash[key] = value.
If you want duck-typing, then either explicitly use a defined method as Brent Longborough showed, or an implicit method as maxhawkins showed.
Ruby is all about accommodating the language to suit the programmer, not about the programmer accommodating to suit the language. This is why there are so many ways. There are so many ways to think about something. In Ruby, you choose the closest and the rest of the code usually falls out extremely neatly and concisely.
As for the original question, "What is the “right” way to iterate through an array in Ruby?", well, I think the core way (i.e. without powerful syntactic sugar or object oriented power) is to do:
for index in 0 ... array.size
puts "array[#{index}] = #{array[index].inspect}"
end
But Ruby is all about powerful syntactic sugar and object oriented power, but anyway here is the equivalent for hashes, and the keys can be ordered or not:
for key in hash.keys.sort
puts "hash[#{key.inspect}] = #{hash[key].inspect}"
end
So, my answer is, "The “right” way to iterate through an array in Ruby depends on you (i.e. the programmer or the programming team) and the project.". The better Ruby programmer makes the better choice (of which syntactic power and/or which object oriented approach). The better Ruby programmer continues to look for more ways.
Now, I want to ask another question, "What is the “right” way to iterate through a Range in Ruby backwards?"! (This question is how I came to this page.)
It is nice to do (for the forwards):
(1..10).each{|i| puts "i=#{i}" }
but I don't like to do (for the backwards):
(1..10).to_a.reverse.each{|i| puts "i=#{i}" }
Well, I don't actually mind doing that too much, but when I am teaching going backwards, I want to show my students a nice symmetry (i.e. with minimal difference, e.g. only adding a reverse, or a step -1, but without modifying anything else).
You can do (for symmetry):
(a=*1..10).each{|i| puts "i=#{i}" }
and
(a=*1..10).reverse.each{|i| puts "i=#{i}" }
which I don't like much, but you can't do
(*1..10).each{|i| puts "i=#{i}" }
(*1..10).reverse.each{|i| puts "i=#{i}" }
#
(1..10).step(1){|i| puts "i=#{i}" }
(1..10).step(-1){|i| puts "i=#{i}" }
#
(1..10).each{|i| puts "i=#{i}" }
(10..1).each{|i| puts "i=#{i}" } # I don't want this though. It's dangerous
You could ultimately do
class Range
def each_reverse(&block)
self.to_a.reverse.each(&block)
end
end
but I want to teach pure Ruby rather than object oriented approaches (just yet). I would like to iterate backwards:
without creating an array (consider 0..1000000000)
working for any Range (e.g. Strings, not just Integers)
without using any extra object oriented power (i.e. no class modification)
I believe this is impossible without defining a pred method, which means modifying the Range class to use it. If you can do this please let me know, otherwise confirmation of impossibility would be appreciated though it would be disappointing. Perhaps Ruby 1.9 addresses this.
(Thanks for your time in reading this.)
Use each_with_index when you need both.
ary.each_with_index { |val, idx| # ...
The other answers are just fine, but I wanted to point out one other peripheral thing: Arrays are ordered, whereas Hashes are not in 1.8. (In Ruby 1.9, Hashes are ordered by insertion order of keys.) So it wouldn't make sense prior to 1.9 to iterate over a Hash in the same way/sequence as Arrays, which have always had a definite ordering. I don't know what the default order is for PHP associative arrays (apparently my google fu isn't strong enough to figure that out, either), but I don't know how you can consider regular PHP arrays and PHP associative arrays to be "the same" in this context, since the order for associative arrays seems undefined.
As such, the Ruby way seems more clear and intuitive to me. :)
Here are the four options listed in your question, arranged by freedom of control. You might want to use a different one depending on what you need.
Simply go through values:
array.each
Simply go through indices:
array.each_index
Go through indices + index variable:
for i in array
Control loop count + index variable:
array.length.times do | i |
Trying to do the same thing consistently with arrays and hashes might just be a code smell, but, at the risk of my being branded as a codorous half-monkey-patcher, if you're looking for consistent behaviour, would this do the trick?:
class Hash
def each_pairwise
self.each { | x, y |
yield [x, y]
}
end
end
class Array
def each_pairwise
self.each_with_index { | x, y |
yield [y, x]
}
end
end
["a","b","c"].each_pairwise { |x,y|
puts "#{x} => #{y}"
}
{"a" => "Aardvark","b" => "Bogle","c" => "Catastrophe"}.each_pairwise { |x,y|
puts "#{x} => #{y}"
}
I'd been trying to build a menu (in Camping and Markaby) using a hash.
Each item has 2 elements: a menu label and a URL, so a hash seemed right, but the '/' URL for 'Home' always appeared last (as you'd expect for a hash), so menu items appeared in the wrong order.
Using an array with each_slice does the job:
['Home', '/', 'Page two', 'two', 'Test', 'test'].each_slice(2) do|label,link|
li {a label, :href => link}
end
Adding extra values for each menu item (e.g. like a CSS ID name) just means increasing the slice value. So, like a hash but with groups consisting of any number of items. Perfect.
So this is just to say thanks for inadvertently hinting at a solution!
Obvious, but worth stating: I suggest checking if the length of the array is divisible by the slice value.
If you use the enumerable mixin (as Rails does) you can do something similar to the php snippet listed. Just use the each_slice method and flatten the hash.
require 'enumerator'
['a',1,'b',2].to_a.flatten.each_slice(2) {|x,y| puts "#{x} => #{y}" }
# is equivalent to...
{'a'=>1,'b'=>2}.to_a.flatten.each_slice(2) {|x,y| puts "#{x} => #{y}" }
Less monkey-patching required.
However, this does cause problems when you have a recursive array or a hash with array values. In ruby 1.9 this problem is solved with a parameter to the flatten method that specifies how deep to recurse.
# Ruby 1.8
[1,2,[1,2,3]].flatten
=> [1,2,1,2,3]
# Ruby 1.9
[1,2,[1,2,3]].flatten(0)
=> [1,2,[1,2,3]]
As for the question of whether this is a code smell, I'm not sure. Usually when I have to bend over backwards to iterate over something I step back and realize I'm attacking the problem wrong.
In Ruby 2.1, each_with_index method is removed.
Instead you can use each_index
Example:
a = [ "a", "b", "c" ]
a.each_index {|x| print x, " -- " }
produces:
0 -- 1 -- 2 --
The right way is the one you feel most comfortable with and which does what you want it to do. In programming there is rarely one 'correct' way to do things, more often there are multiple ways to choose.
If you are comfortable with certain way of doings things, do just it, unless it doesn't work - then it is time to find better way.
Using the same method for iterating through both arrays and hashes makes sense, for example to process nested hash-and-array structures often resulting from parsers, from reading JSON files etc..
One clever way that has not yet been mentioned is how it's done in the Ruby Facets library of standard library extensions. From here:
class Array
# Iterate over index and value. The intention of this
# method is to provide polymorphism with Hash.
#
def each_pair #:yield:
each_with_index {|e, i| yield(i,e) }
end
end
There is already Hash#each_pair, an alias of Hash#each. So after this patch, we also have Array#each_pair and can use it interchangeably to iterate through both Hashes and Arrays. This fixes the OP's observed insanity that Array#each_with_index has the block arguments reversed compared to Hash#each. Example usage:
my_array = ['Hello', 'World', '!']
my_array.each_pair { |key, value| pp "#{key}, #{value}" }
# result:
"0, Hello"
"1, World"
"2, !"
my_hash = { '0' => 'Hello', '1' => 'World', '2' => '!' }
my_hash.each_pair { |key, value| pp "#{key}, #{value}" }
# result:
"0, Hello"
"1, World"
"2, !"

Resources