Ruby Hash destructive vs. non-destructive method - ruby

Could not find a previous post that answers my question...I'm learning how to use destructive vs. non-destructive methods in Ruby. I found an answer to the exercise I'm working on (destructively adding a number to hash values), but I want to be clear on why some earlier solutions of mine did not work. Here's the answer that works:
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each { |k, v| the_hash[k] = v + number_to_add_to_each_value}
end
These two solutions come back as non-destructive (since they all use "each" I cannot figure out why. To make something destructive is it the equals sign above that does the trick?):
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each_value { |v| v + number_to_add_to_each_value}
end
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each { |k, v| v + number_to_add_to_each_value}
end

The terms "destructive" and "non-destructive" are a bit misleading here. Better is to use the conventional "in-place modification" vs. "returns a copy" terminology.
Generally methods that modify in-place have ! at the end of their name to serve as a warning, like gsub! for String. Some methods that pre-date this convention do not have them, like push for Array.
The = performs an assignment within the loop. Your other examples don't actually do anything useful since each returns the original object being iterated over regardless of any results produced.
If you wanted to return a copy you'd do this:
def modify_a_hash(the_hash, number_to_add)
Hash[
the_hash.collect do |k, v|
[ k, v + number_to_add ]
end
]
end
That would return a copy. The inner operation collect transforms key-value pairs into new key-value pairs with the adjustment applied. No = is required since there's no assignment.
The outer method Hash[] transforms those key-value pairs into a proper Hash object. This is then returned and is independent of the original.
Generally a non-destructive or "return a copy" method needs to create a new, independent version of the thing it's manipulating for the purpose of storing the results. This applies to String, Array, Hash, or any other class or container you might be working with.

Maybe this slightly different example will be helpful.
We have a hash:
2.0.0-p481 :014 > hash
=> {1=>"ann", 2=>"mary", 3=>"silvia"}
Then we iterate over it and change all the letters to the uppercase:
2.0.0-p481 :015 > hash.each { |key, value| value.upcase! }
=> {1=>"ANN", 2=>"MARY", 3=>"SILVIA"}
The original hash has changed because we used upcase! method.
Compare to method without ! sign, that doesn't modify hash values:
2.0.0-p481 :017 > hash.each { |key, value| value.downcase }
=> {1=>"ANN", 2=>"MARY", 3=>"SILVIA"}

Related

How would I remove multiple nested values from a hash

I've got a really confusing question that I can't seem to figure out.
Say I have this hash
hash = {
"lock_version"=>1,
"exhibition_quality"=>false,
"within"=>["FID6", "S5"],
"representation_file"=>{
"lock_version"=>0,
"created_by"=>"admin",
"within"=>["FID6", "S5"]
}
}
How can I delete from "within"=>["FID6", "S5"] a value with the pattern FID<Number> (in this example FID6)?
I've thought about it a bit and used the .delete, .reject! the within but I realised this was deleting the whole key value pair which is not what I want. Thanks a lot for any help that can put me on the right path.
You could use a method or proc/lambda to achieve the result. This solutions splits up the logic in two parts. Removing the actual FID<Number> string from the array and recursively calling the former on the correct key.
remove_fid = ->(array) { array.grep_v(/\AFID\d+\z/) }
remove_nested_fid = lambda do |hash|
hash.merge(
hash.slice('within').transform_values(&remove_fid),
hash.slice('representation_file').transform_values(&remove_nested_fid)
)
end
pp hash.then(&remove_nested_fid) # or remove_nested_fid.call(hash)
# {"lock_version"=>1,
# "exhibition_quality"=>false,
# "within"=>["S5"],
# "representation_file"=>
# {"lock_version"=>0, "created_by"=>"admin", "within"=>["S5"]}}
grep_v removes all strings from the array that do not match the given regex.
slice creates a new hash only containing the given keys. If a key is missing it will not be present in the resulting hash.
transform_values transforms the values of a hash into a new value (similar to map for Array), returning a hash.
merge creates a new hash, merging the hashes together.
This solution does not mutate the original hash structure.
You're going to need to recurse over the Hash and, in cases where the value is a Hash, process it using the same function, repeatedly. This can be problematic in Ruby -- which doesn't handle recursion very well -- if the depth of the tree is too deep, but it's the most natural way to express this type of issue.
def filter_fids(h)
h.each_pair do |k, v|
if v.is_a? Hash
filter_fids v
elsif k == "within"
v.reject! { |x| x.start_with? "FID" }
end
end
end
This will mutate the original data structure in place.

How to convert only strings hash values that are numbers to integers

I have rows of hashes imported from several different XML database dumps that look like this (but with varying keys):
{"Id"=>"1", "Name"=>"Cat", "Description"=>"Feline", "Count"=>"123"}
I tried using #to_i but it converts a non-number string to 0:
"Feline".to_i
# => 0
But what I'd like is a way for "Feline" to remain a string, while Id and Count in the above example become integers 1 and 123.
Is there an easy way to convert only the strings values that are numbers into integers?
One line answer:
Using regex approach
h.merge(h) { |k, v| v.match(/\A[+-]?\d+?(\.\d+)?\Z/) ? v.to_i : v }
Using Integer approach
h.merge(h) { |k, v| Integer(v) rescue v }
use Kernel#Integer:
my_hash = {"Id"=>"1", "Name"=>"Cat", "Description"=>"Feline", "Count"=>"123"}
Hash[ my_hash.map{ |a, b| [ a,
begin
Integer b
rescue ArgumentError
b
end ] } ]
ADDED LATER: With my y_support gem, you can make hash operations even more concise.
require 'y_support/core_ext/hash'
my_hash.with_values { |v| begin
Integer b
rescue ArgumentError
b
end }
YSupport can be installed by gem install y_support and also offers Hash#with_keys, Hash#with_values!, Hash#with_keys! that do what you expect they do, and Hash#modify that expects a binary block returning a pair of values, modifying the hash in place. There have been proposals to add such methods directly to the Ruby core in the future.
I think you know what fields should be integers (your consuming code probably depends on it), so I would recommend you convert the specific fields.
c = Hash[h.map { |k,v| [k, %w(Id Count).include?(k) ? Integer(v) : v ] }]
I had a similar problem to solve, where results for pesticide analysis came into the system as a heterogeneous (bad design!) format... negative integers as special codes (not detected, not tested, not quantified etc...), nil as synonym as not detected, floats for quantified compounds and strings for pass/fail boolean... Hold your horses, this is a 10 years old-running in production, never greenfield-ed highly patched app ;)
Two things that I learned from top rubists:
0) DON'T ITERATE-MODIFY AN ENUMERABLE (return a copy)
1) YOUR REGEX WON'T COVER ALL CASES
While I am not a big fan of rescue, I think it fits the purpose of keeping the code clean. So, I've been using this to mitigate my input:
ha = {
"p_permethrin" => nil,
"p_acequinocyl"=>"0.124",
"p_captan"=>"2.12",
"p_cypermethrin"=>"-6",
"p_cyfluthrin"=>"-6",
"p_fenhexamid"=>"-1",
"p_spinetoram"=>"-6",
"p_pentachloronitrobenzene"=>"-6",
"p_zpass"=>"true"
}
Hash[ha.map{|k,v| [k, (Float(v) rescue v)]}] # allows nil
Hash[ha.map{|k,v| [k, (Float(v) rescue v.to_s)]}] # nit to empty string
I would even
class Hash
# return a copy of the hash, where values are evaluated as Integer and Float
def evaluate_values
Hash[self.map{|k,v| [k, (Float(v) rescue v)]}]
end
end
Using a regex and the ternary operator, you could incorporate this into the logic somewhere:
string =~ /^\d+$/ ? string.to_i : string
This will handle not only integers but all numbers.
my_hash = {"Id"=>"1", "Name"=>"Cat", "Description"=>"Feline", "Count"=>"123"}
result = my_hash.inject({}) { |result,(key,value)|
if value.match(/^\s*[+-]?((\d+_?)*\d+(\.(\d+_?)*\d+)?|\.(\d+_?)*\d+)(\s*|([eE][+-]?(\d+_?)*\d+)\s*)$/)
result[key.to_sym] = value.to_i
else
result[key.to_sym] = value
end
result
}
Thanks to Determine if a string is a valid float value for regexp
Define a new method for String: String#to_number
class String
def to_number
Integer(self) rescue Float(self) rescue self
end
end
Test it:
"1".to_number => 1
"Cat".to_number => "Cat"

Idiomatic way of detecting duplicate keys in Ruby?

I've just noticed that Ruby doesn't raise an exception or even supply a warning if you supply duplicate keys to a hash:
$VERBOSE = true
key_value_pairs_with_duplicates = [[1,"a"], [1, "b"]]
# No warning produced
Hash[key_value_pairs_with_duplicates] # => {1=>"b"}
# Also no warning
hash_created_by_literal_with_duplicate_keys = {1 => "a", 1=> "b"} # => {1=>"b"}
For key_value_pairs_with_duplicates, I could detect duplicate keys by doing
keys = key_value_pairs_with_duplicates.map(&:first)
raise "Duplicate keys" unless keys.uniq == keys
Or by doing
procedurally_produced_hash = {}
key_value_pairs_with_duplicates.each do |key, value|
raise "Duplicate key" if procedurally_produced_hash.has_key?(key)
procedurally_produced_hash[key] = value
end
Or
hash = Hash[key_value_pairs_with_duplicates]
raise "Duplicate keys" unless hash.length == key_value_pairs_with_duplicates.length
But is there an idiomatic way to do it?
Hash#merge takes an optional block to define how to handle duplicate keys.
http://www.ruby-doc.org/core-1.9.3/Hash.html#method-i-merge
Taking advantage of the fact this block is only called on duplicate keys:
>> a = {a: 1, b: 2}
=> {:a=>1, :b=>2}
>> a.merge(c: 3) { |key, old, new| fail "Duplicate key: #{key}" }
=> {:a=>1, :b=>2, :c=>3}
>> a.merge(b: 10, c: 3) { |key, old, new| fail "Duplicate key: #{key}" }
RuntimeError: Duplicate key: b
I think there are two idiomatic ways to handle this:
Use one of the Hash extensions that allow multiple values per key, or
Extend Hash (or patch w/ flag method) and implement []= to throw a dupe key exception.
You could also just decorate an existing hash with the []= that throws, or alias_method--either way, it's straight-forward, and pretty Ruby-ish.
I would simply build a hash form the array, checking for a value before overwriting a key. This way it avoid creating any unnecessary temporary collections.
def make_hash(key_value_pairs_with_duplicates)
result = {}
key_value_pairs_with_duplicates.each do |pair|
key, value = pair
raise "Duplicate key" if result.has_key?(key)
result[key] = value
end
result
end
But no, I don't think there is an "idiomatic" way to doing this. It just follows the last in rule, and if you don't like that it's up to you to fix it.
In the literal form you are probably out of luck. But in the literal form why would you need to validate this? You are not getting it from a dynamic source if it's literal, so if you choose to dupe keys, it's your own fault. Just, uh... don't do that.
In other answers I've already stated my opinion that Ruby needs a standard method to build a hash from an enumerable. So, as you need your own abstraction for the task anyway, let's just take Facets' mash with the implementation you like the most (Enumerable#inject + Hash#update looks good to me) and add the check:
module Enumerable
def mash
inject({}) do |hash, item|
key, value = block_given? ? yield(item) : item
fail("Repeated key: #{key}") if hash.has_key?(key) # <- new line
hash.update(key => value)
end
end
end
I think most people here overthink the problem. To deal with duplicate keys, I'd simply do this:
arr = [ [:a,1], [:b,2], [:c,3] ]
hsh = {}
arr.each do |k,v|
raise("Whoa! I already have :#{k} key.") if hsh.has_key?(k)
x[k] = v
end
Or make a method out of this, maybe even extend a Hash class with it. Or create a child of Hash class (UniqueHash?) which would have this functionality by default.
But is it worth it? (I don't think so.) How often do we need to deal with duplicate keys in hash like this?
Latest Ruby versions do supply a warning when duplicating a key. However they still go ahead and re-assign the duplicate's value to the key, which is not always desired behaviour. IMO, the best way to deal with this is to override the construction/assignment methods. E.g. to override #[]=
class MyHash < Hash
def []=(key,val)
if self.has_key?(key)
puts("key: #{key} already has a value!")
else
super(key,val)
end
end
end
So when you run:
h = MyHash.new
h[:A] = ['red']
h[:B] = ['green']
h[:A] = ['blue']
it will output
key: A already has a value!
{:A=>["red"], :B=>["green"]}
Of course you can tailor the overridden behaviour any which way you want.
I would avoid using an array to model an hash at all. In other words, don't construct the array of pairs in the first place. I'm not being facetious or dismissive. I'm speaking as someone who has used arrays of pairs and (even worse) balanced arrays many times, and always regretted it.

Search ruby hash for empty value

I have a ruby hash like this
h = {"a" => "1", "b" => "", "c" => "2"}
Now I have a ruby function which evaluates this hash and returns true if it finds a key with an empty value. I have the following function which always returns true even if all keys in the hash are not empty
def hash_has_blank(hsh)
hsh.each do |k,v|
if v.empty?
return true
end
end
return false
end
What am I doing wrong here?
Try this:
def hash_has_blank hsh
hsh.values.any? &:empty?
end
Or:
def hash_has_blank hsh
hsh.values.any?{|i|i.empty?}
end
If you are using an old 1.8.x Ruby
I hope you're ready to learn some ruby magic here. I wouldn't define such a function globally like you did. If it's an operation on a hash, than it should be an instance method on the Hash class you can do it like this:
class Hash
def has_blank?
self.reject{|k,v| !v.nil? || v.length > 0}.size > 0
end
end
reject will return a new hash with all the empty strings, and than it will be checked how big this new hash is.
a possibly more efficient way (it shouldn't traverse the whole array):
class Hash
def has_blank?
self.values.any?{|v| v.nil? || v.length == 0}
end
end
But this will still traverse the whole hash, if there is no empty value
I've changed the empty? to !nil? || length >0 because I don't know how your empty method works.
If you just want to check if any of the values is an empty string you could do
h.has_value?('')
but your function seems to work fine.
I'd consider refactoring your model domain. Obviously the hash represents something tangible. Why not make it an object? If the item can be completely represented by a hash, you may wish to subclass Hash. If it's more complicated, the hash can be an attribute.
Secondly, the reason for which you are checking blanks can be named to better reflect your domain. You haven't told us the "why", but let's assume that your Item is only valid if it doesn't have any blank values.
class MyItem < Hash
def valid?
!invalid?
end
def invalid?
values.any?{|i| i.empty?}
end
end
The point is, if you can establish a vocabulary that makes sense in your domain, your code will be cleaner and more understandable. Using a Hash is just a means to an end and you'd be better off using more descriptive, domain-specific terms.
Using the example above, you'd be able to do:
my_item = MyItem["a" => "1", "b" => "", "c" => "2"]
my_item.valid? #=> false

What is the "right" way to iterate through an array in Ruby?

PHP, for all its warts, is pretty good on this count. There's no difference between an array and a hash (maybe I'm naive, but this seems obviously right to me), and to iterate through either you just do
foreach (array/hash as $key => $value)
In Ruby there are a bunch of ways to do this sort of thing:
array.length.times do |i|
end
array.each
array.each_index
for i in array
Hashes make more sense, since I just always use
hash.each do |key, value|
Why can't I do this for arrays? If I want to remember just one method, I guess I can use each_index (since it makes both the index and value available), but it's annoying to have to do array[index] instead of just value.
Oh right, I forgot about array.each_with_index. However, this one sucks because it goes |value, key| and hash.each goes |key, value|! Is this not insane?
This will iterate through all the elements:
array = [1, 2, 3, 4, 5, 6]
array.each { |x| puts x }
# Output:
1
2
3
4
5
6
This will iterate through all the elements giving you the value and the index:
array = ["A", "B", "C"]
array.each_with_index {|val, index| puts "#{val} => #{index}" }
# Output:
A => 0
B => 1
C => 2
I'm not quite sure from your question which one you are looking for.
I think there is no one right way. There are a lot of different ways to iterate, and each has its own niche.
each is sufficient for many usages, since I don't often care about the indexes.
each_ with _index acts like Hash#each - you get the value and the index.
each_index - just the indexes. I don't use this one often. Equivalent to "length.times".
map is another way to iterate, useful when you want to transform one array into another.
select is the iterator to use when you want to choose a subset.
inject is useful for generating sums or products, or collecting a single result.
It may seem like a lot to remember, but don't worry, you can get by without knowing all of them. But as you start to learn and use the different methods, your code will become cleaner and clearer, and you'll be on your way to Ruby mastery.
I'm not saying that Array -> |value,index| and Hash -> |key,value| is not insane (see Horace Loeb's comment), but I am saying that there is a sane way to expect this arrangement.
When I am dealing with arrays, I am focused on the elements in the array (not the index because the index is transitory). The method is each with index, i.e. each+index, or |each,index|, or |value,index|. This is also consistent with the index being viewed as an optional argument, e.g. |value| is equivalent to |value,index=nil| which is consistent with |value,index|.
When I am dealing with hashes, I am often more focused on the keys than the values, and I am usually dealing with keys and values in that order, either key => value or hash[key] = value.
If you want duck-typing, then either explicitly use a defined method as Brent Longborough showed, or an implicit method as maxhawkins showed.
Ruby is all about accommodating the language to suit the programmer, not about the programmer accommodating to suit the language. This is why there are so many ways. There are so many ways to think about something. In Ruby, you choose the closest and the rest of the code usually falls out extremely neatly and concisely.
As for the original question, "What is the “right” way to iterate through an array in Ruby?", well, I think the core way (i.e. without powerful syntactic sugar or object oriented power) is to do:
for index in 0 ... array.size
puts "array[#{index}] = #{array[index].inspect}"
end
But Ruby is all about powerful syntactic sugar and object oriented power, but anyway here is the equivalent for hashes, and the keys can be ordered or not:
for key in hash.keys.sort
puts "hash[#{key.inspect}] = #{hash[key].inspect}"
end
So, my answer is, "The “right” way to iterate through an array in Ruby depends on you (i.e. the programmer or the programming team) and the project.". The better Ruby programmer makes the better choice (of which syntactic power and/or which object oriented approach). The better Ruby programmer continues to look for more ways.
Now, I want to ask another question, "What is the “right” way to iterate through a Range in Ruby backwards?"! (This question is how I came to this page.)
It is nice to do (for the forwards):
(1..10).each{|i| puts "i=#{i}" }
but I don't like to do (for the backwards):
(1..10).to_a.reverse.each{|i| puts "i=#{i}" }
Well, I don't actually mind doing that too much, but when I am teaching going backwards, I want to show my students a nice symmetry (i.e. with minimal difference, e.g. only adding a reverse, or a step -1, but without modifying anything else).
You can do (for symmetry):
(a=*1..10).each{|i| puts "i=#{i}" }
and
(a=*1..10).reverse.each{|i| puts "i=#{i}" }
which I don't like much, but you can't do
(*1..10).each{|i| puts "i=#{i}" }
(*1..10).reverse.each{|i| puts "i=#{i}" }
#
(1..10).step(1){|i| puts "i=#{i}" }
(1..10).step(-1){|i| puts "i=#{i}" }
#
(1..10).each{|i| puts "i=#{i}" }
(10..1).each{|i| puts "i=#{i}" } # I don't want this though. It's dangerous
You could ultimately do
class Range
def each_reverse(&block)
self.to_a.reverse.each(&block)
end
end
but I want to teach pure Ruby rather than object oriented approaches (just yet). I would like to iterate backwards:
without creating an array (consider 0..1000000000)
working for any Range (e.g. Strings, not just Integers)
without using any extra object oriented power (i.e. no class modification)
I believe this is impossible without defining a pred method, which means modifying the Range class to use it. If you can do this please let me know, otherwise confirmation of impossibility would be appreciated though it would be disappointing. Perhaps Ruby 1.9 addresses this.
(Thanks for your time in reading this.)
Use each_with_index when you need both.
ary.each_with_index { |val, idx| # ...
The other answers are just fine, but I wanted to point out one other peripheral thing: Arrays are ordered, whereas Hashes are not in 1.8. (In Ruby 1.9, Hashes are ordered by insertion order of keys.) So it wouldn't make sense prior to 1.9 to iterate over a Hash in the same way/sequence as Arrays, which have always had a definite ordering. I don't know what the default order is for PHP associative arrays (apparently my google fu isn't strong enough to figure that out, either), but I don't know how you can consider regular PHP arrays and PHP associative arrays to be "the same" in this context, since the order for associative arrays seems undefined.
As such, the Ruby way seems more clear and intuitive to me. :)
Here are the four options listed in your question, arranged by freedom of control. You might want to use a different one depending on what you need.
Simply go through values:
array.each
Simply go through indices:
array.each_index
Go through indices + index variable:
for i in array
Control loop count + index variable:
array.length.times do | i |
Trying to do the same thing consistently with arrays and hashes might just be a code smell, but, at the risk of my being branded as a codorous half-monkey-patcher, if you're looking for consistent behaviour, would this do the trick?:
class Hash
def each_pairwise
self.each { | x, y |
yield [x, y]
}
end
end
class Array
def each_pairwise
self.each_with_index { | x, y |
yield [y, x]
}
end
end
["a","b","c"].each_pairwise { |x,y|
puts "#{x} => #{y}"
}
{"a" => "Aardvark","b" => "Bogle","c" => "Catastrophe"}.each_pairwise { |x,y|
puts "#{x} => #{y}"
}
I'd been trying to build a menu (in Camping and Markaby) using a hash.
Each item has 2 elements: a menu label and a URL, so a hash seemed right, but the '/' URL for 'Home' always appeared last (as you'd expect for a hash), so menu items appeared in the wrong order.
Using an array with each_slice does the job:
['Home', '/', 'Page two', 'two', 'Test', 'test'].each_slice(2) do|label,link|
li {a label, :href => link}
end
Adding extra values for each menu item (e.g. like a CSS ID name) just means increasing the slice value. So, like a hash but with groups consisting of any number of items. Perfect.
So this is just to say thanks for inadvertently hinting at a solution!
Obvious, but worth stating: I suggest checking if the length of the array is divisible by the slice value.
If you use the enumerable mixin (as Rails does) you can do something similar to the php snippet listed. Just use the each_slice method and flatten the hash.
require 'enumerator'
['a',1,'b',2].to_a.flatten.each_slice(2) {|x,y| puts "#{x} => #{y}" }
# is equivalent to...
{'a'=>1,'b'=>2}.to_a.flatten.each_slice(2) {|x,y| puts "#{x} => #{y}" }
Less monkey-patching required.
However, this does cause problems when you have a recursive array or a hash with array values. In ruby 1.9 this problem is solved with a parameter to the flatten method that specifies how deep to recurse.
# Ruby 1.8
[1,2,[1,2,3]].flatten
=> [1,2,1,2,3]
# Ruby 1.9
[1,2,[1,2,3]].flatten(0)
=> [1,2,[1,2,3]]
As for the question of whether this is a code smell, I'm not sure. Usually when I have to bend over backwards to iterate over something I step back and realize I'm attacking the problem wrong.
In Ruby 2.1, each_with_index method is removed.
Instead you can use each_index
Example:
a = [ "a", "b", "c" ]
a.each_index {|x| print x, " -- " }
produces:
0 -- 1 -- 2 --
The right way is the one you feel most comfortable with and which does what you want it to do. In programming there is rarely one 'correct' way to do things, more often there are multiple ways to choose.
If you are comfortable with certain way of doings things, do just it, unless it doesn't work - then it is time to find better way.
Using the same method for iterating through both arrays and hashes makes sense, for example to process nested hash-and-array structures often resulting from parsers, from reading JSON files etc..
One clever way that has not yet been mentioned is how it's done in the Ruby Facets library of standard library extensions. From here:
class Array
# Iterate over index and value. The intention of this
# method is to provide polymorphism with Hash.
#
def each_pair #:yield:
each_with_index {|e, i| yield(i,e) }
end
end
There is already Hash#each_pair, an alias of Hash#each. So after this patch, we also have Array#each_pair and can use it interchangeably to iterate through both Hashes and Arrays. This fixes the OP's observed insanity that Array#each_with_index has the block arguments reversed compared to Hash#each. Example usage:
my_array = ['Hello', 'World', '!']
my_array.each_pair { |key, value| pp "#{key}, #{value}" }
# result:
"0, Hello"
"1, World"
"2, !"
my_hash = { '0' => 'Hello', '1' => 'World', '2' => '!' }
my_hash.each_pair { |key, value| pp "#{key}, #{value}" }
# result:
"0, Hello"
"1, World"
"2, !"

Resources