Updating Hash Values in Ruby Clarified - ruby

I was going to comment on the original question but I don't have the reputation to do so yet....
I too was wondering how to easily update all the values in a hash, or if there was some kind of equivalent .map! method for hashes. Someone put up this elegant solution:
hash.update(hash){|key,v1| expresion}
on this question:
Ruby: What is the easiest method to update Hash values?
My questions is how does the block know to iterate over each element in the hash? For example, I'd have to call .each on a hash to access each element normally so why isn't it something like:
hash.update(hash.each) do |key ,value|
value+=1
end
In the block with {|key, value| expression} I am accessing each individual hash element yet I don't have to explicitly tell the system this? Why not? Thank you very much.

Hash#update is an alias for Hash#merge! which is more descriptive.
When calling the method with a block, the following happens (excerpt from the docs):
If [a] block is specified, [...] the value of each duplicate key is
determined by calling the block with the key [...]
So, the above code works like this:
The hash is merged with itself, and for each duplicate key the block is called. As we merge the hash with itself, every newly added key is a duplicate and therefore the block is invoked. The result is that every value in the hash gets replaced by expresion.

Hash#update takes a hash as the first parameter, and an optional block as the second parameter. If the second parameter is left out, the method will internally loop on each key-value pair in the supplied hash and use them to merge into the original hash.
If the block (second parameter) is supplied, the method does exactly the same thing. It loops over each key-value in the supplied hash and merges it in. The only difference is where a collision is found (the original hash already has an entry for a specific key). In this case the block is called to help resolve the conflict.
Based on this understanding, simply passing the hash into itself will cause it to loop over every key-value because that's how update always works. Calling .each would be redundant.
To see this more clearly, take a look at the source code for the #update method, and note the internal call to rb_hash_foreach in either logic branch.

Related

How to add a value to an existing hash without a key

I need to assign the numbers 0-100000 to a hash without giving a key.
Ruby uses Murmur as hash function. How can I add a value without having a key, like in C, letting it handle collision and other things. Is it possible? Can I give just the value to hash and let it evaluate the key, then insert to itself?
In a normal hashing operation, we have a hash function, and a table. We use value as the argument of the hash function, then we get a key in return. The value is inserted to the key location in the table (if a collision happens, double hashing or something else used).
Is it possible to do the same type of hashing in Ruby? Or am I stuck with default ways? Can I just throw the value into a function, then it evaluates the key, and inserts the value to the hash table or not?
Just store into the hash using the calculated hash of the key, rather than the key itself:
hash[hash_func(key)] = value
That is, instead of mapping key -> value directly, this maps hash_func(key) -> value. The implementation may pass your hashed key value through another hash function internally, but you needn't care about that.
However, in comments it now comes to light that you want to apply the hash function to the value, not any other key. In that case, just use a set and be done with it. Then, all you need to do is add values to the set:
s = Set.new
s.add(value)
There's no need to calculate the hash of anything; Set will take care of it for you.
In short, this seems to be a case of the XY Problem. You needed to store a set of values in a data structure (and presumably be able to check if those values were stored in an efficient manner). Instead of asking about this, you asked about hash functions and tables. If you had asked about what you really needed, instead of asking about something else that you thought you could use to solve the original problem, you would have had a useful answer much more quickly.
The common solution is to simply use the value as a key. Hence:
value = "xxx"
hash[value] = 1
This way, you clearly document that the actual values (all 1) of this particular hash are of no use, and you will get de-duplicated values. Hash will do the usual hashing internally, you don't need to worry about it at all.
I use 1 as value here, but the actual value is completely irrelevant. I don't use nil as that is the default return value of hash[nonexistant_value].
If your values are more complex, check out http://docs.ruby-lang.org/en/2.0.0/Hash.html for specifics about them.

Maximum arity of ruby function?

I am looking to make an efficient function to clear out a redis-based cache.
I have a method call that returns a number of keys from redis:
$redis.keys("foo:*")
That returns all the keys that start with "foo:". Next, I'd like to delete all the values for these keys.
One (memory-intensive) way to do this is:
$redis.keys("foo:*").each do |key|
$redis.del(key)
end
I'd like to avoid loading all the keys into memory, and then making numerous requests to the redis server.
Another way that I like is to use the splat operator:
keys = $redis.keys("foo:*")
$redis.del(*keys)
The problem is that I don't know what the maximum arity of the $redis.del method, nor of any ruby method, I can't seem to find it online.
What is the maximum arity?
#muistooshort in the comments had a good suggestion that turned out to be right, the redis driver knows what to do with an array argument:
# there are 1,000,000 keys of the form "foo:#{number}"
keys = $redis.keys("foo:*")
$redis.del(keys) # => 1000000
Simply pass an array of keys to $redis.del

Can't all or most cases of `each` be replaced with `map`?

The difference between Enumerable#each and Enumerable#map is whether it returns the receiver or the mapped result. Getting back to the receiver is trivial and you usually do not need to continue a method chain after each like each{...}.another_method (I probably have not seen such case. Even if you want to get back to the receiver, you can do that with tap). So I think all or most cases where Enumerable#each is used can be replaced by Enumerable#map. Am I wrong? If I am right, what is the purpose of each? Is map slower than each?
Edit:
I know that there is a common practice to use each when you are not interested in the return value. I am not interested in whether such practice exists, but am interested in whether such practice makes sense other than from the point of view of convention.
The difference between map and each is more important than whether one returns a new array and the other doesn't. The important difference is in how they communicate your intent.
When you use each, your code says "I'm doing something for each element." When you use map, your code says "I'm creating a new array by transforming each element."
So while you could use map in place of each, performance notwithstanding, the code would now be lying about its intent to anyone reading it.
The choice between map or each should be decided by the desired end result: a new array or no new array. The result of map can be huge and/or silly:
p ("aaaa".."zzzz").map{|word| puts word} #huge and useless array of nil's
I agree with what you said. Enumerable#each simply returns the original object it was called on while Enumerable#map sets the current element being iterated over to the return value of the block, and then returns a new object with those changes.
Since Enumerable#each simply returns the original object itself, it can be very well preferred over the map when it comes to cases where you need to simply iterate or traverse over elements.
In fact, Enumerable#each is a simple and universal way of doing a traditional iterating for loop, and each is much preferred over for loops in Ruby.
You can see the significant difference between map and each when you're composing these enumaratiors.
For example you need to get new array with indixes in it:
array.each.with_index.map { |index, element| [index, element] }
Or for example you just need to apply some method to all elements in array and print result without changing the original array:
m = 2.method(:+)
[1,2,3].each { |a| puts m.call(a) } #=> prints 3, 4, 5
And there's a plenty another examples where the difference between each and map is important key in the writing code in functional style.

How does "(1..4).inject(&:+)" work in Ruby

I find this code in Ruby to be pretty intriguing
(1..4).inject(&:+)
Ok, I know what inject does, and I know this code is basically equivalent to
(1..4).inject(0) {|a,n| a + n}
but how exactly does it work?
Why &:+ is the same as writing the block {|a,n| a + n}?
Why it doesn't need an initial value? I'm ok with the inicial value being 0, but (1..4).inject(&:*) also works, and there the initial value must be 1...
From Ruby documentation:
If you specify a symbol instead, then each element in the collection will be passed to the named method of memo
So, specifying a symbol is equivalent to passing the following block:
{|memo, a| memo.send(sym, a)}
If you do not explicitly specify an initial value for memo, then uses the first element of collection is used as the initial value of memo.
So, there is no magic, Ruby simply takes the first element as the initial value and starts injecting from the second element. You can check it by writing [].inject(:+): it returns nil as opposed to [].inject(0, :+) which returns 0.
Edit: I didn't notice the ampersand. You don't need it, inject will work with a symbol. But if you do write it, the symbol is converted to block, it can be useful with other methods

ruby hash autovivification (facets)

Here is a clever trick to enable hash autovivification in ruby (taken from facets):
# File lib/core/facets/hash/autonew.rb, line 19
def self.autonew(*args)
leet = lambda { |hsh, key| hsh[key] = new( &leet ) }
new(*args,&leet)
end
Although it works (of course), I find it really frustrating that I can't figure out how this two liner does what it does.
leet is put as a default value. So that then just accessing h['new_key'] somehow brings it up and creates 'new_key' => {}
Now, I'd expect h['new_key'] returning default value object as opposed to evaluating it. That is, 'new_key' => {} is not automatically created. So how does leet actually get called? Especially with two parameters?
The standard new method for Hash accepts a block. This block is called in the event of trying to access a key in the Hash which does not exist. The block is passed the Hash itself and the key that was requested (the two parameters) and should return the value that should be returned for the requested key.
You will notice that the leet lambda does 2 things. It returns a new Hash with leet itself as the block for handling defaults. This is the behaviour which allows autonew to work for Hashes of arbitrary depth. It also assigns this new Hash to hsh[key] so that next time you request the same key you will get the existing Hash rather than a new one being created.
It's also worth noting that this code can be made into a one-liner as follows:
def self.autonew(*args)
new(*args){|hsh, key| hsh[key] = Hash.new(&hsh.default_proc) }
end
The call to Hash#default_proc returns the proc that was used to create the parent, so we have a nice recursive setup here.
I talk about a similar case to this on my blog.
Alternatively, you might consider my xkeys gem. It's a module that you can use to extend arrays or hashes to facilitate nested access.
If you look for something that doesn't exist yet, you get a nil value (or another value or an exception if you prefer) without creating anything by looking. It can also append to the end of arrays.
You can opt to autovivify either hashes or arrays for integer keys (but just once for the entire structure).

Resources