Cleaner way of mapping a hash in ruby - ruby

Let's assume I need to do a trivial task on every element of a Hash, e.g. increment its value by 1, or change value into an array containing that value. I've been doing it like this
hash.map{ |k, v| [k, v+1] }.to_h
v+1 is just an example, it can be anything.
Is there any cleaner way to do this? I don't really like mapping a hash to an array of 2-sized arrays, then remembering to convert it to hash again.
Example of what might be nicer:
hash.hash_map{ |v| v+1 }
This way some thing like string conversion (to_s) might be simplified to
hash.hash_map(&:to_s)
Duplication clarification:
I'm not looking for Hash[...] or .to_h, I'm asking if anyone knows a more compact and cleaner solution.

That's just the way Ruby's collection framework works. There is one map method in Enumerable which doesn't know anything about hashes or arrays or lists or sets or trees or streams or whatever else you may come up with. All it knows is that there is a method named each which will yield one single element per iteration. That's it.
Note that this is the same way the collections frameworks of Java and .NET work, too. All collections operations always return the same type: in .NET, that's IEnumerable, in Ruby, that's Array.
Another design approach is that collections operations are type-preserving, i.e. mapping a set will produce a set, etc. That's the way it is done in Smalltalk, for example. However, in Smalltalk, but there it is achieved by copy&pasting almost identical methods into each and every different collection. I.e. if you want to implement your own collection, in Ruby, you only have to implement each, and you get everything else for free, whereas in Smalltalk, you have to implement every single collection method separately. (In Ruby, that would be over 40 methods.)
Scala is the first language that managed to provide a collections framework with type-preserving operations without code duplication, but it took until Scala 2.8 (released in 2010) to figure that out. (The key is the idea of collection builders.) Ruby's collections library was designed in 1993, 17 years before we had figured out how to do type-preserving collections operations without code duplication. Plus, Scala depends heavily on its sophisticated static type system and type-level metaprogramming to find the correct collection builder at compile time. This is not necessary for the scheme to work, but having to look up the builder for every operation at runtime may incur a hefty runtime cost.
What you could do is add new methods that are not part of the standard Enumerable protocol, for example similar to Scala's mapValues and mapKeys.

AFAIK, this does not exist in the Hash out of Ruby box, but here is a simple monkeypatch to achieve what you want:
▶ class Hash
▷ def hash_map &cb
▷ keys.zip(values.map(&cb)).to_h
▷ end
▷ end
There are more readable ways to achieve the requested functionality, but this one uses the built-in map for values once, pretending to be the fastest implementation that comes into my mind.
▶ h = {a: 1, b: 2}
#⇒ { :a => 1, :b => 2 }
▶ h.hash_map do |v| v + 5 end
#⇒ { :a => 6, :b => 7 }

Related

Procedural and Data abstraction in ruby

I'm new to Ruby. I'm learning abstraction principle in ruby.As I understood Procedural abstraction is hiding the implementation details from the user or simply concentrating on the essentials and ignoring the details.
My concern is how to implement it
1) Is it a simple function calling just like this
# function to sort array
# #params array[Array] to be sort
def my_sort(array)
return array if array.size <= 1
swapped = false
while !swapped
swapped = false
0.upto(array.size-2) do |i|
if array[i] > array[i+1]
array[i], array[i+1] = array[i+1], array[i]
swapped = true
end
end
end
array
end
and calling like this
sorted_array = my_sort([12,34,123,43,90,1])
2) How does Data Abstraction differs from Encapsulation
As I understood Data Abstraction is just hiding some member data from other classes.
Data abstraction is fundamental to most object oriented language - wherein the classes are designed to encapsulate data and provide methods to control how that data is modified (if at all), or helper methods to derive meaning of that data.
Ruby's Array class is an example of Data Abstraction. It provides a mechanism to manage an array of Objects, and provides operations that can be performed on that array, without you having to care how internally it is organized.
arr = [1,3,4,5,2,10]
p arr.class # Prints Array
p arr.sort # Prints [1,2,3,4,5,10]
Procedural abstraction is about hiding implementation details of procedure from the user. In the above example, you don't really need to know what sorting algorithm sort method uses internally, you just use it assuming that nice folks in Ruby Core team picked a best one for you.
At the same time, Ruby may not know how to compare two items present in the Array always. For example, below code would not run as Ruby does not know how to compare strings and numbers.
[1,3,4,5,"a","c","b", 2,10].sort
#=> `sort': comparison of Fixnum with String failed (ArgumentError)
It allows us to hook into implementation and help with comparison, even though underlying sorting algorithm remains same (as it is abstracted from the user)
[1,3,4,5,"a","c","b", 2,10].sort { |i,j|
if i.class == String and j.class == String
i <=> j
elsif i.class == Fixnum and j.class == Fixnum
i <=> j
else
0
end
}
#=> [1, 3, 4, 5, 2, 10, "a", "b", "c"]
When writing code for your own problems, procedural abstraction can be used to ensure a procedure often breaks down its problem into sub-problems, and solves each sub-problems using separate procedure. This allows, certain aspects to be extended later (as in above case, comparison could be extended - thanks to Ruby blocks, it was much easier). Template method pattern is good technique to achieve this.
You are returning an array from the method. Data structures are implementation details. If you change the data structure used in the method, you will break the client code. So your example does not hide the implementation details. It does not encapsulate the design decisions so that the client's are insulated from the internal implementation details.
Definition of 'Abstraction' : the quality of dealing with ideas rather than events.
Referring to this answer difference between abstraction and encapsulation? and my understanding I found that in your code the method my_sort fully justifies the Encapsulation as it encapsulates the behavior related to sorting of any single dimension array. However it lacks the abstraction as the method my_sort knows the type of data its gonna process on.
It would have justified Abstraction if it had not known / cared the type of data that comes in via params. In other words, it should have sorted any object that comes in no matter whether it is a list of Fixnum or String or other sortable datatypes.
Encapsulation:
We normally use access modifiers (public, private,..) to differentiate the data/behavior that are to be exposed to the clients and that are to be used internally. The public interface ( Exposed to clients ) are not subject to change as far as possible. However, the private are the behaviors that can change and should not in any case impact the expected behavior of the code that clients rely upon.
Also we separate the sensitive data/behavior to private/protected to prevent accidental modification / misuse. This makes client not to rely on the portion of the code that might change frequently.
So one always need to segregate the core logic to private scope.
Abstraction:
Example:
In case of church there is an abstraction between the confessor and the father / priest. The confessor should not have any idea about the name or any detail of the priest and vice-versa. Anyone can confess and yet hide his/her identity no matter how big mistakes/crimes he/she had committed.

Use case for using array as ruby hash key

Ruby allows using an array as a hash key as shown below:
hash1 = {1 => "one", [2] => 'two', [3,4] => ['three', 'four']}
I am not clear on what common use case for this would be. If people can share some real-world scenarios where this is useful, I would appreciate it.
A great example why you'd want to store arrays as hash keys is for memoizing.
This is an example of how an array as a hash key is useful:
def initialize(*args)
#memoizer ||= {}
return #memoizer[args] if #memoizer[args]
# do what you will with the args in this initializer,
# then create a new instance for the future.
#memoizer[args] = some_calculation(args)
end
I think you're overcomplicating things here. It isn't that Arrays are allowed as keys, it is that almost any object can be a key. From the fine manual:
A Hash is a dictionary-like collection of unique keys and their values. Also called associative arrays, they are similar to Arrays, but where an Array uses integers as its index, a Hash allows you to use any object type.
[...]
A user-defined class may be used as a hash key if the hash and eql? methods are overridden to provide meaningful behavior.
Note that both hash and eql? are in Object so almost everything you'll come across will have them and so can be a key in a Hash. The default implementations may not be terribly meaningful for some arbitrary object but they'll still be there.
Sometimes generality is easier than artificially limiting your options to only those that the language designer can see a use for. Not even Java is that strict.
I guess what I'm trying to say is that I think you're asking the wrong question. The question you should be asking is:
Why should you be forbidden from using an Array as a Hash key?
This is Ruby where (almost) everything is allowed by default so the answer to that question is that we don't want to artificially limit your options, here's a big pile of possibilities, go do something wonderful and unexpected with it.
If the key has some structure, you may want to directly use that:
{
%w[John Travolta] => :foo,
%w[Olivia Newton John] => :bar,
}
Initial state of Othello/Reversi board
Hash.new(:green).merge{
[4, :d] => :white,
[4, :e] => :black,
[5, :d] => :black,
[5, :e] => :white,
}
From your example it seems like it could be used to more efficiently store large and/or sparse matrices. As you can see, if 3 and 4 both share the same values, they can be "compacted" into a single reference. There may be more formal data structures that would use this, but it's been a while since I used "formal" data structures, so I can't think of any off hand.

Why return an enumerator?

I''m curious about why ruby returns an Enumerator instead of an Array for something that seems like Array is an obvious choice. For example:
'foo'.class
# => String
Most people think of a String as an array of chars.
'foo'.chars.class
# => Enumerator
So why does String#chars return an Enumerable instead of an Array? I'm assuming somebody put a lot of thought into this and decided that Enumerator is more appropriate but I don't understand why.
If you want an Array, call #to_a. The difference between Enumerable and Array is that one is lazy and the other eager. It's the good old memory (lazy) vs. cpu (eager) optimization. Apparently they chose lazy, also because
str = "foobar"
chrs = str.chars
chrs.to_a # => ["f", "o", "o", "b", "a", "r"]
str.sub!('r', 'z')
chrs.to_a # => ["f", "o", "o", "b", "a", "z"]
Abstraction - the fact that something may be an Array is an implementation detail you don't care about for many use cases. For those where you do, you can always call .to_a on the Enumerable to get one.
Efficiency - Enumerators are lazy, in that Ruby doesn't have to build the entire list of elements all at once, but can do so one at a time as needed. So only the number you need is actually computed. Of course, this leads to more overhead per item, so it's a trade-off.
Extensibility - the reason chars returns an Enumerable is because it is itself implemented as an enumerator; if you pass a block to it, that block will be executed once per character. That means there's no need for e.g. .chars.each do ... end; you can just do .chars do ... end. This makes it easy to construct operation chains on the characters of the string.
This completely in accordance with the spirit of 1.9: to return enumerators whenever possible. String#bytes, String#lines, String#codepoints, but also methods like Array#permutation all return an enumerator.
In ruby 1.8 String#to_a resulted in an array of lines, but the method is gone in 1.9.
'Most people think of a String as an array of chars' ... only if you think like C or other languages. IMHO, Ruby's object orientation is much more advanced than that. Most Array operations tend to be more Enumerable like, so it probably makes more sense that way.
An array is great for random access to different indexes, but strings are rarely accessed by a particular index. (and if you are trying to to access a particular index, I suspect you are probably doing school work)
If you are trying to inspect each character, Enumerable works. With Enumberable, you have access to map, each, inject, among others. Also for substitution, there are string functions and regular expressions.
Frankly, I can't think of a real world need for an array of chars.
Maybe a string in ruby is mutable? Then having an Array isn't really an obvious choice - the length could change, for instance. But you will still want to enumerate the characters...
Also, you don't really want to be passing around the actual storage for the characters of a string, right? I mean, I don't remember much ruby (it's been a while), but if I were designing the interface, I'd only hand out "copies" for the .chars method/attribute/whatever. Now... Do you want to allocate a new array each time? Or just return a little object that knows how to enumerate the characters in the string? Thus, keeping the implementation hidden.
So, no. Most people don't think of a string as an array of chars. Most people think of a string as a string. With a behavior defined by the library/language/runtime. With an implementation you only need to know when you want to get nasty and all private with stuff below the abstraction belt.
Actually 'foo'.chars passes each character in str to the given block, or returns an enumerator if no block is given.
Check it :
irb(main):017:0> 'foo'.chars
=> #<Enumerable::Enumerator:0xc8ab35 #__args__=[], #__object__="foo", #__method__=:chars>
irb(main):018:0> 'foo'.chars.each {|p| puts p}
f
o
o
=> "foo"

Why were ruby loops designed that way?

As is stated in the title, I was curious to know why Ruby decided to go away from classical for loops and instead use the array.each do ...
I personally find it a little less readable, but that's just my personal opinion. No need to argue about that. On the other hand, I suppose they designed it that way on purpose, there should be a good reason behind.
So, what are the advantages of putting loops that way? What is the "raison d'etre" of this design decision?
This design decision is a perfect example of how Ruby combines the object oriented and functional programming paradigms. It is a very powerful feature that can produce simple readable code.
It helps to understand what is going on. When you run:
array.each do |el|
#some code
end
you are calling the each method of the array object, which, if you believe the variable name, is an instance of the Array class. You are passing in a block of code to this method (a block is equivalent to a function). The method can then evaluate this block and pass in arguments either by using block.call(args) or yield args. each simply iterates through the array and for each element it calls the block you passed in with that element as the argument.
If each was the only method to use blocks, this wouldn't be that useful but many other methods and you can even create your own. Arrays, for example have a few iterator methods including map, which does the same as each but returns a new array containing the return values of the block and select which returns a new array that only contains the elements of the old array for which the block returns a true value. These sorts of things would be tedious to do using traditional looping methods.
Here's an example of how you can create your own method with a block. Let's create an every method that acts a bit like map but only for every n items in the array.
class Array #extending the built in Array class
def every n, &block #&block causes the block that is passed in to be stored in the 'block' variable. If no block is passed in, block is set to nil
i = 0
arr = []
while i < self.length
arr << ( block.nil? ? self[i] : block.call(self[i]) )#use the plain value if no block is given
i += n
end
arr
end
end
This code would allow us to run the following:
[1,2,3,4,5,6,7,8].every(2) #= [1,3,5,7] #called without a block
[1,2,3,4,5,6,7,8,9,10].every(3) {|el| el + 1 } #= [2,5,8,11] #called with a block
Blocks allow for expressive syntax (often called internal DSLs), for example, the Sinatra web microframework.
Sinatra uses methods with blocks to succinctly define http interaction.
eg.
get '/account/:account' do |account|
#code to serve of a page for this account
end
This sort of simplicity would be hard to achieve without Ruby's blocks.
I hope this has allowed you to see how powerful this language feature is.
I think it was mostly because Matz was interested in exploring what a fully object oriented scripting language would look like when he built it; this feature is based heavily on the CLU programming language's iterators.
It has turned out to provide some interesting benefits; a class that provides an each method can 'mix in' the Enumerable module to provide a huge variety of pre-made iteration routines to clients, which reduces the amount of tedious boiler-plate array/list/hash/etc iteration code that must be written. (Ever see java 4 and earlier iterators?)
I think you are kind of biased when you ask that question. Another might ask "why were C for loops designed that way?". Think about it - why would I need to introduce counter variable if I only want to iterate through array's elements? Say, compare these two (both in pseudocode):
for (i = 0; i < len(array); i++) {
elem = array[i];
println(elem);
}
and
for (elem in array) {
println(elem);
}
Why would the first feel more natural than the second, except for historical (almost sociological) reasons?
And Ruby, highly object-oriented as is, takes this even further, making it an array method:
array.each do |elem|
puts elem
end
By making that decision, Matz just made the language lighter for superfluous syntax construct (foreach loop), delegating its use to ordinary methods and blocks (closures). I appreciate Ruby the most just for this very reason - being really rational and economical with language features, but retaining expressiveness.
I know, I know, we have for in Ruby, but most of the people consider it unneccessary.
The do ... end blocks (or { ... }) form a so-called block (almost a closure, IIRC). Think of a block as an anonymous method, that you can pass as argument to another method. Blocks are used a lot in Ruby, and thus this form of iteration is natural for it: the do ... end block is passed as an argument to the method each. Now you can write various variations to each, for example to iterate in reverse or whatnot.
There's also the syntactic sugar form:
for element in array
# Do stuff
end
Blocks are also used for example to filter an array:
array = (1..10).to_a
even = array.select do |element|
element % 2 == 0
end
# "even" now contains [2, 4, 6, 8, 10]
I think it's because it emphasizes the "everything is an object" philosophy behind Ruby: the each method is called on the object.
Then switching to another iterator is much smoother than changing the logic of, for example, a for loop.
Ruby was designed to be expressive, to read as if it was being spoken... Then I think it just evolved from there.
This comes from Smalltalk, that implements control structures as methods, thus reducing the number of keywords and simplifying the parser. Thus allowing controll strucures to serve as proff of concept for the language definition.
In ST, even if conditions are methods, in the fashion:
boolean.ifTrue ->{executeIfBody()}, :else=>-> {executeElseBody()}
In the end, If you ignore your cultural bias, what will be easier to parse for the machine will also be easier to parse by yourself.

Ruby equivalent of C#'s 'yield' keyword, or, creating sequences without preallocating memory

In C#, you could do something like this:
public IEnumerable<T> GetItems<T>()
{
for (int i=0; i<10000000; i++) {
yield return i;
}
}
This returns an enumerable sequence of 10 million integers without ever allocating a collection in memory of that length.
Is there a way of doing an equivalent thing in Ruby? The specific example I am trying to deal with is the flattening of a rectangular array into a sequence of values to be enumerated. The return value does not have to be an Array or Set, but rather some kind of sequence that can only be iterated/enumerated in order, not by index. Consequently, the entire sequence need not be allocated in memory concurrently. In .NET, this is IEnumerable and IEnumerable<T>.
Any clarification on the terminology used here in the Ruby world would be helpful, as I am more familiar with .NET terminology.
EDIT
Perhaps my original question wasn't really clear enough -- I think the fact that yield has very different meanings in C# and Ruby is the cause of confusion here.
I don't want a solution that requires my method to use a block. I want a solution that has an actual return value. A return value allows convenient processing of the sequence (filtering, projection, concatenation, zipping, etc).
Here's a simple example of how I might use get_items:
things = obj.get_items.select { |i| !i.thing.nil? }.map { |i| i.thing }
In C#, any method returning IEnumerable that uses a yield return causes the compiler to generate a finite state machine behind the scenes that caters for this behaviour. I suspect something similar could be achieved using Ruby's continuations, but I haven't seen an example and am not quite clear myself on how this would be done.
It does indeed seem possible that I might use Enumerable to achieve this. A simple solution would be to us an Array (which includes module Enumerable), but I do not want to create an intermediate collection with N items in memory when it's possible to just provide them lazily and avoid any memory spike at all.
If this still doesn't make sense, then consider the above code example. get_items returns an enumeration, upon which select is called. What is passed to select is an instance that knows how to provide the next item in the sequence whenever it is needed. Importantly, the whole collection of items hasn't been calculated yet. Only when select needs an item will it ask for it, and the latent code in get_items will kick into action and provide it. This laziness carries along the chain, such that select only draws the next item from the sequence when map asks for it. As such, a long chain of operations can be performed on one data item at a time. In fact, code structured in this way can even process an infinite sequence of values without any kinds of memory errors.
So, this kind of laziness is easily coded in C#, and I don't know how to do it in Ruby.
I hope that's clearer (I'll try to avoid writing questions at 3AM in future.)
It's supported by Enumerator since Ruby 1.9 (and back-ported to 1.8.7). See Generator: Ruby.
Cliche example:
fib = Enumerator.new do |y|
y.yield i = 0
y.yield j = 1
while true
k = i + j
y.yield k
i = j
j = k
end
end
100.times { puts fib.next() }
Your specific example is equivalent to 10000000.times, but let's assume for a moment that the times method didn't exist and you wanted to implement it yourself, it'd look like this:
class Integer
def my_times
return enum_for(:my_times) unless block_given?
i=0
while i<self
yield i
i += 1
end
end
end
10000.my_times # Returns an Enumerable which will let
# you iterate of the numbers from 0 to 10000 (exclusive)
Edit: To clarify my answer a bit:
In the above example my_times can be (and is) used without a block and it will return an Enumerable object, which will let you iterate over the numbers from 0 to n. So it is exactly equivalent to your example in C#.
This works using the enum_for method. The enum_for method takes as its argument the name of a method, which will yield some items. It then returns an instance of class Enumerator (which includes the module Enumerable), which when iterated over will execute the given method and give you the items which were yielded by the method. Note that if you only iterate over the first x items of the enumerable, the method will only execute until x items have been yielded (i.e. only as much as necessary of the method will be executed) and if you iterate over the enumerable twice, the method will be executed twice.
In 1.8.7+ it has become to define methods, which yield items, so that when called without a block, they will return an Enumerator which will let the user iterate over those items lazily. This is done by adding the line return enum_for(:name_of_this_method) unless block_given? to the beginning of the method like I did in my example.
Without having much ruby experience, what C# does in yield return is usually known as lazy evaluation or lazy execution: providing answers only as they are needed. It's not about allocating memory, it's about deferring computation until actually needed, expressed in a way similar to simple linear execution (rather than the underlying iterator-with-state-saving).
A quick google turned up a ruby library in beta. See if it's what you want.
C# ripped the 'yield' keyword right out of Ruby- see Implementing Iterators here for more.
As for your actual problem, you have presumably an array of arrays and you want to create a one-way iteration over the complete length of the list? Perhaps worth looking at array.flatten as a starting point - if the performance is alright then you probably don't need to go too much further.

Resources