Is map just a more powerful each? [duplicate] - ruby

This question already has answers here:
Can't all or most cases of `each` be replaced with `map`?
(4 answers)
What is the difference between map, each, and collect? [duplicate]
(2 answers)
Closed 8 years ago.
If you want a method that collects an array without modifying it, you can use map, and you'll have something that works the same as each. For example, you could do this:
array.each do |x|
x += 10
print "#{x}"
end
but you could just as easily do this:
array.map{|x| print (x + 10).to_s}
and it would have the exact same result. While each can only do that, map can alter its function using the !, so I don't see why I would use each anymore. Could you explain why I should ever use each instead of map if map seems more versatile?

No. Use each for side-effects; use map for a (side-effect free) transformation.
While they both iterate the enumerable (at some point1), map collects the transformed results which should be used. To say map is a more powerful each is like saying a method that returns an unused value is more powerful than a method does not return a value - it's not of being more powerful, it's about using the correct tool.
Thus, while map can "do" what each does (by evaluation of supplied block), it does more and is useful for a different task: when the transformation, and not the side-effect, is desired. It is often considered poor practice to perform side-effects in a map (excluding, perhaps, the mutation of the mapped objects).
1Furthermore, map and each are not strictly interchangeable. In lazy vs. eager situations, a transformation like map can be lazy while each is only useful for side-effects and is never lazy. (It is not possible for each to be lazy because there is no resulting sequence to "observe" and force the evaluation later.)

Related

SystemStackError when array destructuring with splat operator

I have an application that gathers an (large-ish) amount of data into an array and appends it into an existing array. When I use the splat operator (to use with Array.push), I get a SystemStackError: stack level too deep message. 'Large' is in the range of 150k entries (each entry contains additional objects).
What is the preferred method to merge large arrays in Ruby?
gathered_info = function_that_returns_a_large_array_of_hashes()
dump.push(*gathered_info)
If you want to add a bunch of things to an array then the splat will need to expand those as individual arguments, each of which takes stack space. That's bad for large lists for reasons you've discovered.
You can always just use concat on the array directly:
dump.concat(gathered_info)
That's far less cumbersome.
You normally use a splat because there's no alternative that takes an array instead, but that's not the case here. concat does exactly what you need.

Why must we call to_a on an enumerator object?

The chaining of each_slice and to_a confuses me. I know that each_slice is a member of Enumerable and therefore can be called on enumerable objects like arrays, and chars does return an array of characters.
I also know that each_slice will slice the array in groups of n elements, which is 2 in the below example. And if a block is not given to each_slice, then it returns an Enumerator object.
'186A08'.chars.each_slice(2).to_a
But why must we call to_a on the enumerator object if each_slice has already grouped the array by n elements? Why doesn't ruby just evaluate what the enumerator object is (which is a collection of n elements)?
The purpose of enumerators is lazy evaluation. When you call each_slice, you get back an enumerator object. This object does not calculate the entire grouped array up front. Instead, it calculates each “slice” as it is needed. This helps save on memory, and also allows you quite a bit of flexibility in your code.
This stack overflow post has a lot of information in it that you’ll find useful:
What is the purpose of the Enumerator class in Ruby
To give you a cut and dry answer to your question “Why must I call to_a when...”, the answer is, it hasn’t. It hasn’t yet looped through the array at all. So far it’s just defined an object that says that when it goes though the array, you’re going to want elements two at a time. You then have the freedom to either force it to do the calculation on all elements in the enumerable (by calling to_a), or you could alternatively use next or each to go through and then stop partway through (maybe calculate only half of them as opposed to calculating all of them and throwing the second half away).
It’s similar to how the Range class does not build up the list of elements in the range. (1..100000) doesn’t make an array of 100000 numbers, but instead defines an object with a min and max and certain operations can be performed on that. For example (1..100000).cover?(5) doesn’t build a massive array to see if that number is in there, but instead just sees if 5 is greater than or equal to 1 and less than or equal to 100000.
The purpose of this all is performance and flexibility.
It may be worth considering whether your implementation actually needs to make an array up front, or whether you can actually keep your RAM consumption down a bit by iterating over the enumerator. (If your real world scenario is as simple as you described, an enumerator won’t help much, but if the array actually is large, an enumerator could help you a lot).

Python dictionary or map in elisp

What is the equivalent of a python dictionary like {'a':1, 'b':2} in elisp?
And again, does elisp have any map-reduce api?
Besides association lists,(whose algorithmic complexity is OK for small tables but not for large ones), there are hash tables, you can construct with make-hash-table and puthash, or if you prefer immediate values, you can write them as #s(hash-table data a 1 b 2).
Association lists are the most commonly used associative containers in elisp. It is just a list of key-value cons cells like this ((key . value)). You can use the assoc function to get a value corresponding to a key and rassoc to get a key with the required value.
Elisp comes with the built-in function mapcar which does map, but AFAIK there is no good fold facility. You could emulate it using any of the looping facilities provided. However, the better solution is to use cl-lib and slip into CommonLisp land. In particular, it supplies cl-mapcar and cl-reduce.

Why does Ruby allow me to push an array on itself? [duplicate]

This question already has answers here:
What are recursive arrays good for?
(2 answers)
Closed 8 years ago.
This code is valid in Ruby
a = [5,10,15]
[5,10,15]
a.push a
[5,10,15,[...]]
Resulting in the fourth array slot pointing to the array itself, (seemingly) infinitely. Why does Ruby allow this and does the functionality offer any practical applications?
Since in Ruby everything is an object, variables just point to the objects (more strictly speaking, memory locations). An array is a collection of such a pointers, which means it can store a pointer to itself. It is not an extra feature added in Ruby, it would be actually an extra feature not to allow it.
As for application, check out "What are recursive arrays good for?" (directed graph representation).
Note however, that such an array is not infinite:
a = []
a << a
a.length = 1
Since Ruby is a dynamic language, an array is, in essence, a collection of "any object" so you can push anything you want into it, including other arrays, including (in this case) a reference to itself. It's like an ArrayList<Object> in Java, which can do the same thing (you can add it to itself, but why?)
It might be sometimes useful to have recursive structures, though nothing comes to mind.

Fastest data structure with default values for undefined indexes?

I'm trying to create a 2d array where, when I access an index, will return the value. However, if an undefined index is accessed, it calls a callback and fills the index with that value, and then returns the value.
The array will have negative indexes, too, but I can overcome that by using 4 arrays (one for each quadrant around 0,0).
You can create a Matrix class that relies on tuples and dictionary, with the following behavior :
from collections import namedtuple
2DMatrixEntry = namedtuple("2DMatrixEntry", "x", "y", "value")
matrix = new dict()
defaultValue = 0
# add entry at 0;1
matrix[2DMatrixEntry(0,1)] = 10.0
# get value at 0;1
key = 2DMatrixEntry(0,1)
value = {defaultValue,matrix[key]}[key in matrix]
Cheers
This question is probably too broad for stackoverflow. - There is not a generic "one size fits all" solution for this, and the results depend a lot on the language used (and standard library).
There are several problems in this question. First of all let us consider a 2d array, we say this is simply already part of the language and that such an array grows dynamically on access. If this isn't the case, the question becomes really language dependent.
Now often when allocating memory the language automatically initializes the spots (again language dependent on how this happens and what the best method is, look into RAII). Though I can foresee that actual calculation of the specific cell might be costly (compared to allocation). In that case an interesting thing might be so called "two-phase construction". The array has to be filled with tuples/objects. The default construction of an object sets a bit/boolean to false - indicating that the value is not ready. Then on acces (ie a get() method or a operator() - language dependent) if this bit is false it constructs, else it just reads.
Another method is to use a dictionary/key-value map. Where the key would be the coordinates and the value the value. This has the advantage that the problem of construct-on-access is inherit to the datastructure (though again language dependent). The drawback of using maps however is that lookup speed of a value changes from O(1) to O(logn). (The actual time is widely different depending on the language though).
At last I hope you understand that how to do this depends on more specific requirements, the language you used and other libraries. In the end there is only a single data structure that is in each language: a long sequence of unallocated values. Anything more advanced than that depends on the language.

Resources