Why does Ruby allow me to push an array on itself? [duplicate] - ruby

This question already has answers here:
What are recursive arrays good for?
(2 answers)
Closed 8 years ago.
This code is valid in Ruby
a = [5,10,15]
[5,10,15]
a.push a
[5,10,15,[...]]
Resulting in the fourth array slot pointing to the array itself, (seemingly) infinitely. Why does Ruby allow this and does the functionality offer any practical applications?

Since in Ruby everything is an object, variables just point to the objects (more strictly speaking, memory locations). An array is a collection of such a pointers, which means it can store a pointer to itself. It is not an extra feature added in Ruby, it would be actually an extra feature not to allow it.
As for application, check out "What are recursive arrays good for?" (directed graph representation).
Note however, that such an array is not infinite:
a = []
a << a
a.length = 1

Since Ruby is a dynamic language, an array is, in essence, a collection of "any object" so you can push anything you want into it, including other arrays, including (in this case) a reference to itself. It's like an ArrayList<Object> in Java, which can do the same thing (you can add it to itself, but why?)
It might be sometimes useful to have recursive structures, though nothing comes to mind.

Related

Mutable data types that use stack allocation

Based on my earlier question, I understand the benefit of using stack allocation. Suppose I have an array of arrays. For example, A is a list of matrices and each element A[i] is a 1x3 matrix. The length of A and the dimension of A[i] are known at run time (given by the user). Each A[i] is a matrix of Float64 and this is also known at run time. However, through out the program, I will be modifying the values of A[i] element by element. What data structure can also allow me to use stack allocation? I tried StaticArrays but it doesn't allow me to modify a static array.
StaticArrays defines MArray (MVector, MMatrix) types that are fixed-size and mutable. If you use these there's a higher chance of the compiler determining that they can be stack-allocated, but it's not guaranteed. Moreover, since the pattern you're using is that you're passing the mutable state vector into a function which presumably modifies it, it's not going to be valid or helpful to stack allocate that anyway. If you're going to allocate state once and modify it throughout the program, it doesn't really matter if it is heap or stack allocated—stack allocation is only a big win for objects that are allocated, used locally and then don't escape the local scope, so they can be “freed” simply by popping the stack.
From the code snippet you showed in the linked question, the state vector is allocated in the outer function, test_for_loop, which shouldn't be a big deal since it's done once at the beginning of execution. Using a variably sized state vector to index into an array with a splat (...) might be an issue, however, and that's done in test_function. Using something with fixed size like MVector might be better for that. It might, however, be better still, to use a state tuple and return a new rather than mutated state tuple at the end. The compiler is very good at turning that kind of thing into very efficient code because of immutability.
Note that by convention test_function should be called test_function! since it modifies its M argument and even more so if it modifies the state vector.
I would also note that this isn't a great question/answer pair since it's not standalone at all and really just a continuation of your other question. StackOverflow isn't very good for this kind of iterative question/discussion interaction, I'm afraid.

Is there a way to get the Ruby runtime to combine frozen identical objects into a single instance?

I have data in memory, especially strings, that have large numbers of duplicates. We're hitting the ceiling with memory sometimes and are trying to reduce our footprint. I thought that if I froze the strings, then the Ruby runtime would combine them into single objects in memory. So I thought that this code would return a lower number, ideally, 1, but it did not:
a = Array.new(1000) { 'foo'.dup.freeze } # create separate objects, but freeze them
sleep 5 # give the runtime some time to combine the objects
a.map(&:object_id).uniq.size # => 1000
I guess this makes sense, because if there was a reference to the duplicated object (e.g. object id #202), and all of the frozen strings are combined to use #200, then dereferencing #202 will fail. So maybe my question doesn't make sense.
I guess the best strategy for me to save memory might be to convert the strings to symbols. I am aware that they will never be garbage collected, there would be a small enough number of them that this would not be a problem. Is there a better way?
You basically have the right idea, but in my opinion you found a big gotcha in Ruby. You are correct that Ruby can dedup frozen strings to save memory but in general frozen ≠ deduped!!!
tl;dr the reason is because the two operations have different semantics. Always use String#-# if you want it deduped.
Recall that freeze is a method of Object, so it has to work with every class. In English, freeze is "make it so no further changes can be made to this object and also return the same object so that I can keep calling methods on it". In particular, it would be odd if x.freeze != x. Imagine if I had two arrays that I was modifying, then decided to freeze them. Would it make sense for the interpreter to then iterate through both arrays to see if their contents are equal and to decide to completely throw away one of them? That could be very expensive. So in general freeze does not promise this behavior and always returns the same object, just frozen.
Deduping works very differently because when you call -myStr you're actually saying "return the unique frozen version of this string in memory". In most cases the whole point is to get a different object than the one in myStr (so that the GC can clean up that string and only keep the frozen one).
Unfortunately, the distinction is muddled since if you call freeze on a string literal, Ruby will dedup it automatically! This is sensible because there's no way to get a reference to the original literal object; the fact that the interpreter is allowing x.freeze != x doesn't matter, so we might as well save some memory. But it might also give the impression that freeze does guarantee deduping, when in fact it does not.
This gotcha was discussed when string deduping was first introduced, so it is definitely an intentional design decision by the Ruby developers.

Why must we call to_a on an enumerator object?

The chaining of each_slice and to_a confuses me. I know that each_slice is a member of Enumerable and therefore can be called on enumerable objects like arrays, and chars does return an array of characters.
I also know that each_slice will slice the array in groups of n elements, which is 2 in the below example. And if a block is not given to each_slice, then it returns an Enumerator object.
'186A08'.chars.each_slice(2).to_a
But why must we call to_a on the enumerator object if each_slice has already grouped the array by n elements? Why doesn't ruby just evaluate what the enumerator object is (which is a collection of n elements)?
The purpose of enumerators is lazy evaluation. When you call each_slice, you get back an enumerator object. This object does not calculate the entire grouped array up front. Instead, it calculates each “slice” as it is needed. This helps save on memory, and also allows you quite a bit of flexibility in your code.
This stack overflow post has a lot of information in it that you’ll find useful:
What is the purpose of the Enumerator class in Ruby
To give you a cut and dry answer to your question “Why must I call to_a when...”, the answer is, it hasn’t. It hasn’t yet looped through the array at all. So far it’s just defined an object that says that when it goes though the array, you’re going to want elements two at a time. You then have the freedom to either force it to do the calculation on all elements in the enumerable (by calling to_a), or you could alternatively use next or each to go through and then stop partway through (maybe calculate only half of them as opposed to calculating all of them and throwing the second half away).
It’s similar to how the Range class does not build up the list of elements in the range. (1..100000) doesn’t make an array of 100000 numbers, but instead defines an object with a min and max and certain operations can be performed on that. For example (1..100000).cover?(5) doesn’t build a massive array to see if that number is in there, but instead just sees if 5 is greater than or equal to 1 and less than or equal to 100000.
The purpose of this all is performance and flexibility.
It may be worth considering whether your implementation actually needs to make an array up front, or whether you can actually keep your RAM consumption down a bit by iterating over the enumerator. (If your real world scenario is as simple as you described, an enumerator won’t help much, but if the array actually is large, an enumerator could help you a lot).

Is map just a more powerful each? [duplicate]

This question already has answers here:
Can't all or most cases of `each` be replaced with `map`?
(4 answers)
What is the difference between map, each, and collect? [duplicate]
(2 answers)
Closed 8 years ago.
If you want a method that collects an array without modifying it, you can use map, and you'll have something that works the same as each. For example, you could do this:
array.each do |x|
x += 10
print "#{x}"
end
but you could just as easily do this:
array.map{|x| print (x + 10).to_s}
and it would have the exact same result. While each can only do that, map can alter its function using the !, so I don't see why I would use each anymore. Could you explain why I should ever use each instead of map if map seems more versatile?
No. Use each for side-effects; use map for a (side-effect free) transformation.
While they both iterate the enumerable (at some point1), map collects the transformed results which should be used. To say map is a more powerful each is like saying a method that returns an unused value is more powerful than a method does not return a value - it's not of being more powerful, it's about using the correct tool.
Thus, while map can "do" what each does (by evaluation of supplied block), it does more and is useful for a different task: when the transformation, and not the side-effect, is desired. It is often considered poor practice to perform side-effects in a map (excluding, perhaps, the mutation of the mapped objects).
1Furthermore, map and each are not strictly interchangeable. In lazy vs. eager situations, a transformation like map can be lazy while each is only useful for side-effects and is never lazy. (It is not possible for each to be lazy because there is no resulting sequence to "observe" and force the evaluation later.)

Fastest data structure with default values for undefined indexes?

I'm trying to create a 2d array where, when I access an index, will return the value. However, if an undefined index is accessed, it calls a callback and fills the index with that value, and then returns the value.
The array will have negative indexes, too, but I can overcome that by using 4 arrays (one for each quadrant around 0,0).
You can create a Matrix class that relies on tuples and dictionary, with the following behavior :
from collections import namedtuple
2DMatrixEntry = namedtuple("2DMatrixEntry", "x", "y", "value")
matrix = new dict()
defaultValue = 0
# add entry at 0;1
matrix[2DMatrixEntry(0,1)] = 10.0
# get value at 0;1
key = 2DMatrixEntry(0,1)
value = {defaultValue,matrix[key]}[key in matrix]
Cheers
This question is probably too broad for stackoverflow. - There is not a generic "one size fits all" solution for this, and the results depend a lot on the language used (and standard library).
There are several problems in this question. First of all let us consider a 2d array, we say this is simply already part of the language and that such an array grows dynamically on access. If this isn't the case, the question becomes really language dependent.
Now often when allocating memory the language automatically initializes the spots (again language dependent on how this happens and what the best method is, look into RAII). Though I can foresee that actual calculation of the specific cell might be costly (compared to allocation). In that case an interesting thing might be so called "two-phase construction". The array has to be filled with tuples/objects. The default construction of an object sets a bit/boolean to false - indicating that the value is not ready. Then on acces (ie a get() method or a operator() - language dependent) if this bit is false it constructs, else it just reads.
Another method is to use a dictionary/key-value map. Where the key would be the coordinates and the value the value. This has the advantage that the problem of construct-on-access is inherit to the datastructure (though again language dependent). The drawback of using maps however is that lookup speed of a value changes from O(1) to O(logn). (The actual time is widely different depending on the language though).
At last I hope you understand that how to do this depends on more specific requirements, the language you used and other libraries. In the end there is only a single data structure that is in each language: a long sequence of unallocated values. Anything more advanced than that depends on the language.

Resources