SystemStackError when array destructuring with splat operator - ruby

I have an application that gathers an (large-ish) amount of data into an array and appends it into an existing array. When I use the splat operator (to use with Array.push), I get a SystemStackError: stack level too deep message. 'Large' is in the range of 150k entries (each entry contains additional objects).
What is the preferred method to merge large arrays in Ruby?
gathered_info = function_that_returns_a_large_array_of_hashes()
dump.push(*gathered_info)

If you want to add a bunch of things to an array then the splat will need to expand those as individual arguments, each of which takes stack space. That's bad for large lists for reasons you've discovered.
You can always just use concat on the array directly:
dump.concat(gathered_info)
That's far less cumbersome.
You normally use a splat because there's no alternative that takes an array instead, but that's not the case here. concat does exactly what you need.

Related

Why must we call to_a on an enumerator object?

The chaining of each_slice and to_a confuses me. I know that each_slice is a member of Enumerable and therefore can be called on enumerable objects like arrays, and chars does return an array of characters.
I also know that each_slice will slice the array in groups of n elements, which is 2 in the below example. And if a block is not given to each_slice, then it returns an Enumerator object.
'186A08'.chars.each_slice(2).to_a
But why must we call to_a on the enumerator object if each_slice has already grouped the array by n elements? Why doesn't ruby just evaluate what the enumerator object is (which is a collection of n elements)?
The purpose of enumerators is lazy evaluation. When you call each_slice, you get back an enumerator object. This object does not calculate the entire grouped array up front. Instead, it calculates each “slice” as it is needed. This helps save on memory, and also allows you quite a bit of flexibility in your code.
This stack overflow post has a lot of information in it that you’ll find useful:
What is the purpose of the Enumerator class in Ruby
To give you a cut and dry answer to your question “Why must I call to_a when...”, the answer is, it hasn’t. It hasn’t yet looped through the array at all. So far it’s just defined an object that says that when it goes though the array, you’re going to want elements two at a time. You then have the freedom to either force it to do the calculation on all elements in the enumerable (by calling to_a), or you could alternatively use next or each to go through and then stop partway through (maybe calculate only half of them as opposed to calculating all of them and throwing the second half away).
It’s similar to how the Range class does not build up the list of elements in the range. (1..100000) doesn’t make an array of 100000 numbers, but instead defines an object with a min and max and certain operations can be performed on that. For example (1..100000).cover?(5) doesn’t build a massive array to see if that number is in there, but instead just sees if 5 is greater than or equal to 1 and less than or equal to 100000.
The purpose of this all is performance and flexibility.
It may be worth considering whether your implementation actually needs to make an array up front, or whether you can actually keep your RAM consumption down a bit by iterating over the enumerator. (If your real world scenario is as simple as you described, an enumerator won’t help much, but if the array actually is large, an enumerator could help you a lot).

Stream<double[]> vs DoubleStream

I have to convert double value array into stream
What is difference between following two approach? Which one is better ?
double [] dArray = {1.2,2.3,3.4,4.5};
Stream<double[]> usingStream = Stream.of(dArray); //approach 1
DoubleStream usingArrays = Arrays.stream(dArray); //approach 2
Obviously, Stream.of(dArray) gives you a Stream<double[]> whose single element is the input array, which is probably not what you want. You could use that approach if your input was a Double[] instead of a primitive array, since then you would have gotten a Stream<Double> of the elements of the array.
Therefore Arrays.stream(dArray) is the way to go when you need to transform an array of doubles to a stream of doubles.
Besides the fact that they are different?
DoubleStream can be thought as Stream<Double> (but as a primitive), while Stream<double[]> is a Stream of arrays.
Stream.of and Arrays.stream are entirely different things for different purposes and hence should not be compared.
Stream.of when passed a single dimensional array as in your example will yield a stream with a single element being the array itself which in majority of the cases is not what you want.
Arrays.stream, well as name suggests operates on arrays, whereas Stream.of is more general.
It would have been better and more entertaining had you asked what’s the difference between DoubleStream.of(dArray) and Arrays.stream(dArray).

Enumerator::Lazy and Garbage Collection

I am using Ruby's built in CSV parser against large files.
My approach is to separate the parsing with the rest of the logic. To achieve this I am creating an array of hashes. I also want to take advantage of Ruby's Enumerator:: Lazy to prevent loading the entire file in memory.
My question is, when I'm actually iterating through the array of hashes, does the Garbage collector actually clean things up as I go or will it only clean up when the entire array can be cleaned up, essentially still allowing the entire value in memory still?
I'm not asking if it will clean each element as I finish with it, only if it will clean it before the entire enum is actually evaluated.
When you iterate over a plain old array, the garbage collector has no chance to do anything.
You can help the garbage collector by writing nil into the array position after you no longer need the element, so that the object in this position may now be free for collection.
When you correctly use lazy enumerator, you are not iterate over an array of hashes. Instead you enumerate over the hashes, handling one after the other, and each one is read on demand.
So you have the chance to use much less memory (depending on your further processing, and that it does not hold the objects in memory anyway)
the structure may look like this:
enum = Enumerator.new do |yielder|
csv.read(...) do
...
yielder.yield hash
end
end
enum.lazy.map{|hash| do_something(hash); nil}.count
You also need to make sure that you are not generate the array again in the last step of the chain.

Iterate through Matlab axis array without extra variables?

The "for a=SomeArray" code template works well for iterating through arrays (e.g. number, characters, cells). It doesn't work for an array of axes e.g.,
faxes=get(gcf,'Children')
class(faxes)
for a=faxes
class(a)
size(a)
end
You bascially need to explicitly index into faxes using a counter. If that was not the case, you'd be able to avoid faxes and the indexing variable, leading to much simpler code. Is there some coding detail that I'm missing that prevents this?
I've posted this to:
Usenet
Stack Overflow
According to Usenet, the array faxes needs to be a row vector. Since get(gcf,'Children') is a column vector, it needs to be transposed:
faxes=get(gcf,'Children')
class(faxes)
for a=faxes'
class(a)
size(a)
end

Is map just a more powerful each? [duplicate]

This question already has answers here:
Can't all or most cases of `each` be replaced with `map`?
(4 answers)
What is the difference between map, each, and collect? [duplicate]
(2 answers)
Closed 8 years ago.
If you want a method that collects an array without modifying it, you can use map, and you'll have something that works the same as each. For example, you could do this:
array.each do |x|
x += 10
print "#{x}"
end
but you could just as easily do this:
array.map{|x| print (x + 10).to_s}
and it would have the exact same result. While each can only do that, map can alter its function using the !, so I don't see why I would use each anymore. Could you explain why I should ever use each instead of map if map seems more versatile?
No. Use each for side-effects; use map for a (side-effect free) transformation.
While they both iterate the enumerable (at some point1), map collects the transformed results which should be used. To say map is a more powerful each is like saying a method that returns an unused value is more powerful than a method does not return a value - it's not of being more powerful, it's about using the correct tool.
Thus, while map can "do" what each does (by evaluation of supplied block), it does more and is useful for a different task: when the transformation, and not the side-effect, is desired. It is often considered poor practice to perform side-effects in a map (excluding, perhaps, the mutation of the mapped objects).
1Furthermore, map and each are not strictly interchangeable. In lazy vs. eager situations, a transformation like map can be lazy while each is only useful for side-effects and is never lazy. (It is not possible for each to be lazy because there is no resulting sequence to "observe" and force the evaluation later.)

Resources