Learning Ruby: Making infinite dimensional array via block abuse - ruby

Somebody tell me what's going on here:
a = [0,1,2]
a.each {|x| a[x] = a}
The result is [[...], [...], [...]]. And if I evaluate a[0] I get [[...], [...], [...]]. And if I evaluate a[0][0] I get [[...], [...], [...]] ad infinitum.
Have I created an array of infinite dimensionality? How/Why should this possibly work?

Basically you've modified every element in a to reference the list itself. The list is recursively referencing itself:
a[0] # => a
a[0][0] # => a[0], which is a
a[0][0][0] # => a[0][0], which is a[0], which is a
...
(# => is a Rubyism for "this line evaluates to")
Depending on how you look at it it is not infinite. It's more or less just like a piece of paper with the words "please turn over" written on both sides.
The reason that Ruby prints [...] is that it is clever enough to discover that the list is recursive, and avoids going into an infinite loop.
By the way, your usage of each is a bit non-idiomatic. each returns the list, and you usually don't assign this return value to a variable (since you already have a variable referencing it, a in this case). In other words, your code assigns [0,1,2] to a, then loops over a (setting each element to a), then assigns a to a.

I think it's a self-referential data structure. The a[x]=a puts a's pointer in a[x].

Related

Ruby map gives weird results

Consider the following code
a="123456789"
t=[[1,4],[3,4],[4,5],[1,2]]
p t.map{|x,y|
a[x],a[y]=a[y],a[x]
#p a
a
}
I know ruby map method collects the last expression of the given block but when using the above code to swap the chars in a using the indexes in t won't succeeds.My intention was to collect the state of a after each swap in the index of t.But map always gives the array of a which is in the last state ie)["135264789", "135264789", "135264789", "135264789"].
The results shows that the map method have collected the final result of a after completing each indexes in t.But when printing the a after each swap prints correct value of a at each state.
Is this the correct behavior or am i missing something?
This is because the String#[]= method mutates the string.
Quick fix would be something like this:
a="123456789"
t=[[1,4],[3,4],[4,5],[1,2]]
p t.map{|x,y]
b = "#{a}" # IMPORTANT - this builds a new string
b[x],b[y]=b[y],b[x] # this mutates the new string
#p b
b
}
An alternative to "#{a}" would be to say a.clone, it does the same thing in this case.
The reason this works, is because instead of directly modifying a with a[x],a[y]=a[y],a[x], you're making a temporary copy of a and modifying that instead
edit - I misread the question - if you want to show the result of chaining each operation on the previous result, use dup/clone after the modification as Stefan said in his answer
Is my understanding correct?
Yes, I believe it is. I second what Max says, and I'll also elaborate a bit in case it helps.
Each b is a newly created object because it gets created inside the block, so it gets recreated with every new iteration. The a is created outside the block, so the same object (a) keeps getting referenced inside the block for each iteration.
You can better understand how this works by experimenting with #object_id. Try running this code:
a="123456789"
t=[[1,4],[3,4],[4,5],[1,2]]
p t.map { |x,y|
b = "#{a}" # IMPORTANT - this builds a new string
b[x],b[y]=b[y],b[x]
p "a.object_id = #{a.object_id}"
p "b.object_id = #{b.object_id}"
b
}
You will notice that a is the same object for each iteration of the #map method, while b is a new one.
This is an example of the concept of a closure. A closure is some sort of enclosed code structure that retains access to whatever state is available in the context in which it was created, while that context doesn't have access to its, the enclosed code's, state. Sort of like a "one way mirror": the enclosed code can see outside, but the outside can't see into the enclosed code.
In Ruby, closures are implemented as blocks: blocks are closures. So, everything that is visible to whatever context a block is created in (in this case, main) is also visible to that block, although the reverse is not true — for example, you can't reference b from outside the block. (Methods are not closures: if your block were a method, it wouldn't be able to see a unless you passed it in as an argument to your method.)
So, as Max says, when you make changes to a inside your block, you are actually changing (mutating) the same a that you defined up top each time.
Side topic
Now, if you are referencing individual characters in strings it's important to understand that the underlying structure of strings differs from that of arrays. Also, arrays behave differently when you mutate their elements from strings when you mutate their characters.
I'm mentioning this because I have this vague feeling that you are thinking of string character references as pretty much analogous to array element references. This is pretty much only true with respect to syntax.
You may find the results of running this code interesting:
a = '123456789'
p a.object_id
p a[0].object_id
p a[1].object_id
a[0] = '7'
p a.object_id
p a[0].object_id
p a[1].object_id
puts
a = '123456789'.chars
p a.object_id
p a[0].object_id
p a[1].object_id
a[0] = '7'
p a.object_id
p a[0].object_id
p a[1].object_id
In particular, a comparison of the four outputs of a[1].object_id should be instructive, because it shows where strings and arrays differ. If you reassign an element in an array, that element and only that element gets a new object id. If you reassign a character in a string, the string object itself remains the same, but every character in the string gets recreated.
Since you are returning a from map, the result will contain a four times. Those a's all refer to the same object.
You probably want to return a copy of a to preserve its current state:
a = '123456789'
t = [[1, 4], [3, 4], [4, 5], [1, 2]]
r = t.map { |x, y|
a[x], a[y] = a[y], a[x]
a.dup
}
r #=> ["153426789", "153246789", "153264789", "135264789"]

How does Ruby Array #count handle multiple block arguments

When I execute the following:
[[1,1], [2,2], [3,4]].count {|a,b| a != b} # => 1
the block arguments a, b are assigned to the first and the second values of each inner array respectively. I don't understand how this is accomplished.
The only example given in the documentation for Array#count and Enumerable#count with a block uses a single block argument:
ary.count {|x| x % 2 == 0} # => 3
Just like assignments, there's a (not-so-) secret shortcut. If the right-hand-side is an array and the left-hand-side has multiple variables, the array is splatted, so the following two lines are identical:
a, b, c = [1, 2, 3]
a, b, c = *[1, 2, 3]
While not the same thing, blocks have something in the same vein, when the yielded value is an array, and there are multiple parameters. Thus, these two blocks will act the same when you yield [1, 2, 3]:
do |a, b, c|
...
end
do |(a, b, c)|
...
end
So, in your case, the value gets deconstructed, as if you wrote this:
[[1,1], [2,2], [3,4]].count {|(a,b)| a != b} # => 1
If you had another value that you are passing along with the array, you would have to specify the structure explicitly, as the deconstruction of the array would not be automatic in the way we want:
[[1,1], [2,2], [3,4]].each.with_index.count {|e,i| i + 1 == e[1] }
# automatic deconstruction of [[1,1],0]:
# e=[1,1]; i=0
[[1,1], [2,2], [3,4]].each.with_index.count {|(a,b),i| i + 1 == b }
# automatic deconstruction of [[1,1],0], explicit deconstruction of [1,1]:
# a=1; b=1; i=0
[[1,1], [2,2], [3,4]].each.with_index.count {|a,b,i| i + 1 == b }
# automatic deconstruction of [[1,1],0]
# a=[1,1]; b=0; i=nil
# NOT what we want
I have looked at the documentation for Array.count and Enumerable.count and the only example given with a block uses a single block argument ...
Ruby, like almost all mainstream programming languages, does not allow user code to change the fundamental semantics of the language. In other words, you won't find anything about block formal parameter binding semantics in the documentation of Array#count, because block formal parameter binding semantics are specified by the Ruby Language Specification and Array#count cannot possibly change that.
What I don't understand is how this is accomplished.
This has nothing to do with Array#count. This is just standard block formal parameter binding semantics for block formal parameters.
Formal parameter binding semantics for block formal parameters are different from formal parameter binding semantics for method formal parameters. In particular, they are much more flexible in how they handle mismatches between the number of formal parameters and actual arguments.
If there is exactly one block formal parameter and you yield more than one block actual argument, the block formal parameter gets bound to an Array containing the block actual arguments.
If there are more than one block formal parameters and you yield exactly one block actual argument, and that one actual argument is an Array, then the block formal parameters get bound to the individual elements of the Array. (This is what you are seeing in your example.)
If you yield more block actual arguments than the block has formal parameters, the extra actual arguments get ignored.
If you pass fewer actual arguments than the block has formal parameters, then those extra formal parameters are defined but not bound, and evaluate to nil (just like defined but unitialized local variables).
If you look closely, you can see that the formal parameter binding semantics for block formal parameters are much closer to assignment semantics, i.e. you can imagine an assignment with the block formal parameters on the left-hand side of the assignment operator and the block actual arguments on the right-hand side.
If you have a block defined like this:
{|a, b, c|}
and are yielding to it like this:
yield 1, 2, 3, 4
you can almost imagine the block formal parameter binding to work like this:
a, b, c = 1, 2, 3, 4
And if, as is the case in your question, you have a block defined like this:
{|a, b|}
and are yielding to it like this:
yield [1, 2]
you can almost imagine the block formal parameter binding to work like this:
a, b = [1, 2]
Which of course, as you well know, will have this result:
a #=> 1
b #=> 2
Fun fact: up to Ruby 1.8, block formal parameter binding was using actual assignment! You could, for example, define a constant, an instance variable, a class variable, a global variable, and even an attribute writer(!!!) as a formal parameter, and when you yielded to that block, Ruby would literally perform the assignment:
class Foo
def bar=(value)
puts "`#{__method__}` called with `#{value.inspect}`"
#bar = value
end
attr_reader :bar
end
def set_foo
yield 42
end
foo = Foo.new
set_foo {|foo.bar|}
# `bar=` called with `42`
foo.bar
#=> 42
Pretty crazy, huh?
The most widely-used application of these block formal parameter binding semantics is when using Hash#each (or any of the Enumerable methods with a Hash instance as the receiver). The Hash#each method yields a single two-element Array containing the key and the value as an actual argument to the block, but we almost always treat it as if it were yielding the key and value as separate actual arguments. Usually, we prefer writing
hsh.each do |k, v|
puts "The key is #{k} and the value is #{v}"
end
over
hsh.each do |key_value_pair|
k, v = key_value_pair
puts "The key is #{k} and the value is #{v}"
end
And that is exactly equivalent to what you are seeing in your question. I bet you have never asked yourself why you can pass a block with two block formal parameters to Hash#each even though it only yields a single Array? Well, this case is exactly the same. You are passing a block with two block formal parameters to a method that yields a single Array per iteration.

Why is `loop` not a keyword? What is the use case for `loop` without a block?

I was wondering why loop is a Kernel method rather than a keyword like while and until. There are cases where I want to do unconditional loop, but since loop, being a method, is slower than while true, I chose to do the latter when performance is important. But writing true here looks ugly, and is not Rubish; loop looks better. Here is a dilemma.
My guess is that it is because there is a usage of loop that does not take a block and returns an enumerator. But to me, it looks that an unconditional loop can easily be created on the spot, and does not make sense to create such an instance of Enumerator and later use it. I cannot think of a use case.
Is my guess regarding my wonder correct? If not, why is loop a method rather than a keyword?
What is the use case for an enumerator created by loop without a block?
Only Ruby's developers can answer your first question, but your guess seems reasonable. As to your second question, sure there are use cases. The whole point of Enumerables is that you can pass them around, which, as you know, you can't do with a while or for structure.
As a trivial example, here's a Fibonacci sequence method that takes an Enumerable as an argument:
def fib(enum)
a, b = nil, nil
enum.each do
a, b = b || 0, a ? a + b : 1
puts a
end
puts "DONE"
end
Now suppose you want to print out the first seven Fibonacci numbers. You can use any Enumerable that yields seven times, like the one returned by 7.times:
fib 7.times
# => 0
# 1
# 1
# 2
# 3
# 5
# 8
# DONE
But what if you want to print out Fibonacci numbers forever? Well, just pass it the Enumerable returned by loop:
fib loop
# => 0
# 1
# ...
# (Never stops)
Like I said, this is a silly example that clearly is a terrible way to generate Fibonacci numbers, but hopefully it helps you understand that there are times—albeit perhaps rarely—when it's useful to have an Enumerable that never ends, and why loop is a nice convenience for those cases.

Ruby: Range is empty, but slicing with it produces elements

I'm learning Ruby and have just gotten into some stuff about arrays and ranges. I've run into something about slices that, while it makes sense at first glance, confuses me a bit when I look deeper into it.
IRB says that (2..-1).to_a is an empty array, meaning no values in the range, right?
But if I use that same range in [:a, :b, :c, :d, :e][2..-1], I get back [:c, :d, :e] rather than an empty array.
Now, I'm aware that -1 represents the last element of the array, so it kind of makes sense that what got picked, did. But if the range itself would be empty, how is it selecting anything?
This is a fascinating question. The answer is that it's not the individual elements of the range that are inspected when slicing the array, but the first and last elements. Specifically:
>> (2..-1).to_a
=> []
>> (2..-1).first
=> 2
>> (2..-1).last
=> -1
Thus the example works, since it slices the array from the [2] element to the [-1] element.
If you want a consistent way to think about this, consider that (2..-1).to_a outputs the integers found between 2 and -1 (of which there are none), but that [2..-1] means from the 2 index to the -1 index.
(Source: array.c and range.c in the Ruby source.)
And, the complicated bonus part: to get the meaning you were thinking about, you could use
>> [:a, :b, :c, :d, :e].values_at *(2..-1).to_a
=> []
In the slicing operation it's not seeing the range as a Range per se, it's just looking at the values of the endpoints, in this case 2 and -1. Since -1 has a special meaning (i.e last item in the list) it just returns everything from item at index 2 to the end of the list. Don't think of it as a Range here, just think of it as a convenient way of passing in two numbers (expressing two end points).
The Array#[] method does not use the range as a range (that is, it does not call include? on it or anything like that); it just takes the two numbers out of it and checks whether it's exclusive.
You can imagine it to look somewhat like this (though the real [] is implemented in C, not in Ruby, and of course also handles arguments other than ranges):
def [](range)
start = range.begin
length = range.end - start
length -= 1 if range.exclude_end?
slice(start, length)
end

How can I perform an idiomatic non-recursive flatten in ruby?

I have a method that returns an array of arrays.
For convenience I use collect on a collection to gather them together.
arr = collection.collect {|item| item.get_array_of_arrays}
Now I would like to have a single array that contains all the arrays.
Of course I can loop over the array and use the + operator to do that.
newarr = []
arr.each {|item| newarr += item}
But this is kind of ugly, is there a better way?
There is a method for flattening an array in Ruby: Array#flatten:
newarr = arr.flatten(1)
From your description it actually looks like you don't care about arr anymore, so there is no need to keep the old value of arr around, we can just modify it:
arr.flatten!(1)
(There is a rule in Ruby that says that if you have two methods that do basically the same thing, but one does it in a somewhat surprising way, you name that method the same as the other method but with an exlamation point at the end. In this case, both methods flatten an array, but the version with the exclamation point does it by destroying the original array.)
However, while in this particular case there actually is a method which does exactly what you want, there is a more general principle at work in your code: you have a sequence of things and you iterate over it and try to "reduce" it down into a single thing. In this case, it is hard to see, because you start out with an array and you end up with an array. But by changing just a couple of small details in your code, it all of the sudden becomes blindingly obvious:
sum = 0
arr.each {|item| sum += item } # assume arr is an array of numbers
This is exactly the same pattern.
What you are trying to do is known as a catamorphism in category theory, a fold in mathematics, a reduce in functional programming, inject:into: in Smalltalk and is implemented by Enumerable#inject and its alias Enumerable#reduce (or in this case actually Array#inject and Array#reduce) in Ruby.
It is very easy to spot: whenever you initialize an accumulator variable outside of a loop and then assign to it or modify the object it references during every iteration of the loop, then you have a case for reduce.
In this particular case, your accumulator is newarr and the operation is adding an array to it.
So, your loop could be more idiomatically rewritten like this:
newarr = arr.reduce(:+)
An experienced Rubyist would of course see this right away. However, even a newbie would eventually get there, by following some simple refactoring steps, probably similar to this:
First, you realize that it actually is a fold:
newarr = arr.reduce([]) {|acc, el| acc += el }
Next, you realize that assigning to acc is completely unnecessary, because reduce overwrites the contents of acc anyway with the result value of each iteration:
newarr = arr.reduce([]) {|acc, el| acc + el }
Thirdly, there is no need to inject an empty array as the starting value for the first iteration, since all the elements of arr are already arrays anyway:
newarr = arr.reduce {|acc, el| acc + el }
This can, of course, be further simplified by using Symbol#to_proc:
newarr = arr.reduce(&:+)
And actually, we don't need Symbol#to_proc here, because reduce and inject already accept a symbol parameter for the operation:
newarr = arr.reduce(:+)
This really is a general pattern. If you remember the sum example above, it would look like this:
sum = arr.reduce(:+)
There is no change in the code, except for the variable name.
arr.inject([]) { |main, item| main += item }
I don't seem to understand the question fully... Is Array#flatten what you are looking for?
[[:a,:b], [1,2,3], 'foobar'].flatten
# => [:a, :b, 1, 2, 3, 'foobar']

Resources