Ruby forgets local variables during a while loop? - ruby

I'm processing a record-based text file: so I'm looking for a starting string which constitutes the start of a record: there is no end-of-record marker, so I use the start of the next record to delimit the last record.
So I have built a simple program to do this, but I see something which suprises me: it looks like Ruby is forgetting local variables exist - or have I found a programming error ? [although I don't think I have : if I define the variable 'message' before my loop I don't see the error].
Here's a simplified example with example input data and error message in comments:
flag=false
# message=nil # this is will prevent the issue.
while line=gets do
if line =~/hello/ then
if flag==true then
puts "#{message}"
end
message=StringIO.new(line);
puts message
flag=true
else
message << line
end
end
# Input File example:
# hello this is a record
# this is also part of the same record
# hello this is a new record
# this is still record 2
# hello this is record 3 etc etc
#
# Error when running: [nb, first iteration is fine]
# <StringIO:0x2e845ac>
# hello
# test.rb:5: undefined local variable or method `message' for main:Object (NameError)
#

From the Ruby Programming Language:
(source: google.com)
Blocks and Variable Scope
Blocks define a new variable scope: variables created within a block exist only within that block and are undefined outside of the block. Be cautious, however; the local variables in a method are available to any blocks within that method. So if a block assigns a value to a variable that is already defined outside of the block, this does not create a new block-local variable but instead assigns a new value to the already-existing variable. Sometimes, this is exactly the behavior we want:
total = 0
data.each {|x| total += x } # Sum the elements of the data array
puts total # Print out that sum
Sometimes, however, we do not want to alter variables in the enclosing scope, but we do so inadvertently. This problem is a particular concern for block parameters in Ruby 1.8. In Ruby 1.8, if a block parameter shares the name of an existing variable, then invocations of the block simply assign a value to that existing variable rather than creating a new block-local variable. The following code, for example, is problematic because it uses the same identifier i as the block parameter for two nested blocks:
1.upto(10) do |i| # 10 rows
1.upto(10) do |i| # Each has 10 columns
print "#{i} " # Print column number
end
print " ==> Row #{i}\n" # Try to print row number, but get column number
end
Ruby 1.9 is different: block parameters are always local to their block, and invocations of the block never assign values to existing variables. If Ruby 1.9 is invoked with the -w flag, it will warn you if a block parameter has the same name as an existing variable. This helps you avoid writing code that runs differently in 1.8 and 1.9.
Ruby 1.9 is different in another important way, too. Block syntax has been extended to allow you to declare block-local variables that are guaranteed to be local, even if a variable by the same name already exists in the enclosing scope. To do this, follow the list of block parameters with a semicolon and a comma-separated list of block local variables. Here is an example:
x = y = 0 # local variables
1.upto(4) do |x;y| # x and y are local to block
# x and y "shadow" the outer variables
y = x + 1 # Use y as a scratch variable
puts y*y # Prints 4, 9, 16, 25
end
[x,y] # => [0,0]: block does not alter these
In this code, x is a block parameter: it gets a value when the block is invoked with yield. y is a block-local variable. It does not receive any value from a yield invocation, but it has the value nil until the block actually assigns some other value to it. The point of declaring these block-local variables is to guarantee that you will not inadvertently clobber the value of some existing variable. (This might happen if a block is cut-and-pasted from one method to another, for example.) If you invoke Ruby 1.9 with the -w option, it will warn you if a block-local variable shadows an existing variable.
Blocks can have more than one parameter and more than one local variable, of course. Here is a block with two parameters and three local variables:
hash.each {|key,value; i,j,k| ... }

Contrary to some of the other answers, while loops don't actually create a new scope. The problem you're seeing is more subtle.
Prerequisite knowledge: a brief scoping demo
To help show the contrast, blocks passed to a method call DO create a new scope, such that a newly assigned local variable inside the block disappears after the block exits:
### block example - provided for contrast only ###
[0].each {|e| blockvar = e }
p blockvar # NameError: undefined local variable or method
But while loops (like your case) are different, because a variable defined in the loop will persist:
arr = [0]
while arr.any?
whilevar = arr.shift
end
p whilevar # prints 0
Summary of the "problem"
The reason you get the error in your case is because the line that uses message:
puts "#{message}"
appears before any code that assigns message.
It's the same reason this code raises an error if a wasn't defined beforehand:
# Note the single (not double) equal sign.
# At first glance it looks like this should print '1',
# because the 'a' is assigned before (time-wise) the puts.
puts a if a = 1
Not Scoping, but Parsing-visibility
The so-called "problem" - i.e. local-variable visibility within a single scope - is due to ruby's parser. Since we're only considering a single scope, scoping rules have nothing to do with the problem. At the parsing stage, the parser decides at which source locations a local variable is visible, and this visibility does not change during execution.
When determining if a local variable is defined (i.e. defined? returns true) at any point in the code, the parser checks the current scope to see if any code has assigned it before, even if that code has never run (the parser can't know anything about what has or hasn't run at the parsing stage). "Before" meaning: on a line above, or on the same line and to the left-hand side.
An exercise to determine if a local is defined (i.e. visible)
Note that the following only applies to local variables, not methods. (Determining whether a method is defined in a scope is more complex, because it involves searching included modules and ancestor classes.)
A concrete way to see the local variable behavior is to open your file in a text editor. Suppose also that by repeatedly pressing the left-arrow key, you can move your cursor backward through the entire file. Now suppose you're wondering whether a certain usage of message will raise the NameError. To do this, position your cursor at the place you're using message, then keep pressing left-arrow until you either:
reach the beginning of the current scope (you must understand ruby's scoping rules in order to know when this happens)
reach code that assigns message
If you've reached an assignment before reaching the scope boundary, that means your usage of message won't raise NameError. If you don't reach any assignment, the usage will raise NameError.
Other considerations
In the case a variable assignment appears in the code but isn't run, the variable is initialized to nil:
# a is not defined before this
if false
# never executed, but makes the binding defined/visible to the else case
a = 1
else
p a # prints nil
end
While loop test case
Here's a small test case to demonstrate the oddness of the above behavior when it happens in a while loop. The affected variable here is dest_arr.
arr = [0,1]
while n = arr.shift
p( n: n, dest_arr_defined: (defined? dest_arr) )
if n == 0
dest_arr = [n]
else
dest_arr << n
p( dest_arr: dest_arr )
end
end
which outputs:
{:n=>0, :dest_arr_defined=>nil}
{:n=>1, :dest_arr_defined=>nil}
{:dest_arr=>[0, 1]}
The salient points:
The first iteration is intuitive, dest_arr is initialized to [0].
But we need to pay close attention in the second iteration (when n is 1):
At the beginning, dest_arr is undefined!
But when the code reaches the else case, dest_arr is visible again, because the interpreter sees that it was assigned 2 lines above.
Notice also, that dest_arr is only hidden at the start of the loop; its value is never lost (as evidenced by the final contents being [0, 1]).
This also explains why assigning your local before the while loop fixes the problem. The assignment doesn't need to be executed; it only needs to appear in the source code.
Lambda example
f1 = ->{ f2 }
f2 = ->{ f1 }
p f2.call()
# The following fails because the body of f1 tries to access f2 before an assignment for f2 was seen by the parser.
p f1.call() # undefined local variable or method `f2'.
Fix this by putting an f2 assignment before f1's body. Remember, the assignment doesn't actually need to be executed!
f2 = nil # Could be replaced by: if false; f2 = nil; end
f1 = ->{ f2 }
f2 = ->{ f1 }
p f2.call()
p f1.call() # ok
Method-masking gotcha
Things get really hairy if you have a local variable with the same name as a method:
def dest_arr
:whoops
end
arr = [0,1]
while n = arr.shift
p( n: n, dest_arr: dest_arr )
if n == 0
dest_arr = [n]
else
dest_arr << n
p( dest_arr: dest_arr )
end
end
Outputs:
{:n=>0, :dest_arr=>:whoops}
{:n=>1, :dest_arr=>:whoops}
{:dest_arr=>[0, 1]}
A local variable assignment in a scope will "mask"/"shadow" a method call of the same name. (You can still call the method by using explicit parentheses or an explicit receiver.) So this is similar to the previous while loop test, except that instead of becoming undefined above the assignment code, the dest_arr method becomes "unmasked"/"unshadowed" so that the method is callable w/o parentheses. But any code after the assignment will see the local variable.
Some best-practices we can derive from all this
Don't name local variables the same as method names in the same scope
Don't put the initial assignment of a local variable inside the body of a while or for loop, or anything that causes execution to jump around within a scope (calling lambdas or Continuation#call can do this too). Put the assignment before the loop.

I think this is because message is defined inside the loop. At the end of the loop iteration "message" goes out of scope. Defining "message" outside of the loop stops the variable from going out of scope at the end of each loop iteration. So I think you have the right answer.
You could output the value of message at the beginning of each loop iteration to test whether my suggestion is correct.

Why do you think this is a bug? The interpreter is telling you that message may be undefined when that particular piece of code executes.

I'm not sure why you're surprised: on line 5 (assuming that the message = nil line isn't there), you potentially use a variable that the interpreter has never heard of before. The interpreter says "What's message? It's not a method I know, it's not a variable I know, it's not a keyword..." and then you get an error message.
Here's a simpler example to show you what I mean:
while line = gets do
if line =~ /./ then
puts message # How could this work?
message = line
end
end
Which gives:
telemachus ~ $ ruby test.rb < huh
test.rb:3:in `<main>': undefined local variable or method `message' for main:Object (NameError)
Also, if you want to prepare the way for message, I would initialize it as message = '', so that it's a string (rather than nil). Otherwise, if your first line doesn't match hello, you end up tring to add line to nil - which will get you this error:
telemachus ~ $ ruby test.rb < huh
test.rb:4:in `<main>': undefined method `<<' for nil:NilClass (NoMethodError)

you can simply do this :
message=''
while line=gets do
if line =~/hello/ then
# begin a new record
p message unless message == ''
message = String.new(line)
else
message << line
end
end
# hello this is a record
# this is also part of the same record
# hello this is a new record
# this is still record 2
# hello this is record 3 etc etc

Related

Loop resets variable

I am encountering the following unexpected behavior while running the following loop:
outside_var = 'myString'
loop do
inside_var ||= outside_var
result = SomeCalculation.do_something(inside_var)
inside_var = result[:new_inside_var_value]
end
Now, on the first iteration inside_var gets set to outside_var, which is the expected behavior. Just before the next iteration I set inside_var to something else (depending on the result I got from the calculation inside the loop). This assignment works (printing inside_var at the very bottom of the loop confirms that). On the next iteration, however, inside var goes back to the original state, which is something I didn't anticipate. Why is it doing that and how can I set this variable inside this loop?
I am running Ruby 2.6.5 with Rails 6.
This is a scoping issue. inside_var is scoped to the block. One might check the binding, it changes.
outside_var = 'myString'
2.times do
puts "INSIDE 1: #{defined?(inside_var).nil?} → #{binding}"
inside_var ||= outside_var
puts "INSIDE 2: #{inside_var}"
end
#⇒ INSIDE 1: true → #<Binding:0x000055a3936ee0b0>
# INSIDE 2: myString
# INSIDE 1: true → #<Binding:0x000055a3936edc50>
# INSIDE 2: myString
That said, every time the execution enters the block, the binding is reset, that’s why one should not expect the variables from another scope (with another binding) to exist.
When you do a new iteration inside the loop, you are going to reset everything. I suggest you to modify the var outside the loop to preserve the value inside. Something like this:
result_var = 'myString' # value on the first iteration
loop do
result = SomeCalculation.do_something(result_var)
result_var = result[:new_inside_var_value] # at the end of the first iteration you are already overriding this value
end

Variable scope in Ruby

Consider the following two snippets of ruby code.
puts "One"
if false
d = 1
end
puts "Two"
puts d
puts "Three"
This prints the following
One
Two
Three
Now, consider the following
[].each do |i|
flag = false
end
puts "Two"
puts flag
puts "Three"
This gives the following
Two
'<main>': undefined local variable or method 'flag' for main:Object (NameError)
Why is it that in the first case a blank is printed and the 2nd case an error is thrown ?
Thanks
The difference is that the if block isn't actually a separate scope like it is in other languages such as Java. Variables declared within the if block have the same scope as the surrounding environment. Now, in your case, that if block won't actually be executed, so you'd normally expect d to be undefined (resulting in the same error you got in the second example). But ruby is a little "smrt" in that the interpreter will set up a variable with that label the moment it sees it, regardless whether it is actually executed, because it essentially doesn't know just yet whether that branch will indeed execute. This is explained in "The Ruby Programming Language" by David Flanagan and Yukihiro Matsumoto (can't copy paste the text, adding screenshot instead):
In the case of the .each loop, that do...end you've written in is actually a block, and it does have its own local scope. In other words, variables declared within a block are local to that block only.
However, blocks "inherit" the scope of the environment in which they're declared, so what you can do is declare flag outside of the .each iteration block, and then the block will be able to access it and set its value. Note that in the example you've given, that won't happen because you're attempting to iterate an empty array, but at the very least you won't receive an error any more.
Some additional reading:
Block variable scope in ruby
Understanding scopes in ruby
In Ruby when you do an assignment to a variable (in your case it's d) anywhere in False-branch of If-statement it declares this variable unless method d= is defined. Basically b = bla-bla-bla in False-branch makes this: b = nil.
When you use block on an empty array nothing happens. And if an array is not empty variable still be local for current iteration of a block unless it was defined outside block scope, for example:
[1,2,3,4].each do |i|
a=i
end
puts a
NameError: undefined local variable or method `a' for main:Object
a=1
[1,2,3,4].each do |i|
a=i
end
puts a
4
Also you have an option to use a as a local one inside a block if it has been previously defined:
a=1
[1,2,3,4].each do |i; a|
a=i
end
puts a
1

Understanding Ruby Closures

I'm trying to better understand Ruby closures and I came across this example code which I don't quite understand:
def make_counter
n = 0
return Proc.new { n = n + 1 }
end
c = make_counter
puts c.call # => this outputs 1
puts c.call # => this outputs 2
Can someone help me understand what actually happens in the above code when I call c = make_counter? In my mind, here's what I think is happening:
Ruby calls the make_counter method and returns a Proc object where the code block associated with the Proc will be { n = 1 }. When the first c.call is executed, the Proc object will execute the block associated with it, and returns n = 1. However, when the second c.call is executed, doesn't the Proc object still execute the block associated with it, which is still { n = 1 }? I don't get why the output will change to 2.
Maybe I'm not understanding this at all, and it would be helpful if you could provide some clarification on what's actually happening within Ruby.
The block is not evaluated when make_counter is called. The block is evaluated and run when you call the Proc via c.call. So each time you run c.call, the expression n = n + 1 will be evaluated and run. The binding for the Proc will cause the n variable to remain in scope since it (the local n variable) was first declared outside the Proc closure. As such, n will keep incrementing on each iteration.
To clarify this further:
The block that defines a Proc (or lambda) is not evaluated at initialization - the code within is frozen exactly as you see it.
Ok, the code is actually 'evaluated', but not for the purpose of changing the frozen code. Rather, it is checked for any variables that are currently in scope that are being used within the context of the Proc's code block. Since n is a local variable (as it was defined the line before), and it is used within the Proc, it is captured within the binding and comes along for the ride.
When the call method is called on the Proc, it will execute the 'frozen' code within the context of that binding that had been captured. So the n that had been originally been assigned as 0, is incremented to 1. When called again, the same n will increment again to 2. And so on...
I always feel like to understand whats going on, its always important to revisit the basics. No one ever answered the question of what is a Proc in Ruby which to a newbie reading this post, that would be crucial and would help in answering this question.
At a high-level, procs are methods that can be stored inside variables.
Procs can also take a code block as its parameter, in this case it took n = n + 1. In other programming languages a block is called a closure. Blocks allow you to group statements together and encapsulate behavior.
There are two ways to create blocks in Ruby. The example you provide is using curly braces syntax.
So why use Procs if you can use methods to perform the same functionality?
The answer is that Procs give you more flexibility than methods. With Procs you can store an entire set of processes inside a variable and then call the variable anywhere else in your program.
In this case, Proc was written inside a method and then that method was stored inside a variable called c and then called with puts each time incrementing the value of n.
Similar to Procs, Lambdas also allow you to store functions inside a variable and call the method from other parts of a program.
This here:
return Proc.new { n = n + 1 }
Actually, returns a proc object which has a block associated with it. And Ruby creates a binding with blocks! So the execution context is stored for later use and hence why we can increment n. Let me go a bit further into explaining Ruby Closures, so you can have a more broader idea.
First, we need to clarify the technical term 'binding'. In Ruby, a binding object encapsulates the execution context at some particular scope in a program and retains this context for future use in the program. This execution context includes arguments passed to a method and any local variables defined in the method, any associated blocks, the return stack and the value of self. Take this example:
class SomeClass
def initialize
#ivar = 'instance variable'
end
def m(param)
lvar = 'local variable'
binding
end
end
b = SomeClass.new.m(100) { 'block executed' }
=> #<Binding:0x007fb354b7aca0>
eval "puts param", b
=> 100
eval "puts lvar", b
=> local variable
eval "puts yield", b
=> block executed
eval "puts self", b
=> #<SomeClass:0x007fb354ad82e8>
eval "puts #ivar", b
instance variable
The last statement might seem a little tricky but it's not. Remember binding holds execution context for later use. So when we invoke yield, it is invoking yield as if it was still in that execution context and hence it invokes the block.
It's interesting, you can even reassign the value of the local variables in the closure:
eval "lvar = 'changed in eval'", b
eval "puts lvar", b
=> changed in eval
Now this is all cute, but not so useful. Bindings are really useful as it pertains to blocks. Ruby associates a binding object with a block. So when you create a proc or a lambda, the resulting Proc object holds not just the executable block but also bindings for all the variables used by the block.
You already know that blocks can use local variables and method arguments that are defined outside the block. In the following code, for example, the block associated with the collect iterator uses the method argument n:
# multiply each element of the data array by n
def multiply(data, n)
data.collect {|x| x*n }
end
puts multiply([1,2,3], 2) # Prints 2,4,6
What is more interesting is that if the block were turned into a proc or lambda, it could access n even after the method to which it is an argument had returned. That's because there is a binding associated to the block of the lambda or proc object! The following code demonstrates:
# Return a lambda that retains or "closes over" the argument n
def multiplier(n)
lambda {|data| data.collect{|x| x*n } }
end
doubler = multiplier(2) # Get a lambda that knows how to double
puts doubler.call([1,2,3]) # Prints 2,4,6
The multiplier method returns a lambda. Because this lambda is used outside of the scope in which it is defined, we call it a closure; it encapsulates or “closes over” (or just retains) the binding for the method argument n.
It is important to understand that a closure does not just retain the value of the variables it refers to—it retains the actual variables and extends their lifetime. Another way to say this is that the variables used in a lambda or proc are not statically bound when the lambda or proc is created. Instead, the bindings are dynamic, and the values of the variables are looked up when the lambda or proc is executed.

Why does Ruby seem to hoist variable declarations from inside a case statement even if that code path is not executed? [duplicate]

This question already has answers here:
Why can I refer to a variable outside of an if/unless/case statement that never ran?
(3 answers)
Closed 5 years ago.
We define a function foo:
def foo(s)
case s
when'foo'
x = 3
puts x.inspect
when 'bar'
y = 4
puts y.inspect
end
puts x.inspect
puts y.inspect
end
We then call it as follows:
1.9.3p194 :017 > foo('foo')
in foo scope
3
in outer scope
3
nil
=> nil
1.9.3p194 :018 > foo('bar')
in bar scope
3
in outer scope
nil
3
=> nil
Why does the function not throw an error about an unregistered local variable in either case? In the first case, the variable y seems like it should not exist, so you can't call inspect on it in the outer scope; the same for x in the second case.
Here's another similar example:
def test1
x = 5 if false
puts x.inspect
end
def test2
puts x.inspect
end
And then:
1.9.3p194 :028 > test1
nil
=> nil
1.9.3p194 :029 > test2
NameError: undefined local variable or method `x' for main:Object
What's going on here? It seems like Ruby is hoisting the variable declaration into the outer scope, but I wasn't aware that this is something Ruby does. (Searching for "ruby hoisting" only turns up results about JavaScript hoisting.)
When the Ruby parser sees the sequence identifier, equal-sign, value,
as in this expression
x = 1
it allocates space for a local variable called x. The creation of the
variable—not the assignment of a value to it, but the internal
creation of a variable—always takes place as a result of this kind of
expression, even if the code isn’t executed! Consider this example:
if false
x = 1
end
p x # Output: nil
p y # Fatal Error: y is unknown
The assignment to x isn’t executed, because it’s wrapped in a failing
conditional test. But the Ruby parser sees the sequence x = 1, from
which it deduces that the program involves a local variable x. The
parser doesn’t care whether x is ever assigned a value. Its job is
just to scour the code for local variables for which space needs to
be allocated. The result is that x inhabits a strange kind of variable limbo.
It has been brought into being and initialized to nil.
In that respect, it differs from a variable that has no existence at
all; as you can see in the example, examining x gives you the value
nil, whereas trying to inspect the non-existent variable y results
in a fatal error. But although x exists, it has not played any role in
the program. It exists only as an artifact of the parsing process.
Well-Grounded Rubyist chapter 6.1.2
The ruby parser goes through every lines and set to nil all variable =. The code being actually executed or not does not matter.
See Why can I refer to a variable outside of an if/unless/case statement that never ran?

Can someone explain Ruby's use of pipe characters in a block?

Can someone explain to me Ruby's use of pipe characters in a block? I understand that it contains a variable name that will be assigned the data as it iterates. But what is this called? Can there be more than one variable inside the pipes? Anything else I should know about it? Any good links to more information on it?
For example:
25.times { | i | puts i }
Braces define an anonymous function, called a block. Tokens between the pipe are the arguments of this block. The number of arguments required depends on how the block is used. Each time the block is evaluated, the method requiring the block will pass a value based on the object calling it.
It's the same as defining a method, only it's not stored beyond the method that accepts a block.
For example:
def my_print(i)
puts i
end
will do the same as this when executed:
{|i| puts i}
the only difference is the block is defined on the fly and not stored.
Example 2:
The following statements are equivalent
25.times &method(:my_print)
25.times {|i| puts i}
We use anonymous blocks because the majority of functions passed as a block are usually specific to your situation and not worth defining for reuse.
So what happens when a method accepts a block? That depends on the method. Methods that accept a block will call it by passing values from their calling object in a well defined manner. What's returned depends on the method requiring the block.
For example: In 25.times {|i| puts i} .times calls the block once for each value between 0 and the value of its caller, passing the value into the block as the temporary variable i. Times returns the value of the calling object. In this case 25.
Let's look at method that accepts a block with two arguments.
{:key1 => "value1", :key2 => "value2"}.each {|key,value|
puts "This key is: #{key}. Its value is #{value}"
}
In this case each calls the block ones for each key/value pair passing the key as the first argument and the value as the second argument.
The pipes specify arguments that are populated with values by the function that calls your block. There can be zero or more of them, and how many you should use depends on the method you call.
For example, each_with_index uses two variables and puts the element in one of them and the index in the other.
here is a good description of how blocks and iterators work
Block arguments follow all the same conventions as method parameters (at least as of 1.9): you can define optional arguments, variable length arg lists, defaults, etc. Here's a pretty decent summary.
Some things to be aware of: because blocks see variables in the scope they were defined it, if you pass in an argument with the same name as an existing variable, it will "shadow" it - your block will see the passed in value and the original variable will be unchanged.
i = 10
25.times { | i | puts i }
puts i #=> prints '10'
Will print '10' at the end. Because sometimes this is desirable behavior even if you are not passing in a value (ie you want to make sure you don't accidentally clobber a variable from surrounding scope) you can specify block-local variable names after a semicolon after the argument list:
x = 'foo'
25.times { | i ; x | puts i; x = 'bar' }
puts x #=> prints 'foo'
Here, 'x' is local to the block, even though no value is passed in.

Resources