When is a line of Ruby parsed? evaluated? executed? - ruby

I was a little surprised to discover that person is defined by the following line of code, even when params[:person_id] doesn't exist:
person = Person.find(params[:person_id]) if params[:person_id]
I kind of expected that Ruby would first check the if statement and only define person then. In practice it seems person is defined earlier than that but remains nil.
While investigating that I tried the following:
irb> foo
# NameError (undefined local variable or method `foo' for main:Object)
irb> if false
irb> foo = 'bar'
irb> end
irb> foo
# => nil
Initially foo is undefined. But then it gets defined, even though it's only referenced inside an if block that isn't evaluated.
I'm now guessing that the whole program gets parsed(?) and that a foo node is added to the Abstract Syntax Tree (i.e. defined). The program is then executed(?), but that particular line is skipped (not evaluated(?)) and so foo is nil (defined but not set to a value).
I'm not sure how to confirm or refute that hunch though. How does one go about learning and digging into the Ruby internals and finding out what happens in this particular scenario?

Answering my own question, Jay's answer to a similar question linked to a section of the docs where it is explained:
The local variable is created when the parser encounters the assignment, not when the assignment occurs
There is a deeper analysis of this in the Ruby Hacking Guide (no section links available, search or scroll to the "Local Variable Definitions" section):
By the way, it is defined when “it appears”, this means it is defined even though it was not assigned. The initial value of a defined [but not yet assigned] variable is nil.
That answers the initial question but not how to learn more.
Jay and simonwo both suggested Ruby Under a Microscope by Pat Shaughnessy which I am keen to read.
Additionally, the rest of the Ruby Hacking Guide covers a lot of detail and actually examines the underlying C code. The Objects and Parser chapters were particularly relevant to the original question about variable assignment (not so much the Variables and constants chapter, it simply refers you back to the Objects chapter).
I also found that a useful tool to see how the parser works is the Parser gem. Once it is installed (gem install parser) you can start to examine different bits of code to see what the parser is doing with them.
That gem also bundles the ruby-parse utility which lets you examine the way Ruby parses different snippets of code. The -E and -L options are most interesting to us and the -e option is necessary if we just want to process a fragment of Ruby such as foo = 'bar'. For example:
> ruby-parse -E -e "foo = 'bar'"
foo = 'bar'
^~~ tIDENTIFIER "foo" expr_cmdarg [0 <= cond] [0 <= cmdarg]
foo = 'bar'
^ tEQL "=" expr_beg [0 <= cond] [0 <= cmdarg]
foo = 'bar'
^~~~~ tSTRING "bar" expr_end [0 <= cond] [0 <= cmdarg]
foo = 'bar'
^ false "$eof" expr_end [0 <= cond] [0 <= cmdarg]
(lvasgn :foo
(str "bar"))
ruby-parse -L -e "foo = 'bar'"
s(:lvasgn, :foo,
s(:str, "bar"))
foo = 'bar'
~~~ name
~ operator
~~~~~~~~~~~ expression
s(:str, "bar")
foo = 'bar'
~ end
~ begin
~~~~~ expression
Both of the references linked to at the top highlight an edge case. The Ruby docs used the example p a if a = 0.zero? whlie the Ruby Hacking Guide used an equivalent example p(lvar) if lvar = true, both of which raise a NameError.
Sidenote: Remember = means assign, == means compare. The if foo = true construct in the edge case tells Ruby to check if the expression foo = true evaluates to true. In other words, it assigns the value true to foo and then checks if the result of that assignment is true (it will be). That's easily confused with the far more common if foo == true which simply checks whether foo compares equally to true. Because the two are so easily confused, Ruby will issue a warning if we use the assignment operator in a conditional: warning: found `= literal' in conditional, should be ==.
Using the ruby-parse utility let's compare the original example, foo = 'bar' if false, with that edge case, foo if foo = true:
> ruby-parse -L -e "foo = 'bar' if false"
s(:if,
s(:false),
s(:lvasgn, :foo,
s(:str, "bar")), nil)
foo = 'bar' if false
~~ keyword
~~~~~~~~~~~~~~~~~~~~ expression
s(:false)
foo = 'bar' if false
~~~~~ expression
s(:lvasgn, :foo,
s(:str, "bar"))
foo = 'bar' if false # Line 13
~~~ name # <-- `foo` is a name
~ operator
~~~~~~~~~~~ expression
s(:str, "bar")
foo = 'bar' if false
~ end
~ begin
~~~~~ expression
As you can see above on lines 13 and 14 of the output, in the original example foo is a name (that is, a variable).
> ruby-parse -L -e "foo if foo = true"
s(:if,
s(:lvasgn, :foo,
s(:true)),
s(:send, nil, :foo), nil)
foo if foo = true
~~ keyword
~~~~~~~~~~~~~~~~~ expression
s(:lvasgn, :foo,
s(:true))
foo if foo = true # Line 10
~~~ name # <-- `foo` is a name
~ operator
~~~~~~~~~~ expression
s(:true)
foo if foo = true
~~~~ expression
s(:send, nil, :foo)
foo if foo = true # Line 18
~~~ selector # <-- `foo` is a selector
~~~ expression
In the edge case example, the second foo is also a variable (lines 10 and 11), but when we look at lines 18 and 19 we see the first foo has been identified as a selector (that is, a method).
This shows that it is the parser that decides whether a thing is a method or a variable and that it parses the line in a different order to how it will later be evaluated.
Considering the edge case...
When the parser runs:
it first sees the whole line as a single expression
it then breaks it up into two expressions separated by the if keyword
the first expression foo starts with a lower case letter so it must be a method or a variable. It isn't an existing variable and it IS NOT followed by an assignment operator so the parser concludes it must be a method
the second expression foo = true is broken up as expression, operator, expression. Again, the expression foo also starts with a lower case letter so it must be a method or a variable. It isn't an existing variable but it IS followed by an assignment operator so the parser knows to add it to the list of local variables.
Later when the evaluator runs:
it will first assign true to foo
it will then execute the conditional and check whether the result of that assignment is true (in this case it is)
it will then call the foo method (which will raise a NameError, unless we handle it with method_missing).

Related

Void value expression error in return statement [duplicate]

This is just fine:
def foo
a or b
end
This is also fine:
def foo
return a || b
end
This returns void value expression:
def foo
return a or b
end
Why? It doesn't even get executed; it fails the syntax check. What does void value expression mean?
return a or b is interpreted as (return a) or b, and so the value of return a is necessary to calculate the value of (return a) or b, but since return never leaves a value in place (because it escapes from that position), it is not designed to return a valid value in the original position. And hence the whole expression is left with (some_void_value) or b, and is stuck. That is what it means.
In "Does `return` have precedence to certain operators in Ruby?" which I asked, Stefan explained in a comment that the or and and are actually control flow operators and should not be used as boolean operators (|| and && respectively).
He also referenced "Using “and” and “or” in Ruby":
and and or originate (like so much of Ruby) in Perl. In Perl, they were largely used to modify control flow, similar to the if and unless statement modifiers. (...)
They provide the following examples:
and
foo = 42 && foo / 2
This will be equivalent to:
foo = (42 && foo) / 2 # => NoMethodError: undefined method `/' for nil:NilClass
The goal is to assign a number to foo and reassign it with half of its value. Thus the and operator is useful here due to its low precedence, it modifies/controls what would be the normal flow of the individual expressions:
foo = 42 and foo / 2 # => 21
It can also be used as a reverse if statement in a loop:
next if widget = widgets.pop
Which is equivalent to:
widget = widgets.pop and next
or
useful for chaining expressions together
If the first expression fails, execute the second one and so on:
foo = get_foo() or raise "Could not find foo!"
It can also be used as a:
reversed unless statement modifier:
raise "Not ready!" unless ready_to_rock?
Which is equivalent to:
ready_to_rock? or raise "Not ready!"
Therefore as sawa explained the a or b expression in:
return a or b
Has lower precedence than return a which, when executed, escapes the current context and does not provide any value (void value). This then triggers the error (repl.it execution):
(repl):1: void value expression
puts return a or b
^~
This answer was made possible due to Stefan comments.
Simply because or has lower precedence than || which means return a will be executed before or b, or b is therefore unreachable

Why is `return a or b` a void value expression error in Ruby?

This is just fine:
def foo
a or b
end
This is also fine:
def foo
return a || b
end
This returns void value expression:
def foo
return a or b
end
Why? It doesn't even get executed; it fails the syntax check. What does void value expression mean?
return a or b is interpreted as (return a) or b, and so the value of return a is necessary to calculate the value of (return a) or b, but since return never leaves a value in place (because it escapes from that position), it is not designed to return a valid value in the original position. And hence the whole expression is left with (some_void_value) or b, and is stuck. That is what it means.
In "Does `return` have precedence to certain operators in Ruby?" which I asked, Stefan explained in a comment that the or and and are actually control flow operators and should not be used as boolean operators (|| and && respectively).
He also referenced "Using “and” and “or” in Ruby":
and and or originate (like so much of Ruby) in Perl. In Perl, they were largely used to modify control flow, similar to the if and unless statement modifiers. (...)
They provide the following examples:
and
foo = 42 && foo / 2
This will be equivalent to:
foo = (42 && foo) / 2 # => NoMethodError: undefined method `/' for nil:NilClass
The goal is to assign a number to foo and reassign it with half of its value. Thus the and operator is useful here due to its low precedence, it modifies/controls what would be the normal flow of the individual expressions:
foo = 42 and foo / 2 # => 21
It can also be used as a reverse if statement in a loop:
next if widget = widgets.pop
Which is equivalent to:
widget = widgets.pop and next
or
useful for chaining expressions together
If the first expression fails, execute the second one and so on:
foo = get_foo() or raise "Could not find foo!"
It can also be used as a:
reversed unless statement modifier:
raise "Not ready!" unless ready_to_rock?
Which is equivalent to:
ready_to_rock? or raise "Not ready!"
Therefore as sawa explained the a or b expression in:
return a or b
Has lower precedence than return a which, when executed, escapes the current context and does not provide any value (void value). This then triggers the error (repl.it execution):
(repl):1: void value expression
puts return a or b
^~
This answer was made possible due to Stefan comments.
Simply because or has lower precedence than || which means return a will be executed before or b, or b is therefore unreachable

Strange meaning of || and ||= in Ruby (2.0, 1.9.3, jruby 1.7.4)

Consider the following irb snippet from a freshly-started session:
irb:01> baz # => NameError, baz is not defined
irb:02> baz || baz = 0 # => NameError, baz is not defined
irb:03> baz # => nil
baz was an undefined variable and trying to evaluate it produced a NameError. Yet, somehow, after this operation, baz was defined, and has a value of nil. Seemingly, the value nil was assigned to the variable baz even though no one (explicitly) asked for it to be. Is there an underlying language reason why this behavior is desirable?
What is the rule that explains this behavior and other similarly confusing constructs, such as these:
irb:04> true if foo # => NameError
irb:05> foo # => NameError; name still undefined
irb:06> foo = (true if foo) # => nil
irb:07> foo # => nil; name defined as nil
irb:08> true || i = 0 || j = 2 # => i and j are nil; || appears nonlazy
irb:09> raise || quux = 1 # => RuntimeError, quux is nil
I don't know if it is desirable, but it comes from how Ruby parses the code. Whenever you have a piece of code that assigns a local variable, that local variable is assigned nil even if that piece of code is not evaluated. In your code line 2:
baz || baz = 0
the first baz returned an error because no such variable was assigned. Hence the assignment baz = 0 that follows it was not evaluated, but nevertheless it was parsed, so in the context to follow, a local variable baz was created, and is initialized to nil.
With your second code chunk, foo is not assigned during true if foo and foo. After that, foo = (true if foo) has an assignment to foo, so even though (true if foo) is evaluated before assigment of foo, an error is not raised in that line.

Position of "if" condition

I thought that:
do_something if condition
were equivalent to
if condition
do_something
end
I found a code that does not respect this rule.
if !(defined? foo)
foo = default_value
end
Here, foo takes default_value.
foo = default_value if !(defined? foo)
Here, foo takes nil. In the former code, I think if is executed first, and should be equivalent to:
foo = (default_value if !(defined? foo))
Is there any way to set to default_value if the variable is not defined?
General answer :
Some several comments want to use the ||= operator... Which will not work if foo is nil :
foo ||= default_value
will return the default value, while foo is defined.
I insist on using "not defined?", which is not equal to nil.
The Ruby way is
foo ||= default_value
But, of course
if (defined? foo)
foo = default_value
end
and
foo = default_value if !(defined? foo)
are different. You're not comparing the same thing.
In one you compare (defined? foo) and the other you compare !(defined? foo)
I think what you're really after is the following
if !(defined? foo)
foo = default_value
end
The two pieces of code are equivalent syntactically, but are different from the point of view of parsing. You are partially right that, "if is executed first", but that is only regarding syntax. Within parsing, the parsing order follows the linear order of the tokens. In Ruby, when you have an assignment:
foo = ...
then foo is assigned nil even if that portion of code is not syntactically evaluated, and that affects the result of defined?.
In order to write inline without having that problem, the way I do is to use and, or, &&, or ||:
defined?(foo) or foo = default_value

Using `defined?` at one-level expansion

The method defined? gives the result for the verbatim expression given as an argument. For example, the result of
defined? foo
is sensitive to whether foo is literally any defined variable/method. It does not make difference whether foo is defined as a string that is a valid (existing) expression:
foo = "Array"
or not:
foo = "NonExistingConstant"
Is it possible to make defined? be sensitive to the given argument expanded one level? That is, for foo = "Array", it should return "constant" and for foo = "NonExistingConstant", it should return nil. If so, how?
Since you need to check only constants:
['Array', 'NonExistentClass'].each do |name|
puts Object.const_defined?(name)
end
# >> true
# >> false

Resources