How can I predict how Ruby will parse things?
I came across a really surprising parsing error in Ruby while trying to concatenate strings.
> "every".capitalize +"thing"
=> NoMethodError: undefined method `+#' for "thing":String
Of course, if you put the extra space in their, it works as intended;
> "every".capitalize + "thing"
=> "Everything"
This error will occur if I have anything.any_method +"any string". What Ruby does is assume that we have elided parentheses, and are trying to give an argument to the method;
"every".capitalize( +"thing" )
It notices that we haven't defined the unary operator +# on strings, and throws that error.
My question is, what principles should I use to predict the behavior of the Ruby parser? I only figured this error out after a lot of googling. It's notable that .capitalize takes no parameters ever (not even in the C source code). If you use a method that doesn't apply to the previous object, it still throws the +# error instead of a undefined method 'capitalize' for "every":String error. So this parsing is obviously high-level. I'm not knowledgeable enough to read through Matz's parser.y. I've come across other similarly surprising errors. Can anyone tell me Ruby's parsing priority?
If you want to see how ruby is parsing your code, you can dump the parsetree, i.e.
ruby -e '"every".capitalize +"thing"' --dump parsetree
# # NODE_SCOPE (line: 1)
# +- nd_tbl: (empty)
# +- nd_args:
# | (null node)
# +- nd_body:
# # NODE_CALL (line: 1)
# +- nd_mid: :capitalize
# +- nd_recv:
# | # NODE_STR (line: 1)
# | +- nd_lit: "every"
# +- nd_args:
# # NODE_ARRAY (line: 1)
# +- nd_alen: 1
# +- nd_head:
# | # NODE_CALL (line: 1)
# | +- nd_mid: :+#
# | +- nd_recv:
# | | # NODE_STR (line: 1)
# | | +- nd_lit: "thing"
# | +- nd_args:
# | (null node)
# +- nd_next:
# (null node)
I like to use explainruby sometimes too, cause it's much easier on my eyes :)
Related
I am new to Ruby and a bit confused about how the ternary operator, ?:, works.
According to the book Engineering Software as a Service: An Agile Approach Using Cloud Computing:
every operation is a method call on some object and returns a value.
In this sense, if the ternary operator represents an operation, it is a method call on an object with two arguments. However, I can't find any method of which the ternary operator represents in Ruby's documentation. Does a ternary operator represent an operation in Ruby? Is the above claim made by the book mentioned wrong? Is the ternary operator in Ruby really just a syntactic sugar for if ... then ... else ... end statements?
Please note:
My question is related to How do I use the conditional operator (? :) in Ruby? but not the same as that one. I know how to use the ternary operator in the way described in that post. My question is about where ternary operator is defined in Ruby and if the ternary operator is defined as a method or methods.
Is the ternary operator in Ruby really just a syntactic sugar for if ... then ... else ... end statements?
Yes.
From doc/syntax/control_expressions.rdoc
You may also write a if-then-else expression using ? and :. This ternary if:
input_type = gets =~ /hello/i ? "greeting" : "other"
Is the same as this if expression:
input_type =
if gets =~ /hello/i
"greeting"
else
"other"
end
"According to this book, "every operation is a method call on some object and returns a value." In this sense, if the ternary operator represents an operation, it is a method call on an object with two arguments."
if, unless, while, and until are not operators, they are control structures. Their modifier versions appear in the operator precedence table because they need to have precedence in order to be parsed. They simply check if their condition is true or false. In Ruby this is simple, only false and nil are false. Everything else is true.
Operators are things like !, +, *, and []. They are unary or binary. You can see a list of them by calling .methods.sort on various objects. For example...
2.4.3 :004 > 1.methods.sort
=> [:!, :!=, :!~, :%, :&, :*, :**, :+, :+#, :-, :-#, :/, :<, :<<, :<=, :<=>, :==, :===, :=~, :>, :>=, :>>, :[], :^, :__id__, :__send__, etc...
Note that in Smalltalk, from which Ruby borrows heavily, everything really is a method call. Including the control structures.
Is the ternary operator in Ruby really just a syntactic sugar for if ... then ... else ... end statements?
(another) yes.
Here's the parse tree for a ? b : c:
$ ruby --dump=parsetree -e 'a ? b : c'
###########################################################
## Do NOT use this node dump for any purpose other than ##
## debug and research. Compatibility is not guaranteed. ##
###########################################################
# # NODE_SCOPE (line: 1)
# +- nd_tbl: (empty)
# +- nd_args:
# | (null node)
# +- nd_body:
# # NODE_PRELUDE (line: 1)
# +- nd_head:
# | (null node)
# +- nd_body:
# | # NODE_IF (line: 1)
# | +- nd_cond:
# | | # NODE_VCALL (line: 1)
# | | +- nd_mid: :a
# | +- nd_body:
# | | # NODE_VCALL (line: 1)
# | | +- nd_mid: :b
# | +- nd_else:
# | # NODE_VCALL (line: 1)
# | +- nd_mid: :c
# +- nd_compile_option:
# +- coverage_enabled: false
Here's the parse tree for if a then b else c end:
$ ruby --dump=parsetree -e 'if a then b else c end'
###########################################################
## Do NOT use this node dump for any purpose other than ##
## debug and research. Compatibility is not guaranteed. ##
###########################################################
# # NODE_SCOPE (line: 1)
# +- nd_tbl: (empty)
# +- nd_args:
# | (null node)
# +- nd_body:
# # NODE_PRELUDE (line: 1)
# +- nd_head:
# | (null node)
# +- nd_body:
# | # NODE_IF (line: 1)
# | +- nd_cond:
# | | # NODE_VCALL (line: 1)
# | | +- nd_mid: :a
# | +- nd_body:
# | | # NODE_VCALL (line: 1)
# | | +- nd_mid: :b
# | +- nd_else:
# | # NODE_VCALL (line: 1)
# | +- nd_mid: :c
# +- nd_compile_option:
# +- coverage_enabled: false
They are identical.
In many languages ?: is an expression whereas if-then-else is a statement. In Ruby, both are expressions.
Simple code which I can not explain to myself:
puts a if a = 1
This results in
warning: found = in conditional, should be ==
NameError: undefined local variable or method 'a' for main:Object
Though, now upon checking a we can see, that it has been defined:
a #=> 1
Why does a get assigned to 1 despite the exception thrown?
From the docs:
The confusion comes from the out-of-order execution of the expression.
First the local variable is assigned-to then you attempt to call a
nonexistent method [a].
This part is still confusing - why does interpreter not detecting already defined local variable a and still tries to call a "nonexisting" method? Should it not check for local variables as well, find defined local variable a and print 1?
Let's take a look at Ruby's abstract syntax tree for modifier if:
$ ruby --dump=parsetree -e 'puts a if a = 1'
# # NODE_SCOPE (line: 1, code_range: (1,0)-(1,15))
# +- nd_tbl: :a
# +- nd_args:
# | (null node)
# +- nd_body:
# # NODE_PRELUDE (line: 1, code_range: (1,0)-(1,15))
# +- nd_head:
# | (null node)
# +- nd_body:
# | # NODE_IF (line: 1, code_range: (1,0)-(1,15))
# | +- nd_cond:
# | | # NODE_DASGN_CURR (line: 1, code_range: (1,10)-(1,15))
# | | +- nd_vid: :a
# | | +- nd_value:
# | | # NODE_LIT (line: 1, code_range: (1,14)-(1,15))
# | | +- nd_lit: 1
# | +- nd_body:
# | | # NODE_FCALL (line: 1, code_range: (1,0)-(1,6))
# | | +- nd_mid: :puts
# | | +- nd_args:
# | | # NODE_ARRAY (line: 1, code_range: (1,5)-(1,6))
# | | +- nd_alen: 1
# | | +- nd_head:
# | | | # NODE_VCALL (line: 1, code_range: (1,5)-(1,6))
# | | | +- nd_mid: :a
# | | +- nd_next:
# | | (null node)
# | +- nd_else:
# | (null node)
# +- nd_compile_option:
# +- coverage_enabled: false
And for standard if:
$ ruby --dump=parsetree -e 'if a = 1 then puts a end'
# # NODE_SCOPE (line: 1, code_range: (1,0)-(1,24))
# +- nd_tbl: :a
# +- nd_args:
# | (null node)
# +- nd_body:
# # NODE_PRELUDE (line: 1, code_range: (1,0)-(1,24))
# +- nd_head:
# | (null node)
# +- nd_body:
# | # NODE_IF (line: 1, code_range: (1,0)-(1,24))
# | +- nd_cond:
# | | # NODE_DASGN_CURR (line: 1, code_range: (1,3)-(1,8))
# | | +- nd_vid: :a
# | | +- nd_value:
# | | # NODE_LIT (line: 1, code_range: (1,7)-(1,8))
# | | +- nd_lit: 1
# | +- nd_body:
# | | # NODE_FCALL (line: 1, code_range: (1,14)-(1,20))
# | | +- nd_mid: :puts
# | | +- nd_args:
# | | # NODE_ARRAY (line: 1, code_range: (1,19)-(1,20))
# | | +- nd_alen: 1
# | | +- nd_head:
# | | | # NODE_DVAR (line: 1, code_range: (1,19)-(1,20))
# | | | +- nd_vid: :a
# | | +- nd_next:
# | | (null node)
# | +- nd_else:
# | (null node)
# +- nd_compile_option:
# +- coverage_enabled: false
The only difference is the method argument for puts:
# | | | # NODE_VCALL (line: 1, code_range: (1,5)-(1,6))
# | | | +- nd_mid: :a
vs:
# | | | # NODE_DVAR (line: 1, code_range: (1,19)-(1,20))
# | | | +- nd_vid: :a
With modifier if, the parser treats a as a method call and creates a NODE_VCALL. This instructs the interpreter to make a method call (although there is a local variable a), resulting in a NameError. (because there is no method a)
With standard if, the parser treats a as a local variable and creates a NODE_DVAR. This instructs the interpreter to look up a local variable which works as expected.
As you can see, Ruby recognizes local variables at the parser level. That's why the documentation says: (emphasis added)
the modifier and standard versions [...] are not exact transformations of each other due to parse order.
Ruby parses code left-to-right. Local variables get defined when the first assignment to them is being parsed. At puts a, no assignment to a has been parsed yet, thus the local variable a doesn't exist yet, and Ruby assumes a is a method call. The local variable only exists to the right and below the assignment.
At runtime, Ruby has to evaluate the condition in order to figure out whether to execute the puts, so a gets initialized to 1.
You seem to be executing that code within some kind of REPL. Usually, REPLs rescue exceptions instead of terminating, which is why your code keeps executing instead of terminating, and since we are now below the assignment, the variable is defined, and since the assignment was executed, the variable is initialized.
If the distinction between definition and initialization of a variable is unclear to you, meditate on this:
foo
# NameError
if false
foo = 42
end
foo
#=> nil
foo = :bar
foo
#=> :bar
Say -1, is - parsed as part of the literal as a sign, or an operator to turn the value into its negative counterpart?
It is parsed as part of the literal, and makes the literal a negate literal.
Here's the reference in the parser source code. If you search the file for tUMINUS_NUM and tUMINUS you'll see where the - token is defined.
In addition to Simone Carletti's answer:
$ ruby --dump=parsetree -e "-1"
###########################################################
## Do NOT use this node dump for any purpose other than ##
## debug and research. Compatibility is not guaranteed. ##
###########################################################
# # NODE_SCOPE (line: 1)
# +- nd_tbl: (empty)
# +- nd_args:
# | (null node)
# +- nd_body:
# # NODE_LIT (line: 1)
# +- nd_lit: -1
As opposed to -(1), -+1 or - 1 (with a space in between) which invoke Fixnum#-#:
$ ruby --dump=parsetree -e "-(1)"
###########################################################
## Do NOT use this node dump for any purpose other than ##
## debug and research. Compatibility is not guaranteed. ##
###########################################################
# # NODE_SCOPE (line: 1)
# +- nd_tbl: (empty)
# +- nd_args:
# | (null node)
# +- nd_body:
# # NODE_CALL (line: 1)
# +- nd_mid: :-#
# +- nd_recv:
# | # NODE_LIT (line: 1)
# | +- nd_lit: 1
# +- nd_args:
# (null node)
In 10 Things You Didn't Know Ruby Could Do, slide 30, James Edward Gray II mentions
ruby -e 'puts { is_this_a_block }' --dump parsetree
which produces
###########################################################
## Do NOT use this node dump for any purpose other than ##
## debug and research. Compatibility is not guaranteed. ##
###########################################################
# # NODE_SCOPE (line: 1)
# +- nd_tbl: (empty)
# +- nd_args:
# | (null node)
# +- nd_body:
# # NODE_ITER (line: 1)
# +- nd_iter:
# | # NODE_FCALL (line: 1)
# | +- nd_mid: :puts
# | +- nd_args:
# | (null node)
# +- nd_body:
# # NODE_SCOPE (line: 1)
# +- nd_tbl: (empty)
# +- nd_args:
# | (null node)
# +- nd_body:
# # NODE_VCALL (line: 1)
# +- nd_mid: :is_this_a_block
Is the information outputted here available at runtime? If so, does the information represent merely what code has been written down, or does it also have the results of any metaprogramming that has been done?
Yep. You can use the Ripper gem (which is included out-of-the-box with MRI 1.9) to generate an AST (abstract syntax tree) for a given string of code (via Ripper.sexp). However, because of architectural changes in MRI 1.9, once your code is parsed and translated into YARV bytecode both the original source and the AST is dropped and you will no longer be able to get this information. However, if you throw in any code that you would have generated via metaprogramming into Ripper.sexp you can get the AST of the result. You can use also some of the other tricks shown in JEG2's talk to parse in the source file and generate an AST for it (although any metaprogrammed code will not be parsed as it does not exist yet).
I've just read here (http://ruby.runpaint.org/programs#lexical) that comments are tokens. I've never thought of comments as tokens as they're either annotations or for a post-processor.
Are comments really tokens or is this source wrong?
Yes, they should be tokens, but ignored by the parser later on. If you do ruby --dump parsetree foo.rb with a file that looks like this
# this is a comment
1+1
# another comment
this is what you'll get:
# # NODE_SCOPE (line: 3)
# +- nd_tbl: (empty)
# +- nd_args:
# | (null node)
# +- nd_body:
# # NODE_CALL (line: 2)
# +- nd_mid: :+
# +- nd_recv:
# | # NODE_LIT (line: 2)
# | +- nd_lit: 1
# +- nd_args:
# # NODE_ARRAY (line: 2)
# +- nd_alen: 1
# +- nd_head:
# | # NODE_LIT (line: 2)
# | +- nd_lit: 1
# +- nd_next:
# (null node)
Yeah they're tokens to the parser. Usually, if you use a parser generator this is the definition of a comment
{code} short_comment = '//' not_cr_lf* eol | '#' not_cr_lf* eol;
{code} long_comment = '/*' not_star* '*'+ (not_star_slash not_star* '*'+)* '/'; /* '4vim */
Ignored Tokens
short_comment,
long_comment;
This is a SableCC grammar. They're usually ignored tokens.
Remember that everything you write in a source code is a token, that's always the first step. The parser needs to start building the abstract syntax tree from tokens.