ANTLR: line number of the end of the code block given CommonTree - antlr3

Is there any way to find the line number where a code block ends
Example: for the following input
21) synchronized(Lock.class){
22) a.getAndIncrement(); //some code
23)
24) }
the corresponding AST is
synchronized
PARENTESIZED_EXPR
EXPR
.
Lock
class
BLOCK_SCOPE
EXPR
METHOD_CALL
.
g
getAndIncrement
ARGUMENT_LIST
for the above code given the CommonTree is there any way to retrieve the line number where the "synchronized" block ends. The output for the above code should be 24(as the synchronized block ends at line number 24).

Yes, through the following technique:
Make sure the } does not get omitted from the AST.
If you are using the rewrite operator ->, this means the } token needs to appear on the right hand side.
If you are using the AST operators ^ and !, this means you can't use the ! operator on your } token.
Find the CommonTree corresponding to the } token, and call getLine() on the token to get the line number.
Edit: Here is the current block rule in the grammar:
block
: LCURLY blockStatement* RCURLY
-> ^(BLOCK_SCOPE[$LCURLY, "BLOCK_SCOPE"] blockStatement*)
;
As you can see, the rewrite rule does not include the RCURLY token, so information about the position of the end of the block is omitted. The rule can be modified to include the token:
block
: LCURLY blockStatement* RCURLY
-> ^(BLOCK_SCOPE[$LCURLY, "BLOCK_SCOPE"] blockStatement* RCURLY)
;
Note that this requires updating the corresponding in the tree grammar as well.
block
: ^(BLOCK_SCOPE blockStatement* RCURLY)
;

Related

It is applied to too many arguments; maybe you forgot a `;'

I am trying to write a code that calculate the size of a list.
Here is what I've done:
let rec l = function
| [] -> 0
| t::q -> 1 + l q
print_int(l ([1;2;3;4]))
The problem is that it's saying me :
It is applied to too many arguments; maybe you forgot a `;'.
When I put the double semicolon ;; at the end of the definition of l it works well, yet I've read that ;; is not useful at all if you are not coding in the REPL, so here I don't see why it's giving me this error.
The following
print_int(l [1;2;3;4])
is a toplevel expression. Such expression needs to be preceded by ;;:
;; print_int(l [1;2;3;4])
Another option is to make this toplevel expression a binding with
let () = print_int(l [1;2;3;4])
When parsing the code the parser advances until it hits l q. At this point there could be more arguments that should get applied to the function l. So the parser keeps going and the next thing it finds is the value print_int. Another argument to l. Which gives you your error.
The parser has no way of knowing that you had finished the code for the function l. In the top level the special token ;; is used to tell the parser that the input is finished and it should evaluate the code now. After that it starts paring the remaining input again.
Now why doesn't compiled code also have the ';;' token?
Simply because its not needed. In compiled code the line print_int(l [1;2;3;4]) is not valid input. That would be a statement you want to execute and functional languages have no such thing. Instead print_int(l [1;2;3;4]) is an expression that returns a value, () in this case, and you have to tell the compiler what to do with that value. A let () = tells the compiler to match it against (). And the let ... also tells the compiler that the previous let rec l ... has finished. So no special ;; token is needed.
Or think of it this way: In the top level there is an implicit let _ = if your input doesn't start with let. That way you can just type in some expression and see what it evaluates to without having to type let _ = every time. The ';;' token still means "evaluate now" though and is still needed.

Is there any difference between the terms line of code and statement?

Is there any difference between the terms 'line of code' and 'statement' in programming languages?
Yes, check following:
int a =0;
while(a<100)
{
cout<<
"Is it ok"<<
a <<
"is the current value of 'a'"<<endl;
a++;
}
Above code snippet is having:
Line of code: 9
Simple Statements: 3
Compound Statements: 1
Line of code is basically how many end points you are using. A Statement is the group of code that you produce to create an expected output.
For example in a conditional statement using if else, you can right that in multiple line of core or using only one line via Ternary method.
Sample:
if($var == 0) {
echo "this is zero";
} else {
echo "not a zero";
}
that basically created 5 line of code. In ternary you can go like this
$var == 0 ? echo "this is zero" : echo "not a zero";
As you see the result is the same only it is created in 1 line of code.
Hope this helps you moving forward
Lines of code presumably refers to lines of content in a source file, i.e. the number of lines in a file. On the other hand, a given statement of code can, and often does, exceed a single line. For example, consider the following Java statement from the Javadoc for streams:
int sum = widgets.stream()
.filter(b -> b.getColor() == RED)
.mapToInt(b -> b.getWeight())
.sum();
One statement spans four physical lines in the editor. However, we could inline the whole thing into a single line.

Huge number of calls to ANTLR38BitConsume library function

I'm writing a lexer for ASN1 using ANTLR3 for C target, using the version 3.4 of the library. Actually the lexer is really slow, so I performed a perf record on the execution, and I found that the bottleneck is the library function ANTLR38BitConsume, which is pretty simple.
static void
antlr38BitConsume(pANTLR3_INT_STREAM is)
{
pANTLR3_INPUT_STREAM input;
input = ((pANTLR3_INPUT_STREAM) (is->super));
if ((pANTLR3_UINT8)(input->nextChar) < (((pANTLR3_UINT8)input->data) + input->sizeBuf))
{
/* Indicate one more character in this line
*/
input->charPositionInLine++;
if ((ANTLR3_UCHAR)(*((pANTLR3_UINT8)input->nextChar)) == input->newlineChar)
{
/* Reset for start of a new line of input
*/
input->line++;
input->charPositionInLine = 0;
input->currentLine = (void *)(((pANTLR3_UINT8)input->nextChar) + 1);
}
/* Increment to next character position
*/
input->nextChar = (void *)(((pANTLR3_UINT8)input->nextChar) + 1);
}
}
After an investigation, I have figured out that on a file with size 1.4KB, this function, which should be called at most one per bytes I think, it's called 24.5M times. Do you know if this is a known issue or if there is another explanation for this weird behaviour?
EDITED
After some trials, I have figured out the rules which cause this issue. I have some rules in order to recognize specific object identifiers, which are very simple:
OID1 : {counter==6}?=> 2 3 4 5 840 {counter=0;}
and a rule which matches every byte in the content field of ASN1 encoding:
VALUE : ({counter>0}?=> '\u0000'..'\u00FF' {counter--;})+
Since the lexer uses Longest Matching rule, OID1 will never be matched, because VALUE can match an arbitrary long value. Hence, in order to fool the lexer, I edit OID1 rule in this way:
OID1 : {counter==6}?=> 2 3 4 5 840 {counter=0;} VALUE?
The VALUE token at the end of the rule will never be matched, since that rule is active only when counter is greater than 0, but in this way the lexer will consider also OID1 in the matching, because it can be longer than VALUE.
However, this rule caused that huge number of calls to BitConsume function! As soon as I delete the VALUE? part, I got a number of calls precisely equal to the number of bytes of input file.
I think that this is an implementation bug of ANTLR, since it should try to match the input following OID1 with VALUE, but it should immediately stop and skip the token because counter is 0.

Why an empty string seems emitted instead of a custom node, sometimes, in a Treetop grammar?

I'd like your advice concerning a recurring problem concerning my usage of Treetop,that I cannot fix...from time to time. I'm probably missing something.
I suspect many of you have the right idiom or habits to solve that.
I generally use Treetop like the following :
I define my grammar in a .tt file
I modify it to emit custom parse tree objets
(that inherit Treetop::Runtime::SyntaxNode). These classes are
defined in a "parsetree.rb" file.
these custom objects have a
to_ast method to turn them recursively into "pure"
Treetop-independent classes (that constitute my final AST). I have
two separate modules (ParseTree & AST) for that.
However, I hit a classical error message, that I cannot generally fix :
parsetree.rb:380:in `to_ast': undefined method `to_ast' for SyntaxNode
offset=149, "":Treetop::Runtime::SyntaxNode (NoMethodError)
I am puzzled here because an empty string "" seems to be emitted instead of one of my custom nodes.
In this example, on this line 380 I have the following code ( it is about a finite state machine)
# in parsetree.rb
class Next < Tree
def to_ast
ret=Ldl::Ast::Next.new
ret.name=ns.to_ast
if cond
ret.condition=cond.c.to_ast
end
ret.actions=acts.to_ast # <==== line 380
ret
end
end
class NextActions < Tree
def to_ast
eqs.elements.collect{|eq| eq.to_ast}
end
end
And my piece of grammar concerned by the error is :
rule nextstate
space? 'next' space ns:identifier space? cond:('?' space? c:expression)? space
acts:next_actions? <Ldl::ParseTree::Next>
end
rule next_actions
space? eqs:equation+ space 'end' space <Ldl::ParseTree::NextActions>
end
Your problem is with the behaviour of optional expressions.
The acts:next_actions is optional. If this optional element is unmatched in the input, you don't get a NextActions node but an epsilon. You should detect that by saying something like:
ret.actions = acts.empty? ? [] : acts.to_ast
The same problem may occur because the sequence named by the tag cond: is optional. If this sequence doesn't exist in the input, then it has no content "c". In that case "cond" will still be defined and your "if" statement will be true, but cond.c.to_ast will fail.
Short answer: When you use an optional expression, you should tag it and test the tag for empty? before trying to use the content.

What are the precise rules for when you can omit parenthesis, dots, braces, = (functions), etc.?

What are the precise rules for when you can omit (omit) parentheses, dots, braces, = (functions), etc.?
For example,
(service.findAllPresentations.get.first.votes.size) must be equalTo(2).
service is my object
def findAllPresentations: Option[List[Presentation]]
votes returns List[Vote]
must and be are both functions of specs
Why can't I go:
(service findAllPresentations get first votes size) must be equalTo(2)
?
The compiler error is:
"RestServicesSpecTest.this.service.findAllPresentations
of type
Option[List[com.sharca.Presentation]]
does not take parameters"
Why does it think I'm trying to pass in a parameter? Why must I use dots for every method call?
Why must (service.findAllPresentations get first votes size) be equalTo(2) result in:
"not found: value first"
Yet, the "must be equalTo 2" of
(service.findAllPresentations.get.first.votes.size) must be equalTo 2, that is, method chaining works fine? - object chain chain chain param.
I've looked through the Scala book and website and can't really find a comprehensive explanation.
Is it in fact, as Rob H explains in Stack Overflow question Which characters can I omit in Scala?, that the only valid use-case for omitting the '.' is for "operand operator operand" style operations, and not for method chaining?
You seem to have stumbled upon the answer. Anyway, I'll try to make it clear.
You can omit dot when using the prefix, infix and postfix notations -- the so called operator notation. While using the operator notation, and only then, you can omit the parenthesis if there is less than two parameters passed to the method.
Now, the operator notation is a notation for method-call, which means it can't be used in the absence of the object which is being called.
I'll briefly detail the notations.
Prefix:
Only ~, !, + and - can be used in prefix notation. This is the notation you are using when you write !flag or val liability = -debt.
Infix:
That's the notation where the method appears between an object and it's parameters. The arithmetic operators all fit here.
Postfix (also suffix):
That notation is used when the method follows an object and receives no parameters. For example, you can write list tail, and that's postfix notation.
You can chain infix notation calls without problem, as long as no method is curried. For example, I like to use the following style:
(list
filter (...)
map (...)
mkString ", "
)
That's the same thing as:
list filter (...) map (...) mkString ", "
Now, why am I using parenthesis here, if filter and map take a single parameter? It's because I'm passing anonymous functions to them. I can't mix anonymous functions definitions with infix style because I need a boundary for the end of my anonymous function. Also, the parameter definition of the anonymous function might be interpreted as the last parameter to the infix method.
You can use infix with multiple parameters:
string substring (start, end) map (_ toInt) mkString ("<", ", ", ">")
Curried functions are hard to use with infix notation. The folding functions are a clear example of that:
(0 /: list) ((cnt, string) => cnt + string.size)
(list foldLeft 0) ((cnt, string) => cnt + string.size)
You need to use parenthesis outside the infix call. I'm not sure the exact rules at play here.
Now, let's talk about postfix. Postfix can be hard to use, because it can never be used anywhere except the end of an expression. For example, you can't do the following:
list tail map (...)
Because tail does not appear at the end of the expression. You can't do this either:
list tail length
You could use infix notation by using parenthesis to mark end of expressions:
(list tail) map (...)
(list tail) length
Note that postfix notation is discouraged because it may be unsafe.
I hope this has cleared all the doubts. If not, just drop a comment and I'll see what I can do to improve it.
Class definitions:
val or var can be omitted from class parameters which will make the parameter private.
Adding var or val will cause it to be public (that is, method accessors and mutators are generated).
{} can be omitted if the class has no body, that is,
class EmptyClass
Class instantiation:
Generic parameters can be omitted if they can be inferred by the compiler. However note, if your types don't match, then the type parameter is always infered so that it matches. So without specifying the type, you may not get what you expect - that is, given
class D[T](val x:T, val y:T);
This will give you a type error (Int found, expected String)
var zz = new D[String]("Hi1", 1) // type error
Whereas this works fine:
var z = new D("Hi1", 1)
== D{def x: Any; def y: Any}
Because the type parameter, T, is inferred as the least common supertype of the two - Any.
Function definitions:
= can be dropped if the function returns Unit (nothing).
{} for the function body can be dropped if the function is a single statement, but only if the statement returns a value (you need the = sign), that is,
def returnAString = "Hi!"
but this doesn't work:
def returnAString "Hi!" // Compile error - '=' expected but string literal found."
The return type of the function can be omitted if it can be inferred (a recursive method must have its return type specified).
() can be dropped if the function doesn't take any arguments, that is,
def endOfString {
return "myDog".substring(2,1)
}
which by convention is reserved for methods which have no side effects - more on that later.
() isn't actually dropped per se when defining a pass by name paramenter, but it is actually a quite semantically different notation, that is,
def myOp(passByNameString: => String)
Says myOp takes a pass-by-name parameter, which results in a String (that is, it can be a code block which returns a string) as opposed to function parameters,
def myOp(functionParam: () => String)
which says myOp takes a function which has zero parameters and returns a String.
(Mind you, pass-by-name parameters get compiled into functions; it just makes the syntax nicer.)
() can be dropped in the function parameter definition if the function only takes one argument, for example:
def myOp2(passByNameString:(Int) => String) { .. } // - You can drop the ()
def myOp2(passByNameString:Int => String) { .. }
But if it takes more than one argument, you must include the ():
def myOp2(passByNameString:(Int, String) => String) { .. }
Statements:
. can be dropped to use operator notation, which can only be used for infix operators (operators of methods that take arguments). See Daniel's answer for more information.
. can also be dropped for postfix functions
list tail
() can be dropped for postfix operators
list.tail
() cannot be used with methods defined as:
def aMethod = "hi!" // Missing () on method definition
aMethod // Works
aMethod() // Compile error when calling method
Because this notation is reserved by convention for methods that have no side effects, like List#tail (that is, the invocation of a function with no side effects means that the function has no observable effect, except for its return value).
() can be dropped for operator notation when passing in a single argument
() may be required to use postfix operators which aren't at the end of a statement
() may be required to designate nested statements, ends of anonymous functions or for operators which take more than one parameter
When calling a function which takes a function, you cannot omit the () from the inner function definition, for example:
def myOp3(paramFunc0:() => String) {
println(paramFunc0)
}
myOp3(() => "myop3") // Works
myOp3(=> "myop3") // Doesn't work
When calling a function that takes a by-name parameter, you cannot specify the argument as a parameter-less anonymous function. For example, given:
def myOp2(passByNameString:Int => String) {
println(passByNameString)
}
You must call it as:
myOp("myop3")
or
myOp({
val source = sourceProvider.source
val p = myObject.findNameFromSource(source)
p
})
but not:
myOp(() => "myop3") // Doesn't work
IMO, overuse of dropping return types can be harmful for code to be re-used. Just look at specification for a good example of reduced readability due to lack of explicit information in the code. The number of levels of indirection to actually figure out what the type of a variable is can be nuts. Hopefully better tools can avert this problem and keep our code concise.
(OK, in the quest to compile a more complete, concise answer (if I've missed anything, or gotten something wrong/inaccurate please comment), I have added to the beginning of the answer. Please note this isn't a language specification, so I'm not trying to make it exactly academically correct - just more like a reference card.)
A collection of quotes giving insight into the various conditions...
Personally, I thought there'd be more in the specification. I'm sure there must be, I'm just not searching for the right words...
There are a couple of sources however, and I've collected them together, but nothing really complete / comprehensive / understandable / that explains the above problems to me...:
"If a method body has more than one
expression, you must surround it with
curly braces {…}. You can omit the
braces if the method body has just one
expression."
From chapter 2, "Type Less, Do More", of Programming Scala:
"The body of the upper method comes
after the equals sign ‘=’. Why an
equals sign? Why not just curly braces
{…}, like in Java? Because semicolons,
function return types, method
arguments lists, and even the curly
braces are sometimes omitted, using an
equals sign prevents several possible
parsing ambiguities. Using an equals
sign also reminds us that even
functions are values in Scala, which
is consistent with Scala’s support of
functional programming, described in
more detail in Chapter 8, Functional
Programming in Scala."
From chapter 1, "Zero to Sixty: Introducing Scala", of Programming Scala:
"A function with no parameters can be
declared without parentheses, in which
case it must be called with no
parentheses. This provides support for
the Uniform Access Principle, such
that the caller does not know if the
symbol is a variable or a function
with no parameters.
The function body is preceded by "="
if it returns a value (i.e. the return
type is something other than Unit),
but the return type and the "=" can be
omitted when the type is Unit (i.e. it
looks like a procedure as opposed to a
function).
Braces around the body are not
required (if the body is a single
expression); more precisely, the body
of a function is just an expression,
and any expression with multiple parts
must be enclosed in braces (an
expression with one part may
optionally be enclosed in braces)."
"Functions with zero or one argument
can be called without the dot and
parentheses. But any expression can
have parentheses around it, so you can
omit the dot and still use
parentheses.
And since you can use braces anywhere
you can use parentheses, you can omit
the dot and put in braces, which can
contain multiple statements.
Functions with no arguments can be
called without the parentheses. For
example, the length() function on
String can be invoked as "abc".length
rather than "abc".length(). If the
function is a Scala function defined
without parentheses, then the function
must be called without parentheses.
By convention, functions with no
arguments that have side effects, such
as println, are called with
parentheses; those without side
effects are called without
parentheses."
From blog post Scala Syntax Primer:
"A procedure definition is a function
definition where the result type and
the equals sign are omitted; its
defining expression must be a block.
E.g., def f (ps) {stats} is
equivalent to def f (ps): Unit =
{stats}.
Example 4.6.3 Here is a declaration
and a de?nition of a procedure named
write:
trait Writer {
def write(str: String)
}
object Terminal extends Writer {
def write(str: String) { System.out.println(str) }
}
The code above is implicitly completed
to the following code:
trait Writer {
def write(str: String): Unit
}
object Terminal extends Writer {
def write(str: String): Unit = { System.out.println(str) }
}"
From the language specification:
"With methods which only take a single
parameter, Scala allows the developer
to replace the . with a space and omit
the parentheses, enabling the operator
syntax shown in our insertion operator
example. This syntax is used in other
places in the Scala API, such as
constructing Range instances:
val firstTen:Range = 0 to 9
Here again, to(Int) is a vanilla
method declared inside a class
(there’s actually some more implicit
type conversions here, but you get the
drift)."
From Scala for Java Refugees Part 6: Getting Over Java:
"Now, when you try "m 0", Scala
discards it being a unary operator, on
the grounds of not being a valid one
(~, !, - and +). It finds that "m" is
a valid object -- it is a function,
not a method, and all functions are
objects.
As "0" is not a valid Scala
identifier, it cannot be neither an
infix nor a postfix operator.
Therefore, Scala complains that it
expected ";" -- which would separate
two (almost) valid expressions: "m"
and "0". If you inserted it, then it
would complain that m requires either
an argument, or, failing that, a "_"
to turn it into a partially applied
function."
"I believe the operator syntax style
works only when you've got an explicit
object on the left-hand side. The
syntax is intended to let you express
"operand operator operand" style
operations in a natural way."
Which characters can I omit in Scala?
But what also confuses me is this quote:
"There needs to be an object to
receive a method call. For instance,
you cannot do “println “Hello World!”"
as the println needs an object
recipient. You can do “Console
println “Hello World!”" which
satisfies the need."
Because as far as I can see, there is an object to receive the call...
I find it easier to follow this rule of thumb: in expressions spaces alternate between methods and parameters. In your example, (service.findAllPresentations.get.first.votes.size) must be equalTo(2) parses as (service.findAllPresentations.get.first.votes.size).must(be)(equalTo(2)). Note that the parentheses around the 2 have a higher associativity than the spaces. Dots also have higher associativity, so (service.findAllPresentations.get.first.votes.size) must be.equalTo(2)would parse as (service.findAllPresentations.get.first.votes.size).must(be.equalTo(2)).
service findAllPresentations get first votes size must be equalTo 2 parses as service.findAllPresentations(get).first(votes).size(must).be(equalTo).2.
Actually, on second reading, maybe this is the key:
With methods which only take a single
parameter, Scala allows the developer
to replace the . with a space and omit
the parentheses
As mentioned on the blog post: http://www.codecommit.com/blog/scala/scala-for-java-refugees-part-6 .
So perhaps this is actually a very strict "syntax sugar" which only works where you are effectively calling a method, on an object, which takes one parameter. e.g.
1 + 2
1.+(2)
And nothing else.
This would explain my examples in the question.
But as I said, if someone could point out to be exactly where in the language spec this is specified, would be great appreciated.
Ok, some nice fellow (paulp_ from #scala) has pointed out where in the language spec this information is:
6.12.3:
Precedence and associativity of
operators determine the grouping of
parts of an expression as follows.
If there are several infix operations in an expression, then
operators with higher precedence bind
more closely than operators with lower
precedence.
If there are consecutive infix operations e0 op1 e1 op2 . . .opn en
with operators op1, . . . , opn of the
same precedence, then all these
operators must have the same
associativity. If all operators are
left-associative, the sequence is
interpreted as (. . . (e0 op1 e1) op2
. . .) opn en. Otherwise, if all
operators are rightassociative, the
sequence is interpreted as e0 op1 (e1
op2 (. . .opn en) . . .).
Postfix operators always have lower precedence than infix operators. E.g.
e1 op1 e2 op2 is always equivalent to
(e1 op1 e2) op2.
The right-hand operand of a
left-associative operator may consist
of several arguments enclosed in
parentheses, e.g. e op (e1, . . .
,en). This expression is then
interpreted as e.op(e1, . . . ,en).
A left-associative binary operation e1
op e2 is interpreted as e1.op(e2). If
op is rightassociative, the same
operation is interpreted as { val
x=e1; e2.op(x ) }, where x is a fresh
name.
Hmm - to me it doesn't mesh with what I'm seeing or I just don't understand it ;)
There aren't any. You will likely receive advice around whether or not the function has side-effects. This is bogus. The correction is to not use side-effects to the reasonable extent permitted by Scala. To the extent that it cannot, then all bets are off. All bets. Using parentheses is an element of the set "all" and is superfluous. It does not provide any value once all bets are off.
This advice is essentially an attempt at an effect system that fails (not to be confused with: is less useful than other effect systems).
Try not to side-effect. After that, accept that all bets are off. Hiding behind a de facto syntactic notation for an effect system can and does, only cause harm.

Resources