Strange behaviour of ANTLR3 - antlr3

Why does grammar presented in this answer https://stackoverflow.com/a/1932664/5613768 accept expression like this : 2(38) ?? I know why 12*(5-6) is accepted and why 12*(5-6 is not accepted but I can't explain this behaviour.

It doesn't accept the entire input. It stops parsing after the 2 because the eval rule:
eval
: additionExp
;
matches 2 as a additionExp and then stops since the rest of the input cannot be matched.
If you "anchor" the eval rule so that it must consume the entire token stream like this:
eval
: additionExp EOF
;
you will see an error on your console.

Related

Unexpected behavior with ANTLR3

I am experiencing an unexpected behavior with ANTLR3. This is my grammar:
grammar Onto;
****parser rules****
predicate
: VERB
;
****lexer rules****
VERB
: 'VB' WS
;
PREPOSITION
: 'TO' WS
;
WS
: (' ' | '\t' | '\r'| '\n')
;
When I parse the string "VB TO", ANTLR3 exits without flagging an error. This is unexpected because the given string does not match any rule in the grammar.
However when I retry the same after removing the PREPOSITION rule from the grammar, ANTLR3 flags the following error which is the expected result:
line 1:3 no viable alternative at character 'T'
line 1:4 no viable alternative at character 'O'
You made the classic mistake. Your main rule has no EOF at the end, so your parser currently also matches only a part of your input and sees that as valid. In your case it matches VERB and then expects nothing more. That PREPOSITION matches your "TO" input is part of the behavior as this returns the PREPOSIITON token to the parser. But since the parser is already happy with the VERB input it considers the parse done successfully.
Without the PREPOSITION lexer rule however, the lexer returns an error token as it cannot match that input. Which is what the error above is about.

How does : <<'END' work in bash to create a multi-line comment block?

I found a great answer for how to comment in bash script (by #sunny256):
#!/bin/bash
echo before comment
: <<'END'
bla bla
blurfl
END
echo after comment
The ' and ' around the END delimiter are important, otherwise things inside the block like for example $(command) will be parsed and executed.
This may be ugly, but it works and I'm keen to know what it means. Can anybody explain it simply? I did already find an explanation for : that it is no-op or true. But it does not make sense to me to call no-op or true anyway....
I'm afraid this explanation is less "simple" and more "thorough", but here we go.
The goal of a comment is to be text that is not interpreted or executed as code.
Originally, the UNIX shell did not have a comment syntax per se. It did, however, have the null command : (once an actual binary program on disk, /bin/:), which ignores its arguments and does nothing but indicate successful execution to the calling shell. Effectively, it's a synonym for true that looks like punctuation instead of a word, so you could put a line like this in your script:
: This is a comment
It's not quite a traditional comment; it's still an actual command that the shell executes. But since the command doesn't do anything, surely it's close enough: mission accomplished! Right?
The problem is that the line is still treated as a command beyond simply being run as one. Most importantly, lexical analysis - parameter substitution, word splitting, and such - still takes place on those destined-to-be-ignored arguments. Such processing means you run the risk of a syntax error in a "comment" crashing your whole script:
: Now let's see what happens next
echo "Hello, world!"
#=> hello.sh: line 1: unexpected EOF while looking for matching `''
That problem led to the introduction of a genuine comment syntax: the now-familiar # (which was first introduced in the C shell created at BSD). Everything from # to the end of the line is completely ignored by the shell, so you can put anything you like there without worrying about syntactic validity:
# Now let's see what happens next
echo "Hello, world!"
#=> Hello, world!
And that's How The Shell Got Its Comment Syntax.
However, you were looking for a multi-line (block) comment, of the sort introduced by /* (and terminated by */) in C or Java. Unfortunately, the shell simply does not have such a syntax. The normal way to comment out a block of consecutive lines - and the one I recommend - is simply to put a # in front of each one. But that is admittedly not a particularly "multi-line" approach.
Since the shell supports multi-line string-literals, you could just use : with such a string as an argument:
: 'So
this is all
a "comment"
'
But that has all the same problems as single-line :. You could also use backslashes at the end of each line to build a long command line with multiple arguments instead of one long string, but that's even more annoying than putting a # at the front, and more fragile since trailing whitespace breaks the line-continuation.
The solution you found uses what is called a here-document. The syntax some-command <<whatever causes the following lines of text - from the line immediately after the command, up to but not including the next line containing only the text whatever - to be read and fed as standard input to some-command. Here's an alternate shell implementation of "Hello, world" which takes advantage of this feature:
cat <<EOF
Hello, world
EOF
If you replace cat with our old friend :, you'll find that it ignores not only its arguments but also its input: you can feed whatever you want to it, and it will still do nothing (and still indicate that it did that nothing successfully).
However, the contents of a here-document do undergo string processing. So just as with the single-line : comment, the here-document version runs the risk of syntax errors inside what is not meant to be executable code:
#!/bin/sh -e
: <<EOF
(This is a backtick: `)
EOF
echo 'In modern shells, $(...) is preferred over backticks.'
#=> ./demo.sh: line 2: bad substitution: no closing "`" in `
The solution, as seen in the code you found, is to quote the end-of-document "sentinel" (the EOF or END or whatever) on the line introducing the here document (e.g. <<'EOF'). Doing this causes the entire body of the here-document to be treated as literal text - no parameter expansion or other processing occurs. Instead, the text is fed to the command unchanged, just as if it were being read from a file. So, other than a line consisting of nothing but the sentinel, the here-document can contain any characters at all:
#!/bin/sh -e
: <<'EOF'
(This is a backtick: `)
EOF
echo 'In modern shells, $(...) is preferred over backticks.'
#=> In modern shells, $(...) is preferred over backticks.
(It is worth noting that the way you quote the sentinel doesn't matter - you can use <<'EOF', <<E"OF", or even <<EO\F; all have the same result. This is different from the way here-documents work in some other languages, such as Perl and Ruby, where the content is treated differently depending on the way the sentinel is quoted.)
Notwithstanding any of the above, I strongly recommend that you instead just put a # at the front of each line you want to comment out. Any decent code editor will make that operation easy - even plain old vi - and the benefit is that nobody reading your code will have to spend energy figuring out what's going on with something that is, after all, intended to be documentation for their benefit.
It is called a Here Document. It is a code block that lets you send a list of commands to another command or program
The string following the << is the marker determining the end of the block. If you send commands to no-op, nothing happens, which is why you can use it as a comment block.
That's heredoc syntax. It's a way of defining multi-line string literals.
As the answer at your link explains, the single quotes around the END disables interpolation, similar to the way single-quoted strings disable interpolation in regular bash strings.

How Does Bash Tokenize Scripts?

Coming from a C++: it always seems like magic to me that some whitespace has an effect on the validity or semantics of the script. Here's an example:
echo a 2 > &1
bash: syntax error near unexpected token `&'
echo a 2 >&1
a 2
echo a 2>&1
a
echo a 2>& 1
a
Looking at this didn't help much. My main problem is that it does not feel consistent; and I am in a state of confusion.
I'm trying to find out how bash tokenizes its scripts. A general description thereof to clear up any confusion would be appreciated.
Edit:
I am NOT looking for redirections specifically. They just came up as example. Other examples:
A="something"
A = "something"
if [$x = $y];
if [ $x = $y ];
Why isn't there a space necessary between ] and ;? Why does assignment require an immediate equal sign? ...
2>&1 is a single operator token, so any whitespace that breaks it up will change the meaning of the command. It just happens to be a parameterized token, which means the shell will further tokenize it to determine what exactly the operator does. The general form is n>&m, where n is the file descriptor you are redirecting, and m is the descriptor you are copying to. In this case, you are saying that the standard error (2) of the command should be copied to whatever standard output (1) is currently open on.
The examples you gave have the behavior they do for good reason.
Redirection sources default to FD 1. Thus, >&1 is legitimate syntax on its own -- it redirects FD 1 to FD 1 -- meaning allowing whitespace before the > would result in an ambiguous syntax: The parser couldn't tell if the preceding token was its own word or a redirection source.
Nothing other than a FD number is valid under >&, unless you're in a very new bash which allows a variable to be dereferenced to retrieve a FD number. In any event, anything immediately following >& is known to be a file descriptor, so allowing optional whitespace creates no ambiguity there.
a = 1 is parsed as a legitimate command, not a syntax error: It runs the command a with the first argument = and the second argument 1. Disallowing whitespace within assignments eliminates this ambiguity. Similarly, a= foo has a separate and distinct meaning: It exports an environment variable a with an empty value while running the command foo. Relaxing the whitespace rules would disallow both of these legitimate commands.
[ is a command, not special syntax known to the parser; thus, [foo tries to find a command (named, say, /usr/bin/[foo), requiring whitespace.
; takes precedence in the parser as a statement separator, rather than being treated as part of a word, unless quoted or escaped. The same is true of & (another separator), or a newline.
The thing is, there's no single general rule which will explain all this; you need to read and learn the language syntax. Fortunately, there's not very much syntax: Almost all commands are "simple commands", which follow very simple and clear rules. You're asking about, and we're explaining, some of the exceptions to that; there are other exceptions, such as [[ ]] in bash, but they're small enough in total that they can be learned.
Other suggested resources:
http://aosabook.org/en/bash.html (The Architecture of Open Source Applications; chapter on bash)
http://mywiki.wooledge.org/BashParser (Wooledge wiki high-level description of the parser -- though this focuses more on expansion rules than tokenization)
http://mywiki.wooledge.org/BashGuide (an introductory guide to bash syntax in general, written with more of a focus on accuracy and best practices than some competing materials).

Kornshell variable definition: What is ?FOO?

I found a line of code in a kornshell script:
foo=`basename ?BAR?`
What does the question marks mean?
Thank you
touch BAR ABAR ABARZ
ls ?BAR?
ABARZ
? is normally a shell wildcard char that matches 1 character, and that 1 character position must be in use, as shown in the example above. It's like a 1-char version of '*', match 1 char (that must be there). Notice that if you change to
ls ?BAR*
You get output like
ABAR ABARZ
Your code shows the same behavior
foo=$(basename ?BAR?)
echo $foo
ABARZ
Does that make sense? Not really, but given the small context you have given the other possible interpretation is that the original script writer is using ?BAR? as a place-holder and telling you "change this to a real/meaningful value".
Other may have other ideas.
IHTH

confusing case of a bash completion script

I'm having trouble understanding what the following code does in a bash completion script:
case "$last" in
+\(--import|-i\))
_filedir '+(txt|html)';;
When is that case ever met? I thought the second line above would be something like
--import|-i)
which does make sense to me. I grepped my bash_completion.d directory for '+\\(' but that one was the only one that came up so I guess it's not that common.
This code is indeed puzzling without context. As it is, it matches two literal strings -
$ case "+(--import" in +\(--import|-i\)) echo match ;; esac
match
$ case "-i)" in +\(--import|-i\)) echo match ;; esac
match
It looks similar to the extended glob pattern +(--import|-i), but in this form it's neither a match for the literal pattern (would need to escape the pipe) nor the actual pattern (would need to unescape the parentheses). I'd guess "bug", but bash completion is a minefield of crazy metaprogramming, so it's impossible to say without seeing the entire script.
From bash(1)
If the extglob shell option is enabled using the shopt builtin,
several extended pattern matching operators are recognized. In the
following description, a pattern-list is a list of one or more
patterns separated by a |. Composite patterns may be formed using one
or more of the following sub-patterns:
[...]
+(pattern-list)
Matches one or more occurrences of the given patterns

Resources