Printing the incoming token in the antlr3 grammar? - antlr3

I am using antlr3. I have my parser and lexer files.I want to print the incoming token(As given by user input) in the parser file.I have tried input.LT(1).
It prints the input token(given by user) nicely. But then for future analysis I can't use this lookahead token. So is there any other commands or instruction that can print the incoming tokens?
(For example if my input is 1+2+; my token for '+' is 'PLUS', then I must print '+' not 'PLUS').

To get the actual text of the token ("+"), use token.getText().
To get the text representing the token name ("PLUS"), you'll have to ask the parser: <YourParserClass>.tokenNames[token.getType()].

Related

JMeter, How to assert customer ID generated randomly and contain special character using Jmeter

I have an API that contains a customer ID value, that is generated randomly and contains special character.
I need to assert this customer ID using Response Assertion, But due to the special characters , the comparison shows fail .
Example:
API body:
{Customer_ID : "rzrzlk#kad9$l11zr#zz9dr1"
}
My response assertion:
"data":[{"customer_id":"rzrzlk#kad9$l11zr#zz9dr1",
Result:
Assertion failed
I know I should user backslash , but when you dont know where to user it inside the values, it will be useless.
You should use a backslash only if you're escaping regular expressions meta characters
If you just need to ensure that response contains the given value you should use the following setup:
Contains and Matches modes expect a regular expression
Equals and Substring modes expect a plain text string
More information: How to Use JMeter Assertions in Three Easy Steps

"Shell Command Language" document - a counterintuitive sentence in "Token Recognition" section

In section 2.3 ("Token Recognition") of this document: https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/utilities/V3_chap02.html
the following sentence appears:
"If it is indicated that a token is delimited, and no characters have been included in a token, processing shall continue until an actual token is delimited"
What's this supposed to mean? How can a token be marked as delimited before any character has been included in it?
That sentence does appear ambiguous. Reworded it says "If a token is empty, processing continues until a non-empty token is given."
A token is 'marked as delimited' when its beginning and end are known. So from the last delimiter to the newly found delimiter.
A token can be empty if there are 2 delimiters next to each other.
For example, say the delimiter is comma.
fashl,gasdf,agasdf,aasdf,,ghask
Of the six tokens, the fifth token is empty. Its beginning and end are established by the fourth and fifth commas, therefore it is 'delimited'. But no characters were included in it.
However the sentence does go on to be ambiguous. 'Processing shall continue' does not specify what happens to the empty token. Or rather, it does not specify what logical action or path should be taken. So it could either mean:
If an empty token is found, continue to read until the next delimiter and consider that the token.
If an empty token is found, ignore the token and continue reading.
Though the end effect of that may not make a difference.

Parslet: How to Buffer/parse incrementally data

I'm writing an HTTP/1 response parser with parslet. It works, but only when I send the full payload.
I have smth like this:
rule(:response) {
response_line >> crlf >>
header.repeat.as(:headers) >> crlf >>
data.as(:data)
}
root :response
But if I pass an incomplete payload, I get:
parser.parse("HTTP/1.1 200 OK\r\n")
#=> Parslet::ParseFailed: Failed to match sequence (RESPONSE_LINE CRLF headers:(HEADER{0, }) CRLF data:DATA) at line 1 char 16.
I'd like to be able to feed bytes to the parser without failing, at least if they don't break the expectations. Is there a way to somehow "buffer" until some rule is broken, or all expectations are met?
Parslet matches a grammer against a whole document.
If you want to allow it to parse a partial document you need to define your grammar such that the missing parts are optional.
One approach may be to define a grammar that would match any one elements from the header, and define a 'the_rest' capture group that matches 'any.repeat'
Then you could recursively call the parser each time you get more document... with "the rest" plus anything more you have read in.
Each time you call it you would get one part of the header returned.

how to test my Grammar antlr4 successfully? [duplicate]

I have been starting to use ANTLR and have noticed that it is pretty fickle with its lexer rules. An extremely frustrating example is the following:
grammar output;
test: FILEPATH NEWLINE TITLE ;
FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ;
NEWLINE: '\r'? '\n' ;
TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;
This grammar will not match something like:
c:\test.txt
x
Oddly if I change TITLE to be TITLE: 'x' ; it still fails this time giving an error message saying "mismatched input 'x' expecting 'x'" which is highly confusing. Even more oddly if I replace the usage of TITLE in test with FILEPATH the whole thing works (although FILEPATH will match more than I am looking to match so in general it isn't a valid solution for me).
I am highly confused as to why ANTLR is giving such extremely strange errors and then suddenly working for no apparent reason when shuffling things around.
This seems to be a common misunderstanding of ANTLR:
Language Processing in ANTLR:
The Language Processing is done in two strictly separated phases:
Lexing, i.e. partitioning the text into tokens
Parsing, i.e. building a parse tree from the tokens
Since lexing must preceed parsing there is a consequence: The lexer is independent of the parser, the parser cannot influence lexing.
Lexing
Lexing in ANTLR works as following:
all rules with uppercase first character are lexer rules
the lexer starts at the beginning and tries to find a rule that matches best to the current input
a best match is a match that has maximum length, i.e. the token that results from appending the next input character to the maximum length match is not matched by any lexer rule
tokens are generated from matches:
if one rule matches the maximum length match the corresponding token is pushed into the token stream
if multiple rules match the maximum length match the first defined token in the grammar is pushed to the token stream
Example: What is wrong with your grammar
Your grammar has two rules that are critical:
FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ;
TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;
Each match, that is matched by TITLE will also be matched by FILEPATH. And FILEPATH is defined before TITLE: So each token that you expect to be a title would be a FILEPATH.
There are two hints for that:
keep your lexer rules disjunct (no token should match a superset of another).
if your tokens intentionally match the same strings, then put them into the right order (in your case this will be sufficient).
if you need a parser driven lexer you have to change to another parser generator: PEG-Parsers or GLR-Parsers will do that (but of course this can produce other problems).
This was not directly OP's problem, but for those who have the same error message, here is something you could check.
I had the same Mismatched Input 'x' expecting 'x' vague error message when I introduced a new keyword. The reason for me was that I had placed the new key word after my VARNAME lexer rule, which assigned it as a variable name instead of as the new keyword. I fixed it by putting the keywords before the VARNAME rule.

Programming idiom to parse a string in multiple-passes

I'm working on a Braille translation library, and I need to translate a string of text into braille. I plan to do this in multiple passes, but I need a way to keep track of which parts of the string have been translated and which have not, so I don't retranslate them.
I could always create a class which would track the ranges of positions in the string which had been processed, and then design my search/replace algorithm to ignore them on subsequent passes, but I'm wondering if there isn't a more elegant way to accomplish the same thing.
I would imagine that multi-pass string translation isn't all that uncommon, I'm just not sure what the options are for doing it.
A more usual approach would be to tokenize your input, then work on the tokens. For example, start by tokenizing the string into a token for each character. Then, in a first pass generate a straightforward braille mapping, token by token. In subsequent passes, you can replace more of the tokens - for example, by replacing sequences of input tokens with a single output token.
Because your tokens are objects or structs, rather than simple characters, you can attach additional information to each - such as the source token(s) you translated (or rather, transliterated) the current token from.
Check out some basic compiler theory..
Lexical Analysis
Parsing/Syntax Analysis

Resources