Antlr3 behaves differently in 2 different rules when the rule's intent is effectively same - antlr3

While working with Antlr3 grammar, I have come across a situation when the rule's intent is effectively same but behaves differently.
I have created a small example-
I want to parse a qualified object name which may be 3-part or 2-part or unqualified (Dot is the separator).
Test Input-
1. SCH.LIB.TAB1;
2. LIB.TAB1;
3. TAB1;
I changed the below rule from having optionals to having alternatives (ORed rules).
Before State-
qualified_object_name
:
( identifier ( ( DOT identifier )? DOT identifier )? )
;
After State-
qualified_object_name_new
:
( identifier DOT identifier DOT identifier ) // 3 part name
| ( identifier DOT identifier ) // 2 part name
| ( identifier ) // 1 part name
;
Input 1 is parsed correctly by both the rules, but the new rule gives error while parsing input 2 and 3.
line 1:22 no viable alternative at input ';'
I assumed that Antlr will try to match against alternative 1 of qualified_object_name_new, but when does not match alternative 1 fully, then would try to match alternative 2 and so on.
So, for input 'LIB.TAB1' it would finally match against alternative 2 of qualified_object_name_new.
However, it is not working this way and gives error while paring 2-part name or unqualified name.
Interestingly, when I set option k = 1, then all 3 inputs are parsed correctly by the new rule.
But with any other value of k, it gives error.
I want to understand why Antlr behaves this way and is this correct.

You probably have not increased the lookahead size (which is 1 by default in ANTLR3). With one token lookahead the new object name rule cannot resolve the ambiquity (all alts start with the same token). You should have gotten a warning about this too.
You have 3 options to solve this problem with ANTLR3 (though I also recommend to switch to version 4):
Enable backtracking (see the backgtrack option), though I'm not 100% sure if that really helps.
Increase lookahead size (see the k option).
Use syntactic predicates to force ANTLR to look ahead the entire alt.
For more details read the ANTLR3 documentation.

Related

My flow fails for no reason: Invalid Template Language? This is what I do

Team,
Occasionally my flow fails and its enough test it manually to running again. However, I want to avoid that this error ocurrs again to stay in calm.
The error that appears is this:
Unable to process template language expressions in action 'Periodo' inputs at line '0' and column '0': 'The template language function 'split' expects its first parameter to be of type string. The provided value is of type 'Null'. Please see https://aka.ms/logicexpressions#split for usage details.'.
And it appears in 2 of the 4 variables that I create:
Client and Periodo
The variable Clientlooks this:
The same scenario to "Periodo".
The variables are build in the same way:
His formula:
trim(first(split(first(skip(split(outputs('Compos'),'client = '),1)),'indicator')))
His formula:
trim(first(split(first(skip(split(outputs('Compos'),'period = '),1)),'DATA_REPORT_DELIVERY')))
The same scenario to the 4 variables. 4 of them strings (numbers).
Also I attached email example where I extract the info:
CO NIV ICE REFRESCOS DE SOYA has finished successfully.CO NIV ICE REFRESCOS DE SOYA
User
binary.struggle#mail.com
Parameters
output = 7
country = 170
period = 202204012
DATA_REPORT_DELIVERY = NO
read_persistance = YES
write_persistance = YES
client = 18277
indicator_group = SALES
Could you give some help? I reach some attepmpts succeded but it fails for no apparent reason:
Thank you.
I'm not sure if you're interested but I'd do it a slightly different way. It's a little more verbose but it will work and it makes your expressions a lot simpler.
I've just taken two of your desired outputs and provided a solution for those, one being client and the other being country. You can apply the other two as need be given it's the same pattern.
If I take client for example, this is the concept.
Initialize Data
This is your string that you provided in your question.
Initialize Split Lines
This will split up your string for each new line. The expression for this step is ...
split(variables('Data'), '\n')
However, you can't just enter that expression into the editor, you need to do it and then edit in in code view and change it from \\n to \n.
Filter For 'client'
This will filter the array created from the split line step and find the item that contains the word client.
`contains(item(), 'client')`
On the other parallel branches, you'd change out the word to whatever you're searching for, e.g. country.
This should give us a single item array with a string.
Initialize 'client'
Finally, we want to extract the value on the right hand side of the equals sign. The expression for this is ...
trim(split(body('Filter_For_''client''')[0], '=')[1])
Again, just change out the body name for the other action in each case.
I need to put body('Filter_For_''client''')[0] and specify the first item in an array because the filter step returns an array. We're going to assume the length is always 1.
Result
You can see from all of that, you have the value as need be. Like I said, it's a little more verbose but (I think) easier to follow and troubleshoot if something goes wrong.

What is the meaning of MAKEINTRESOURCE((id>>4)+1)?

I am trying to mimic the behavior of CString::LoadString(HINSTANCE hInst, DWORD id, WORD langID) without introducing a dependency on MFC into my app. So I walked through the source. The first thing it does is to immediately call AtlGetStringResourceImage(hInst, id, langID), and then this in turn contains the following line of code:
hResource = ::FindResourceExW(hInst, (LPWSTR)RT_STRING, MAKEINTRESOURCEW((id>>4)+1), langID);
(It's not verbatim like this, but I trimmed out some unimportant stuff).
What is the meaning of shifting the ID by 4 and adding 1? According to the documentation of FindResourceEx, you should pass in MAKEINTRESOURCE(id), and I can't find any example code that is manipulating the id before passing it to MAKEINTRESOURCE. At the same time, if I make my code call MAKEINTRESOURCE(id) then it doesn't work and FindResourceEx returns null, whereas if I use the above shift + add, then it does work.
Can anyone explain this?
From the STRINGTABLE resource documentation:
RC allocates 16 strings per section and uses the identifier value to determine which section is to contain the string. Strings whose identifiers differ only in the bottom 4 bits are placed in the same section.
The code you are curious about locates the section a given string identifier is stored in by ignoring the low 4 bits.

Getting First and Follow metadata from an ANTLR4 parser

Is it possible to extract the first and follow sets from a rule using ANTLR4? I played around with this a little bit in ANTLR3 and did not find a satisfactory solution, but if anyone has info for either version, it would be appreciated.
I would like to parse user input up the user's cursor location and then provide a list of possible choices for auto-completion. At the moment, I am not interested in auto-completing tokens which are partially entered. I want to display all possible following tokens at some point mid-parse.
For example:
sentence:
subjects verb (adverb)? '.' ;
subjects:
firstSubject (otherSubjects)* ;
firstSubject:
'The' (adjective)? noun ;
otherSubjects:
'and the' (adjective)? noun;
adjective:
'small' | 'orange' ;
noun:
CAT | DOG ;
verb:
'slept' | 'ate' | 'walked' ;
adverb:
'quietly' | 'noisily' ;
CAT : 'cat';
DOG : 'dog';
Given the grammar above...
If the user had not typed anything yet the auto-complete list would be ['The'] (Note that I would have to retrieve the FIRST and not the FOLLOW of rule sentence, since the follow of the base rule is always EOF).
If the input was "The", the auto-complete list would be ['small', 'orange', 'cat', 'dog'].
If the input was "The cat slept, the auto-complete list would be ['quietly', 'noisily', '.'].
So ANTLR3 provides a way to get the set of follows doing this:
BitSet followSet = state.following[state._fsp];
This works well. I can embed some logic into my parser so that when the parser calls the rule at which the user is positioned, it retrieves the follows of that rule and then provides them to the user. However, this does not work as well for nested rules (For instance, the base rule, because the follow set ignores and sub-rule follows, as it should).
I think I need to provide the FIRST set if the user has completed a rule (this could be hard to determine) as well as the FOLLOW set of to cover all valid options. I also think I will need to structure my grammar such that two tokens are never subsequent at the rule level.
I would have break the above "firstSubject" rule into some sub rules...
from
firstSubject:
'The'(adjective)? CAT | DOG;
to
firstSubject:
the (adjective)? CAT | DOG;
the:
'the';
I have yet to find any information on retrieving the FIRST set from a rule.
ANTLR4 appears to have drastically changed the way it works with follows at the level of the generated parser, so at this point I'm not really sure if I should continue with ANTLR3 or make the jump to ANTLR4.
Any suggestions would be greatly appreciated.
ANTLRWorks 2 (AW2) performs a similar operation, which I'll describe here. If you reference the source code for AW2, keep in mind that it is only released under an LGPL license.
Create a special token which represents the location of interest for code completion.
In some ways, this token behaves like the EOF. In particular, the ParserATNSimulator never consumes this token; a decision is always made at or before it is reached.
In other ways, this token is very unique. In particular, if the token is located at an identifier or keyword, it is treated as though the token type was "fuzzy", and allowed to match any identifier or keyword for the language. For ANTLR 4 grammars, if the caret token is located at a location where the user has typed g, the parser will allow that token to match a rule name or the keyword grammar.
Create a specialized ATN interpreter that can return all possible parse trees which lead to the caret token, without looking past the caret for any decision, and without constraining the exact token type of the caret token.
For each possible parse tree, evaluate your code completion in the context of whatever the caret token matched in a parser rule.
The union of all the results found in step 3 is a superset of the complete set of valid code completion results, and can be presented in the IDE.
The following describes AW2's implementation of the above steps.
In AW2, this is the CaretToken, and it always has the token type CARET_TOKEN_TYPE.
In AW2, this specialized operation is represented by the ForestParser<TParser> interface, with most of the reusable implementation in AbstractForestParser<TParser> and specialized for parsing ANTLR 4 grammars for code completion in GrammarForestParser.
In AW2, this analysis is performed primarily by GrammarCompletionQuery.TaskImpl.runImpl(BaseDocument).

Format statement with unknown columns

I am attempting to use fortran to write out a comma-delimited file for import into another commercial package. The issue is that I have an unknown number of data columns. My output needs to look like this:
a_string,a_float,a_different_float,float_array_elem1,float_array_elem2,...,float_array_elemn
which would result in something that might look like this:
L1080,546876.23,4325678.21,300.2,150.125,...,0.125
L1090,563245.1,2356345.21,27.1245,...,0.00983
I have three issues. One, I would prefer the elements to be tightly grouped (variable column width), two, I do not know how to define a variable number of array elements in the format statement, and three, the array elements can span a large range--maybe 12 orders of magnitude. The following code conceptually does what I want, but the variable 'n' and the lack of column-width definition throws an error (of course):
WRITE(50,900) linenames(ii),loc(ii,1:2),recon(ii,1:n)
900 FORMAT(A,',',F,',',F,n(',',F))
(I should note that n is fixed at run-time.) The write statement does what I want it to when I do WRITE(50,*), except that it's width-delimited.
I think this thread almost answered my question, but I got quite confused: SO. Right now I have a shell script with awk fixing the issue, but that solution is...inelegant. I could do some manipulation to make the output a string, and then just write it, but I would rather like to avoid that option if at all possible.
I'm doing this in Fortran 90 but I like to try to keep my code as backwards-compatible as possible.
the format close to what you want is f0.3, this will give no spaces and a fixed number of decimal places. I think if you want to also lop off trailing zeros you'll need to do a good bit of work.
The 'n' in your write statement can be larger than the number of data values, so one (old school) approach is to put a big number there, eg 100000. Modern fortran does have some syntax to specify indefinite repeat, i'm sure someone will offer that up.
----edit
the unlimited repeat is as you might guess an asterisk..and is evideltly "brand new" in f2008
In order to make sure that no space occurs between the entries in your line, you can write them separately in character variables and then print them out using theadjustl() function in fortran:
program csv
implicit none
integer, parameter :: dp = kind(1.0d0)
integer, parameter :: nn = 3
real(dp), parameter :: floatarray(nn) = [ -1.0_dp, -2.0_dp, -3.0_dp ]
integer :: ii
character(30) :: buffer(nn+2), myformat
! Create format string with appropriate number of fields.
write(myformat, "(A,I0,A)") "(A,", nn + 2, "(',',A))"
! You should execute the following lines in a loop for every line you want to output
write(buffer(1), "(F20.2)") 1.0_dp ! a_float
write(buffer(2), "(F20.2)") 2.0_dp ! a_different_float
do ii = 1, nn
write(buffer(2+ii), "(F20.3)") floatarray(ii)
end do
write(*, myformat) "a_string", (trim(adjustl(buffer(ii))), ii = 1, nn + 2)
end program csv
The demonstration above is only for one output line, but you can easily write a loop around the appropriate block to execute it for all your output lines. Also, you can choose different numerical format for the different entries, if you wish.

How to generate new unique name for context?

What is the best way to generate a name for some temporary context which is guaranteed to be unique (context with this name must not exist in the system)?
The following expression will generate a context name that is guaranteed not to conflict with any loaded context:
First#Contexts[] //.
c_ /; MemberQ[Contexts[], c] :>
"Context"~~ToString[RandomInteger[1000000]]~~"`"
It makes no attempt to account for contexts that are not yet loaded. As written, this expression could be used up to 1,000,000 times before running out of names. Adjust the fixed string ("Context") and name count (1000000) to suit your taste.
Update
As #Leonid points out in a comment, empty contexts will not be listed in Contexts[]. Therefore, it is strictly speaking possible that this expression could return the name of an existing empty context.
UUIDs
For all practical purposes, generating a name from a number randomly selected from a large enough range would work, e.g.
"Context"~~ToString[RandomInteger[2^128]]~~"`"
In a similar vein, one could use a UUID. UUIDs are routinely used as identifiers that are phenomenally likely to be unique across all network nodes as well:
Needs["JLink`"]
LoadJavaClass["java.util.UUID"]
"Context"~~
StringReplace[JavaBlock#java`util`UUID`randomUUID[]#toString[], "-" -> ""]~~
"`"
I can suggest a function I used here:
Clear[unique];
unique[sym_] :=
ToExpression[
ToString[Unique[sym]] <>
StringReplace[StringJoin[ToString /# Date[]], "." :> ""]];
You can replace the ToExpression by StringJoin[...,"`"] to tailor it to your needs.
Another option would be to look at all starting contexts (before the first backquote), find their string length and then generate a string (maybe random, but that isn't necessary) that is at least one character longer than the others. This is guaranteed to be unique, and there isn't even a theoretical possibility of a collision as in some of the other solutions.
sl = (StringSplit[#, "`"][[1]] & /# Contexts[] // StringLength // Max )
Out[349]= 30
In[353]:= "c" ~~ ToString[10^sl] ~~ "`"
Out[353]= "c1000000000000000000000000000000`"
A disadvantage of this method would be that the context names get longer after each repeated application of this method. ;-) If that's a problem we could create a unique name based on the set of longest context names using Cantor's diagonal procedure.
Is Unique perhaps what you're looking for?
This is really an example illustrating Alexey's response to Sjoerd's answer/question. From a fresh kernel on my machine, the following code
Begin["myContext3`"];
Unique["myContext"]
Yields "myContext3". Thus, clearly, Unique (the first thing I thought of) does not work.
Incidentally, I would have just added a comment to Sjoerd's response, but I don't know how to include the accent symbol used to denote a context inline. Does anyone here know how to do this?

Resources