I need help defining some rules for a grammar in cups. The rules in question belong to the declaration block, which consists of the declaration of 0 or more constants, 0 or more type records, and 0 or more variables. An example of code to parser:
x: constant := True;
y: constant := 32
type Tpersona is record
dni: Integer;
edad : Integer;
casado : Boolean;
end record;
type Tfecha is record
dia: Integer;
mes : Integer;
anyo : Integer;
end record;
type Tcita is record
usuario:Tpersona;
fecha:Tfecha;
end record;
a: Integer;
x,y: Boolean;
x,y: Boolean;
x,y: Boolean;
The order between them must be respected, but any of them can not appear. This last property is what generates a shift/reduce conflict with the following rules.
declaration_block ::= const_block types_block var_block;
// Constant declaration
const_block ::= dec_const const_block | ;
dec_const ::= IDEN TWOPOINT CONSTANT ASSIGN const_values SEMICOLON;
//Types declaration
types_block ::= dec_type types_block | ;
dec_type ::= TYPE IDEN IS RECORD
reg_list
END RECORD SEMICOLON;
reg_list ::= dec_reg reg_list | dec_reg;
dec_reg ::= IDEN TWOPOINT valid_types SEMICOLON;
//Variable declaration
var_block ::= dec_var var_block | ;
dec_variable ::= iden_list TWOPOINT valid_types SEMICOLON;
iden_list ::= IDEN | IDEN COMMA iden_list;
// common use
const_values ::= INT | booleans;
booleans ::= TRUE | FALSE;
valid_types ::= primitive_types | IDEN;
primitive_types ::= INTEGER | BOOLEAN;
The idea is that any X_block can be empty. I understand the shift-reduce conflict, since when starting and receiving an identifier (IDEN), it doesn't know whether to reduce in const_block ::= <empty> and take IDEN as part of dec_variable, or to shift and take the IDEN token as part of const_block. If I remove the empty/epsilon production in const_block or in type_block, the conflict disappears, although the grammar would be incorrect because it would be an infinite list of constants and it would give a syntax error in the reserved word "type".
So I may have an ambiguity caused because both constants and variables can go at the beginning and start with "id:" and either block can appear first. How can I rewrite the rules to resolve the ambiguities and the shift/reduce conflict they cause?
I tried to do something like:
declaration_block ::= const_block types_block var_block | const_block types_block | const_block var_block | types_block var_block | types_block | var_decl | ;
but i have the same problem.
Other try is to create new_rules to identify if it is a constant or a variable... but the ambiguety of the empty rule in contant_block do not dissapear.
dec_const ::= start_const ASSIGN valor_constantes SEMICOLON;
start_const ::= IDEN TWOPOINT CONSTANT;
// dec_var ::= start_variables SEMICOLON;
// start_var ::= lista_iden TWOPOINT tipos_validos;
If I reduce the problem to something simpler, without taking into account types and only allowing one declaration of a constant or a variable, the fact that these blocks can be empty produces the problem:
dec_var ::= iden_list TWOPOINT valid_types SEMICOLON | ;
iden_list ::= IDEN | IDEN COMMA lista_iden;
I expect rewrite the rules some way to solve this conflict and dealing with similar problemns in the future.
Thanks so much
To start with, your grammar is not ambiguous. But it does have a shift-reduce conflict (in fact, two of them), which indicates that it cannot be parsed deterministically with only one lookahead token.
As it happens, you could solve the problem (more or less) by just increasing the lookahead, if you had a parser generator which allowed you to do that. However, such parser generators are pretty rare, and CUP isn't one of them. There are parser generators which allow arbitrary lookahead, either by backtracking (possibly with memoisation, such as ANTLR4), or by using an algorithm which allows multiple alternatives to be explored in parallel (GLR, for example). But I don't know of a parser generators which can produce a deterministic transition table which uses two lookahead tokens (which would suffice, in this case).
So the solution is to add some apparent redundancy to the grammar in order to factor out the cases which require more than one lookahead token.
The fundamental problem is the following set of possible inputs:
...; a : constant 3 ; ...
...; a : Integer ; ...
There's no ambiguity here whatsoever. The first one can only be a constant declaration; the second can only be variable declarations. But observe that we don't discover that fact until we see either the keyword constant (as in the first case), or a identifier which could be a type (as in the second case).
What that means is that we need to avoid forcing the parser to make any decision involving the a and the : until the next token is available. In particular, we cannot force it to decide whether the a is just an IDEN, or the first (or only) element in an iden_list.
iden_list is needed to parse
...; a , b : Integer ; ...
but that's not a problem since the , is a definite sign that we have a list. So the resolution has to include hamdling a : Integer without reducing a to an iden_list. And that requires an (apparently) redundant production:
var_block::=
| dec_var var_block
dec_var : iden_list ':' type ';'
| IDEN ':' type ';'
iden_list : IDEN ',' IDEN
| iden_list ',' IDEN
(Note: I changed valid_types to type because valid is redundant -- only valid syntaxes are parsed -- and because I think you should never use a plural name for a singular object; it confuses the reader.)
That's not quite enough, though, because we also need to avoid forcing the parser to decide whether the const_block needs to be reduced before the variable declaration. For that, we need something like the attempt you already made to remove the empty block definitions, and instead provide eight different declaration_block productions, one of each of the eight possible empty clauses. That will work fine, as long as you change the block definitions to be left-recursive rather than right-recursive. The right-recursive definition forces the parser to perform a reduction at the end of const_block, which means that it needs to know exactly where const_block ends with only one lookahead token.
On the whole, if you're going to use a bottom-up parser like CUP, you should make it a habit to use left-recursion unless you have a good reason not to (like defining a right-associative operator). There are a few exceptions, but on the whole left-recursion will produce fewer surprises, and in addition it will not burn through the parser stack on long inputs.
Making all those changes, we end up with something like this, where:
The block definitions were changed to left-recursive definitions with a non-empty base case;
ident_list was forced to have at least two elements, and a "redundant" production was added for the one-identifier case;
The start production was divided into eight possible combinations in order to allowed each of the three subclauses to be empty;
A few minor name changes were made.
declaration_block ::=
| var_block
| types_block
| types_block var_block
| const_block
| const_block var_block
| const_block types_block
| const_block types_block var_block
;
// Constant declaration
const_block ::= dec_const
| const_block dec_const ;
dec_const ::= IDEN TWOPOINT CONSTANT ASSIGN const_value SEMICOLON;
//Types declaration
types_block ::= dec_type
| types_block dec_type ;
dec_type ::= TYPE IDEN IS RECORD
reg_list
END RECORD SEMICOLON;
reg_list ::= dec_reg
| reg_list dec_reg;
dec_reg ::= IDEN TWOPOINT type SEMICOLON;
//Variable declaration
var_block ::= dec_var
| var_block dec_var;
dec_var : iden_list ':' type ';'
| IDEN ':' type ';' ;
iden_list : IDEN ',' IDEN
| iden_list ',' IDEN;
// common use
const_value ::= INT | boolean;
boolean ::= TRUE | FALSE;
type ::= primitive_type | IDEN;
primitive_type ::= INTEGER | BOOLEAN;
Related
I have the following BNFC code:
GFDefC. GoalForm ::= Constraint ;
GFDefT. GoalForm ::= True ;
GFDefA. GoalForm ::= GoalForm "," GoalForm ;
GFDefO. GoalForm ::= GoalForm ";" GoalForm ;
ConFr. Constraint ::= Var "#" Term ;
TVar. Term ::= UnVar;
TFun. Term ::= Fun ;
FDef. Fun ::= FunId "(" [Arg] ")" ;
ADecl. Arg ::= Term ;
separator Arg "," ;
...
However, the following is not parsed
fun(X)
while it parses the one below
x # fun(Y)
so to sum up, it parses the function as a part of constraints, but not individually.
It should parse both of them.
Could anyone point out why?
You should set your entrypoints properly.
As you're parsing x # fun(Y) successfully, I assume you have set your entrypoints to Constraint and using the generated pConstraint function to parse your expressions. Then, you can change your rules of Constraint to
ConNoVar. Constraint ::= Term ;
ConFr. Constraint ::= Var "#" Term ;
Aternatively, you can add Term to your entrypoints and invoke pTerm to parse your function terms.
I have a vendor-provided MIB file where the same object name/descriptor is defined in two different tables in the same MIB. Unfortunately, I think the MIB is proprietary and can't post it here in its entirety. So I've created a similar sample Foobar.mib file that I've included at the end of this post.
My question is: Is there any way such a MIB is legal or could be considered valid?
Net::SNMP can print the tree of it and it looks like this:
+--foobar(12345678)
|
+--foo(1)
| |
| +--fooTable(1)
| |
| +--fooEntry(1)
| | Index: fooIndex
| |
| +-- -R-- INTEGER fooIndex(1)
| +-- -R-- String commonName(2)
|
+--bar(2)
|
+--barTable(1)
|
+--barEntry(1)
| Index: barIndex
|
+-- -R-- INTEGER barIndex(1)
+-- -R-- String commonName(2)
Note now commonName is defined under both fooTable and barTable in the
very same MIB (see below in my sample Foobar.mib).
This confuses Net::SNMP, since FooBarMib::commonName can now mean two different OIDs.
It would be grand to include a link to an RFC in a bug report for the vendor.
I've found that RFC 1155 - Structure and identification of management information for TCP/IP-based internets says:
Each OBJECT DESCRIPTOR corresponding to an object type in the
internet-standard MIB shall be a unique, but mnemonic, printable
string. This promotes a common language for humans to use when
discussing the MIB and also facilitates simple table mappings for
user interfaces.
Does this only apply to "internet-standard MIB"s and hence not to vendor MIBs?
I've also found RFC 2578 - Structure of Management Information Version 2 (SMIv2) that says:
For all descriptors appearing in an information module, the descriptor shall be unique and mnemonic, and shall not exceed 64 characters in length.
But does a MIB for an SNMP v1 agent also have to adhere to RFC 2578? The SNMP agent
implementing the MIB only supports SNMP v1 for whatever reason. And the RFC
2578 has SMIv2 in the title, where the 2 worries me a little. However the MIB itself does import from SMIv2 FWIW.
I've found two internet references that say that object names / descriptors must be unique within a MIB, but without a source reference:
Andrew Komiagin in "SNMP OID with non-unique node names" here on SO says:
MIB Object names must be unique within entire MIB file.
and Dave Shield on the Net::SNMP mailing list says:
Within a given MIB module, all object names must be unique.
Both the objects defined within that MIB, and objects explicitly
IMPORTed. You can't have two objects with the same name,
both referenced in the same MIB.
I'd love to get a standards / RFC reference for either of those two equivalent statements.
Sample Foobar.mib
This defines commonName as both ::={ fooEntry 2 } and further down as ::={ barEntry 2 } also:
-- I've changed the MIB module name.
FooBarMib DEFINITIONS ::= BEGIN
IMPORTS sysName, sysLocation FROM SNMPv2-MIB;
IMPORTS enterprises, OBJECT-TYPE FROM SNMPv2-SMI;
-- I've provided a fake name and enterprise ID here
foobar OBJECT IDENTIFIER::= {enterprises 12345678}
foo OBJECT IDENTIFIER::={ foobar 1 }
fooTable OBJECT-TYPE
SYNTAX SEQUENCE OF FooEntry
MAX-ACCESS not-accessible
STATUS current
::={ foo 1 }
fooEntry OBJECT-TYPE
SYNTAX FooEntry
MAX-ACCESS not-accessible
STATUS current
INDEX { fooIndex }
::={ fooTable 1 }
FooEntry ::= SEQUENCE{
fooIndex INTEGER,
commonName OCTET STRING,
-- other leaves omitted
}
fooIndex OBJECT-TYPE
SYNTAX INTEGER
MAX-ACCESS read-only
STATUS current
::={ fooEntry 1 }
commonName OBJECT-TYPE
SYNTAX OCTET STRING
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Label for the commonEntry"
::={ fooEntry 2 }
bar OBJECT IDENTIFIER::={ foobar 2 }
barTable OBJECT-TYPE
SYNTAX SEQUENCE OF BarEntry
MAX-ACCESS not-accessible
STATUS current
::={ bar 1 }
barEntry OBJECT-TYPE
SYNTAX BarEntry
MAX-ACCESS not-accessible
STATUS current
INDEX { barIndex }
::={ barTable 1 }
BarEntry ::= SEQUENCE{
barIndex INTEGER,
commonName OCTET STRING,
-- other leaves omitted
}
barIndex OBJECT-TYPE
SYNTAX INTEGER
MAX-ACCESS read-only
STATUS current
::={ barEntry 1 }
commonName OBJECT-TYPE
SYNTAX OCTET STRING
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Label for the commonEntry"
::={ barEntry 2 }
END
Unfortunately, enterprises can do whatever they want. If they want to play nice, they are advised to adhere to the rules. Details at https://www.rfc-editor.org/rfc/rfc2578#section-3
I am trying to write some BNF (not EBNF) to describe the various elements of the following code fragment which is in no particular programming language but would be syntactically correct in VBA.
If Temperature <=0 Then
Description = "Freezing"
End If
So far I have come up with the BNF at the bottom of this post (I have not yet described string, number or identifier).
What perplexes me is the second line of code, Description = "Freezing", in which I am assigning a string literal to an identifier. How should I deal with this in my BNF?
I am tempted to simply adjust my definition of a factor like this...
<factor> ::= <identifier> | <number> | <string_literal> | (<expression)>
...after all, in VBA an arithmetic expression containing a string or a string variable would be syntactically correct and not picked up until run time. For example (4+3)*(6-"hi") would not be picked up as a syntax error. Is this the right approach?
Or should I leave the production for a factor as it is and redefine the assignment like this...?
<assignment> ::= <identifier> = <expression> | <identifier> = <string_literal>
I am not trying to define a whole language in my BNF, rather, I just want to cover most of the productions that describe the code fragment. Suggestions would be much appreciated.
BNF so far...
<string> ::= …
<number> ::= …
<identifier> ::= …
<assignment> ::= <identifier> = <expression>
<statements> ::= <statement> <statements>
<statement> ::= <assignment> | <if_statement> | <for_statement> | <while_statement> | …
<expression> ::= <expression> + <term> | <expression> - <term> | <term>
<term> ::= <term> * <factor> | <term> / <factor> | <factor>
<factor> ::= <identifier> | <number> | (<expression)>
<relational_operator> ::= < | > | <= | >= | =
<condition> ::= <expression> <relational_operator> <expression>
<if_statement> ::= If <condition> Then <statement>
| If <condition> Then <statements> End If
| If <condition> Then <statements> Else <statements> End If
Consider the code sample:
X = "hi"
Y = 6 - X
The 6 - X expression is an error, but you can't make it a syntax error using just a context-free grammar. Similarly for:
If Temperature <= X Then ...
Instead of catching such type errors via the grammar, you'll have to catch them later, either statically or dynamically. And given that you have to do that analysis anyway, there's not much point trying to catch any type errors (express any type constraints) in the grammar.
So go with your first solution, adding <string_literal> to <factor>.
While you don't provide any details about your language, it seems reasonable to believe that a language which has string literals and string variables also has some operations on strings, at least function calls taking strings as arguments and probably certain operators. (In VB, as I understand it, both + and & function as string concatenation operators.)
In that case, assignment to a string variable is not limited to assigning a string literal, and the grammar would be expected to allow expressions including string literals.
It is always tempting to attempt to enforce type coherency in a grammar, on the basis that some type errors (such as 6 - "hi") can be detected immediately. But there are many other very similar errors (6 - HiStringVariable) which cannot be detected until type deduction (or even until runtime, for dynamic languages). The contortions necessary to do partial type checking during the parse are almost never worth the trouble.
I'm trying to figure out a grammar rule(s) for any mathematical expression.
I'm using EBNF (wiki article linked below) for deriving syntax rules.
I've managed to come up with one that worked for a while, but the grammar rule fails with onScreenTime + (((count) - 1) * 0.9).
The rule is as follows:
math ::= MINUS? LPAREN math RPAREN
| mathOperand (mathRhs)+
mathRhs ::= mathOperator mathRhsGroup
| mathOperator mathOperand mathRhs?
mathRhsGroup ::= MINUS? LPAREN mathOperand (mathRhs | (mathOperator mathOperand))+ RPAREN
You can safely assume mathOperand are positive or negative numbers, or variables.
You can also assume mathOperator denotes any mathematical operator like + or -.
Also, LPAREN and RPAREN are '(' and ')' respectively.
EBNF:
https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form
EDIT
Forgot to mention that it fails on (count) - 1. It says RPAREN expected instead of - 1.
EDIT 2 My revised EBNF now looks like this:
number ::= NUMBER_LITERAL //positive integer
mathExp ::= term_ ((PLUS | MINUS) term_)* // * is zero-or-more.
private term_ ::= factor_ ((ASTERISK | FSLASH) factor_)*
private factor_ ::= PLUS factor_
| MINUS factor_
| primary_
private primary_ ::= number
| IDENTIFIER
| LPAREN mathExp RPAREN
Have a look at the expression grammar of any programming language:
expression
: term
| expression '+' term
| expression '-' term
;
term
: factor
| term '*' factor
| term '/' factor
| term '%' factor
;
factor
: primary
| '-' factor
| '+' factor
;
primary
: IDENTIFIER
| INTEGER
| FLOATING_POINT_LITERAL
| '(' expression ')'
;
Exponentiation left as an exercise for the reader: note that the exponentiation operator is right-associative. This is in yacc notation. NB You are using EBNF, not BNF.
EDIT My non-left-recursive EBNF is not as strong as my yacc, but to factor out the left-recursions you need a scheme like for example:
expression
::= term ((PLUS|MINUS) term)*
term
::= factor ((FSLASH|ASTERISK) factor)*
etc., where * means 'zero or more'. My comments on this below are mostly incorrect and should be ignored.
You may want to take a look at the expression grammar of languages that are typically implemented using recursive descent parsers for which LL(1) grammars are needed which do not allow left recursion. Most if not all of Wirth's languages fall into this group. Below is an example from the grammar of classic Modula-2. EBNF links are shown next to each rule.
http://modula-2.info/m2pim/pmwiki.php/SyntaxDiagrams/PIM4NonTerminals#expression
In the example MIB entry below:
--
-- Logging configuration
--
nsLoggingTable OBJECT-TYPE
SYNTAX SEQUENCE OF NsLoggingEntry
MAX-ACCESS not-accessible
STATUS current
DESCRIPTION
"A table of individual logging output destinations, used to control
where various levels of output from the agent should be directed."
::= { nsConfigLogging 1 }
nsLoggingEntry OBJECT-TYPE
SYNTAX NsLoggingEntry
MAX-ACCESS not-accessible
STATUS current
DESCRIPTION
"A conceptual row within the logging table."
INDEX { nsLogLevel, IMPLIED nsLogToken }
::= { nsLoggingTable 1 }
NsLoggingEntry ::= SEQUENCE {
nsLogLevel INTEGER,
nsLogToken DisplayString,
nsLogType INTEGER,
nsLogMaxLevel INTEGER,
nsLogStatus RowStatus
}
Here RowStatus entry is the last one in the NsLoggingEntry, can we put this RowStatus entry anywhere in NsLoggingEntry (for e.g. after "nsLogToken DisplayString")?
Moving the entry nsLogStatus RowStatus to a different location within the sequence of NsLoggingEntry is possible but you need to update the order of the columnar objects to match the order of the sequence.
To give a little more detail, NsLoggingEntry ::= SEQUENCE is defining the columns that will make up entries in the nsLoggingTable. The MIB file should have further definition for each of those columns that will look something like,
nsLogStatus OBJECT-TYPE
SYNTAX RowStatus
MAX-ACCESS read-only
STATUS current
DESCRIPTION "<Some great description of this column>"
::= { nsLoggingEntry 5 }
The key part of that definition is the ::= { nsLoggingEntry 5 } line which asserts that nsLogStatus will be the fifth column of in rows of nsLoggingTable. If you change the order of the NsLoggingEntry sequence, you should make sure that the individual column definitions follow that sequence.
For example, if you changed the order to be,
NsLoggingEntry ::= SEQUENCE {
nsLogLevel INTEGER,
nsLogToken DisplayString,
nsLogStatus RowStatus,
nsLogType INTEGER,
nsLogMaxLevel INTEGER
}
the OID assignments for each of the columns should become,
nsLogLevel ::= { nsLoggingEntry 1 }
nsLogToken ::= { nsLoggingEntry 2 }
nsLogStatus ::= { nsLoggingEntry 3 }
nsLogType ::= { nsLoggingEntry 4 }
nsLogMaxLevel ::= { nsLoggingEntry 5 }
There is one more thing to keep in mind: the index for the table should be the first column in the sequence so nsLogLevel should remain in it's current location, as should nsLogToken.