Why is layout around optional parts of a production causing ambiguity? - whitespace

In Rascal, why is it that when there is layout at the position of an optional part of a production, this causes ambiguity? E.g. "{ }" is ambiguous as Start1, while it parses fine as Start2 from the following grammar, which I would have expected to be exactly identical.
layout Layout = " "?;
start syntax Start1 = "{" "c"? "}";
start syntax Start2 = "{" "c" "}"
| "{" "}";
In addition, I would like to know if there is another way to represent Start2 without duplication than Start1, that does not cause the same ambiguity.
Obviously there is no large amount of duplication in this code and Start2 is a good option here, but this is just an example. I am working with a grammar with many productions that contain three or four optional parts and in the last case the notation displayed in Start2 already requires duplicating the not-optional parts of the production 2^4=16 times, which really is troublesome in my opinion.

Your grammar is first extended before a parser is generated to something similar to this:
layout Layout = " "?;
syntax " "? = | " ";
syntax Start1 = "{" Layout "c"? Layout "}";
syntax "c"? = | "c";
lexical " " = [\ ];
lexical "c" = [c];
lexical "{" = [{];
lexical "}" = [}];
syntax Start2 = "{" Layout "c" Layout "}"
| "{" Layout "}";
syntax start[Start1] = Layout Start1 Layout;
syntax start[Start2] = Layout Start2 Layout;
So for an input like { } (space between the curlies), the space can be derived by the first instance of Layout in the right-hand side of the Start1 rule, or by the second instance of Layout. Since the parser produces all derivation trees, both in this case, the parse is ambiguous so to say.
Typically the ambiguity is solved by introducing greediness using a follow restriction like so:
layout Layout = " "? !>> " "
or (equivalently) like so:
layout Layout = " "? !>> [\ ]
The restriction acts as a constraint on the Layout rule: it will not derive anything (not even the empty string) if there is a space following it. This makes only the first derivation valid then, where the space goes inside the first Layout instance of Start1. After this there is } which satisfies the constraint and the parse is unambiguous.

Related

Parsing comment in Rascal

I have a very basic question about parsing a fragment that contains comment.
First we import my favorite language, Pico:
import lang::pico::\syntax::Main;
Then we execute the following:
parse(#Id,"a");
gives, as expected:
Id: (Id) `a`
However,
parse(#Id,"a\n%% some comment\n");
gives a parse error.
What do I do wrong here?
There are multiple problems.
Id is a lexical, meaning layout (comments) are never there
Layout is only inserted between elements in a production and the Id lexical has only a character class, so no place to insert layout.
Even if Id was a syntax non terminal with multiple elements, it would parse comments between them not before or after.
For more on the difference between syntax, lexical, and layout see: Rascal Syntax Definitions.
If you want to parse comments around a non terminal, we have the start modified for the non terminal. Normally, layout is only inserted between elements in the production, with start it is also inserted before and after it.
Example take this grammer:
layout L = [\t\ ]* !>> [\t\ ];
lexical AB = "A" "B"+;
syntax CD = "C" "D"+;
start syntax EF = "E" "F"+;
this will be transformed into this grammar:
AB = "A" "B"+;
CD' = "C" L "D"+;
EF' = L "E" L "F"+ L;
"B"+ = "B"+ "B" | "B";
"D"+ = "D"+ L "D" | "D";
"F"+ = "F"+ L "F" | "F";
So, in particular if you'd want to parse a string with layout around it, you could write this:
lexical Id = [a-z]+;
start syntax P = Id i;
layout L = [\ \n\t]*;
parse(#start[P], "\naap\n").top // parses and returns the P node
parse(#start[P], "\naap\n").top.i // parses and returns the Id node
parse(P, "\naap"); // parse error at 0 because start wrapper is not around P

What is the empty statement in Golang?

In Python we can use pass clause as an placeholder.
What is the equivalent clause in Golang?
An ; or something else?
The Go Programming Language Specification
Empty statements
The empty statement does nothing.
EmptyStmt = .
Notation
The syntax is specified using Extended Backus-Naur Form (EBNF):
Production = production_name "=" [ Expression ] "." .
Expression = Alternative { "|" Alternative } .
Alternative = Term { Term } .
Term = production_name | token [ "…" token ] | Group | Option | Repetition .
Group = "(" Expression ")" .
Option = "[" Expression "]" .
Repetition = "{" Expression "}" .
Productions are expressions constructed from terms and the following
operators, in increasing precedence:
| alternation
() grouping
[] option (0 or 1 times)
{} repetition (0 to n times)
Lower-case production names are used to identify lexical tokens.
Non-terminals are in CamelCase. Lexical tokens are enclosed in double
quotes "" or back quotes ``.
The form a … b represents the set of characters from a through b as
alternatives. The horizontal ellipsis … is also used elsewhere in the
spec to informally denote various enumerations or code snippets that
are not further specified. The character … (as opposed to the three
characters ...) is not a token of the Go language.
The empty statement is empty. In EBNF (Extended Backus–Naur Form) form: EmptyStmt = . or an empty string.
For example,
for {
}
var no
if true {
} else {
no = true
}

Syntax error ... Jython/Python

Hi I'm doing my first programming assignment (it's with Jython, i.e. Python using Java) and I've run into a syntax error on line 14 (bolded below). I've tried changing the variable to something less useful like "L" or "I" but it still gives the error. It's annoying because it makes no sense. I have tried indenting again and adding comments around it.
This is a program that outputs a picture of a soccer ball factory. It's as much as an artistic project as a comp sci project. So the printing looks complicated but is only like long checklist for building.
def prettyPic():
#building materials and parts
spacer = " "
ceiling_part = "-"
ball = "o"
wheel = ""
door_joint = "#"
left_half_arch = "/"
right_half_arch = "\\"
ladder = "\\"
wall = "|"
glass = (
#biox
**left_box = "u"
right_box = "u"**
#begin printing
print (spacer*30 + ceiling_part*30)
print (spacer*32 + wall*1) + (spacer*47 + wall*1)
#three balls, leaving space for drop
print (spacer*32 + wall*1) + (ball*27) + (wall*1)
#arches, not touching ceiling
etc, etc
The problem is in this line:
glass = (
This means that to glass variable, you assign tuple just like:
glass = (1, 2, 'some string')
Python interpreter searches for termination of just opened tuple but it finds only Python code that is not correct in this context.
Remove or comment out line with glass, or assign to glass variable some value.

CSV parsing with a context free grammar

I'd like to parse a CSV using context free grammar. I arleady have an implementation in C++ but I want to scale CFG's up to harder problems, but first i need to solve an easy one.
So here's what I have so far (my syntax is similar to boost spirit):
A CSV consists of one or more rows
Start >> +Line
A row consists of a comma separated symbols plus EOL
Line >> Symbol >> *(',' Symbol) >> EOL
An EOL delimiter can be either windows or unix style
EOL >> -'\r' >> '\n'
Here is where I am stuck handling quoted strings:
Symbol >>
string |
????
Example of a complex quoted strings that must be properly parsed:
"This, is a ""complex"" example of a CSV string!"
"This, is a more """"""complex"""""" but theoretically possible example of a CSV string!"
I am new to CFG's and cannot figure out how to characterize this in CFG. Basically you need to ignore the commas and double double quotes when the state enters quote mode.
UPDATE:
I just realized that I need to add more states to my conceptual finite state machine from an insight from my automata theory that CFG can be recognized by a pushdown automata:
Symbol -->
string
" doublequotemode "
' singlequotemode '
doublequotemode -->
*"" string *""
The question is how does this work with boost and greedy/non-greedy parsing?
This will process (double-)quoted strings replacing "" with a single quote " inside the quoted string:
noquote = char_ - '"';
quoted_string = lexeme[lit('"') >> *(noquote | '"' >> char_('"')) >> '"'];

VB6 getting ride of large spaces

Hey all, i am trying to replace large spaces between text with just one. My output looks like this right now:
5964215">
This is just the first example of the spaces
5964478">
This would be the 2nd example of showing how many spaces this thing has in each sentence.
5964494">
That comes from a textbox that has multi-line to true. Here is what it looks like when it doesn't have multi-line to true.
http://www.june3rdsoftware.com/forums/vb6.jpg
I can not seem to get the spaces to go away! BTW, this text is from a webpage if that makes any difference.
David
According to the suggestion of MvanGeest, here is some VB code to replace blocks of white spaces:
Sub test()
Dim x As String, y As String
x = "abcd defg 1233"
Dim re As New RegExp
re.Pattern = "\s+"
re.Global = True
y = re.Replace(x, " ")
Debug.Print y
End Sub
To make this work, you will have to add a reference to "Microsoft VBScript Regular Expresssions" to your project.
Assuming no regex support, why not set up a simple state machine that will set the state=1 when a space is found and set state=0 once a non-space is encountered. You can move char by char when state=0 (thus copying over only 1 space per series of spaces).
Also assuming no regex, you could try something like
str = "long text with spaces "
i = LenB(str)
str = Replace(str, " ", " ")
Do While LenB(str) <> i
i = LenB(str)
str = Replace(str, " ", " ")
Loop
Of course this code could be optimized for long space sequences but it might be all you need as well

Resources