Very new to lex. This is for project for Prog. Langs. class
Consider a language built over the following grammar:
<program> ::= <statement> | <program> <statement>
<statement> ::= <assignStmt> | <ifStmt> | <whileStmt> | <printStmt>
<assignStmt> ::= <id> = <expr> ;
<ifStmt> ::= if ( <expr> ) then <stmt>
<whileStmt> ::= while ( <expr> ) do <stmt>
<printStmt> ::= print <expr> ;
<expr> ::= <term> | <expr> <addOp> <term>
<term> ::= <factor> | <term> <multOp> <factor>
<factor> ::= <id> | <number> | - <factor> | ( <expr> )
<id> ::= <letter> | <id> <letter>
<letter> ::= a | b | c | d | e | f | g | h | i | j
| k | l | m | n | o | p | r | s | t
| u | v | w | x | y | z
<number> ::= <digit> | <number> <digit>
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<addOp> ::= + | -
<multOp> ::= * | / | %
Implement a lex-based C program that scans for all the tokens of the language (keywords, identifiers, numbers, operators, and so on).
My problem is I get "l7t2.l:32: unrecognized rule" error. I believe it stems from the declaration of "word" above but not sure how to fix it.
Heres my lex file, l7t2.l
%option noyywrap
%{
#include "l7t2.h"
int totDol = 0;
int *outword;
%}
digit [0-9]
number {digit}*
letter [a-zA-Z]
word ({letter}{[a-zA-Z0-9]}+)
%%
"if" {return IF;}
"then" {return THEN;}
"while" {return WHILE;}
"do" {return DO;}
"+" {return PLUSOP;}
"-" {return MINUSOP;}
"*" {return MULTOP;}
"/" {return DIVOP;}
"%" {return MODOP;}
";" {return SEMICOLON;}
"=" {return EQUAL;}
"print" {return PRINT;}
[ \t\n]+ ;
{word} {strcpy(outword, yytext);}
\${number} {totDol = 0; totDol += strtod(yytext+1, NULL); return totDol;}
%%
word ({letter}{[a-zA-Z0-9]}+)
The problem is here. {} is used to introduce prior definitions only. It should be:
word ({letter}[a-zA-Z0-9]+)
On line 32, surely you should be returning a value from that rule?
NB You can get rid of all the single-special-character rules and have a final cover-all rule:
. return yytext[0];
This also means you can use the special characters directly in the grammar, e.g. '+' instead of PLUSOP. It also saves you from having to handle illegal characters at all in the lexer: the parser does it.
Related
In school we have been studying metalanguages, in particular, railroad diagrams and EBNF. I received a question where an imaginary programming language (winston) was described in EBNF. Here it is:
Digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
LCase = a | b | c | d
UCase = A | B | C | D | E | F | G | H | I | J
Operator = + | - | * | /
Logical = < | > | <= | >= | <>
Constant = [-] <Digit>{<Digit>}
Identifier = <UCase>{<LCase> | <Digit>}
Assignment = Set <Identifier> to <Constant> | <Identifier
{<Operator>(<Constant> | <Identifier>)}
Condition = <Identifier> <Logical> (<Identifier> | <Constant>)
{(and | or) <Identifier> <Logical> (<Identifier> | <Constant>)}
When = (<Assignment> | <Condition> {<Assignment> | <Condition>})
Statement = <Input> | <Output> | <Assignment> | <Condition> | <When> | <Pretest> | <Posttest>
Program = Start <Statement> {! <Statement>} Stop
The program written below was made with winston but doesn't execute properly. Use the EBNF descriptions to identify the error.
Start
Input J1
Input J2
When (J1 = J2, Set A3 to 0), (J1 < J2, Set A3 to -1), Set A3 to 1
Output A3
Stop
My working so far: To me, this program seems legitimate. It is a program so if must start with "start" and end with "stop", which it does. The statements in the middle seemingly are allowed to be in there. Can someone point me in the right direction?
Also, can someone tell me what it means in the EBNF description of a program what this means:<statement>
I think it means statements like when and if but Im not too sure. Thanks for the help :)
When is comma-separated and the grammar doesn't specify commas at all.
J1 = J2 -- there is no = comparison op in the grammar (see Logical), so J1 = J2 is neither Assignment, nor Condition and is thus invalid.
<statement> -- the grammar wraps symbols in angle brackets on the right-hand side, e.g. Identifier on the left-hand and, later, <Identifier> in Assignment rule -- that doesn't look like a valid EBNF.
lets say i have the following EBNF:
ProductNo ::= Digitgroup "-" Lettergroup;
Digitgroup ::= Digit Digit? Digit? Digit?;
Digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
Lettergroup ::= Letter Letter? Letter? Letter? Letter?;
Letter ::= "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z";
now i want to set the maximum of Tokens for ProductNo = 5
Example:
Input : 1-A (EBNF valid and Token < 5)
Input : 023-A (EBNF valid and Token < 5)
Input : 0231-ABI (currently EBNF valid but Token = 8 > 5 so this should not be valid)
Input : 022-ABCDE(currently EBNF valid but Token = 9 > 5 so this should not be valid)
as you can see in this example input, the combination of Digits and Letters can vary as long as its EBNF conform (min 1 Digit max 4 Digit), (min 1 Letter max 5 Letter) but the sum of the Tokens has to be <= 5 including the "-".
Question : Is there a way other than writing every valid combination of Letter and Digit down?
My current solution:
ProductNo ::= Token Token Token Token? Token?;
Token ::= Digit | Letter | "-";
Digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
Letter ::= "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z";
Problem : The composition of ProductNo (Digitgroup, "-", Lettergroup) is not reproduced. So i need to combine the two EBNF into one, but i really cant figure a way out how to do this.
I'm assuming you are using the W3C notation: http://www.w3.org/TR/REC-xml/#sec-notation , not the standard ISO notation: http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form .
If I'm wrong, then please specify which EBNF you're using!
In the W3C notation you can use this:
Digit ::= [0-9]
Letter ::= [A-Z]
GoodFormat ::= Digit+ "-" Letter+
Token ::= Digit | Letter | "-"
TooLong ::= Token Token Token Token Token Token+
ProductNo ::= GoodFormat - TooLong
I think that there is a smarter solution than writing every valid combination down:
ProductNo ::= Case1 | Case2 | Case3
Case1 ::= Digit Digit? Digit? "-" Letter;
Case2 ::= Digit "-" Letter Letter? Letter?;
Case3 ::= Digit Digit? "-" Letter Letter?;
Digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
Letter ::= "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z";
But I don't know if there is any smarter why to do this. I hope this solution helps a bit.
I have this grammar in EBNF for a sub-language with arithmetic & logical expressions, variables assignment and printing.
start ::= (print | assign)*
print ::= print expr ;
assign ::= ID = expr ;
expr ::= andExpr (|| andExpr)*
andExpr ::= relExpr (&& relExpr)*
relExpr ::= addExpr ( == addExpr | != addExpr | <= addExpr | >= addExpr | < addExpr | > addExpr)?
addExpr ::= mulExpr (+ mulExpr | - mulExpr)*
mulExpr ::= unExpr (* hunExpri | / hunExpr)*
unExpr ::= + unExpr | - unExpr | ! unExpr | primary
primary ::= ( expr ) | ID | NUM | true | false
unfortunately I just can't figure out what these two rules:
unExpr ::= + unExpr
unExpr ::= - unExpr
actually do, or why I should need them, since I seem to be able to derive every phrase of the language without applying them. Any idea?
thanks a lot :-)
If you are planning no expressions like:
a=-1
(where "a" is an ID and "1" is a NUM)
in your language than you don't need those two rules.
Otherwise you have to implement them.
I need to write a program to draw all possible paths in a given matrix that can be had by moving in only left, right and up direction.
One should not cross the same location more than once. Note also that on a particular path, we may or may not use motion in all possible directions.
Path will start in the bottom-left corner in the matrix and will reach the top-right corner.
Following symbols are used to denote the direction of the motion in the current position:
+---+
| > | right
+---+
+---+
| ^ | up
+---+
+---+
| < | left
+---+
The symbol * is used in the final location to indicate end of path.
Example:
For a 5x8 matrix, using left, right and up directions, 2 different paths are shown below.
Path 1:
+---+---+---+---+---+---+---+---+
| | | | | | | | * |
+---+---+---+---+---+---+---+---+
| | | > | > | > | > | > | ^ |
+---+---+---+---+---+---+---+---+
| | | ^ | < | < | | | |
+---+---+---+---+---+---+---+---+
| | > | > | > | ^ | | | |
+---+---+---+---+---+---+---+---+
| > | ^ | | | | | | |
+---+---+---+---+---+---+---+---+
Path 2
+---+---+---+---+---+---+---+---+
| | | | > | > | > | > | * |
+---+---+---+---+---+---+---+---+
| | | | ^ | < | < | | |
+---+---+---+---+---+---+---+---+
| | | | | | ^ | | |
+---+---+---+---+---+---+---+---+
| | | > | > | > | ^ | | |
+---+---+---+---+---+---+---+---+
| > | > | ^ | | | | | |
+---+---+---+---+---+---+---+---+
Can anyone help me with this?
I tried to solve using lists. It i soon realized that i am making a disaster. Here is the code i tried with.
solution x y = travel (1,1) (x,y)
travelRight (x,y) = zip [1..x] [1,1..] ++ [(x,y)]
travelUp (x,y) = zip [1,1..] [1..y] ++ [(x,y)]
minPaths = [[(1,1),(2,1),(2,2)],[(1,1),(1,2),(2,2)]]
travel startpos (x,y) = rt (x,y) ++ up (x,y)
rt (x,y) | odd y = map (++[(x,y)]) (furtherRight (3,2) (x,2) minPaths)
| otherwise = furtherRight (3,2) (x,2) minPaths
up (x,y) | odd x = map (++[(x,y)]) (furtherUp (2,3) (2,y) minPaths)
| otherwise = furtherUp (2,3) (2,y) minPaths
furtherRight currpos endpos paths | currpos == endpos = (travelRight currpos) : map (++[currpos]) paths
| otherwise = furtherRight (nextRight currpos) endpos ((travelRight currpos) : (map (++[currpos]) paths))
nextRight (x,y) = (x+1,y)
furtherUp currpos endpos paths | currpos == endpos = (travelUp currpos) : map (++[currpos]) paths
| otherwise = furtherUp (nextUp currpos) endpos ((travelUp currpos) : (map(++[currpos]) paths))
nextUp (x,y) = (x,y+1)
identify lst = map (map iden) lst
iden (x,y) = (x,y,1)
arrows lst = map mydir lst
mydir (ele:[]) = "*"
mydir ((x1,y1):(x2,y2):lst) | x1==x2 = '>' : mydir ((x2,y2):lst)
| otherwise = '^' : mydir ((x2,y2):lst)
surroundBox lst = map (map createBox) lst
bar = "+ -+"
mid x = "| "++ [x] ++" |"
createBox chr = bar ++ "\n" ++ mid chr ++ "\n" ++ bar ++ "\n"
This ASCII grids are much more confusing than enlightening. Let me describe a better way to represent each possible path.
Each non-top row will have exactly one cell with UP. I claim that once each of the UP cells has been chosen that the LEFT and RIGHT and EMPTY cells can be determined. I claim that all possible cells in each of the non-top rows can be UP in all combination.
Each path is thus isomorphic to a (rows-1) length list of numbers in the range (1..columns) that determine the UP cells. The number of allowed paths is thus columns^(rows-1) and enumerating the possible paths in this format should be easy.
Then you could make a printer that converts this format to the ASCII art. This may be annoying, depending on skill level.
Looks like a homework so I will try to give enough hints
Try first filling number of paths from a cell to your goal.
So
+---+---+---+---+---+---+---+---+
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | * |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
The thing to note here is from the cell in the top level there will always be one path to the *.
Number of possible path from cells in the same row will be same. You can realize this as all the paths will ultimately have to move up as there is no down action so in any path the cell above the current row can be reached by any cell in the current row.
You can feel the all possible paths from the current cell has its relation with the possible paths from the cell left,right and above. But as we know we can find all possible paths from only one cell in a row and rest of cells' possible paths will be some movements in the same row followed by a suffix of possible paths from that cell.
Maybe I will give you a example
+---+---+---+
| 1 | 1 | * |
+---+---+---+
| | | |
+---+---+---+
| | | |
+---+---+---+
You know all possible paths from cells in the first row. You need to find the same in the second row. So a good strategy would be to do it for the right most cell
+---+---+---+
| > | > | * |
+---+---+---+
| ^ | < | < |
+---+---+---+
| | | |
+---+---+---+
+---+---+---+
| | > | * |
+---+---+---+
| | ^ | < |
+---+---+---+
| | | |
+---+---+---+
+---+---+---+
| | | * |
+---+---+---+
| | | ^ |
+---+---+---+
| | | |
+---+---+---+
Now finding this for rest of the cells in the same row is trivial using these as I have told before.
In the end if you have m X n matrix the number of paths from bottom-left corner to top-right corner will be n^(m-1).
Another way
This way is not very optimal but easy to implement. Consider m X n grid
Find the path of longest length. You dont need the exact path just the number of <,>,^.
You can find the direct formula in terms of m and n.
Like
^ = m - 1
< = (n-1) * floor((m-1)/2)
> = (n-1) * (floor((m-1)/2) + 1)
Any valid path will be a prefix of the permutations of this which you can search exhaustively. Use permutations from Data.List to get all possible permutations. Then make a function which given a path strips a valid path from this. map this over the list of permutations and remove duplicates. The thing to note is path will be a prefix of what you get from permutation, so there can be several permutations for the same path.
Can you create that matrix and define the "fields"? Even if you can't (a specific matrix is given), you can map an [(Int, Int)] matrix (which sounds reasonable for this kind of task) to your own representation.
Since you didn't specify what your skill level was, I hope you don't mind that I suggest that you first try to create some kind of a grid in order to have something to work on:
data Status = Free | Left | Right | Up
deriving (Read, Show, Eq)
type Position = (Int, Int)
type Field = (Position, Status)
type Grid = [Field]
grid :: Grid
grid = [((x, y), stat) | x <- [1..10], y <- [1..10], let stat = Free]
Of course there are other ways to achieve this. Afterwards you can define some movement, map Position to Grid index and Statuses to printable characters... Try to fiddle with it and you might get some ideas.
Does anyone know the rules for valid Ruby variable names? Can it be matched using a RegEx?
UPDATE: This is what I could come up with so far:
^[_a-z][a-zA-Z0-9_]+$
Does this seem right?
Identifiers are pretty straightforward. They begin with letters or an underscore, and contain letters, underscore and numbers. Local variables can't (or shouldn't?) begin with an uppercase letter, so you could just use a regex like this.
/^[a-z_][a-zA-Z_0-9]*$/
It's possible for variable names to be unicode letters, in which case most of the existing regexes don't match.
varname = "\u2211" # => "∑"
eval(varname + '= "Tony the Pony"') => "Tony the Pony"
puts varname # => ∑
local_variable_identifier = /Insert large regular expression here/
varname =~ local_variable_identifier # => nil
See also "Fun with Unicode" in either the Ruby 1.9 Pickaxe or at Fun with Unicode.
According to http://rubylearning.com/satishtalim/ruby_names.html a Ruby variable consists of:
A name is an uppercase letter,
lowercase letter, or an underscore
("_"), followed by Name characters
(this is any combination of upper- and
lowercase letters, underscore and
digits).
In addition, global variables begin with a dollar sign, instance variables with a single at-sign, and class variables with two at-signs.
A regular expression to match all that would be:
%r{
(\$|#{1,2})? # optional leading punctuation
[A-Za-z_] # at least one upper case, lower case, or underscore
[A-Za-z0-9_]* # optional characters (including digits)
}x
Hope that helps.
I like #aboutruby's answer, but just to complete it, here's the equivalent using POSIX bracket expressions.
/^[_[:lower:]][_[:alnum:]]*$/
Or, since a-z is actually shorter than [:lower:]:
/^[_a-z][_[:alnum:]]*$/
I think /^(\$){0,1}[_a-zA-Z][a-zA-Z0-9_]*([?!]){0,1}$/ is a bit closer to what you will need...
It depends on whether you want to match method names as well.
If you are trying to match a name that might be encountered in an expression, then it might start with $ and it might end with ? or !. If you know for sure that it is just a local variable then the rule will be much simpler.
i was trying to figure one out for a rails patch, and Matthew Draper wrote this one, using the ruby parser as a reference:
/\A(?![A-Z0-9])(?:[[:alnum:]_]|[^\0-\177])+\z/
And here it is, straight from the horse's mouth. (The horse in this case is the Draft ISO Ruby Specification):
local-variable-identifier → ( lowercase-character | _ ) identifier-character *
identifier-character → lowercase-character | uppercase-character | decimal-digit | _
uppercase-character → A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
lowercase-character → a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
decimal-digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
In Ruby 1.9, using named groups, you can translate this literally:
local_variable_identifier = %r{
(?<uppercase_character> A | B | C | D | E | F | G | H | I | J | K | L | M
| N | O | P | Q | R | S | T | U | V | W | X | Y | Z
){0}
(?<lowercase_character> a | b | c | d | e | f | g | h | i | j | k | l | m
| n | o | p | q | r | s | t | u | v | w | x | y | z
){0}
(?<decimal_digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9){0}
(?<identifier_character> \g<lowercase_character>
| \g<uppercase_character>
| \g<decimal_digit>
| _
){0}
( \g<lowercase_character> | _ ) \g<identifier_character>*
}x
Of course, this is not how you would really write it.