lets say i have the following EBNF:
ProductNo ::= Digitgroup "-" Lettergroup;
Digitgroup ::= Digit Digit? Digit? Digit?;
Digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
Lettergroup ::= Letter Letter? Letter? Letter? Letter?;
Letter ::= "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z";
now i want to set the maximum of Tokens for ProductNo = 5
Example:
Input : 1-A (EBNF valid and Token < 5)
Input : 023-A (EBNF valid and Token < 5)
Input : 0231-ABI (currently EBNF valid but Token = 8 > 5 so this should not be valid)
Input : 022-ABCDE(currently EBNF valid but Token = 9 > 5 so this should not be valid)
as you can see in this example input, the combination of Digits and Letters can vary as long as its EBNF conform (min 1 Digit max 4 Digit), (min 1 Letter max 5 Letter) but the sum of the Tokens has to be <= 5 including the "-".
Question : Is there a way other than writing every valid combination of Letter and Digit down?
My current solution:
ProductNo ::= Token Token Token Token? Token?;
Token ::= Digit | Letter | "-";
Digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
Letter ::= "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z";
Problem : The composition of ProductNo (Digitgroup, "-", Lettergroup) is not reproduced. So i need to combine the two EBNF into one, but i really cant figure a way out how to do this.
I'm assuming you are using the W3C notation: http://www.w3.org/TR/REC-xml/#sec-notation , not the standard ISO notation: http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form .
If I'm wrong, then please specify which EBNF you're using!
In the W3C notation you can use this:
Digit ::= [0-9]
Letter ::= [A-Z]
GoodFormat ::= Digit+ "-" Letter+
Token ::= Digit | Letter | "-"
TooLong ::= Token Token Token Token Token Token+
ProductNo ::= GoodFormat - TooLong
I think that there is a smarter solution than writing every valid combination down:
ProductNo ::= Case1 | Case2 | Case3
Case1 ::= Digit Digit? Digit? "-" Letter;
Case2 ::= Digit "-" Letter Letter? Letter?;
Case3 ::= Digit Digit? "-" Letter Letter?;
Digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
Letter ::= "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z";
But I don't know if there is any smarter why to do this. I hope this solution helps a bit.
Related
I have:
hash = {"1"=>["A", "B", "C", ... "Z"], "2"=>["B", "C"], "3"=>["A", "C"]
My goal is to use hash as a source for creating a CSV with columns whose names are a letter of the alphabet and with rows hash(key) = 1,2,3 etc.
I created an array of all hash.values.unshift("")values that serve as row 1 (columns labels).
desired output:
| A | B | C | ... | Z |
1| A | B | C | ... | Z |
2| | B | C | ....... |
3| A | | C | ....... |
Creating CSV:
CSV.open("groups.csv", 'w') do |csv|
csv << row1
hash.each do |v|
csv << v.flatten
end
end
This makes the CSV look almost what I want but There is no spacing to get columns to align.
Any advice on how to make a method for modifying my hash that compares my all [A-Z] against each subsequent hash key (rows) to insert empty strings to provide spacing?
Can Class CSV do it better?
Something like this?
require 'csv'
ALPHA = ('A'..'Z').to_a.freeze
hash={"1"=>ALPHA, "2"=>["B", "C"], "3"=>["A", "C"]}
csv = CSV.generate("", col_sep: "|") do |csv|
csv << [" "] + ALPHA # header
hash.each do |k, v|
alphabet = ALPHA.map { |el| [el, 0] }.to_h
v.each { |el| alphabet[el] += 1 }
csv << [k, *alphabet.map { |k, val| val == 1 ? k : " " }]
end
end
csv.split("\n").each { |row| puts row }
output:
|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z
1|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z
2| |B|C| | | | | | | | | | | | | | | | | | | | | | |
3|A| |C| | | | | | | | | | | | | | | | | | | | | | |
If your values are truly single characters and don't need the CSV escaping, then I recommend bypassing CSV altogether and building the string in plain Ruby.
Assuming you want to align your lines correctly regardless of the number of digits in the row number (e.g. 1, 10, and 100), you can use printf style formatting to guarantee horizontal aligment (assuming your row number width never exceeds the value of ROWNUM_WIDTH).
By the way, I changed the hash's keys to integers, hope that's ok.
#!/usr/bin/env ruby
FIELDS = ('A'..'Z').to_a
DATA = { 1 => FIELDS, 2 => %w(B C), 3 => %w(A C) }
ROWNUM_WIDTH = 3
output = ' ' * ROWNUM_WIDTH + " | #{FIELDS.join(' | ')} |\n"
DATA.each do |rownum, values|
line = "%*d | " % [ROWNUM_WIDTH, rownum]
FIELDS.each do |field|
char = values.include?(field) ? field : ' '
line << "#{char} | "
end
output << line << "\n"
end
puts output
=begin
Outputs:
| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
1 | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
2 | | B | C | | | | | | | | | | | | | | | | | | | | | | | |
3 | A | | C | | | | | | | | | | | | | | | | | | | | | | | |
=end
all = [*?A..?Z]
hash = {"1"=>[*?A..?Z], "2"=>["B", "C"], "3"=>["A", "C"]}
hash.map do |k, v|
[k, *all.map { |k| v.include?(k) ? k : ' ' }]
end.unshift([' ', *all]).
map { |row| row.join('|') }
#⇒ [
# [0] " |A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z",
# [1] "1|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z",
# [2] "2| |B|C| | | | | | | | | | | | | | | | | | | | | | | ",
# [3] "3|A| |C| | | | | | | | | | | | | | | | | | | | | | | "
# ]
Very new to lex. This is for project for Prog. Langs. class
Consider a language built over the following grammar:
<program> ::= <statement> | <program> <statement>
<statement> ::= <assignStmt> | <ifStmt> | <whileStmt> | <printStmt>
<assignStmt> ::= <id> = <expr> ;
<ifStmt> ::= if ( <expr> ) then <stmt>
<whileStmt> ::= while ( <expr> ) do <stmt>
<printStmt> ::= print <expr> ;
<expr> ::= <term> | <expr> <addOp> <term>
<term> ::= <factor> | <term> <multOp> <factor>
<factor> ::= <id> | <number> | - <factor> | ( <expr> )
<id> ::= <letter> | <id> <letter>
<letter> ::= a | b | c | d | e | f | g | h | i | j
| k | l | m | n | o | p | r | s | t
| u | v | w | x | y | z
<number> ::= <digit> | <number> <digit>
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<addOp> ::= + | -
<multOp> ::= * | / | %
Implement a lex-based C program that scans for all the tokens of the language (keywords, identifiers, numbers, operators, and so on).
My problem is I get "l7t2.l:32: unrecognized rule" error. I believe it stems from the declaration of "word" above but not sure how to fix it.
Heres my lex file, l7t2.l
%option noyywrap
%{
#include "l7t2.h"
int totDol = 0;
int *outword;
%}
digit [0-9]
number {digit}*
letter [a-zA-Z]
word ({letter}{[a-zA-Z0-9]}+)
%%
"if" {return IF;}
"then" {return THEN;}
"while" {return WHILE;}
"do" {return DO;}
"+" {return PLUSOP;}
"-" {return MINUSOP;}
"*" {return MULTOP;}
"/" {return DIVOP;}
"%" {return MODOP;}
";" {return SEMICOLON;}
"=" {return EQUAL;}
"print" {return PRINT;}
[ \t\n]+ ;
{word} {strcpy(outword, yytext);}
\${number} {totDol = 0; totDol += strtod(yytext+1, NULL); return totDol;}
%%
word ({letter}{[a-zA-Z0-9]}+)
The problem is here. {} is used to introduce prior definitions only. It should be:
word ({letter}[a-zA-Z0-9]+)
On line 32, surely you should be returning a value from that rule?
NB You can get rid of all the single-special-character rules and have a final cover-all rule:
. return yytext[0];
This also means you can use the special characters directly in the grammar, e.g. '+' instead of PLUSOP. It also saves you from having to handle illegal characters at all in the lexer: the parser does it.
Here's what I'm trying to do. I'm iterating through an array of strings, each of which may contain [] multiple times. I want to use String#match as many times as necessary to process every occurrence. So I have an Array#each block, and nested within that is an infinite loop that should break only when I run out of matches for a given string.
def follow(c, dir)
case c
when "\\"
dir.rotate!
when "/"
dir.rotate!.map! {|x| -x}
end
return dir
end
def parse(ar)
ar.each_with_index { |s, i|
idx = 0
while true do
t = s.match(/\[[ ]*([a-zA-Z]+)[ ]*\]/) { |m|
idx = m.end 0
r = m[1]
s[m.begin(0)...idx] = " "*m[0].size
a = [i, idx]
dir = [0, 1]
c = ar[a[0]][a[1]]
while !c.nil? do
dir = follow c, dir
ar[a[0]][a[1]] = " "
a[0] += dir[0]; a[1] += dir[1]
c = ar[a[0]][a[1]]
if c == ">" then
ar[a[0]][a[1]+1] = r; c=nil
elsif c == "<" then
ar[a[0]][a[1]-r.size] = r; c=nil
end
end
ar[a[0]][a[1]] = " "
puts ar
}
if t == nil then break; end
end
}
parse File.new("test", "r").to_a
Contents of test:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
/--> /----> | |
| |[kale]/ | hot dogs | Brazil |
| | <----------------------\ |
| | orange |[yellow]\ | [green]/ |
| +--------+--------|-+-------------+
\-------------------/
Goal:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
yellow kale | |
| | hot dogs | Brazil |
| green |
| orange | | |
+--------+-------- -+-------------+
Actual output of program:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
yellow kale | |
| | hot dogs | Brazil |
| <----------------------\ |
| orange | | [green]/ |
+--------+-------- -+-------------+
(Since the array is modified in place, I figure there is no need to update the match index.) The inner loop should run again for each pair of [] I find, but instead I only get one turn per array entry. (It seems that the t=match... bit isn't the problem, but I could be wrong.) How can I fix this?
I have this grammar in EBNF for a sub-language with arithmetic & logical expressions, variables assignment and printing.
start ::= (print | assign)*
print ::= print expr ;
assign ::= ID = expr ;
expr ::= andExpr (|| andExpr)*
andExpr ::= relExpr (&& relExpr)*
relExpr ::= addExpr ( == addExpr | != addExpr | <= addExpr | >= addExpr | < addExpr | > addExpr)?
addExpr ::= mulExpr (+ mulExpr | - mulExpr)*
mulExpr ::= unExpr (* hunExpri | / hunExpr)*
unExpr ::= + unExpr | - unExpr | ! unExpr | primary
primary ::= ( expr ) | ID | NUM | true | false
unfortunately I just can't figure out what these two rules:
unExpr ::= + unExpr
unExpr ::= - unExpr
actually do, or why I should need them, since I seem to be able to derive every phrase of the language without applying them. Any idea?
thanks a lot :-)
If you are planning no expressions like:
a=-1
(where "a" is an ID and "1" is a NUM)
in your language than you don't need those two rules.
Otherwise you have to implement them.
Does anyone know the rules for valid Ruby variable names? Can it be matched using a RegEx?
UPDATE: This is what I could come up with so far:
^[_a-z][a-zA-Z0-9_]+$
Does this seem right?
Identifiers are pretty straightforward. They begin with letters or an underscore, and contain letters, underscore and numbers. Local variables can't (or shouldn't?) begin with an uppercase letter, so you could just use a regex like this.
/^[a-z_][a-zA-Z_0-9]*$/
It's possible for variable names to be unicode letters, in which case most of the existing regexes don't match.
varname = "\u2211" # => "∑"
eval(varname + '= "Tony the Pony"') => "Tony the Pony"
puts varname # => ∑
local_variable_identifier = /Insert large regular expression here/
varname =~ local_variable_identifier # => nil
See also "Fun with Unicode" in either the Ruby 1.9 Pickaxe or at Fun with Unicode.
According to http://rubylearning.com/satishtalim/ruby_names.html a Ruby variable consists of:
A name is an uppercase letter,
lowercase letter, or an underscore
("_"), followed by Name characters
(this is any combination of upper- and
lowercase letters, underscore and
digits).
In addition, global variables begin with a dollar sign, instance variables with a single at-sign, and class variables with two at-signs.
A regular expression to match all that would be:
%r{
(\$|#{1,2})? # optional leading punctuation
[A-Za-z_] # at least one upper case, lower case, or underscore
[A-Za-z0-9_]* # optional characters (including digits)
}x
Hope that helps.
I like #aboutruby's answer, but just to complete it, here's the equivalent using POSIX bracket expressions.
/^[_[:lower:]][_[:alnum:]]*$/
Or, since a-z is actually shorter than [:lower:]:
/^[_a-z][_[:alnum:]]*$/
I think /^(\$){0,1}[_a-zA-Z][a-zA-Z0-9_]*([?!]){0,1}$/ is a bit closer to what you will need...
It depends on whether you want to match method names as well.
If you are trying to match a name that might be encountered in an expression, then it might start with $ and it might end with ? or !. If you know for sure that it is just a local variable then the rule will be much simpler.
i was trying to figure one out for a rails patch, and Matthew Draper wrote this one, using the ruby parser as a reference:
/\A(?![A-Z0-9])(?:[[:alnum:]_]|[^\0-\177])+\z/
And here it is, straight from the horse's mouth. (The horse in this case is the Draft ISO Ruby Specification):
local-variable-identifier → ( lowercase-character | _ ) identifier-character *
identifier-character → lowercase-character | uppercase-character | decimal-digit | _
uppercase-character → A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
lowercase-character → a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
decimal-digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
In Ruby 1.9, using named groups, you can translate this literally:
local_variable_identifier = %r{
(?<uppercase_character> A | B | C | D | E | F | G | H | I | J | K | L | M
| N | O | P | Q | R | S | T | U | V | W | X | Y | Z
){0}
(?<lowercase_character> a | b | c | d | e | f | g | h | i | j | k | l | m
| n | o | p | q | r | s | t | u | v | w | x | y | z
){0}
(?<decimal_digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9){0}
(?<identifier_character> \g<lowercase_character>
| \g<uppercase_character>
| \g<decimal_digit>
| _
){0}
( \g<lowercase_character> | _ ) \g<identifier_character>*
}x
Of course, this is not how you would really write it.