Can I 'unmatch' a rule programmatically in treetop? - ruby

Is it possibe to skip a rule by validating it using ruby code in treetop?
Say there is something like this:
rule short_words
[a-z]+ {
def method1
text_value
end
...
}
end
And I want the words size to be from 2 to 5 letters. Can I exit rule if I find that the length of text_value is not between 2 and 5?

Treetop's syntax supports {min,max} bounds on matches. (Excerpt from http://treetop.rubyforge.org/syntactic_recognition.html)
Repetition count
A generalised repetition count (minimum, maximum) is also available.
* 'foo' 2.. matches 'foo' two or more times
* 'foo' 3..5 matches 'foo' from three to five times
* 'foo' ..4 matches 'foo' from zero to four times

Related

Treetop parser : how to handle spaces?

Good morning everyone,
I'm currently trying to describe some basic Ruby grammar but I'm now stuck with parse space?
I can handle x = 1 + 1,
but can't parser x=1+1,
how can I parser space?
I have tried add enough space after every terminal.
but it can't parse,give a nil.....
How can I fix it?
Thank you very much, have a nice day.
grammar Test
rule main
s assign
end
rule assign
name:[a-z]+ s '=' s expression s
{
def to_ast
Assign.new(name.text_value.to_sym, expression.to_ast)
end
}
end
rule expression
add
end
rule add
left:brackets s '+' s right:add s
{
def to_ast
Add.new(left.to_ast, right.to_ast)
end
}
/
minus
end
rule minus
left:brackets s '-' s right:minus s
{
def to_ast
Minus.new(left.to_ast, right.to_ast)
end
}
/
brackets
end
rule brackets
'(' s expression ')' s
{
def to_ast
expression.to_ast
end
}
/
term
end
rule term
number / variable
end
rule number
[0-9]+ s
{
def to_ast
Number.new(text_value.to_i)
end
}
end
rule variable
[a-z]+ s
{
def to_ast
Variable.new(text_value.to_sym)
end
}
end
rule newline
s "\n"+ s
end
rule s
[ \t]*
end
end
this code works
problem Solved!!!!
It's not enough to define the space rule, you have to use it anywhere there might be space. Because this occurs often, I usually use a shorter rule name S for mandatory space, and the lowercase version s for optional space.
Then, as a principle, I skip optional space first in my top rule, and again after every terminal that can be followed by space. Terminals here are strings, character sets, etc. So at the start of assign, and before the {} block on variable, boolean, number, and also after your '=', '-' and '+' literals, add a call to the rule s to skip any spaces.
This policy works well for me. It's a good idea to have a test case which has minimum space, and another case that has maximum space (in all possible places).

Matching strings with variations in spaces

I have a log file with a lot of arbitrary spaces:
Number of active files:
20
Missing files:
10
I am trying to determine if a certain string like this:
Expected_string = "Number of active files: 20"
is contained in the log file. Is there an easy way to compare the strings disregarding the variation in spaces?
I am using a method that looks like this:
def isStringInLog?(string)
if open(#log_full_path).grep(/#{string}/).size > 0
return true
end
return false
end
However, this only works if the strings match exactly.
You can use Ruby's gsub method to turn all instances of one or more whitespace characters (including newlines) into a single space:
def string_in_log?(str)
File.read(#log_full_path).gsub(/\s+/, " ").include?(str)
end
gsub uses the regular expression /\s+/ to replace all groups of whitespace with one space.
Also, ruby variables (except Constants) and method names should begin with a lowercase letter and use snake_case, not camelCase.
Maybe break and rejoin the log?
log = <<-EOF
Number of active files:
20
Missing files:
10
EOF
pattern = 'Number of active files: 20'
puts log.split.join(' ').include?(pattern) # true

Why doesn't this loop stop?

I'm trying to take the string "xxxyyyzzz" and split it up into an array that groups the same letters. So I want the output to be ["xxx","yyy","zzz"]. I'm not sure why this code keeps on looping. Any suggestions?
def split_up(str)
i = 1
result = []
array = str.split("")
until array == []
if array[i] == array[i-1]
i += 1
else
result << array.shift(i).join("")
end
i = 1
end
result
end
puts split_up("xxxyyyzzz")
The looping is because your until condition never exits. You are incrementing i when the successive characters match, but at the end of the loop you are resetting i to 1.
If you edit this section and add this line:
until array == []
puts i # new line
Then you'll see that i is always 1, and the code keeps printing 1 forever.
Delete the line i = 1 line and you'll get the result you want.
Also, you may be interested in reading about the Ruby string scan method, and pattern matching and capture groups, and using look-ahead and look-behind zero-length assertions, which can match boundaries.
Here is how I would personally accomplish splitting a string at letter boundaries:
"xxxyyyzzz".scan(/(.)(\1*)/).map{|a,b| a+b }
=> ["xxx", "yyy", "zzz"]
The scan method is doing this:
. matches any character e.g. "x", and the parentheses capture this.
\1* matches the previous capture any number of time, e.g. "xx", and the parentheses capture this.
Thus $1 matches the first character "x" and $2 matches all the repeats "xx".
The scan block concatenates the first character and its repeats, so returns "xxx".
As mentioned above, this can be solved using scan like this:
def split_up(string)
repeat_alphabets = /(\w)(\1*)/
string.scan(repeat_alphabets).map do |match|
match[0] << match[1]
end
end
Explanation:
The regular expression matches repeating characters, but due to the construction of the regex matches occur as pairs of the alphabet and remaining repeated instances.
m[0] << m[1] joins the matches to form the required string.
map combines the string into an array and returns the array as it being the last statement.

ruby stringing with text

I am trying to create a program where the first three characters of a string is repeated a given number of times like this:
foo('Chocolate', 3) # => 'ChoChoCho'
foo('Abc', 3) # => 'AbcAbcAbc'
I know I can use length to count characters, but how do I specify the length of the string to be outputted? Also how can I specify the number of times?
def foo(str, n)
str[0..2] * n
end
You can use something like this.
def print_first_three_x_times(string, x)
#remove everything but the first three chars
string.slice!(3..string.length)
#print x times
x.times{ print string }
end
output:
Hunter#Hunter-PC ~
$ irb
irb(main):008:0> print_first_three_x_times("Hunter",5)
HunHunHunHunHun=> 5
irb(main):009:0>

How to define {min,max} matches in treetop peg

With Ruby's regular expressions I could write /[0-9]{3,}/ I can't figure out how to write this in treetop other than:
rule at_least_three_digit_number
[0-9] [0-9] [0-9]+
end
Is there a 'match [at least|most] n' rule for treetop?
It looks like PEGs don't have some of the RE convenience operators, but in return you do get a much more powerful expression matcher.
http://treetop.rubyforge.org/syntactic_recognition.html
A generalised repetition count (minimum, maximum) is also available.
'foo' 2.. matches 'foo' two or more times
'foo' 3..5 matches 'foo' from three to five times
'foo' ..4 matches 'foo' from zero to four times

Resources