I am working on a new programming language rip, and I'm having trouble getting to the bottom of some infinite loops. Is there a way to print out each rule as it gets called, such that I can see the rules that are recursing? I've tried walking through the code in my head, and I just don't see it. Any help would be much appreciated.
To flesh out Raving Genius’s answer:
The method to patch is actually Parslet::Atoms::Context#lookup. View it on GitHub (permalink to current version). In your own code, you can patch that method to print obj like this:
class Parslet::Atoms::Context
def lookup(obj, pos)
p obj
#cache[pos][obj.object_id]
end
end
Run that code any time before you call parse on your parser, and it will take effect. Sample output:
>> parser = ConsistentNewlineTextParser.new
=> LINES
>> parser.parse("abc")
LINES
(line_content:LINE_CONTENT NEWLINE){0, } line_content:LINE_CONTENT
(line_content:LINE_CONTENT NEWLINE){0, }
line_content:LINE_CONTENT NEWLINE
LINE_CONTENT
WORD
\\w{0, }
\\w
\\w
\\w
\\w
NEWLINE
dynamic { ... }
FIRST_NEWLINE
'? '
'
'?
'
'
'
LINE_CONTENT
=> {:line_content=>"abc"#0}
I figured it out: editing Parslet::Atom::Context#lookup to output the obj parameter will show each rule as it is being called.
My branch of Parslet automatically detects endless loops, and exits out reporting expression that is repeating without consuming anything.
https://github.com/nigelthorne/parslet
see Parse markdown indented code block for an example.
Related
I want to insert data into a string via interpolation. I want to check if #call.transferred_from is nil, and if so, output #call.transfer_from_other; else output #call.transferred_from.try(:facility_name) along with #call.transferred_from.try(:facility_address).
Here is my code example:
"#{if #call.transferred_from.nil? #call.transfer_from_other else #call.transferred_from.try(:facility_name) #call.transferred_from.try(:facility_address) end}"
Doing this gives me the following error:
syntax error, unexpected keyword_else, expecting keyword_then or ';' or '\n'
I'm not sure where to go. Any help would be appreciated.
Update: 08/04/14
I moved the conditional into a private controller method as follows:
def transfer_from_address
if #call.transferred_from.nil?
#call.transfer_from_other
else
#call.transferred_from.try(:facility_name) + ' ' + #call.transferred_from.try(:facility_address)
end
end
Then I call the following using string interpolation.
#{transfer_from_address}
This seems to work, but I'm not sure that it's proper Ruby.
I know this is not really answering your question, but I'd caution about putting this much logic in an interpolation. While its totally doable, it makes your code very hard to understand.
The fundamental issue I see with your particular issue is you're trying to return 2 things somehow, yet you're just putting both of them next to eachother which is not valid ruby.
Assuming this is in an interpolation you'd want to somehow return them together ..
#{
#call.transferred_from.nil? ?
#call.transfer_from_other :
#call.transferred_from.try(:facility_name) + ' ' + #call.transferred_from.try(:facility_address)
}
I'd really suggest you move this into a variable or a method tho .. and just reference it in the interpolation.
This could look something like:
facility_name_and_address = #call.transferred_from.nil? ? #call.transfer_from_other : #call.transferred_from.try(:facility_name) + ' ' + #call.transferred_from.try(:facility_address)
{
:body => facility_name_and_address
}
If I understand what you are trying to do, I would suggest adding a method to #call which does the job:
class Call
def transfer_text
return transfer_from_other if transferred_from.nil?
"#{transferred_from.try(:facility_name)} #{transferred_from.try(:facility_address)}"
end
end
Then simply calling #call.transfer_text should provide the needed text.
If you want to be more sophisticated, and you don't want trailing white-space in case facility_name or facility_address are nil, you can create a list of them, and join them with white space:
[transferred_from.try(:facility_name), transferred_from.try(:facility_address)].compact.join(' ')
This will make sure spaces will be only between to non-nil elements. If both are nil, and empty string will be the result (rather than a space), and if one is nil, it won't have a leading/trailing space.
why not using
:body => "#{#call.transferred_from.nil? ? #call.transfer_from_other : #call.transferred_from.try(:facility_name) #call.transferred_from.try(:facility_address)"
but anyway I would not use this compact syntax for better maintainability
You just need to put either a semicolon or then right after the condition.
if #call.transferred_from.nil?; #call.transfer_from_other ...
But in your case, there is not much point in putting the entire condition inside a string interpolation. It is better to do the condition outside the string.
By the way, if you fix your first error, then you might encounter the next error:
#call.transferred_from.try(:facility_name) #call.transferred_from.try(:facility_address)
To fix that as well, I think you should do
#call.transferred_from.instance_eval{|e| e.nil? ?
#call.transfer_from_other.to_s :
"#{e.try(:facility_name)} #{e.try(:facility_address)}"
}
I'm just starting with ruby and parslet, so this might be obvious to others (hopefully).
I'm wanting to get all the words up until a delimiter (^) without consuming it
The following rule works (but consumes the delimeter) with a result of {:wrd=>"otherthings"#0, :delim=>"^"#11}
require 'parslet'
class Mini < Parslet::Parser
rule(:word) { match('[a-zA-Z]').repeat}
rule(:delimeter) { str('^') }
rule(:othercontent) { word.as(:wrd) >> delimeter.as(:delim) }
root(:othercontent)
end
puts Mini.new.parse("otherthings^")
I was trying to use the 'present?',
require 'parslet'
class Mini < Parslet::Parser
rule(:word) { match('[a-zA-Z]').repeat}
rule(:delimeter) { str('^') }
rule(:othercontent) { word.as(:wrd) >> delimeter.present? }
root(:othercontent)
end
puts Mini.new.parse("otherthings^")
but this throws an exception:
Failed to match sequence (wrd:WORD &DELIMETER) at line 1 char 12. (Parslet::ParseFailed)
At a later stage I'll want to inspect the word to the right of the delimeter to build up a more complex grammar which is why I don't want to consume the delimeter.
I'm using parslet 1.5.0.
Thanks for your help!
TL;DR;
If you care what is before the "^" you should parse that first.
--- longer answer ---
A parser will always consume all the text. If it can't consume everything, then the document is not fully described by the grammar. Rather than thinking of it as something performing "splits" on your text... instead think of it as a clever state machine consuming a stream of text.
So... as your full grammar needs to consume all the document... when developing your parser, you can't make it to parse some part and leave the rest. You want it to transform your document into a tree so you can manipulate it into it's final from.
If you really wanted to just consume all text before a delimiter, then you could do something like this...
Say I was going to parse a '^' separated list of things.
I could have the following rules
rule(:thing) { (str("^").absent? >> any).repeat(1) } # anything that's not a ^
rule(:list) { thing >> ( str("^") >> thing).repeat(0) } #^ separated list of things
This would work as follows
parse("thing1^thing2") #=> "thing1^thing2"
parse("thing1") #=> "thing1"
parse("thing1^") #=> ERROR ... nothing after the ^ there should be a 'thing'
This would mean list would match a string that doesn't end or start with an '^'. To be useful however I need to pull out the bits that are the values with the "as" keyword
rule(:thing) { (str("^").absent? >> any).repeat(1).as(:thing) }
rule(:list) { thing >> ( str("^") >> thing).repeat(0) }
Now when list matches a string I get an array of hashes of "things".
parse("thing1^thing2") #=> [ {:thing=>"thing1"#0} , {:thing=>"thing2"#7} ]
In reality however you probably care what a 'thing' is... not just anything will go there.
In that case.. you should start by defining those rules... because you don't want to use the parser to split by "^" then re-parse the strings to work out what they are made of.
For example:
parse("6 + 4 ^ 2")
# => [ {:thing=>"6 + 4 "#0}, {:thing=>" 2"#7} ]
And I probably want to ignore the white_space around the "thing"s and I probably want to deal with the 6 the + and the 4 all separately. When I do that I am going to have to throw away my "all things that aren't '^'" rule.
I am currently writting a Ruby parser using Ruby, and more precisely Parslet, since I think it is far more easier to use than Treetop or Citrus. I create my rules using the official specifications, but there are some statements I can not write, since they "exclude" some syntax, and I do not know how to do that... Well, here is an example for you to understand...
Here is a basic rule :
foo::=
any-character+ BUT NOT (foo* escape_character barbar*)
# Knowing that (foo* escape_character barbar*) is included in any-character
How could I translate that using Parslet ? Maybe the absent?/present? stuff ?
Thank you very much, hope someone has an idea....
Have a nice day!
EDIT:
I tried what you said, so here's my translation into Ruby language using parslet:
rule(:line_comment){(source_character.repeat >> line_terminator >> source_character.repeat).absent? >> source_character.repeat(1)}
However, it does not seem to work (the sequence in parens). I did some tests, and came to the conclusion that what's written in my parens is wrong.
Here is a very easier example, let's consider these rules:
# Parslet rules
rule(:source_character) {any}
rule(:line_terminator){ str("\n") >> str("\r").maybe }
rule(:not){source_character.repeat >> line_terminator }
# Which looks like what I try to "detect" up there
I these these rules with this code:
# Code to test :
code = "test
"
But I get that:
Failed to match sequence (SOURCE_CHARACTER{0, } LINE_TERMINATOR) at
line 2 char 1. - Failed to match sequence (SOURCE_CHARACTER{0, }
LINE_TERMINATOR) at line 2 char 1.- Failed to match sequence (' '
' '?) at line 2 char 1.
`- Premature end of input at line 2 char 1. nil
If this sequence doesn't work, my 'complete' rule up there won't ever work... If anyone has an idea, it would be great.
Thank you !
You can do something like this:
rule(:word) { match['^")(\\s'].repeat(1) } # normal word
rule(:op) { str('AND') | str('OR') | str('NOT') }
rule(:keyword) { str('all:') | str('any:') }
rule(:searchterm) { keyword.absent? >> op.absent? >> word }
In this case, the absent? does a lookahead to make sure the next token is not a keyword; if not, then it checks to make sure it's not an operator; if not, finally see if it's a valid word.
An equivalent rule would be:
rule(:searchterm) { (keyword | op).absent? >> word }
Parslet matching is greedy by nature. This means that when you repeat something like
foo.repeat
parslet will match foo until it fails. If foo is
rule(:foo) { any }
you will be on the path to fail, since any.repeat always matches the entire rest of the document!
What you're looking for is something like the string matcher in examples/string_parser.rb (parslet source tree):
rule :string do
str('"') >>
(
(str('\\') >> any) |
(str('"').absent? >> any)
).repeat.as(:string) >>
str('"')
end
What this says is: 'match ", then match either a backslash followed by any character at all, or match any other character, as long as it is not the terminating ".'
So .absent? is really a way to exclude things from a match that follows:
str('foo').absent? >> (str('foo') | str('bar'))
will only match 'bar'. If you understand that, I assume you will be able to resolve your difficulties. Although those will not be the last on your way to a Ruby parser...
I am trying to write a method that is the same as mysqli_real_escape_string in PHP. It takes a string and escapes any 'dangerous' characters. I have looked for a method that will do this for me but I cannot find one. So I am trying to write one on my own.
This is what I have so far (I tested the pattern at Rubular.com and it worked):
# Finds the following characters and escapes them by preceding them with a backslash. Characters: ' " . * / \ -
def escape_characters_in_string(string)
pattern = %r{ (\'|\"|\.|\*|\/|\-|\\) }
string.gsub(pattern, '\\\0') # <-- Trying to take the currently found match and add a \ before it I have no idea how to do that).
end
And I am using start_string as the string I want to change, and correct_string as what I want start_string to turn into:
start_string = %("My" 'name' *is* -john- .doe. /ok?/ C:\\Drive)
correct_string = %(\"My\" \'name\' \*is\* \-john\- \.doe\. \/ok?\/ C:\\\\Drive)
Can somebody try and help me determine why I am not getting my desired output (correct_string) or tell me where I can find a method that does this, or even better tell me both? Thanks a lot!
Your pattern isn't defined correctly in your example. This is as close as I can get to your desired output.
Output
"\\\"My\\\" \\'name\\' \\*is\\* \\-john\\- \\.doe\\. \\/ok?\\/ C:\\\\Drive"
It's going to take some tweaking on your part to get it 100% but at least you can see your pattern in action now.
def self.escape_characters_in_string(string)
pattern = /(\'|\"|\.|\*|\/|\-|\\)/
string.gsub(pattern){|match|"\\" + match} # <-- Trying to take the currently found match and add a \ before it I have no idea how to do that).
end
I have changed above function like this:
def self.escape_characters_in_string(string)
pattern = /(\'|\"|\.|\*|\/|\-|\\|\)|\$|\+|\(|\^|\?|\!|\~|\`)/
string.gsub(pattern){|match|"\\" + match}
end
This is working great for regex
This should get you started:
print %("'*-.).gsub(/["'*.-]/){ |s| '\\' + s }
\"\'\*\-\.
Take a look at the ActiveRecord sanitization methods: http://api.rubyonrails.org/classes/ActiveRecord/Base.html#method-c-sanitize_sql_array
Take a look at escape_string / quote method in Mysql class here
Ruby 1.9.1, OSX 10.5.8
I'm trying to write a simple app that parses through of bunch of java based html template files to replace a period (.) with an underscore if it's contained within a specific tag. I use ruby all the time for these types of utility apps, and thought it would be no problem to whip up something using ruby's regex support. So, I create a Regexp.new... object, open a file, read it in line by line, then match each line against the pattern, if I get a match, I create a new string using replaceString = currentMatch.gsub(/./, '_'), then create another replacement as whole string by newReplaceRegex = Regexp.escape(currentMatch) and finally replace back into the current line with line.gsub(newReplaceRegex, replaceString) Code below, of course, but first...
The problem I'm having is that when accessing the indexes within the returned MatchData object, I'm getting the first result twice, and it's missing the second sub string it should otherwise be finding. More strange, is that when testing this same pattern and same test text using rubular.com, it works as expected. See results here
My pattern:
(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+.)+(?:[a-zA-Z0-9]+)(?:>))
Text text:
<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText
Here's the relevant code:
tagRegex = Regexp.new('(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>))+')
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each{|htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
tagMatch = tagRegex.match(htmlLine)
if(tagMatch)
matchesArray = tagMatch.to_a
firstMatch = matchesArray[0]
secondMatch = matchesArray[1]
puts "First match: #{firstMatch} and second match #{secondMatch}"
tagMatch.captures.each {|lineMatchCapture|
puts "Current capture for tagMatches: #{lineMatchCapture} of total match count #{matchesArray.size}"
#create a new regex using the match results; make sure to use auto escape method
originalPatternString = Regexp.escape(lineMatchCapture)
replacementRegex = Regexp.new(originalPatternString)
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = lineMatchCapture.gsub(/\./, '_')
#replace original match with underscore replaced copy within line
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
}
end
}
I would think that I should get the first tag in matchData[0] then the second tag in matchData1, or, what I'm really doing because I don't know how many matches I'll get within any given line is matchData.to_a.each. And in this case, matchData has two captures, but they're both the first tag match
which is: <WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>
So, what the heck am I doing wrong, why does rubular test give me the expected results?
You want to use the on String#scan instead of the Regexp#match:
tag_regex = /<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>)/
lines = "<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText\
<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText"
lines.scan(tag_regex)
# => ["<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>", "<WEBOBJECT NAME=admin.SecondLineMatch>"]
A few recommendations for next ruby questions:
newlines and spaces are your friends, you don't loose points for using more lines on your code ;-)
use do-end on blocks instead of {}, improves readability a lot
declare variables in snake case (hello_world) instead of camel case (helloWorld)
Hope this helps
I ended up using the String.scan approach, the only tricky point there was figuring out that this returns an array of arrays, not a MatchData object, so there was some initial confusion on my part, mostly due to my ruby green-ness, but it's working as expected now. Also, I trimmed the regex per Trevoke's suggestion. But snake case? Never...;-) Anyway, here goes:
tagRegex = /(<(?:webobject) (?:name)=(?:\w+\.)+(?:\w+)(?:>))/i
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each do |htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
oldMatches = htmlLine.scan(tagRegex) #oldMatches thusly named due to not explicitly using Regexp or MatchData, as in "the old way..."
if(oldMatches.size > 0)
oldMatches.each_index do |index|
arrayMatch = oldMatches[index]
aMatch = arrayMatch[0]
#create a new regex using the match results; make sure to use auto escape method
replacementRegex = Regexp.new(Regexp.escape(aMatch))
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = aMatch.gsub(/\./, '_')
#replace original match with underscore replaced copy within line, matching against the new escaped literal regex
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
end # I kind of still prefer the brackets...;-)
end
end
Now, why does MatchData work the way it does? It seems like it's behavior is a bug really, and certainly not very useful in general if you can't get it provide a simple means of accessing all the matches. Just my $.02
Small bits:
This regexp helps you get "normalMode" .. But not "secondLineMatch":
<webobject name=\w+\.((?:\w+)).+> (with option 'i', for "case insensitive")
This regexp helps you get "secondLineMatch" ... But not "normalMode":
<webobject name=\w+\.((?:\w+))> (with option 'i', for "case insensitive").
I'm not really good at regexpt but I'll keep toiling at it.. :)
And I don't know if this helps you at all, but here's a way to get both:
<webobject name=admin.(\w+) (with option 'i').