recognize Ruby code in Treetop grammar - ruby

I'm trying to use Treetop to parse an ERB file. I need to be able to handle lines like the following:
<% ruby_code_here %>
<%= other_ruby_code %>
Since Treetop is written in Ruby, and you write Treetop grammars in Ruby, is there already some existing way in Treetop to say "hey, look for Ruby code here, and give me its breakdown" without me having to write out separate rules to handle all parts of the Ruby language? I'm looking for a way, in my .treetop grammar file, to have something like:
rule erb_tag
"<%" ruby_code "%>" {
def content
...
end
}
end
Where ruby_code is handled by some rules that Treetop provides.
Edit: someone else parsed ERB using Ruby-lex, but I got errors trying to reproduce what he did. The rlex program did not produce a full class when it generated the parser class.
Edit: right, so you lot are depressing, but thanks for the info. :) For my Master's project, I'm writing a test case generator that needs to work with ERB as input. Fortunately, for my purposes, I only need to recognize a few things in the ERB code, such as if statements and other conditionals as well as loops. I think I can come up with Treetop grammar to match that, with the caveat that it isn't complete for Ruby.

As far as I know, nobody has yet created a Treetop grammar for Ruby. (In fact, nobody has ever been able to create any grammar for Ruby other than the YACC grammar that ships with MRI and YARV.) I know that the author of Treetop has been working on one for several years, but it's not a trivial undertaking. Getting the ANTLR grammar which is used in XRuby right took about 5 years, and it is still not fully compliant.
Ruby's syntax is insanely, mindbogglingly complex.

No
I don't think so. Specifying the complex and subtle Ruby grammar in treetop would be a major accomplishment, but it should be possible.
The actual ruby grammer is written in yacc. Now, yacc is a legendary tool but treetop generates a more powerful class of parsers, so it should be possible and perhaps someone has done it.
It's not an afternoon project.

May be I'm kidding but if yacc is less complex than ruby then you could realize yacc in treetop which than uses the ruby grammar created for yacc.

For your purposes, you can probably get away without parsing all of Ruby. What you actually need is a way to detect the %> that closes off a Ruby block. If you don't ever want to fail when the Ruby code contains those closing characters, you must detect anywhere those characters can occur inside the Ruby text; which means you need to detect all forms of literals.
However for you purposes you can probably get away with recognising the most likely cases where %> would occur in Ruby text, and ignore just those cases. This assumes of course that any remaining failure can be handled by getting your user to write the ERB a little differently.
For what it's worth, Treetop itself "parses" Ruby blocks this way; it just counts { and } characters until the closing one is found. So if your block contains a } in a literal string, you're broken (but you can work around by including the matching one in a comment).

Related

What is "translate" keyword do in Ruby

Short question:
What is 'translate' word doing and why it's colored as special in my IDE?
Long question:
I am doing the Odin Project, and code in 04_pig_latin Ruby and RSpec exercise should look like this:
def translate(string)
# some code
end
Per the spec which I need to pass:
describe "#translate" do
it "translates a word beginning with a vowel" do
s = translate("apple")
expect(s).to eq("appleay")
end
end
In my Cloud9 IDE the word translate is colored blue (like require or render), so I assume that I can't use it as a method name and will need to change the given RSpec test to pass it. However, I saw that others doing this task are naming this method translate without any issues.
I haven't found anything about this "keyword" what could make it unique, I don't know what it's really doing, and don't know whether it's uniqueness comes from Ruby or Cloud9.
Link to exercises repo
Each Ruby syntax highlighting library often includes common phrases that are used in things like Rails. For example, belongs_to, while not a special keyword in a Ruby sense, is very common in Rails applications so it's often highlighted.
translate might be a special phrase as well as it's used by a lot of I18N libraries.
The only way to find out for sure is to look at the rules for syntax highlighting your editor uses. Usually there's a list of special method names in there.

Treetop parser error handling mechanism providing useless output

I've been experimenting with Treetop lately to create simple parser for CFG DSL language for one of my clients. I was successful to implement all the features he required, but working with Treetop turned out to be quite a painful experience.
The problem is that I was not able to get any usable error message from Treetop. The only output I am getting is
parser.rb:22:in `parse': Parser error at offset: 0 (Exception)
Error:
#<TranLanParser:0x007f960c852f60>
from parser.rb:28:in `<class:Parser>'
from parser.rb:10:in `<main>'
which always points to the first character in the file. This is really terrible to find any error in the parsed language. How should I incrementally develop my parser if I can't find what's wrong whatsoever?
I tried to change my grammar to contain recursive rules, because I thought that this would help the parser to create AST nodes as soon as possible, but it didn't help.
My question is:
Am I doing something wrong? Is there any good example how to create PEG grammars for Treetop, which provide meaningful error messages on partially derived trees? Or is it a bug/error in Treetop library?
Thanks for any opinion.
Did you try printing parser.failure_reason? This prints the list of terminals that would have allowed advancing beyond the right-most position the parser reached (before it back-tracked).
Did you try a single token or ultra-simple grammar, working up as you go?
Did you try setting parser.consume_all_input = false, to see whether it was parsing correctly but not to the end of the input?
There are a few more "traps for young players" but you haven't given us enough information to go on. Once you "get it", developing in Treetop is a breeze, but it can take a little while to get to that point.

Where is Ruby code, generated via metaprogramming, stored, and is it viewable/printable?

I've just started learning about metaprogramming in Ruby, and found myself wondering if it was possible to view (in someway) the code that had been generated. I want to, as a coding exercise, write a short method that will generate a Ruby file containing either a few method definitions or, ideally, an entire class or module definition.
I was thinking that perhaps just building up a string representation of the file and then merely writing it out might be a way to accomplish that, but that way doesn't really necessitate the use of metaprogramming, and since my goal is a metaprogramming exercise, I would like to figure out a way to incorporate it into that process or else do it another way.
I guess, if I was to take the string-building approach, I would like to start with something like
klass_string = "class GeneratedClass\n\t<BODY>\nend"
and then somehow save the output of something like this
define_method( :initialize ) do
instance_variable_set("#var", "some_value")
end
in a string that could replace '' in klass_string and then written out to a file. I know I could just put the above code snippet directly into the string, and it would workout fine, but I would like to have the output in a more standard format, as if it'd been written by hand and not generated:
class GeneratedClass
def initialize
#var = 'some_value'
end
end
Could someone point me in the right direction?
I agree with your comment that this question isn't really about metaprogramming so much as dynamic code generation / execution and introspection. Those are interesting topics, but not really metaprogramming. In particular your question about outputting ruby code to strings is about introspection, where as your string injection question is about dynamic code (just to try give you the words to google about what you're interested in).
Since your question is general and really around introspection and dynamic code, I'm going to reference you to some canonical and useful projects that can help you learn more..
ParseTree & Ruby Parser and Sourcify
Ruby Parser is a pure ruby implementation of ParseTree, so I'd recommend starting there to learn how to examine and "stringify" Ruby code. Play around with all of those, and in particular learn how they examine code in Ruby to generate their results. You'll learn a ton about how things work under the hood. Eric Hodel among others is real smart about this stuff.. Be warned though, this is really advanced stuff, but if that's where you want to build expertise, hopefully those references will help!

What a Ruby parser would you suggest to parse Ruby sources?

A parser I'm looking for should:
be Ruby parsing friendly,
be elegant by rule design,
produce user friendly parsing errors,
user documentation should be available in volume more than a calculator example,
UPD: allowing to omit optional whitespaces writing a grammar.
Fast parsing is not an important feature.
I tried Citrus but the lack of documentation and need to specify every space in rules just turned me away from it.
Treetop
Ragel
Or in case you want to parse Ruby itself:
parse_tree and ruby_parser
Edit:
I just saw your last comment about needing a subset of Ruby for your project, in that case I'd also recommend having a look at tinyrb.

Ruby Parser

 I want to know whether it is possible to parse ruby language using just
deterministic parser having no backtracking at all ??
Instead of actually having to write a parser, you can always leverage the existing interpreter to do what you want.
For example: ruby2ruby
http://seattlerb.rubyforge.org/ruby2ruby/ ruby2ruby
I don't know any specific details about parsing Ruby, or why you insist on "no backtracking". My guess is that you believe the Ruby grammar isn't LALR(1), e.g., isn't processable by YACC or equivalents.
Regardless, if the problem is to parse a language whose grammar is context-free, one can do this using a GLR parser, which does not backtrack:
http://en.wikipedia.org/wiki/GLR_parser
I've used this to build production parsers for many real languages.

Resources