What a Ruby parser would you suggest to parse Ruby sources? - ruby

A parser I'm looking for should:
be Ruby parsing friendly,
be elegant by rule design,
produce user friendly parsing errors,
user documentation should be available in volume more than a calculator example,
UPD: allowing to omit optional whitespaces writing a grammar.
Fast parsing is not an important feature.
I tried Citrus but the lack of documentation and need to specify every space in rules just turned me away from it.

Treetop
Ragel
Or in case you want to parse Ruby itself:
parse_tree and ruby_parser
Edit:
I just saw your last comment about needing a subset of Ruby for your project, in that case I'd also recommend having a look at tinyrb.

Related

Determining the length of sections of code

Is there a tool that parses a Ruby script (MRI/YARV) and gives statistics of how many lines each module/class/method definition is?
Saikuro will do this. It's also included in metric_fu, which makes it easy to run Saikuro and many other code metrics tools.
(Be careful, the saikuro gem is probably not what you want, instead it's Saikuro with a capital "S".)
What do you mean by MRI/YARV? A script doesn't have an implementation associated with it. The tool may be associated with a particular implementation, though.
There may be such a tool in the code metrics section of Ruby Toolbox.

How to build AST by S-expression in Ruby?

I have no idea how to build S-exp.
I want to do it, because I need to build AST for my langauge.
At the beginning I used RubyParser to parse it to sexp then code gen.
But it must be ruby's subset I think.I cant define the language what I want.
Now I need to implement parser for my language.
So anyone could recommend any ruby tool that building AST for S-expression ?
Thanks!
It is not very clear from your question what exactly do you need, but simple Google search gives some interesting links to check. Maybe after checking these links, if they are not the answer to your question, you can edit question and make it more precise and concrete.
http://thingsaaronmade.com/blog/writing-an-s-expression-parser-in-ruby.html
https://github.com/aarongough/sexpistol
You might try the sxp-ruby gem at http://github.com/bendiken/sxp-ruby. I use it for SPARQL S-Expressions (SSE) and similar methods for managing Abstract Syntax Trees in Ruby.
Maybe you could have a look at this gem named Astrapi.
This is just an experiment :
describe your language elements (concepts) in a "mm" file (abstract syntax)
run astrapi on this file
astrapi generates a parser that is able to fill up your AST, from your input source expressed in s-expression (concrete syntax of your concepts).
I have put a modest documentation here.

Safe to parse user submitted code using Ripper?

I'm using the Ruby 1.9 Ripper library to analyze specific parts of a source code by building it's sexp tree. From what I know, Ripper just uses a lexer / parser to do this.
Is it safe to run Ripper on a user submitted code?
Since it does not actually evaluate any code, yes it is safe.
If you are talking about taking those s-expressions and evaluating them, then most certainly the answer seems to be: Not without cleaning it first. That cleaning process could be especially tricky though.

recognize Ruby code in Treetop grammar

I'm trying to use Treetop to parse an ERB file. I need to be able to handle lines like the following:
<% ruby_code_here %>
<%= other_ruby_code %>
Since Treetop is written in Ruby, and you write Treetop grammars in Ruby, is there already some existing way in Treetop to say "hey, look for Ruby code here, and give me its breakdown" without me having to write out separate rules to handle all parts of the Ruby language? I'm looking for a way, in my .treetop grammar file, to have something like:
rule erb_tag
"<%" ruby_code "%>" {
def content
...
end
}
end
Where ruby_code is handled by some rules that Treetop provides.
Edit: someone else parsed ERB using Ruby-lex, but I got errors trying to reproduce what he did. The rlex program did not produce a full class when it generated the parser class.
Edit: right, so you lot are depressing, but thanks for the info. :) For my Master's project, I'm writing a test case generator that needs to work with ERB as input. Fortunately, for my purposes, I only need to recognize a few things in the ERB code, such as if statements and other conditionals as well as loops. I think I can come up with Treetop grammar to match that, with the caveat that it isn't complete for Ruby.
As far as I know, nobody has yet created a Treetop grammar for Ruby. (In fact, nobody has ever been able to create any grammar for Ruby other than the YACC grammar that ships with MRI and YARV.) I know that the author of Treetop has been working on one for several years, but it's not a trivial undertaking. Getting the ANTLR grammar which is used in XRuby right took about 5 years, and it is still not fully compliant.
Ruby's syntax is insanely, mindbogglingly complex.
No
I don't think so. Specifying the complex and subtle Ruby grammar in treetop would be a major accomplishment, but it should be possible.
The actual ruby grammer is written in yacc. Now, yacc is a legendary tool but treetop generates a more powerful class of parsers, so it should be possible and perhaps someone has done it.
It's not an afternoon project.
May be I'm kidding but if yacc is less complex than ruby then you could realize yacc in treetop which than uses the ruby grammar created for yacc.
For your purposes, you can probably get away without parsing all of Ruby. What you actually need is a way to detect the %> that closes off a Ruby block. If you don't ever want to fail when the Ruby code contains those closing characters, you must detect anywhere those characters can occur inside the Ruby text; which means you need to detect all forms of literals.
However for you purposes you can probably get away with recognising the most likely cases where %> would occur in Ruby text, and ignore just those cases. This assumes of course that any remaining failure can be handled by getting your user to write the ERB a little differently.
For what it's worth, Treetop itself "parses" Ruby blocks this way; it just counts { and } characters until the closing one is found. So if your block contains a } in a literal string, you're broken (but you can work around by including the matching one in a comment).

Ruby Parser

 I want to know whether it is possible to parse ruby language using just
deterministic parser having no backtracking at all ??
Instead of actually having to write a parser, you can always leverage the existing interpreter to do what you want.
For example: ruby2ruby
http://seattlerb.rubyforge.org/ruby2ruby/ ruby2ruby
I don't know any specific details about parsing Ruby, or why you insist on "no backtracking". My guess is that you believe the Ruby grammar isn't LALR(1), e.g., isn't processable by YACC or equivalents.
Regardless, if the problem is to parse a language whose grammar is context-free, one can do this using a GLR parser, which does not backtrack:
http://en.wikipedia.org/wiki/GLR_parser
I've used this to build production parsers for many real languages.

Resources