What grammar based parser-generator tools exist for ruby? - ruby

What open source (preferably gem-based) parser-generator options do I have in Ruby?
I've used (flex&bison)|(lex&yacc) from C in the past, and I'm comfortable with BNF-style specifications.
I've heard of treetop, but it looks a bit alien and verbose compared to yacc...
Purpose: I want to convert my text markup language to a BNF and generate the parsing code.

Have you looked at rex and racc, the gem versions of lex and yacc?

There's also parslet if you want a PEG-based processor

Citrus is an option - similar but not identical to Treetop in its grammar.

Related

A List of Google Prettify Language Codes

I cannot find this anywhere and I swear I used to be able to very simply without much prying. Can anyone help me? Thanks. I would appreciate it. Also, does prettify support Batch?
I thought it would be helpful to have an actual list rather than just a link. I found it in the loader directory that #MikeSamuel linked to from Javascript code prettifier. As the readme states, the prettify.js comments are the authoritative source. However, What is syntax highlighting and how does it work? provided a better formatted list, so I will copy that below. Refer to the the links for the most up-to-date information.
If you are using the Prettify codes to markup Stack Overflow code, you use
<!-- language: lang-or-tag-here -->
your code
Language Codes:
Let Prettify interpret the code and guess.
default
Explicitly do not use any syntax highlighting.
lang-none
Bash and other Shell scripting
lang-bash, lang-bsh, lang-csh, lang-sh
C, C++, et al
lang-c, lang-cc, lang-cpp, lang-cxx, lang-cyc, lang-m
C#
lang-cs
Clojure
lang-clj
CoffeeScript
lang-coffee
CSS
lang-css
Dart
lang-dart
Delphi
lang-pascal
Erlang
lang-erl, lang-erlang
Go
lang-go
Haskell
lang-hs
HTML
lang-html
Java
lang-java
JavaScript
lang-js, lang-javascript
JSON
lang-json
LaTeX and TeX
lang-latex, lang-tex
Lisp and Scheme
lang-cl, lang-el, lang-lisp, lang-lsp, lang-scm, lang-ss, lang-rkt
Lua
lang-lua
OCaml, SML, F#, et al
lang-fs, lang-ml
Pascal
lang-pascal
Perl
lang-pl, lang-perl
PHP
lang-php
Protocol buffers
lang-proto
Python
lang-py, lang-python, lang-cv
R and S
lang-r, lang-s
Regex
lang-regex
Ruby
lang-rb, lang-ruby
Rust
lang-rc, lang-rs, lang-rust
Scala
lang-scala
SQL
lang-sql
VHDL
lang-vhdl, lang-vhd
Visual Basic
lang-vb, lang-vbs
XML
lang-xml
You can find a table in the FAQ, under the header For which languages does it work?:
The comments in prettify.js are authoritative but the lexer should work on a number of languages including C and friends, Java, Python, Bash, SQL, HTML, XML, CSS, Javascript, Makefiles, and Rust. It works passably on Ruby, PHP, VB, and Awk and a decent subset of Perl and Ruby, but, because of commenting conventions, but doesn't work on Smalltalk.
Other languages are supported via extensions: ...
You can find the handlers, with their extensions in the loader directory
For the mapping from extensions to builtin languages, see the registerLangHandler calls in prettify.js

How to build AST by S-expression in Ruby?

I have no idea how to build S-exp.
I want to do it, because I need to build AST for my langauge.
At the beginning I used RubyParser to parse it to sexp then code gen.
But it must be ruby's subset I think.I cant define the language what I want.
Now I need to implement parser for my language.
So anyone could recommend any ruby tool that building AST for S-expression ?
Thanks!
It is not very clear from your question what exactly do you need, but simple Google search gives some interesting links to check. Maybe after checking these links, if they are not the answer to your question, you can edit question and make it more precise and concrete.
http://thingsaaronmade.com/blog/writing-an-s-expression-parser-in-ruby.html
https://github.com/aarongough/sexpistol
You might try the sxp-ruby gem at http://github.com/bendiken/sxp-ruby. I use it for SPARQL S-Expressions (SSE) and similar methods for managing Abstract Syntax Trees in Ruby.
Maybe you could have a look at this gem named Astrapi.
This is just an experiment :
describe your language elements (concepts) in a "mm" file (abstract syntax)
run astrapi on this file
astrapi generates a parser that is able to fill up your AST, from your input source expressed in s-expression (concrete syntax of your concepts).
I have put a modest documentation here.

Parsing XML, how is this actually done? [duplicate]

So, just as a fun project, I decided I'd write my own XML parser. No, not to parse a specific document, and no, not using an XML parser library. I mean writing code to parse out any XML document into a usable data structure. Just because I like the challenge. :-)
With that said, so far it's proved to be... interesting. It's not as easy to parse (especially when you start taking into account special characters, CDATA, empty tags, comments, etc.) as it initially looked.
Are there any well documented XML parsing algorithms or explanations anywhere that anyone knows of? It seems like there are well-documented Queue and Stack and BTree and etc. etc. etc. implementations everywhere, but I'm not sure I've ever seen a simple, well-documented XML parser algorithm...
I repeat: I am not looking for a pre-built parser library! I am looking for information on how to create my own pre-built parser library! Do not tell me "use expat" or "use SAX" or whatever. That's not what I'm asking for.
Antlr offers a tutorial on parsing XML. It breaks the process down into phases: lexing, parsing, tree parsing, etc. Looks pretty interesting.
I don't know if it would be "cheating" in your book, but you could try parsing your XML with a ready-built all-purpose language parser like ANTLR. The result would be a list of tokens (if you just use the lexer) or a parse tree (if you include the parser) and you could then re-build the parse tree almost 1:1 into an XML structure.
Maybe. I haven't thought about the ways in which XML might be different from "normal" ANTLR fodder like programming languages, and whether you would be able to define a suitable grammar.
VTD-XML is probably the simplest parsing technique possible...
http://expat.sourceforge.net/
Expat is an XML parser library written in C. It is a stream-oriented parser in which an application registers handlers for things the parser might find in the XML document (like start tags). An introductory article on using Expat is available on xml.com.

Which is the best counterpart to ANTLR to create parsers in ruby?

I've used antlr and javacc/freecc for a while.
Now I need to write a bunch of parsers using antlr grammars but such parsers need to be written in ruby lang.
I googled but nothing found. Is there any ruby parser generator that takes antlr grammars and create a parser? If there are many, which is the best one in your opinion?
TIA
Paolo
You might get away easy by using JRuby and keeping your ANTLR parsers in java.
If PEGs are enough for your job, treetop and the newer citrus are common tools used by rubyists.
Other parsers I dug while researching for a project are: peggy, Kanocc, Racc.
For my project I chosed treetop (citrus was not born yet).
Why not to use ANTLR Ruby: http://www.antlr.org/wiki/display/ANTLR3/Antlr3RubyTarget (http://split-s.blogspot.com/2005/12/antlr-for-ruby.html)
There is also some beta here: http://rubyforge.org/projects/antlr3/
You could also generate the parser with ANTLR for Java or C and call it from your Ruby program with JRuby or FFI.
This should also give you a performance boost which might be a big advantage if you have a lot of input to parse.

Ruby Parser

 I want to know whether it is possible to parse ruby language using just
deterministic parser having no backtracking at all ??
Instead of actually having to write a parser, you can always leverage the existing interpreter to do what you want.
For example: ruby2ruby
http://seattlerb.rubyforge.org/ruby2ruby/ ruby2ruby
I don't know any specific details about parsing Ruby, or why you insist on "no backtracking". My guess is that you believe the Ruby grammar isn't LALR(1), e.g., isn't processable by YACC or equivalents.
Regardless, if the problem is to parse a language whose grammar is context-free, one can do this using a GLR parser, which does not backtrack:
http://en.wikipedia.org/wiki/GLR_parser
I've used this to build production parsers for many real languages.

Resources