Could anyone teach me how to develop an AST(Abstract Syntax Tree) for XPath?
Totally confused.
Thanks!
Using ANTLR is a fine way to get started doing this.
BUT: build something simple, like an expression parser, first. This will make sure you understand the basics. You'll likely find dozens of examples of this for ANTLR, I suspect even here on stack overflow.
After you get an expression grammar and an AST for it, then consider XPath. You'll discover this is a LOT more complicated, mostly because the definition of XPath was created by one committee, building on the giant XML construction of another committee. So your problem will be a little bit knowing how to build ASTs with parsers, and a LOT with reading all the gook that those two committees wrote to define XPath.
Related
I am working on a research project involving makefiles. Basically I am trying to construct a language that will explain makefiles in an easy to understand fashion.
Before I can start constructing this language I need to gather examples of makefiles that would be good to illustrate the use of my language. As the title suggests I am looking for makefiles that at first glace might appear to be simple, but after diving deeper into them they are actually confusing, and complex.
An ideal example in this case would be one that makes use of composition; ie using many parts of the makefile to compose the whole. Any examples of this would be much appreciated.
Is there any API for WP7 to get an arithmetic expression in a string and eval it?
Searched for it on google, but nothing comes up..
Thanks in advance.
Searching for "C# parse arithmetic from string" I found a good bunch of resources, with this one the most appealing to me. I haven't tried, but I suppose it will compile on WP7.
Anyways, there's always the possibility of building your own parser. You can approach this using expression trees: read about this in this article (scroll down to the section Expression Trees) (in Java, but explains the concept pretty well and the language is similar enough to C# to be comprehensible).
You can also use the Shunting Yard algorithm, which uses a stack to evaluate the expression. These are only two approachs I remember now, I'm sure there are a lot more. It's harder than using third party libraries, but you can adapt them to your own requeriments.
I have decided to write a small interpreter as my next project, in Ruby. What knowledge/skills will I need to have to be successful?
I haven't decided on the language to interpret yet, but I am looking for something that is not a toy language, but would be relatively easy to write an interpreter for.
Thanks in advance.
You will have to learn at least:
lexical analysis (grouping characters into tokens)
parsing (grouping tokens together into structure)
abstract syntax trees (representing program structure in a data structure)
data representation (assuming your language will have variables)
an evaluation loop that "runs" your program
An excellent introduction to some of these topics can be found in the introductory text Structure and Interpretation of Computer Programs. The language used in that book is Scheme, which is a robust, well-specified language that is ideally suited for your first interpreter implementation. Highly recommended.
I haven't decided on the language to interpret yet, but I am looking for
something that is not a toy language, but would be relatively easy to write an
interpreter for. Thanks in advance.
Try some dialect of Lisp like Scheme or Clojure. (Now there's an idea: Clojure-in-Ruby, which integrates with Ruby as well as Clojure does with Java.)
With Lisp, there is no need to bother with idiosyncracies of syntax, as Lisp's syntax is much closer to the abstract syntax tree.
This SICP chapter shows how to write a Lisp interpreter in Lisp (a metacircular evaluator). In my opinion this is the best place to start. Then you can move on to Lisp in Small Pieces to learn how to write advanced interpreters and compilers for Lisp. The advantage of implementing a language like Lisp (in Lisp itself!) is that you get the lexical analyzer, parser, AST, data/program representation and REPL for free. You can concentrate on the task of getting your great language working!
There is Tree top project wich can be helpful for you http://treetop.rubyforge.org/
You can checkout Ruby Draft Specification http://ruby-std.netlab.jp/
I had a similar idea a couple of days ago. LISP is by far the easiest to implement because the syntax is so simple, and the data structures that the language manipulates are the same structures that the code is written in. Hence you need only a minimal implementation, and can define the rest in terms of itself.
However, if you are trying to learn about parsing, you may want to do a more complex language with Abstract Syntax Trees, etc.
If you want to check out my (literally two days old) Java implementation of lisp, check out mylisp.googlecode.com. I'm still working on it but it is incredible how short a time it took to get the existing stuff working.
It's not sooo hard. here's a LISP interpreter in ruby and the source is so small you are supposed to copy/paste it. but are you gonna learn LISP now? hehe.
If you're just doing this for fun, make up your own, simple language and just try it. My recommendation would be something like a really simple classic BASIC (no visual basic or object oriented stuff). With line numbers, GOTO, INPUT and PRINT and that's it. You get to do the basics, and you get a better understanding of how things work.
The knowledge you'll need?
Tokenizing (turning that huge chunk of characters into something more efficiently readable, effectively splitting it up into 'words')
Parsing (going over the tokens and building a data structure from it)
Interpreting (looping over the data structure and executing each command)
And for that last one you'll also need a way to keep around variables. Usually you'd just implement a "stack", one huge block of data where you can mark off an area at the end.
It's not implemented in Lisp, but I found Write Yourself A Scheme in 48 Hours to be a very useful document while I was starting out with Haskell (though I didn't get anywhere near finishing it after 48 hours; YMMV). It also gives you a lot of insight into interpreters in general.
I can recommend this book. It discusses patterns for writing parsers and interpreters and more:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0
If I wanted to learn about pattern recognition in general what would be a good place to start (recommend a book)?
Also, does anybody have any experience/knowledge on how to go about applying these algorithms to find abstraction patterns in programs? (repeated code, chunks of code that do the same thing, but in slightly different ways, etc.)
Thanks
Edit: I don't mind mathematically intensive books. In fact, that would be a good thing.
If you are reasonably mathematically confident then either of Chris Bishop's books "Pattern Recognition and Machine Learning" or "Neural Networks for Pattern Recognition" are very good for learning about pattern recognition.
It helps if you have access to the parse tree generated during compilation. This way you can look for pieces of the tree which are similar, ignoring the nodes which are deeper than what you are looking at, this way you can pick out e.g. nodes which multiply together two sub-expressions, ignoring the contents of the sub-expressions. You can apply the same logic to a collection of nodes, e.g. you want to find a multiplication of two sub-expressions where those two sub-expressions are additions of more sub-expressions. You first look for multiplies, then check if the two nodes underneath the multiply are additions, ignoring anything any deeper.
I'd suggest looking at the code of some open source project (e.g. FindBugs or SIM)
that does the kind of thing you're talking about.
If you're working in one of the supported languages, IntelliJ idea has a really smart structural search and replace that would fit your problem.
Other interesting projects are PMD and Eclipse.
Eclipse uses AST (abstract syntax trees) for all source code in any project. Tools can then register for certain types of ASTs (like Java source) and get a preprocessed view where they can add additional information (like links to documentation, error markers, etc).
Another project you can look into is Duplo - it's an open-source/GPL project, so you can pore over their approach by grabbing the code from SourceForge.
This is specific to .Net and visual studio, but it finds duplicate code in your project. It does report some false positives I've found but it could be a good place to start.
Clone Detective
One kind of pattern is code that has been cloned by copy and paste methods. See CloneDR for a tool that automatically finds such code in spite of variations in layout and even changes in the body of the clone, by comparing abstract syntax trees for the language in question.
CloneDR works with a variety of langauges: C, C++, C#, Java, JavaScript, PHP, COBOL, Python, ... The website shows clone detection reports for a variety of programming languages.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
In my day job I, and others on my team write a lot of hardware models in Verilog-AMS, a language supported primarily by commercial vendors and a few opensource simulator projects.
One thing that would make supporting each others code more helpful would be a LINTER that would check our code for common problems and assist with enforcing a shared code formatting style.
I of course want to be able to add my own rules and, after I prove their utility to myself, promote them to the rest of the team..
I don't mind doing the work that has to be done, but of course also want to leverage the work of other existing projects.
Does having the allowed language syntax in a yacc or bison format give me a leg up?
or should I just suck each language statement into a perl string, and use pattern matching to find the things I don't like?
(most syntax and compilation errors are easily caught by the commercial tools.. but we have some of our own extentions.)
lex/flex and yacc/bison provide easy-to-use, well-understood lexer- and parser-generators, and I'd really recommend doing something like that as opposed to doing it procedurally in e.g. Perl. Regular expressions are powerful stuff for ripping apart strings with relatively-, but not totally-fixed structure. With any real programming language, the size of your state machine gets to be simply unmanageable with anything short of a Real Lexer/Parser (tm). Imagine dealing with all possible interleavings of keywords, identifiers, operators, extraneous parentheses, extraneous semicolons, and comments that are allowed in something like Verilog AMS, with regular expressions and procedural code alone.
There's no denying that there's a substantial learning curve there, but writing a grammar that you can use for flex and bison, and doing something useful on the syntax tree that comes out of bison, will be a much better use of your time than writing a ton of special-case string-processing code that's more naturally dealt with using a syntax-tree in the first place. Also, what you learn writing it this way will truly broaden your skillset in ways that writing a bunch of hacky Perl code just won't, so if you have the means, I highly recommend it ;-)
Also, if you're lazy, check out the Eclipse plugins that do syntax highlighting and basic refactoring for Verilog and VHDL. They're in an incredibly primitive state, last I checked, but they may have some of the code you're looking for, or at least a baseline piece of code to look at to better inform your approach in rolling your own.
I've written a couple verilog parsers and I would suggest PCCTS/ANTLR if your favorite programming language is C/C++/Java. There is a PCCTS/ANTLR Verilog grammar that you can start with. My favorite parser generator is Zebu which is based on Common Lisp.
Of course the big job is to specify all the linting rules. It makes sense to make some kind of language to specify the linting rules as well.
Don't underestimate the amount of work that goes into a linter. Parsing is the easy part because you have tools (bison, flex, ANTLR/PCCTS) to automate much of it.
But once you have a parse, then what? You must build a semantic tree for the design. Depending on how complicated your inputs are, you must elaborate the Verilog-AMS design (i.e. resolving parameters, unrolling generates, etc. If you use those features). And only then can you try to implement rules.
I'd seriously consider other possible solutions before writing a linter, unless the number of users and potential time savings thereby justify the development time.
In trying to find my answer, I found this on ANTLR - might be of use
If you use Java at all (and thus IDEA), the IDE's extensions for custom languages might be of use
yacc/bison definitely gives you a leg up, since good linting would require parsing the program. Regex (true regex, at least) might cover trivial cases, but it is easy to write code that the regexes don't match but are still bad style.
ANTLR looks to be an alternative path to the more common (OK I heard about them before) YACC/BISON approach, which it turns out also commonly use LEX/FLEX as a front end.
a Quick read of the FLEX man page kind of make me think It could be the framework for that regex type of idea..
Ok.. I'll let this stew a little longer, then see how quickly I can build a prototype parser in one or the other.
and a little bit longer