Is there a framework for writing phrase structure rules out there that is opensource? - ruby

I've worked with the Xerox toolchain so far, which is powerful, not opensource, and a bit overkill for my current problem. Are there libraries that allow my to implement a phrase structure grammar? Preferably in ruby or lisp.

AFAIK, there's no open-source Lisp phrase structure parser available.
But since a parser is actually a black box, it's not so hard to make your application work with a parser written in any language, especially as they produce S-expressions as output. For example, with something like pfp you can just pipe your sentences as strings to it, then read and process the resulting trees. Or you can wrap a socket server around it and you'll get a distributed system :)
There's also cl-langutils, that may be helpful in some basic NLP tasks, like tokenization and, maybe, POS tagging. But overall, it's much less mature and feature rich, than the commonly used packages, like Stanford's or OpenNLP.

Related

Writing portable scheme code. Is anything "standard" beyond R5RS itself?

I'm learning scheme and until now have been using guile. I'm really just learning as a way to teach myself a functional programming language, but I'd like to publish an open source project of some sort to reenforce the study— not sure what yet... I'm a web developer, so probably something webby.
It's becoming apparent that publishing scheme code isn't very easy to do, with all these different implementations and no real standards beyond the core of the language itself (R5RS). For example, I'm almost certainly going to need to do basic IO on disk and over a TCP socket, along with string manipulation, such as scanning/regex, which seems not to be covered by R5RS, unless I'm not seeing it in the document. It seems like Scheme is more of a "concept" than a practical language... is this a fair assessment? Perhaps I should look to something like Haskell if I want to learn a functional programming language that lends itself more to use in open source projects?
In reality, how much pain do the differing scheme implementations pose when you want to publish an open source project? I don't really fancy having to maintain 5 different functions for basic things like string manipulation under various mainstream implementations (Chicken, guile, MIT, DrRacket). How many people actually write scheme for cross-implementation compatibility, as opposed to being tightly coupled with the library functions that only exist in their own scheme?
I have read http://www.ccs.neu.edu/home/dorai/scmxlate/scheme-boston/talk.html, which doesn't fill me with confidence ;)
EDIT | Let's re-define "standard" as "common".
I believe that in Scheme, portability is a fool's errand, since Scheme implementations are more different than they are similar, and there is no single implementation that other implementations try to emulate (unlike Python and Ruby, for example).
Thus, portability in Scheme is analogous to using software rendering for writing games "because it's in the common subset between OpenGL and DirectX". In other words, it's a lowest common denominator—it can be done, but you lose access to many features that the implementation offers.
For this reason, while SRFIs generally have a portable reference implementation (where practical), some of them are accompanied by notes that a quality Scheme implementation should tailor the library to use implementation-specific features in order to function optimally.
A prime example is case-lambda (SRFI 16); it can be implemented portably, and the reference implementation demonstrates it, but it's definitely less optimal compared to a built-in case-lambda, since you're having to implement function dispatch in "user" code.
Another example is stream-constant from SRFI 41. The reference implementation uses an O(n) simulation of circular lists for portability, but any decent implementation should adapt that function to use real circular lists so that it's O(1).†
The list goes on. Many useful things in Scheme are not portable—SRFIs help make more features portable, but there's no way that SRFIs can cover everything. If you want to get useful work done efficiently, chances are pretty good you will have to use non-portable features. The best you can do, I think, is to write a façade to encapsulate those features that aren't already covered by SRFIs.
† There is actually now a way to implement stream-constant in an O(1) fashion without using circular lists at all. Portable and fast for the win!
Difficult question.
Most people decide to be pragmatic. If portability between implementations is important, they write the bulk of the program in standard Scheme and isolate non-standard parts in (smallish) libraries. There have been various approaches of how exactly to do this. One recent effort is SnowFort.
http://snow.iro.umontreal.ca/
An older effort is SLIB.
http://people.csail.mit.edu/jaffer/SLIB
If you look - or ask for - libraries for regular expressions and lexer/parsers you'll quickly find some.
Since the philosophy of R5RS is to include only those language features that all implementors agree on, the standard is small - but also very stable.
However for "real world" programming R5RS might not be the best fit.
Therefore R6RS (and R7RS?) include more "real world" libraries.
That said if you only need portability because it seems to be the Right Thing, then reconsider carefully if you really want to put the effort in.
I would simply write my program on the implementation I know the best. Then if necessary port it afterwards. This often turns out to be easier than expected.
I write a blog that uses Scheme as its implementation language. Because I don't want to alienate users of any particular implementation of Scheme, I write in a restricted dialect of Scheme that is based on R5RS plus syntax-case macros plus my Standard Prelude. I don't find that overly restrictive for the kind of algorithmic programs that I write, but your needs may be different. If you look at the various exercises on the blog, you will see that I wrote my own regular-expression matcher, that I've done a fair amount of string manipulation, and that I've snatched files from the internet by shelling out to wget (I use Chez Scheme -- users have to provide their own non-portable shell mechanism if they use anything else); I've even done some limited graphics work by writing ANSI terminal sequences.
I'll disagree just a little bit with Jens. Instead of porting afterwards, I find it easier to build in portability from the beginning. I didn't use to think that way, but my experience over the last three years shows that it works.
It's worth pointing out that modern Scheme implementations are themselves fairly portable; you can often port whole programs to new environments simply by bringing the appropriate Scheme along. That doesn't help library programmers much, though, and that's where R7RS-small, the latest Scheme definition, comes in. It's not widely implemented yet, but it provides a larger common core than R5RS.

Ruby Text Analysis

Is there any Ruby gem or else for text analysis? Word frequency, pattern detection and so forth (preferably with an understanding of french)
the generalization of word frequencies are Language Models, e.g. uni-grams (= single word frequency), bi-grams (= frequency of word pairs), tri-grams (=frequency of world triples), ..., in general: n-grams
You should look for an existing toolkit for Language Models — not a good idea to re-invent the wheel here.
There are a few standard toolkits available, e.g. from the CMU Sphinx team, and also HTK.
These toolkits are typically written in C (for speed!! because you have to process huge corpora) and generate standard output format ARPA n-gram files (those are typically a text format)
Check the following thread, which contains more details and links:
Building openears compatible language model
Once you generated your Language Model with one of these toolkits, you will need either a Ruby Gem which makes the language model accessible in Ruby, or you need to convert the ARPA format into your own format.
adi92's post lists some more Ruby NLP resources.
You can also Google for "ARPA Language Model" for more info
Last not least check Google's online N-gram tool. They built n-grams based on the books they digitized — also available in French and other languages!
The Mendicant Bug: NLP Resources for Ruby
contains lots of useful Ruby NLP links.
I had tried using the Ruby Linguistics stuff a long time ago, and remember having a lot of problems with it... I don't recommend jumping into that.
If most of your text analysis involves stuff like counting ngrams and naive Bayes, I recommend just doing it on your own. Ruby has pretty good basic libraries and awesome support for regexes, so this should not be that tricky, and it will be easier for you to adapt stuff to the idiosyncrasies of the problem you are trying to solve.
Like the Stanford parser gem, its possible to use Java libraries that solve your problem from within Ruby, but this can be tricky, so probably not the best way to solve a problem.
I wrote the gem words_counted for this reason. You can see a demo on rubywordcount.com. It has a lot of the analysis features you mention, and a host more. The API is well documented and can be found in the readme on Github.

Is there any scripting language that's fast, easy to embed, and well-suited for high-level game-programming?

First off, I'm aware that there are many questions related to this, but none of them seemed to help my specific situation. In particular, lua and python don't fit my needs as well as I could hope. It may be that no language with my requirements exists, but before coming to that conclusion it'd be nice to hear a few more opinions. :)
As you may have guessed, I need such a language for a game engine I'm trying to create. The purpose of this game engine is to provide a user with the basic tools for building a game, while still giving her the freedom of creating many different types of games.
For this reason, the scripting language should be able to handle game concepts intuitively. Among other things, it should be easy to define a variety of types, sub-type them with slightly different properties, query and modify objects dynamically, and so on.
Furthermore, it should be possible for the game developer to handle every situation they come across in the scripting language. While basic components like the renderer and networking would be implemented in C++, game-specific mechanisms such as rotating a few hundred objects around a planet will be handled in the scripting language. This means that the scripting language has to be insanely fast, 1/10 C speed is probably the minimum.
Then there's the problem of debugging. Information about the function, stack trace and variable states that the error occurred in should be accessible.
Last but not least, this is a project done by a single person. Even if I wanted to, I simply don't have the resources to spend weeks on just the glue code. Integrating the language with my project shouldn't be much harder than integrating lua.
Examining the two suggested languages, lua and python, lua is fast(luajit) and easy to integrate, but its standard debugging facilities seem to be lacking. What's even worse, lua by default has no type-system at all. Of course you can implement that on your own, but the syntax will always be weird and unintuitive.
Python, on the other hand, is very comfortable to use and has a basic class system. However, it's not that easy to integrate, it's paradigm doesn't really involve type-checking and it's definitely not fast enough for more complex games. I'd again like to point out that everything would be done in python. I'm well aware that python would likely be fast enough for 90% of the code.
There's also Scala, which I haven't seen suggested so far. Scala seems to actually fulfill most of the requirements, but embedding the Java VM with C doesn't seem very easy, and it generally seems like java expects you to build your application around java rather than the other way around. I'm also not sure if Scala's functional paradigm would be good for intuitive game-development.
EDIT: Please note that this question isn't about finding a solution at any cost. If there isn't any language better than lua, I will simply compromise and use that(I actually already have the thing linked into my program). I just want to make sure I'm not missing something that'd be more suitable before doing so, seeing as lua is far from the perfect solution for me.
You might consider mono. I only know of one success story for this approach, but it is a big one: C++ engine with mono scripting is the approach taken in Unity.
Try the Ring programming language
http://ring-lang.net
It's general-purpose multi-paradigm scripting language that can be embedded in C/C++ projects, extended using C/C++ code and/or used as standalone language. The supported programming paradigms are Imperative, Procedural, Object-Oriented, Functional, Meta programming, Declarative programming using nested structures, and Natural programming.
The language is simple, trying to be natural, encourage organization and comes with transparent implementation. It comes with compact syntax and a group of features that enable the programmer to create natural interfaces and declarative domain-specific languages in a fraction of time. It is very small, fast and comes with smart garbage collector that puts the memory under the programmer control. It supports many programming paradigms, comes with useful and practical libraries. The language is designed for productivity and developing high quality solutions that can scale.
The compiler + The Virtual Machine are 15,000 lines of C code
Embedding Ring Interpreter in C/C++ Programs
https://en.wikibooks.org/wiki/Ring/Lessons/Embedding_Ring_Interpreter_in_C/C%2B%2B_Programs
For embeddability, you might look into Tcl, or if you're into Scheme, check out SIOD or Guile. I would suggest Lua or Python in general, of course, but your question precludes them.
Since noone seems to know a combination better than lua/luajit, I think I will leave it at that. Thanks for everyone's input on this. I personally find lua to be very lacking as a high-level language for game-programming, but it's probably the best choice out there. So to whomever finds this question and has the same requirements(fast, easy to use, easy to embed), you'll either have to use lua/luajit or make your own. :)

How to write an interpreter?

I have decided to write a small interpreter as my next project, in Ruby. What knowledge/skills will I need to have to be successful?
I haven't decided on the language to interpret yet, but I am looking for something that is not a toy language, but would be relatively easy to write an interpreter for.
Thanks in advance.
You will have to learn at least:
lexical analysis (grouping characters into tokens)
parsing (grouping tokens together into structure)
abstract syntax trees (representing program structure in a data structure)
data representation (assuming your language will have variables)
an evaluation loop that "runs" your program
An excellent introduction to some of these topics can be found in the introductory text Structure and Interpretation of Computer Programs. The language used in that book is Scheme, which is a robust, well-specified language that is ideally suited for your first interpreter implementation. Highly recommended.
I haven't decided on the language to interpret yet, but I am looking for
something that is not a toy language, but would be relatively easy to write an
interpreter for. Thanks in advance.
Try some dialect of Lisp like Scheme or Clojure. (Now there's an idea: Clojure-in-Ruby, which integrates with Ruby as well as Clojure does with Java.)
With Lisp, there is no need to bother with idiosyncracies of syntax, as Lisp's syntax is much closer to the abstract syntax tree.
This SICP chapter shows how to write a Lisp interpreter in Lisp (a metacircular evaluator). In my opinion this is the best place to start. Then you can move on to Lisp in Small Pieces to learn how to write advanced interpreters and compilers for Lisp. The advantage of implementing a language like Lisp (in Lisp itself!) is that you get the lexical analyzer, parser, AST, data/program representation and REPL for free. You can concentrate on the task of getting your great language working!
There is Tree top project wich can be helpful for you http://treetop.rubyforge.org/
You can checkout Ruby Draft Specification http://ruby-std.netlab.jp/
I had a similar idea a couple of days ago. LISP is by far the easiest to implement because the syntax is so simple, and the data structures that the language manipulates are the same structures that the code is written in. Hence you need only a minimal implementation, and can define the rest in terms of itself.
However, if you are trying to learn about parsing, you may want to do a more complex language with Abstract Syntax Trees, etc.
If you want to check out my (literally two days old) Java implementation of lisp, check out mylisp.googlecode.com. I'm still working on it but it is incredible how short a time it took to get the existing stuff working.
It's not sooo hard. here's a LISP interpreter in ruby and the source is so small you are supposed to copy/paste it. but are you gonna learn LISP now? hehe.
If you're just doing this for fun, make up your own, simple language and just try it. My recommendation would be something like a really simple classic BASIC (no visual basic or object oriented stuff). With line numbers, GOTO, INPUT and PRINT and that's it. You get to do the basics, and you get a better understanding of how things work.
The knowledge you'll need?
Tokenizing (turning that huge chunk of characters into something more efficiently readable, effectively splitting it up into 'words')
Parsing (going over the tokens and building a data structure from it)
Interpreting (looping over the data structure and executing each command)
And for that last one you'll also need a way to keep around variables. Usually you'd just implement a "stack", one huge block of data where you can mark off an area at the end.
It's not implemented in Lisp, but I found Write Yourself A Scheme in 48 Hours to be a very useful document while I was starting out with Haskell (though I didn't get anywhere near finishing it after 48 hours; YMMV). It also gives you a lot of insight into interpreters in general.
I can recommend this book. It discusses patterns for writing parsers and interpreters and more:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0

Dynamic languages - which one should I choose?

Dynamic languages are on the rise and there are plenty of them: e.g. Ruby, Groovy, Jython, Scala (static, but has the look and feel of a dynamic language) etc etc.
My background is in Java SE and EE programming and I want to extend my knowledge into one of these dynamic languages to be better prepared for the future.
But which dynamic language should I focus on learning and why? Which of these will be the preferred language in the near future?
Learning Ruby or Python (and Scala to a lesser extent) means you'll have very transferrable skills - you could use the Java version, the native version or the .NET version (IronRuby/IronPython). Groovy is nice but JVM-specific.
Being "better prepared for the future" is tricky unless you envisage specific scenarios. What kind of thing do you want to work on? Do you have a project which you could usefully implement in a dynamic language? Is it small enough to try on a couple of them, to get a feeling of how they differ?
Scala is not a dynamic language at all. Type inference doesn't mean that its untyped. However, Its a very nice language that has nice mixture of OOPs and functional programming. The only problem is some gotchas that you encounter along the way.
Since you are already an experienced Java programmer, it will fit nicely into your skillset. Now, if you want to go all the way dynamic both Ruby or Python are awesome languages. There is demand for both the languages.
I would personally recommend Clojure. Clojure is an awesome new language that is going in popularity faster than anything I've ever seen. Clojure is a powerful, simple, and fast Lisp implemented on the JVM. It has access to all Java libraries of course, just like Scala. It has a book written about it already, it's matured to version 1.0, and it has three IDE plugins in development, with all three very usable.
I would take a look at Scala. Why ?
it's a JVM language, so you can leverage off your current Java skills
it now has a lot of tooling/IDE support (e.g. Intellij will handle Scala projects)
it has a functional aspect to it. Functional languages seem to be getting a lot of traction at the moment, and I think it's a paradigm worth learning for the future
My (entirely subjective) view is that Scala seems to be getting a lot of the attention that Groovy got a year or two ago. I'm not trying to be contentious here, or suggest that makes it a better language, but it seems to be the new JVM language de jour.
As an aside, a language that has some dynamic attributes is Microsoft's F#. I'm currently looking at this (and ignoring my own advice re. points 1 and 2 above!). It's a functional language with objects, built on .Net, and is picking up a lot of attention at the moment.
In the game industry Lua, if you're an Adobe based designer Lua is also good, if you're an embedded programmer Lua is practically the only light-weight solution, but if you are looking into Web development and General tool scripting Python would be more practical
I found Groovy to be a relatively easy jump from an extensive Java background -- it's sort of a more convenient version of Java. It integrates really nicely with existing Java code as well, if you need to do that sort of thing.
I'd recommend Python. It has a huge community and has a mature implementation (along with several promising not-so-mature-just-yet ones). Perl is as far as I've seen loosing a lot of traction compared to the newer languages, presumably due to its "non-intuitiveness" (no, don't get me started on that).
When you've done a project or two in Python, go on to something else to get some broader perspective. If you've done a few non-trivial things in two different dynamic languages, you won't have any problems assimilating any other language.
JScript is quite usefull, and its certainly a dynamic language...
If you want a language with a good number of modules (for almost anything!), go for Perl. With its CPAN, you will always find what you want without reinventing the wheel.
Well keeping in mind your background, i would recommend a language where the semantics are similar to what you are aware of. Hence a language like Scala, Fan, Groovy would be a good starting point.Once you get a hang of the basic semantics of using a functional language(as well as start loving it), you can move onto a language like Ruby. The turn around time for you in this way gets reduced as well as the fact that you can move towards being a polyglot programmer.
i would vote +1 for Groovy (and Grails). You can type with Java style or Groovy still (you can also mix both and have no worry about that). Also you can use Java libs.
As a general rule, avoid dynamically typed languages. The loss of compile time checking and the self-documenting nature of strong, static typing is well worth the necessity of putting type information into your source code. If the extra typing you need to do when writing your code is too great an effort, then a language with type inference (Scala, Haskell) might be of interest.
Having type information makes code much more readable, and readability should be your #1 criteria in coding. It is expensive for a person to read code, anything that inhibits clear, accurate understanding by the reader is a bad thing. In OO languages it is even worse, because you are always making new types. A reader just getting familiar will flounder because they do not know the types that are being passed around and modified. In Groovy, for example, the following is legal def accountHistoryReport(in, out) Reading that, I have no idea what in and out are. When you are looking at 20 different report methods that look just like that, you can quickly go completely homicidal.
If you really think you have to have non-static typing, then a language like Clojure is a good compromise. Lisp-like languages are built on a small set of key abstractions and massive amount of capability on each of the abstractions. So in Clojure, I will create a map (hash) that has the attributes of my object. It is a bit reductionalist, but I will not have to look through the whole code base for the implementation of some un-named class.
My rule of thumb is that I write scripts in dynamic languages, and systems in compiled, statically typed languages.

Resources