software or algorithms to detect grammar units in texts - algorithm

I'm not sure this is the right fit for stackoverflow but maybe you guys would suggest where to put this question otherwise but here it is anyway. Suppose I have a few sentences of a text like this:
John reads newspapers everyday. Right now he has just finished reading
one. He will read another one and might even read a small book
tomorrow.
This small extract contains the following grammar units:
present simple (reads)
present perfect (has finished)
future simple (will read)
modal verb may
Do you know of any software, algorithm or study that defines rules for identifying these grammar patterns?

Read this also if you are going to use Ruby than you can use TreeTop or find the equivalent parser in other programming language.

NTLK is a natural language parser for python, it works by tagging words. You can look at some examples here. It creates a parse-tree, which are very useful for these types of problems.
I haven't seen it distinguish between simple and perfect, but it could be modified to do so.

Related

Convert Processing sketch to ruby-processing sketch

Having discovered Processing, and in the middle of trying to learn ruby, I naturally installed ruby-processing for Processing 2.2 (not 3 which requires JRubyArt as the alternative to ruby-processing as I understand).
I am hoping to kill two birds with one stone and create Processing sketches whilst learning more ruby.
However it would be really useful to translate example Processing sketches into ruby so I can then play with them. At the moment I am doing it manually. Does anyone know of a script to do this?
Short answer: no.
This kind of translation is not trivial. Generally, you can't really translate the syntax of one language into the syntax of a different language. You don't go line-by-line and simply change the code one line at a time. That makes what you're asking pretty difficult, so you probably won't find many tools that do stuff like that.
Instead, to translate a program from one language to another, you have to think about the semantics, not the syntax. You have to ask yourself "what does this program do?" and then you simply write a program that does the same thing in the target language. It's not a one-to-one syntax mapping.
In fact, if you're trying to learn, that's a pretty great exercise. The examples that come with Processing are pretty small, so it should be relatively straightforward. Here is what I would do if I were you:
Take an existing example Processing sketch.
Write down in English (not pseudocode) what the program does. Be as specific as possible. Break things down into a list of small steps, and break those steps down into even smaller sub-steps whenever possible. You should be able to hand your list to somebody who has never seen the sketch, and they should be able to tell you in their own words exactly what the sketch does.
Take that list and treat that as your programming assignment, and implement it in your target language. If you get stuck on one of the specific steps, post an MCVE and we'll go from there.
So, translating code involves a middle step of translating it into English first, which is why tools like what you're asking for are much more difficult than you might first think.

QA Algorithm for Q Processing

What Algorithm/method do I use for a Question Answering System's Question Processing?
I have been searching possible algorithms for my Question Answering System, the only thing that I think that would be possible to use is Parsing but I have asked about parsing in my last question and with the answers there i think its not possible to be used?(I'm not sure).
My idea of using Parsing is by Cutting the question into pieces word per word and then it will go through a Storage of Words that would determine what Kind of Word(noun,adjective,verb,etc) is being said. My purpose of using Parsing is to remove or rather to determine the Topic of the question.
The other idea of mine is the ChatterBot. A Chatterbot uses a query of words? Correct me if I'm not mistaken and those words are assigned to another Word. It would randomly choose a word from its Query.
Example: User's Statement: Hello > ChatterBot's Possible Replies: Hi,Hello,Hey
I'm not quite sure what is the possible method/algorithm to use in a Question Answering, I have read the Wikipedia post : http://en.wikipedia.org/wiki/Question_answering but I do not quite understand what algorithm to use in Question Processing.
Thank you.
PS: I'm developing in Javascript. Q = Question
You could use a naive bayes classifier in order to look at the questions and determine their subject. You'd need a lot of training data and a fairly narrow domain.
The sophisticated responses to this problem involve a lot of machine inference techniques which are a bit out of my skill level to explain extremely well. My idea is to use a markov network in which each word has an edge to one or two words next to it. A series of tests are applied to each word which indicate likely memberhood of that word to one of its possible meanings (For example, Mark is more likely a name if it's capitalized, but if the next word is 'a' it probably is used in the sense of a verb.) From there the machine can attempt to determine the actual meaning of the sentence, which will rely on the use of, again, unimaginably large amounts of training data.
Coursera's Probabilistic Graphical Models class (Probably their NLP class too) would probably be the best resource if you're interested in becoming skilled in this area. (PGM is the only reason I know anything about this!)
here's a great book, you may need to read to get a lot of stuff related to NLP, and Question answering systems http://www.amazon.com/Speech-Language-Processing-2nd-Edition/dp/0131873210
the book has a full section (V.Applications) that will help you a lot to develop a good system.
but note that the book is discussing theories and algorithms only (no code)
it's not about parsing text only, you'll need to understand the context to provide better answer. actually you need to extract some keywords and ignore everything else.
also you may read in topics Keywords (Bag of words), algorithms like (TF/IDF).

How to implement AIML in Prolog?

AIML files: http://www.alicebot.org/aiml/aaa/
I want to make these AIML files the knowledge base of my Prolog program.
Help me. Thanks in advance.
P.S. Excuse my bad english.
http://pycdep.sourceforge.net contains something AIML-like implemented in prolog.
Maybe it can serve as a starting point.
You might want to consult (rent it from your local library, don't buy the whole book) the following book:
An Introduction to Language Processing with Perl and Prolog
Pierre M. Nugues (Autor)
Text Book
Before delving into chart parsers and the like, the book contains two sections that deal with eliza like template matching. The sections are:
9.2 Word Spotting and Template Matching
9.3 Multiword Detection
It has Prolog code snipets. The code snipets are not optimized for speed or large volumes, but they show the general idea of the algorithms.
The book is good in computer linguistics, but you might want to consult additional sources for Q&A logic etc..
Best Regards
P.S.: Currently working as well on a Prolog port of a Java/Prolog hybrid extraction pipeline
CAT

How to write an interpreter?

I have decided to write a small interpreter as my next project, in Ruby. What knowledge/skills will I need to have to be successful?
I haven't decided on the language to interpret yet, but I am looking for something that is not a toy language, but would be relatively easy to write an interpreter for.
Thanks in advance.
You will have to learn at least:
lexical analysis (grouping characters into tokens)
parsing (grouping tokens together into structure)
abstract syntax trees (representing program structure in a data structure)
data representation (assuming your language will have variables)
an evaluation loop that "runs" your program
An excellent introduction to some of these topics can be found in the introductory text Structure and Interpretation of Computer Programs. The language used in that book is Scheme, which is a robust, well-specified language that is ideally suited for your first interpreter implementation. Highly recommended.
I haven't decided on the language to interpret yet, but I am looking for
something that is not a toy language, but would be relatively easy to write an
interpreter for. Thanks in advance.
Try some dialect of Lisp like Scheme or Clojure. (Now there's an idea: Clojure-in-Ruby, which integrates with Ruby as well as Clojure does with Java.)
With Lisp, there is no need to bother with idiosyncracies of syntax, as Lisp's syntax is much closer to the abstract syntax tree.
This SICP chapter shows how to write a Lisp interpreter in Lisp (a metacircular evaluator). In my opinion this is the best place to start. Then you can move on to Lisp in Small Pieces to learn how to write advanced interpreters and compilers for Lisp. The advantage of implementing a language like Lisp (in Lisp itself!) is that you get the lexical analyzer, parser, AST, data/program representation and REPL for free. You can concentrate on the task of getting your great language working!
There is Tree top project wich can be helpful for you http://treetop.rubyforge.org/
You can checkout Ruby Draft Specification http://ruby-std.netlab.jp/
I had a similar idea a couple of days ago. LISP is by far the easiest to implement because the syntax is so simple, and the data structures that the language manipulates are the same structures that the code is written in. Hence you need only a minimal implementation, and can define the rest in terms of itself.
However, if you are trying to learn about parsing, you may want to do a more complex language with Abstract Syntax Trees, etc.
If you want to check out my (literally two days old) Java implementation of lisp, check out mylisp.googlecode.com. I'm still working on it but it is incredible how short a time it took to get the existing stuff working.
It's not sooo hard. here's a LISP interpreter in ruby and the source is so small you are supposed to copy/paste it. but are you gonna learn LISP now? hehe.
If you're just doing this for fun, make up your own, simple language and just try it. My recommendation would be something like a really simple classic BASIC (no visual basic or object oriented stuff). With line numbers, GOTO, INPUT and PRINT and that's it. You get to do the basics, and you get a better understanding of how things work.
The knowledge you'll need?
Tokenizing (turning that huge chunk of characters into something more efficiently readable, effectively splitting it up into 'words')
Parsing (going over the tokens and building a data structure from it)
Interpreting (looping over the data structure and executing each command)
And for that last one you'll also need a way to keep around variables. Usually you'd just implement a "stack", one huge block of data where you can mark off an area at the end.
It's not implemented in Lisp, but I found Write Yourself A Scheme in 48 Hours to be a very useful document while I was starting out with Haskell (though I didn't get anywhere near finishing it after 48 hours; YMMV). It also gives you a lot of insight into interpreters in general.
I can recommend this book. It discusses patterns for writing parsers and interpreters and more:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0

How to write a linter? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
In my day job I, and others on my team write a lot of hardware models in Verilog-AMS, a language supported primarily by commercial vendors and a few opensource simulator projects.
One thing that would make supporting each others code more helpful would be a LINTER that would check our code for common problems and assist with enforcing a shared code formatting style.
I of course want to be able to add my own rules and, after I prove their utility to myself, promote them to the rest of the team..
I don't mind doing the work that has to be done, but of course also want to leverage the work of other existing projects.
Does having the allowed language syntax in a yacc or bison format give me a leg up?
or should I just suck each language statement into a perl string, and use pattern matching to find the things I don't like?
(most syntax and compilation errors are easily caught by the commercial tools.. but we have some of our own extentions.)
lex/flex and yacc/bison provide easy-to-use, well-understood lexer- and parser-generators, and I'd really recommend doing something like that as opposed to doing it procedurally in e.g. Perl. Regular expressions are powerful stuff for ripping apart strings with relatively-, but not totally-fixed structure. With any real programming language, the size of your state machine gets to be simply unmanageable with anything short of a Real Lexer/Parser (tm). Imagine dealing with all possible interleavings of keywords, identifiers, operators, extraneous parentheses, extraneous semicolons, and comments that are allowed in something like Verilog AMS, with regular expressions and procedural code alone.
There's no denying that there's a substantial learning curve there, but writing a grammar that you can use for flex and bison, and doing something useful on the syntax tree that comes out of bison, will be a much better use of your time than writing a ton of special-case string-processing code that's more naturally dealt with using a syntax-tree in the first place. Also, what you learn writing it this way will truly broaden your skillset in ways that writing a bunch of hacky Perl code just won't, so if you have the means, I highly recommend it ;-)
Also, if you're lazy, check out the Eclipse plugins that do syntax highlighting and basic refactoring for Verilog and VHDL. They're in an incredibly primitive state, last I checked, but they may have some of the code you're looking for, or at least a baseline piece of code to look at to better inform your approach in rolling your own.
I've written a couple verilog parsers and I would suggest PCCTS/ANTLR if your favorite programming language is C/C++/Java. There is a PCCTS/ANTLR Verilog grammar that you can start with. My favorite parser generator is Zebu which is based on Common Lisp.
Of course the big job is to specify all the linting rules. It makes sense to make some kind of language to specify the linting rules as well.
Don't underestimate the amount of work that goes into a linter. Parsing is the easy part because you have tools (bison, flex, ANTLR/PCCTS) to automate much of it.
But once you have a parse, then what? You must build a semantic tree for the design. Depending on how complicated your inputs are, you must elaborate the Verilog-AMS design (i.e. resolving parameters, unrolling generates, etc. If you use those features). And only then can you try to implement rules.
I'd seriously consider other possible solutions before writing a linter, unless the number of users and potential time savings thereby justify the development time.
In trying to find my answer, I found this on ANTLR - might be of use
If you use Java at all (and thus IDEA), the IDE's extensions for custom languages might be of use
yacc/bison definitely gives you a leg up, since good linting would require parsing the program. Regex (true regex, at least) might cover trivial cases, but it is easy to write code that the regexes don't match but are still bad style.
ANTLR looks to be an alternative path to the more common (OK I heard about them before) YACC/BISON approach, which it turns out also commonly use LEX/FLEX as a front end.
a Quick read of the FLEX man page kind of make me think It could be the framework for that regex type of idea..
Ok.. I'll let this stew a little longer, then see how quickly I can build a prototype parser in one or the other.
and a little bit longer

Resources