Stanford NLP. The Solution to Draw Graphical Tree - stanford-nlp

I have a tree. I want to draw it into form(graphical). After that, it can extend to add, delete node, edit pos tag of tree on this graphic. can give me some ideas to start with this issue. sorry for bad english.
example tree:
(ROOT
(S (NP (NNP John))
(VP (VBZ loves)
(NP (NNP Mary)))
(. .)))

You can take a look at PennTreeReader in CoreNLP for code to read the string form of a tree into a Tree object. Beyond that, the design of the visualization is up to you. A good place to start might be D3; there's even a CoreNLP Parse Tree demo.

Related

What does NML tag convey?

Basic query:
Stanford parser version 4.0.0 uses NML tag. I think is a useful feature, but I do not fully understand it. So I would appreciate more information about it, e.g. its full form and the motivation for introducing it etc. Why does it treat "Income tax proposal" and "Fish tank water"
differently. Has parser learnt the use of NML tag correctly?
Following is optional, please read it if you think that I am making up a fictious tag!
Following information is just to establish that this a serious enquiry. My previous query about NML tag was rejected because my guess of meaning of NML tag mislead me and some how I gave a wrong example! I am sorry for that.
Please see:
https://nlp.stanford.edu/nlp/javadoc/javanlp/index.html?edu/stanford/nlp/trees/ModCollinsHeadFinder.html
Under the heading Changes:
QUOTE
Added NML head rules, which are the same as for NP.
NP head rule: NP and NML are treated almost identically (NP has precedence)
NAC head rule: NML comes after NN/NNS but after NNP/NNPS
UNQUOTE
I am getting NML tags in several sentences while running the Stanford parser version 4.0.0
Here is just one example:
Parsing [sent. 1 len. 7]: The income tax proposal was rejected .
(ROOT
(S
(NP (DT The)
(NML (NN income) (NN tax))
(NN proposal))
(VP (VBD was)
(VP (VBN rejected)))
(. .)))
The NML label should be for a noun phrase that is modifying another word or phrase. So a good example would be income tax proposal. income tax is an NML since it is serving as an adjective of proposal. It is describing the type of proposal.
Syntactically income tax proposal and marriage proposal share the same high level structure, a noun phrase describing another noun, so the point of NML is to note that the phrase income tax is a complete object and it is modifying the word proposal to generate the final NP of income tax proposal.
If the actual statistical parser is inconsistent, as in the case of fish tank water, that is more likely an error in the model itself, which is just something you have to accept. Statistical parsers make lots of errors all the time.

Algorithm for evaluating expression

I am writing a C++ program that must parse and then evaluate an expression held in a string like this:
((C<3) || (D>5)) and (B)
or something like
((A+4) > (B-2) || C) && ^D
The expression will always evaluate to true of false. I read about shunting yard algorithm, but order of operations isn't that important to me (I can just state left to right evaluation).
I'm thinking about building a tree to hold the components of the formula and then evaluate the tree recursively from bottom left up. Each child of a node would be an AND, each node would be a test. If I reach the topmost node (while current state is true) it must evaluate to true. This is a rough start...looking for advice.
Is there an algorithm design pattern on how to do this? (Seems like this problem has been solved many times before)
I recommend putting the time and effort into learning proper lexing and parsing tools that are designed for this. Flex for lexical analysis (getting individual tokens - variable, operation, paranthesis, etc.) and then Bison for syntax analysis (building the syntax tree from tokens).
Once you have the syntax tree, evaluation is easy from bottom to up, as you said.
I'm not sure how much you know about formal gramars, but you can always find good tutorials online, perhaps start here: How do I use C++ in flex and bison?

Parser output divergence for almost identical sentences-- why?

Why do I get such different parse trees, when I run these two almost identical sentences through TreeAnnotation? The first one returns correctly "SQ (VBZ Does)..)" pattern, bu the second one shows "S (NP (NNP Does)...". Is this an error? Thanks.
Does he have time?
(ROOT (SQ (VBZ Does) (NP (PRP he)) (VP (VB have) (NP (NN time))) (. ?)))
Does John have time?
(ROOT (S (NP (NNP Does) (NNP John)) (VP (VBP have) (NP (NN time))) (. ?)))
Cute -- this looks like a POS tagging bug. In the first case, "Does" is correctly tagged as VBZ; in the second, it is incorrectly tagged as a proper noun (NNP). Likely, this is the sequence model in the POS tagger messing up: Since both "Does" and "John" are capitalized, it prefers tagging them both as proper nouns.

algorithm to get topic / focus of sentence out of words in sentence

Are there any well-know or successful algorithms for obtaining the topic and / or focus of a sentence ( question ) out of the words in the sentence question?
If not, how would I got about getting the topic / focus of the question. It seems that the topic / focus of the questions is usually a noun or a noun-phrase.
So the first thing I would do is determine the nouns by Part Of Speech tagging the question. but then how do I know if I should get just the nouns or the noun(s) and a adjective before it, or the noun and the adverb before it, or the noun(s) and verb?
For example:
In ' did the quick brown fox jump over the lazy dog ', get ' quick brown fox ', ' jump ', and ' lazy dog '.
In ' what is the population of japan ', get ' population ' and ' japan '
In ' what color is milk ' get ' color ' and ' milk '
In ' What is the height of Mt. Everest ' get ' Mt. Everst ' and ' Height '.
While writing these I guess the easiest way is removing stop words.
I think first of all that the problem is language-dependent.
Secondly I think that if you have a set of words, you could run a check on their popularity/frequency in the language; f.e. the word "the" occurs much more often that the word "euphoric" => euphoric has more chance of being a proper keyword.
Here the importance of spelling is however crucial. How to deal with this? One idea is to use distance-algorithms such as Levenshtein to words that do not occur often (or do a google-search with the word and check if you get results or a "did-you-mean"-notification)
Some languages are though more structured that other. In english to find nouns, you can run first a check with "a/an word" and then words that end in "s" to find possible candidates for nouns. Then make a comparison with a dictionary.
With adjectives you can perhaps assume that a possible adjective will be located right before the noun. Then just compare the possible adjective with the dictionary.
Then you could of course keep a black-list of words that are never allowed as keywords.
The best solution would perhaps be to have a self-learning neural system but I'm not so familiar with those to give any suggestions
This could be thought of as a parsing problem and I personally find the stanford nlp tool very effective .
Here is the link to the demo of the stanford parser
For the example , did the quick brown fox jump over the lazy dog
The output you get is
did/VBD
the/DT
quick/JJ
brown/JJ
fox/NN
jump/VB
over/RP
the/DT
lazy/JJ
dog/NN
From the output you can write an extractor to extract the nouns ( adjectives and adverbs if need be) and thus obtain the topics from the sentence .
Moreover , the parse tree looks like
(ROOT
(SINV (VBD did)
(NP (DT the) (JJ quick) (JJ brown) (NN fox))
(VP (VB jump)
(PRT (RP over))
(NP (DT the) (JJ lazy) (NN dog)))))
If you take a closer look at the parse tree , the output you are expecting are both the NP(noun phrases) - the quick brown fox and the lazy dog .
I hope this helps !

How to parse a list of words according to a simplified grammar?

Just to clarify, this isn't homework. I've been asked for help on this and am unable to do it, so it turned into a personal quest to solve it.
Imagine you have a grammar for an English sentence like this:
S => NP VP | VP
NP => N | Det N | Det Adj N
VB => V | V NP
N => i you bus cake bear
V => hug love destroy am
Det => a the
Adj => pink stylish
I've searched for several hours and really am out of ideas.
I found articles talking about shallow parsing, depth-first backtracking and related things, and while I'm familiar with most of them, I still can't apply them to this problem. I tagged Lisp and Haskell because those are the languages I plan to implement this in, but I don't mind if you use other languages in your replies.
I'd appreciate hints, good articles and everything in general.
Here's a working Haskell example. It turns out there's a few tricks to learn before you can make it work! The zeroth thing to do is boilerplate: turn off the dreaded monomorphism restriction, import some libraries, and define some functions that aren't in the libraries (but should be):
{-# LANGUAGE NoMonomorphismRestriction #-}
import Control.Applicative ((<*))
import Control.Monad
import Text.ParserCombinators.Parsec
ensure p x = guard (p x) >> return x
singleToken t = tokenPrim id (\pos _ _ -> incSourceColumn pos 1) (ensure (==t))
anyOf xs = choice (map singleToken xs)
Now that the zeroth thing is done... first, we define a data type for our abstract syntax trees. We can just follow the shape of the grammar here. However, to make it more convenient, I've factored a few of the grammar's rules; in particular, the two rules
NP => N | Det N | Det Adj N
VB => V | V NP
are more conveniently written this way when it comes to actually writing a parser:
NP => N | Det (Adj | empty) N
VB => V (NP | empty)
Any good book on parsing will have a chapter on why this kind of factoring is a good idea. So, the AST type:
data Sentence
= Complex NounPhrase VerbPhrase
| Simple VerbPhrase
data NounPhrase
= Short Noun
| Long Article (Maybe Adjective) Noun
data VerbPhrase
= VerbPhrase Verb (Maybe NounPhrase)
type Noun = String
type Verb = String
type Article = String
type Adjective = String
Then we can make our parser. This one follows the (factored) grammar even more closely! The one wrinkle here is that we always want our parser to consume an entire sentence, so we have to explicitly ask for it to do that by demanding an "eof" -- or end of "file".
s = (liftM2 Complex np vp <|> liftM Simple vp) <* eof
np = liftM Short n <|> liftM3 Long det (optionMaybe adj) n
vp = liftM2 VerbPhrase v (optionMaybe np)
n = anyOf ["i", "you", "bus", "cake", "bear"]
v = anyOf ["hug", "love", "destroy", "am"]
det = anyOf ["a", "the"]
adj = anyOf ["pink", "stylish"]
The last piece is the tokenizer. For this simple application, we'll just tokenize based on whitespace, so the built-in words function works just fine. Let's try it out! Load the whole file in ghci:
*Main> parse s "stdin" (words "i love the pink cake")
Right (Complex (Short "i") (VerbPhrase "love" (Just (Long "the" (Just "pink") "cake"))))
*Main> parse s "stdin" (words "i love pink cake")
Left "stdin" (line 1, column 3):
unexpected "pink"
expecting end of input
Here, Right indicates a successful parse, and Left indicates an error. The "column" number reported in the error is actually the word number where the error occurred, due to the way we're computing source positions in singleToken.
There are several different approaches for syntactic parsing using a Context-free grammar.
If you want to implement this yourself you could start by familiarizing yourself with parsing algorithms: you can have a look here and here, or if you prefer something on paper the chapter on Syntactic Parsing in Jurafsky&Martin might be a good start.
I know that it is not too difficult to implement a simple syntactic parser in the Prolog programming language. Just google for 'prolog shift reduce parser' or 'prolog scan predict parser'. I don't know Haskell or Lisp, but there might be similarities to prolog, so maybe you can get some ideas from there.
If you don't have to implement the complete parser on your own I'd have a look at the Python NLTK which offers tools for CFG-Parsing. There is a chapter about this in the NLTK book.
Okay, there are a number of algorithms that you could use. Below are some popular dynamic programming algorithms:
1) CKY algorithm: The grammar should be in CNF form
2) Earley algorithm
3) Chart parsing.
Please google to find the implemenation of these. Basically, given a sentence, these algorithms allow you to assign a context free tree to it.
You provided example of non propabalistic grammar. So you can use tools ANTLR, JFlex, Scala Parser Combinators, Parsers python library to implement parser by this grammar in very similar code as you provided.
I think the problem for you might be that the way you'd parse a computer language is a lot different than the way you'd parse natural language.
Computer languages are designed to be unambiguous and relatively easily to get the exact meaning of from a computer.
Natural languages evolved to be compact and expressive and usually understood by people. You might manage to make deterministic parsing that compilers use work for some very simple subset of English grammar, but it's nothing like what is used to parse real natural language.

Resources