Parser output divergence for almost identical sentences-- why? - stanford-nlp

Why do I get such different parse trees, when I run these two almost identical sentences through TreeAnnotation? The first one returns correctly "SQ (VBZ Does)..)" pattern, bu the second one shows "S (NP (NNP Does)...". Is this an error? Thanks.
Does he have time?
(ROOT (SQ (VBZ Does) (NP (PRP he)) (VP (VB have) (NP (NN time))) (. ?)))
Does John have time?
(ROOT (S (NP (NNP Does) (NNP John)) (VP (VBP have) (NP (NN time))) (. ?)))

Cute -- this looks like a POS tagging bug. In the first case, "Does" is correctly tagged as VBZ; in the second, it is incorrectly tagged as a proper noun (NNP). Likely, this is the sequence model in the POS tagger messing up: Since both "Does" and "John" are capitalized, it prefers tagging them both as proper nouns.

Related

What does NML tag convey?

Basic query:
Stanford parser version 4.0.0 uses NML tag. I think is a useful feature, but I do not fully understand it. So I would appreciate more information about it, e.g. its full form and the motivation for introducing it etc. Why does it treat "Income tax proposal" and "Fish tank water"
differently. Has parser learnt the use of NML tag correctly?
Following is optional, please read it if you think that I am making up a fictious tag!
Following information is just to establish that this a serious enquiry. My previous query about NML tag was rejected because my guess of meaning of NML tag mislead me and some how I gave a wrong example! I am sorry for that.
Please see:
https://nlp.stanford.edu/nlp/javadoc/javanlp/index.html?edu/stanford/nlp/trees/ModCollinsHeadFinder.html
Under the heading Changes:
QUOTE
Added NML head rules, which are the same as for NP.
NP head rule: NP and NML are treated almost identically (NP has precedence)
NAC head rule: NML comes after NN/NNS but after NNP/NNPS
UNQUOTE
I am getting NML tags in several sentences while running the Stanford parser version 4.0.0
Here is just one example:
Parsing [sent. 1 len. 7]: The income tax proposal was rejected .
(ROOT
(S
(NP (DT The)
(NML (NN income) (NN tax))
(NN proposal))
(VP (VBD was)
(VP (VBN rejected)))
(. .)))
The NML label should be for a noun phrase that is modifying another word or phrase. So a good example would be income tax proposal. income tax is an NML since it is serving as an adjective of proposal. It is describing the type of proposal.
Syntactically income tax proposal and marriage proposal share the same high level structure, a noun phrase describing another noun, so the point of NML is to note that the phrase income tax is a complete object and it is modifying the word proposal to generate the final NP of income tax proposal.
If the actual statistical parser is inconsistent, as in the case of fish tank water, that is more likely an error in the model itself, which is just something you have to accept. Statistical parsers make lots of errors all the time.

Stanford NLP. The Solution to Draw Graphical Tree

I have a tree. I want to draw it into form(graphical). After that, it can extend to add, delete node, edit pos tag of tree on this graphic. can give me some ideas to start with this issue. sorry for bad english.
example tree:
(ROOT
(S (NP (NNP John))
(VP (VBZ loves)
(NP (NNP Mary)))
(. .)))
You can take a look at PennTreeReader in CoreNLP for code to read the string form of a tree into a Tree object. Beyond that, the design of the visualization is up to you. A good place to start might be D3; there's even a CoreNLP Parse Tree demo.

Pattern match function in Scheme Meta Circular Evaluator

I'm trying to add a pattern matching function to an existing scheme meta circular evaluator (this is homework) and I'm a bit lost on the wording of the instructions. I was hoping someone more skilled in this regard could help me interpret this.
The syntax for match should look like the following: (match a ((p1 v1) (p2 v2) (p3 v3)))
And it could be used to find length like so:
(define length (lambda (x)
(match x (('() 0)
(('cons a b) (+ 1 (length b))))))
The pattern language in the function should contain numeric constants, quoted constants, variables, and cons. If patterns are exhausted without finding a match, an error should be thrown.
I thought I understood the concept of pattern matching but implementing it in a function this way has me a bit thrown off. Would anyone be willing to explain what the above syntax is doing (specifically, how match is used in length above) so I can get a better understanding of what this function should do?
(match x (('() 0)
(('cons a b) (+ 1 (length b)))))
It may be most helpful to consider what this code would need to expand into. For each pattern, you'd need a test to determine whether the object you're trying to match matches, and you'd need code to figure out how to bind variables to its subparts. In this case, you'd want an expansion roughly like:
(if (equal? '() x)
0
(if (pair? x)
(let ((a (car x))
(b (cdr x)))
(+ 1 (length b)))
;; indicate failure to match
'NO-MATCH))
You could write that with a cond, too, of course, but if you have to procedurally expand this, it might be easier to use nested if forms.
If you're actually implementing this as a function (and not as a macro (i.e., source transformation), then you'll need to specify exactly how you can work with environments, etc.
I suggest you read the chapter four, Structured Types and the Semantics of Pattern Matching, from The Implementation of Functional Languages. The chapter is written by Simon L. Peyton Jones and Philip Wadler.

algorithm to get topic / focus of sentence out of words in sentence

Are there any well-know or successful algorithms for obtaining the topic and / or focus of a sentence ( question ) out of the words in the sentence question?
If not, how would I got about getting the topic / focus of the question. It seems that the topic / focus of the questions is usually a noun or a noun-phrase.
So the first thing I would do is determine the nouns by Part Of Speech tagging the question. but then how do I know if I should get just the nouns or the noun(s) and a adjective before it, or the noun and the adverb before it, or the noun(s) and verb?
For example:
In ' did the quick brown fox jump over the lazy dog ', get ' quick brown fox ', ' jump ', and ' lazy dog '.
In ' what is the population of japan ', get ' population ' and ' japan '
In ' what color is milk ' get ' color ' and ' milk '
In ' What is the height of Mt. Everest ' get ' Mt. Everst ' and ' Height '.
While writing these I guess the easiest way is removing stop words.
I think first of all that the problem is language-dependent.
Secondly I think that if you have a set of words, you could run a check on their popularity/frequency in the language; f.e. the word "the" occurs much more often that the word "euphoric" => euphoric has more chance of being a proper keyword.
Here the importance of spelling is however crucial. How to deal with this? One idea is to use distance-algorithms such as Levenshtein to words that do not occur often (or do a google-search with the word and check if you get results or a "did-you-mean"-notification)
Some languages are though more structured that other. In english to find nouns, you can run first a check with "a/an word" and then words that end in "s" to find possible candidates for nouns. Then make a comparison with a dictionary.
With adjectives you can perhaps assume that a possible adjective will be located right before the noun. Then just compare the possible adjective with the dictionary.
Then you could of course keep a black-list of words that are never allowed as keywords.
The best solution would perhaps be to have a self-learning neural system but I'm not so familiar with those to give any suggestions
This could be thought of as a parsing problem and I personally find the stanford nlp tool very effective .
Here is the link to the demo of the stanford parser
For the example , did the quick brown fox jump over the lazy dog
The output you get is
did/VBD
the/DT
quick/JJ
brown/JJ
fox/NN
jump/VB
over/RP
the/DT
lazy/JJ
dog/NN
From the output you can write an extractor to extract the nouns ( adjectives and adverbs if need be) and thus obtain the topics from the sentence .
Moreover , the parse tree looks like
(ROOT
(SINV (VBD did)
(NP (DT the) (JJ quick) (JJ brown) (NN fox))
(VP (VB jump)
(PRT (RP over))
(NP (DT the) (JJ lazy) (NN dog)))))
If you take a closer look at the parse tree , the output you are expecting are both the NP(noun phrases) - the quick brown fox and the lazy dog .
I hope this helps !

What are considered atoms in Scheme?

Can someone please explain or link me to any helpful resources ( I couldn't find any threads on google) that could help me understand what atoms are.
Nowadays we consider an atom an element that's not a cons-pair and that is not null. That includes:
Numbers
Strings
Symbols
Booleans
Characters
This is best expressed with the following procedure, taken from the book The Little Schemer:
(define atom?
(lambda (x)
(and (not (pair? x)) (not (null? x)))))
The term "atom" is used by several authors (McCarthy and Friedman/Felleisen, among others) to refer to a datum that is not a "cons" pair. I claim that these days, you'd be more likely to invert that, and test for "cons"-hood rather than "atom"-hood. Where are you seeing the term used?

Resources