Stanford Parser - POS tagging - pronoun as noun? - stanford-nlp

I noticed that the Stanford Parser taggs "anyone" and "anybody" as a noun, whereas they are pronouns; I tried to set "anyone" in different contexts, I got the same result. Can anyone tell me if it hapenned to him/her and if there is a way to correct it (I mean perhaps some settings ?).
Thank you!

This is more a linguistics question than a code question. The analysis of these words is complex. Many linguists would describe them as being a fused determiner and noun, as they fairly transparently are at least historically.
In general we currently use the Penn Treebank standards for tokenization, part-of-speech, and phrasal labels for English. The Penn Treebank annotates these words as noun (NN) - rightly or wrongly - so that is what our tools currently do.
However, you might be pleased to know that under the Universal Dependencies guidelines, these words are indeed pronouns (PRON). We are sure to be moving towards greater use of Universal Dependencies in future releases.

Related

How can I use stanford-openIE for Chinese text

I am working on on Stanford-openIE but I do not know whether it supports Chinese text or not. If it supports Chinese language, How can I use stanford-openIE for Chinese text?
Any guidance will be appreciated.
Stanford's OpenIE system was developed for English. It's based off of universal dependencies, meaning that in theory it shouldn't be too hard to adapt to other languages; but, nonetheless, it's highly unlikely that it would work out of the box.
At minimum, the relation triple segmenter would have to be adapted for Chinese. For some of the more subtle functionality, the code to mark natural logic polarity and the code to score prepositional phrase deletions would have to be rewritten.

PCFG vs SR Parser

It looks like stanfordnlp has these SR models for some time.
I am really new to NLP but we are currently using PCFG parser and we are having serious performance issues( that we cut down the parse length to 35)
I was thinking if we could try using SR. I tried it with POS tagger from stanford(english-left3words-distsim.tagger)
Would you know how SR is on accuracy vs PCFG?
I also find sentence root detection issues with SR and dep parse: Example:
Michael Jeffrey Jordan, also known by his initials, MJ, is an American former professional basketball player, entrepreneur, and current majority owner and chairman of the Charlotte Bobcats
The PCFG is really accurate with the root and detects player as the root.
Would also appreciate a little insight on the NN people use e.g.(https://mailman.stanford.edu/pipermail/java-nlp-user/2014-November/006513.html) in above post.
Do I need to use another tagger like - left3words with this?
I am sorry if this sounds a little naive.
But all I want is a correct sentence root and its dependencies.
Does POS tagging upfront make it fast?
Thanks a lot in advance.
The English shift-reduce parser shipped with CoreNLP is actually slightly better than the PCFG parser on our test data. You can see performance metrics at the bottom of the shift-reduce parser homepage.
I've asked for clarification in a comment above.

How to use Stanford parser to generate the English sentence from given segmented nodes

I am trying to develop a Sinhala (My native language) to English translator.
Still I am thinking for an approach.
If I however parse a sentence of my language, then can use that for generating english sentence with the help of stanford parser. Or is there any other method you can recommend.
And I am thinking of a bottom up parser for my language, but still have no idea how to implement. Any suggestions for steps I can follow.
Thanks
Mathee
If you have enough amounts of bilingual corpora then the best thing you can do is to train a model using an existing statistical machine translation system like Moses. Modern phrase-based SMT systems resemble a reverse parsing mechanism in the sense that they find the most probable combination of target-language phrases for translating a specific source-language sentence. More information in this Wikipedia article.

How to reverse use a nlp parser to generate sentences

I am trying to develop a Sinhala (My native language) to English translator. Still I am thinking for an approach.
If I however parse a sentence of my language, then can I use that for generating english sentence with the help of stanford parser or any other parser. Or is there any other method you can recommend.
And I am thinking of a bottom up parser for my language, but still have no idea how to implement. Any suggestions for steps I can follow.
Thanks Mathee
This course on Coursera may help you implement a translator. From what I know based on that course, you can use a training set tagged by parts of speech (i.e. noun, verb, etc.) and use that training test to parse other sentence. I suggest looking into hidden Markov models.
My Pyramids parser is an unconventional single-sentence parser for English. (It is capable of parsing other languages too, but a grammar must be specified.) The parser can not only parse English into parse trees, but can convert back and forth between parse trees and word-level semantic graphs, which are graphs that describe the semantic relationships between all the words in a sentence. The correct word order is reconstructed based on the contents of the graph; all that needs to be provided, other than the words and their relationships, is the type of sentence (statement, question, command) and the linguistic category of each word (noun, determiner, verb, etc.). From there it is straight-forward to join the tokens of the parse tree into a sentence.
The parser is a (very early) alpha pre-release, but it is functional and actively maintained. I am currently using it to translate back and forth between English and an internal semantic representation used by a conversational agent (a "chat bot", but capable of more in-depth language understanding). If you decide to use the parser, do let me know. I will be happy to provide any assistance you might need with installing, using, or improving it.

Techniques for calculating adjective frequency [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I need to calculate word frequencies of a given set of adjectives in a large set of customer support reviews. However I don't want to include those that are negated.
For example suppose my list of adjectives was: [helpful, knowledgeable, friendly]. I want to make sure "friendly" isn't counted in a sentence such as "The representative was not very friendly."
Do I need to do a full NLP parse of the text or is there an easier approach? I don't need super high accuracy.
I'm not at all familiar with NLP. I'm hoping for something that doesn't have such a steep learning curve and isn't so processor intensive.
Thanks
If all you want is adjective frequencies, then the problem is relatively simple, as opposed to some brutal, not-so-good machine learning solution.
Wat do?
Do POS tagging on your text. This annotates your text with part of speech tags, so you'll have 95% accuracy or more on that. You can tag your text using the Stanford Parser online to get a feel for it. The parser actually also gives you the grammatical structure, but you only care about the tagging.
You also want to make sure the sentences are broken up properly. For this you need a sentence breaker. That's included with software like the Stanford parser.
Then just break up the sentences, tag them, and count all things with the tag ADJ or whatever tag they use. If the tags don't make sense, look up the Penn Treebank tagset (Treebanks are used to train NLP tools, and the Penn Treebank tags are the common ones).
How?
Java or Python is the language of NLP tools. Python, use NLTK. It's easy, well documented and well understood.
For Java, you have GATE, LingPipe and the Stanford Parser among others. It's a complete pain in the ass to use the Stanford Parser, fortunately I've suffered so you do not have to if you choose to go that route. See my google page for some code (at the bottom of the page) examples with the Stanford Parser.
Das all?
Nah, you might want to stem the adjectives too- that's where you get the root form of a word:
cars -> car
I can't actually think of a situation where this is necessary with adjectives, but it might happen. When you look at your output it'll be apparent if you need to do this. A POS tagger/parser/etc will get you your stemmed words (also called lemmas).
More NLP Explanations
See this question.
It depends on the source of your data. If the sentences come from some kind of generator, you can probably split them automatically. Otherwise you will need NLP, yes.
Properly parsing natural language pretty much is an open issue. It works "largely" for English, in particular since English sentences tend to stick to the SVO order. German for example is quite nasty here, as different word orders convey different emphasis (and thus can convey different meanings, in particular when irony is used). Additionally, German tends to use subordinate clauses much more.
NLP clearly is the way to go. At least some basic parser will be needed. It really depends on your task, too: do you need to make sure every one is correct, or is a probabilistic approach good enough? Can "difficult" cases be discarded or fed to a human for review? etc.

Resources