Stanford's CoreNLP inconsistency with online demo - stanford-nlp

I'm using the latest version [3.8.0] of CoreNLP with the python wrapper [py-corenlp] and I realized there is some inconsistency between the output I get from CoreNLP when I do the annotation with the following annotators: tokenize, ssplit, pos, depparse, parse, and the output from the Online Demo. What is more, Stanford's Parser, both when calling it in my code or when I run it online, is giving me the same results as CoreNLP.
For instance, I have the following question (borrowed from the Free917 question corpus):
at what institutions was Marshall Hall a professor
Using CoreNLP I get the following parsing:
(ROOT\n (SBAR\n (WHPP (IN at)\n (WHNP (WDT what)))\n (S\n (NP (NNS institutions))\n (VP (VBD was)\n (NP\n (NP (NNP Marshall) (NNP Hall))\n (NP (DT a) (NN professor)))))))
Same with Stanford's Parser:
[Tree('ROOT', [Tree('SBAR', [Tree('WHPP', [Tree('IN', ['at']), Tree('WHNP', [Tree('WP', ['what'])])]), Tree('S', [Tree('NP', [Tree('NNS', ['institutions'])]), Tree('VP', [Tree('VBD', ['was']), Tree('NP', [Tree('NP', [Tree('NNP', ['Marshall']), Tree('NNP', ['Hall'])]), Tree('NP', [Tree('DT', ['a']), Tree('NN', ['professor'])])])])])])])]
The Online Demo is the correct version though:
Online Demo Parsing
How can I get the results I get using the Online Demo?
Thank you in advance!

The demo runs the shift-reduce parser, which is both faster and more accurate, at the expense of a [much] larger serialized model size. See https://nlp.stanford.edu/software/srparser.shtml

Related

Writing My Prolog Code

I am writing my first Prolog code, and I am have some difficulties with it I was wondering if anyone could help me out.
I am writing a program that needs to follow the following rules:
for Verb phrases., noun phrases come before transitive verbs.
subjects (nominative noun phrases) are followed by ga
Direct Objects (nominative noun phrases are followed by o.
it must be able to form these sentences with the given words in the code:
Adamu ga waraimasu (adam laughs)
Iive ga nakimasu (eve cries)
Adamu ga Iivu O mimasi (adam watches Eve)
Iivu ga Adamu O tetsudaimasu (eve helps adam)
here is my code. It it mostly complete except, I don't know if the rules are correct in the code:
Japanese([adamu ],[nounphrase],[adam],[entity]).
Japanese([iivu ],[nounphrase],[eve],[entity]).
Japanese([waraimasu ],[verb,intransitive],[laughs],[property]).
Japanese([nakimasu],[verb,intransitive],[cries],[property]).
Japanese([mimasu ],[verb,transitive],[watches],[relation]).
Japanese([tetsudaimasu ],[verb,transitive],[helps],[relation]).
Japanese(A,[verbphrase],B,[property]):-
Japanese(A,[verb,intransitive],B,[property]).
Japanese(A,[nounphrase,accusative],B,[entity]):-
Japanese(C,[nounphrase],B,[entity]),
append([ga],C,A).
Japanese(A,[verbphrase],B,[property]):-
Japanese(C,[verb,transitive],D,[relation]),
Japanese(E,[nounphrase,accusative],F,[entity]),
append(C,E,A),
append(D,F,B).
Japanese(A,[sentence],B,[proposition]):-
Japanese(C,[nounphrase],D,[entity]),
Japanese(E,[verbphrase],F,[property]),
append(E,C,A),
append(F,D,B).

Stanford Named Entity Recognizer (NER) functionality with NLTK

Is this possible: to get (similar to) Stanford Named Entity Recognizer functionality using just NLTK?
Is there any example?
In particular, I am interested in extraction LOCATION part of text. For example, from text
The meeting will be held at 22 West Westin st., South Carolina, 12345
on Nov.-18
ideally I would like to get something like
(S
22/LOCATION
(LOCATION West/LOCATION Westin/LOCATION)
st./LOCATION
,/,
(South/LOCATION Carolina/LOCATION)
,/,
12345/LOCATION
.....
or simply
22 West Westin st., South Carolina, 12345
Instead, I am only able to get
(S
The/DT
meeting/NN
will/MD
be/VB
held/VBN
at/IN
22/CD
(LOCATION West/NNP Westin/NNP)
st./NNP
,/,
(GPE South/NNP Carolina/NNP)
,/,
12345/CD
on/IN
Nov.-18/-NONE-)
Note that if I enter my text into http://nlp.stanford.edu:8080/ner/process I get results far from perfect (street number and zip code are still missing) but at least "st." is a part of LOCATION and South Carolina is a LOCATION and not some "GPE / NNP" : ?
What I am doing wrong please? how can I fix it to use NLTK for extracting location piece from some text please?
Many thanks in advance!
nltk DOES have an interface for Stanford NER, check nltk.tag.stanford.NERTagger.
from nltk.tag.stanford import NERTagger
st = NERTagger('/usr/share/stanford-ner/classifiers/all.3class.distsim.crf.ser.gz',
'/usr/share/stanford-ner/stanford-ner.jar')
st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
output:
[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'),
('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'),
('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'LOCATION')]
However every time you call tag, nltk simply writes the target sentence into a file and runs Stanford NER command line tool to parse that file and finally parses the output back to python. Therefore the overhead of loading classifiers (around 1 min for me every time) is unavoidable.
If that's a problem, use Pyner.
First run Stanford NER as a server
java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer \
-loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz -port 9191
then go to pyner folder
import ner
tagger = ner.SocketNER(host='localhost', port=9191)
tagger.get_entities("University of California is located in California, United States")
# {'LOCATION': ['California', 'United States'],
# 'ORGANIZATION': ['University of California']}
tagger.json_entities("Alice went to the Museum of Natural History.")
#'{"ORGANIZATION": ["Museum of Natural History"], "PERSON": ["Alice"]}'
Hope this helps.

How do I define a SemgrexPattern in Stanford API, to match nodes' text without using {lemma:...}?

I am working with the edu.stanford.nlp.semgrex and edu.stanford.nlp.tress.semgraph packages and am looking for a way to match nodes with a text value other than the lemma: directive.
I couldn't find all possible attribute names in javadoc for SemgrexPattern, only those for lemma, tag, and relational operators - is there a comprehensive list available?
For example, in the following sentence
My take-home pay is $20.
extracting the 'take-home' node is not possible using
(SemgrexPattern.compile( "{lemma:take-home}"))
.matcher( "My take-home pay is $20.").find()
yields false, because take-home is deemed not to be a lemma.
What do I need to do to match nodes with non-lemma, arbitrary text?
Thanks for any advice or comment.
Sorry - I realize that {word:take-home} would work in the example above.
Thanks..

Stanford Parser tags

I just started using Stanford Parser but I do not understand the tags very well. This might be a stupid question to ask but can anyone tell me what does the SBARQ and SQ tags represent and where can I find a complete list for them? I know how the Penn Treebank looks like but these are slightly different.
Sentence: What is the highest waterfall in the United States ?
(ROOT
(SBARQ
(WHNP (WP What))
(SQ (VBZ is)
(NP
(NP (DT the) (JJS highest) (NN waterfall))
(PP (IN in)
(NP (DT the) (NNP United) (NNPS States)))))
(. ?)))
I have looked at Stanford Parser website and read a few of the journals listed there but there are no explanation of the tags mentioned earlier. I found a manual describing all the dependencies used but it doesn't explain what I am looking for. Thanks!
This reference looks to have an extensive list - not sure if it is complete or not.
Specifically, it lists the ones you're asking about as:
SBARQ - Direct question introduced by a wh-word or a wh-phrase. Indirect
questions and relative clauses should be bracketed as SBAR, not SBARQ.
SQ - Inverted yes/no question, or main clause of a wh-question,
following the wh-phrase in SBARQ.
To see the entire list just print the tagIndex of the parser
LexicalizedParser lp = LexicalizedParser.loadModel();
System.out.println(lp.tagIndex); // print the tag index

Parse out phrasal verbs

Has anyone ever tried parsing out phrasal verbs with Stanford NLP?
The problem is with separable phrasal verbs, e.g.: climb up, do over: We climbed that hill up. I have to do this job over.
The first phrase looks like this in the parse tree:
(VP
(VBD climbed)
(ADVP
(IN that)
(NP (NN hill)
)
)
(ADVP
(RB up)
)
)
the second phrase:
(VB do)
(NP
(DT this)
(NN job)
)
(PP
(IN over)
)
So it seems like reading the parse tree would be the right way, but how to know that verb is going to be phrasal?
Dependency parsing, dude. Look at the prt (phrasal verb particle) dependency in both sentences. See the Stanford typed dependencies manual for more info.
nsubj(climbed-2, We-1)
root(ROOT-0, climbed-2)
det(hill-4, that-3)
dobj(climbed-2, hill-4)
prt(climbed-2, up-5)
nsubj(have-2, I-1)
root(ROOT-0, have-2)
aux(do-4, to-3)
xcomp(have-2, do-4)
det(job-6, this-5)
dobj(do-4, job-6)
prt(do-4, over-7)
The stanford parser gives you very nice dependency parses. I have code for programmatically accessing these if you need it: https://gist.github.com/2562754

Resources