I want to extract information from sentences. I am a newbie in this field. I have sentences as :
"Andrew query pizza king what is today's deal"
"Andrew order flower shop to send my wife roses"
Format : <Name> <command> <company name> <connecting word> <action>
With the help of standford NLP parser how to extract the sentences as the format above? Like After extracting If i want to print action of the sentence it should give {is today's deal, me send my wife roses}
That's a hard task. If you have a very, very restricted set of sentences you can try to use the parser dependencies and model your problem with rules. However, I ran your sentence through the Stanford parser and got obviously wrong result:
(ROOT
(FRAG
(NP
(NP (NNP Andrew) (NN query) (NN pizza) (NN king))
(SBAR
(WHNP (WP what))
(S
(VP (VBZ is)
(NP
(NP (NN today) (POS 's))
(NN deal))))))))
As you can see, it sees Andrew query pizza king as a noun phrase, it would do the same with "Andrew dog carrot soup what is today's deal". Obviously it misses the verb "query", the target "pizza king", etc.
Even if that worked, the syntax parser models only syntax, ignoring semantics. You should check Semantic Role Labeling, Named Entity Recognition, Relation Extraction, etc. For your specific task most probably you would have to define your own semantics, then use a statistical algorithm for analyzing the text and extracting the needed information.
Here is a nice article about approaches to building chatbots: https://techinsight.com.vn/language/en/three-basic-nlp-problems-one-develops-chatbot-system-typical-approaches/
Related
I have in a database thousands of sentences (highlights from kindle books) and some of them are sentence fragments (e.g. "You can have the nicest, most") which I am trying to filter out.
As per some definition I found, a sentence fragment is missing either its subject or its main verb.
I tried to find some kind of sentence fragment algorithm but without success.
But anyway in the above example, I can see the subject (You) and the verb (have) but it still doesn't look like a full sentence to me.
I thought about restricting on the length (like excluding string whose length is < than 30) but I don't think it's a good idea.
Any suggestion on how you would do it?
I have a large corpus of sentences (~ 1.1M) to parse through Stanford Core NLP but in the output I get more sentences than in the input, probably the system segments some sentences beyond the given segmentation into lines.
To control what happens I would like to include "tags" into the input. These tags should be recognizable in the output and should not affect parsing.
Something like
<0001>
I saw a man with a telescope .
</0001>
or
#0001#
I saw a man with a telescope .
#/0001#
I have tried many formats, in all cases the "tag" has been parsed as if it were part of the text.
Is there some way to tell the parser "do not parse this, just keep it as is in the output"?
===A few hours later===
As I'm getting no answer, here is an example: I would like to process the sentence “Manon espérait secrètement y revoir un garçon qui l'avait marquée autrefois.” that carries tag 151_0_4. I imagined to write the tag between two rows of equal signs on a separate line, followed by a period, to be sure that the tag will, at worst, be processed as a separate sentence:
=====151_0_4======.
Manon espérait secrètement y revoir un garçon qui l'avait marquée autrefois.
=====151_0_4======.
And here is what this produced:
(ROOT (SENT (NP (SYM =)) (NP (SYM =) (PP (SYM =) (NP (SYM =) (PP (SYM =) (NP (NUM 151_0_4) (SYM =) (SYM =) (NP (SYM =) (PP (SYM =) (NP (SYM =) (SYM =))))))))) (PUNCT .)))
As you see the tags are definitely considered as being part of the sentence, no way to separate them from it.
Same thing happened with XML-like tags <x151_0_4> or tags using the hash character...
If your current data is strictly one sentence per line, then by far the easiest thing to do is to just leave it like that and to give the option -ssplit.eolonly=true.
There unfortunately isn't an option to pass through certain kinds of meta-data or delimiters without attempting to parse or process them. However, you can indicate that they should not be made part of other sentences by means of the ssplit.boundaryTokenRegex or ssplit.boundaryMultiTokenRegex properties. However, your choices are then either to just delete them (see ssplit.tokenPatternsToDiscard) or else to process them as weird sentences, which you'd then need to clean up.
when i input a parameter "marry" into function,then,return "this is name",when i input a parameter "MIT",then,return "this is institution name",just a word,not a whole sentence to reognize,what should use Algorithm about NLP? thanks.
This looks like part-of-speech tagging and/or named entity recognition BUT if you are processing English, single words without context are potentially ambiguous. Also, single words may not be informative. "new" on it's own can be an adjective (POS) but "New York" is most likely a location (NER). Check some literature on both tasks and consider processing at least sentence-level features.
I need your help please, I am doing NER project using NetBeans v.8.0.2.
I need to get the Person Names and Places out of any Arabic document-file and categorize them as person name, Place. I saw all Stanford files, POS tagger, parser and also Stanford NER. And I tried them all, the tagger works fine with me.
But i had problems with Parser especially in this line of code
LexicalizedParser lp = LexicalizedParser.loadModel(grammar, options);
from ParserDemo and no output comes up. Do i need the parser first to tokenize the document then to use POS tagger, or i can just use the POS tagger with some editing (like using if statement to combine all NNP together and the same for places).
So first of all as of the moment we do not have any Arabic NER models.
Secondly, I'll post some steps for running the Stanford parser on Arabic text.
Get the Stanford parser: http://nlp.stanford.edu/software/lex-parser.shtml
Compile ParserDemo.java ; you need the jars present in the directory stanford-parser-full-2015-04-20 to compile
I ran this command at the command line while in the stanford-parser-full-2015-04-20 directory, (do the analogous thing in NetBeans):
java -cp ".:*" ParserDemo edu/stanford/nlp/models/lexparser/arabicFactored.ser.gz data/arabic-onesent-utf8.txt
You should get a proper parse of the Arabic example sentence.
So when you run ParserDemo in NetBeans, make sure you provide "edu/stanford/nlp/models/lexparser/arabicFactored.ser.gz" as the first argument to ParserDemo , so it knows to load the Arabic model.
For this input:
و نشر العدل من خلال قضاء مستقل
I get this output:
(ROOT
(S (CC و)
(VP (VBD نشر)
(NP (DTNN العدل))
(PP (IN من)
(NP (NN خلال)
(NP (NN قضاء) (JJ مستقل)))))
(PUNC .)))
I am happy to help you further, please let me know if you need any more info.
FYI here is some more info on the Arabic parser:
http://nlp.stanford.edu/software/parser-arabic-faq.shtml
Please, Could anybody help me in order to extract the text from a Tree??
e.g. : (NP (NP (DT the) (JJ main) (NN road)) (PP (IN of) (NP (NNP Rontau))))
The text : "the main road of Rontau"
I'm using stanford trees package.
You want the "yield" of a parse tree. You can access this as a list of Label instances using the method Tree#yield.