How to get the text for Tree in reverse way - stanford-nlp

Please, Could anybody help me in order to extract the text from a Tree??
e.g. : (NP (NP (DT the) (JJ main) (NN road)) (PP (IN of) (NP (NNP Rontau))))
The text : "the main road of Rontau"
I'm using stanford trees package.

You want the "yield" of a parse tree. You can access this as a list of Label instances using the method Tree#yield.

Related

Adding metadata into Stanford coreNLP input

I have a large corpus of sentences (~ 1.1M) to parse through Stanford Core NLP but in the output I get more sentences than in the input, probably the system segments some sentences beyond the given segmentation into lines.
To control what happens I would like to include "tags" into the input. These tags should be recognizable in the output and should not affect parsing.
Something like
<0001>
I saw a man with a telescope .
</0001>
or
#0001#
I saw a man with a telescope .
#/0001#
I have tried many formats, in all cases the "tag" has been parsed as if it were part of the text.
Is there some way to tell the parser "do not parse this, just keep it as is in the output"?
===A few hours later===
As I'm getting no answer, here is an example: I would like to process the sentence “Manon espérait secrètement y revoir un garçon qui l'avait marquée autrefois.” that carries tag 151_0_4. I imagined to write the tag between two rows of equal signs on a separate line, followed by a period, to be sure that the tag will, at worst, be processed as a separate sentence:
=====151_0_4======.
Manon espérait secrètement y revoir un garçon qui l'avait marquée autrefois.
=====151_0_4======.
And here is what this produced:
(ROOT (SENT (NP (SYM =)) (NP (SYM =) (PP (SYM =) (NP (SYM =) (PP (SYM =) (NP (NUM 151_0_4) (SYM =) (SYM =) (NP (SYM =) (PP (SYM =) (NP (SYM =) (SYM =))))))))) (PUNCT .)))
As you see the tags are definitely considered as being part of the sentence, no way to separate them from it.
Same thing happened with XML-like tags <x151_0_4> or tags using the hash character...
If your current data is strictly one sentence per line, then by far the easiest thing to do is to just leave it like that and to give the option -ssplit.eolonly=true.
There unfortunately isn't an option to pass through certain kinds of meta-data or delimiters without attempting to parse or process them. However, you can indicate that they should not be made part of other sentences by means of the ssplit.boundaryTokenRegex or ssplit.boundaryMultiTokenRegex properties. However, your choices are then either to just delete them (see ssplit.tokenPatternsToDiscard) or else to process them as weird sentences, which you'd then need to clean up.

Extracting information from a sentence using NLP

I want to extract information from sentences. I am a newbie in this field. I have sentences as :
 
"Andrew query pizza king what is today's deal"
 "Andrew order flower shop to send my wife roses"
Format : <Name> <command> <company name> <connecting word> <action>
With the help of standford NLP parser how to extract the sentences as the format above? Like After extracting If i want to print action of the sentence it should give {is today's deal, me send my wife roses}
That's a hard task. If you have a very, very restricted set of sentences you can try to use the parser dependencies and model your problem with rules. However, I ran your sentence through the Stanford parser and got obviously wrong result:
(ROOT
(FRAG
(NP
(NP (NNP Andrew) (NN query) (NN pizza) (NN king))
(SBAR
(WHNP (WP what))
(S
(VP (VBZ is)
(NP
(NP (NN today) (POS 's))
(NN deal))))))))
As you can see, it sees Andrew query pizza king as a noun phrase, it would do the same with "Andrew dog carrot soup what is today's deal". Obviously it misses the verb "query", the target "pizza king", etc.
Even if that worked, the syntax parser models only syntax, ignoring semantics. You should check Semantic Role Labeling, Named Entity Recognition, Relation Extraction, etc. For your specific task most probably you would have to define your own semantics, then use a statistical algorithm for analyzing the text and extracting the needed information.
Here is a nice article about approaches to building chatbots: https://techinsight.com.vn/language/en/three-basic-nlp-problems-one-develops-chatbot-system-typical-approaches/

Name Entity Recognition for Arabic documents

I need your help please, I am doing NER project using NetBeans v.8.0.2.
I need to get the Person Names and Places out of any Arabic document-file and categorize them as person name, Place. I saw all Stanford files, POS tagger, parser and also Stanford NER. And I tried them all, the tagger works fine with me.
But i had problems with Parser especially in this line of code
LexicalizedParser lp = LexicalizedParser.loadModel(grammar, options);
from ParserDemo and no output comes up. Do i need the parser first to tokenize the document then to use POS tagger, or i can just use the POS tagger with some editing (like using if statement to combine all NNP together and the same for places).
So first of all as of the moment we do not have any Arabic NER models.
Secondly, I'll post some steps for running the Stanford parser on Arabic text.
Get the Stanford parser: http://nlp.stanford.edu/software/lex-parser.shtml
Compile ParserDemo.java ; you need the jars present in the directory stanford-parser-full-2015-04-20 to compile
I ran this command at the command line while in the stanford-parser-full-2015-04-20 directory, (do the analogous thing in NetBeans):
java -cp ".:*" ParserDemo edu/stanford/nlp/models/lexparser/arabicFactored.ser.gz data/arabic-onesent-utf8.txt
You should get a proper parse of the Arabic example sentence.
So when you run ParserDemo in NetBeans, make sure you provide "edu/stanford/nlp/models/lexparser/arabicFactored.ser.gz" as the first argument to ParserDemo , so it knows to load the Arabic model.
For this input:
و نشر العدل من خلال قضاء مستقل
I get this output:
(ROOT
(S (CC و)
(VP (VBD نشر)
(NP (DTNN العدل))
(PP (IN من)
(NP (NN خلال)
(NP (NN قضاء) (JJ مستقل)))))
(PUNC .)))
I am happy to help you further, please let me know if you need any more info.
FYI here is some more info on the Arabic parser:
http://nlp.stanford.edu/software/parser-arabic-faq.shtml

emacs lisp (elisp): modifying indent-for-tab-command in actionscript-mode

I'm using actionscript-mode in emacs, found here http://www.emacswiki.org/emacs/ActionScriptMode. I would like to modify it. The problem is in the behavior of the TAB command. I'm concerned with the case that the cursor is on the first column of a line that contains code already. Most code-editing modes in emacs handle this case by (1) performing an indentation calculation and adjustment, and (2) advancing the cursor to the first non-whitespace character. For some reason, the actionscript-mode does (1) but not (2). How can I modify it? I know a little emacs-lisp but not enough to follow the code. I'm not going to post the entire actionscript-mode.el, but there may be a clue in this section:
(defun actionscript-indent-line ()
"Indent current line of As3 code. Delete any trailing
whitespace. Keep point at same relative point in the line."
(interactive)
(save-excursion
(end-of-line)
(delete-horizontal-space))
(let ((old-pos (point)))
(back-to-indentation)
(let ((delta (- old-pos (point)))
(col (max 0 (as3-calculate-indentation))))
(indent-line-to col)
(forward-char delta))))
The comment here says "keep point at same relative point in the line." Maybe I can just turn off that part.
I figured it out. The command back-to-indentation puts the cursor at the beginning of the line. I need to modify a few things because I want to preserve the relative position in the case that the cursor is in the middle of the line, past the first non-whitespace character, but if the cursor is positioned before the first non-whitespace character then I want to run back-to-indentation.

String data from double single-quoted symbol in scheme

I read some answers here and googled, but had no luck.
I have this:
''a
in scheme (Chez scheme to be exact), and I want to turn it into a string (it's a case in my to-string lambda).
Now, asking if it's a symbol (using the symbol predicate) yields a positive answer, so I know when to operate, but I can't do anything after that since there is no way to get the inner-quote itself.
So basically I can't find a way to turn ''a into "a".
Hopefully this is simple, any help will be appreciated!
This expression:
''a
Is equivalent to this symbol definition (why the double quote, by the way? a single quote suffices):
(quote (quote a))
To turn it into a string, simply do this:
(symbol->string (cadr ''a))
=> "a"

Resources