Quote Annotator get author - stanford-nlp

In the following text:
John said, "There's an elephant outside the window."
Is there a simple way to figure out that the quote "There's an elephant outside the window." belongs to John?

We've just added a module for handling this.
You'll need to get the latest code from GitHub.
Here is some sample code:
package edu.stanford.nlp.examples;
import edu.stanford.nlp.coref.*;
import edu.stanford.nlp.coref.data.*;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.util.*;
import edu.stanford.nlp.pipeline.*;
import java.util.*;
public class QuoteAttributionExample {
public static void main(String[] args) {
Annotation document =
new Annotation("John said, \"There's an elephant outside the window.\"");
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions,quote,quoteattribution");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
for (CoreMap quote : document.get(CoreAnnotations.QuotationsAnnotation.class)) {
System.out.println(quote);
System.out.println(quote.get(QuoteAttributionAnnotator.MentionAnnotation.class));
}
}
}
This is still under development, we will probably add some code to make it easier to get the actual text span that links to the quote soon.

Related

how to give flag option in stanford-nlp program?

The site suggest that i can use several flags
https://nlp.stanford.edu/software/openie.html
But how to use it, I tried doing it this way
import edu.stanford.nlp.ie.util.RelationTriple;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.naturalli.NaturalLogicAnnotations;
import edu.stanford.nlp.util.CoreMap;
import java.util.Collection;
import java.util.Properties;
/**
* A demo illustrating how to call the OpenIE system programmatically.
*/
public class OpenIEDemo {
public static void main(String[] args) throws Exception {
// Create the Stanford CoreNLP pipeline
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,depparse,natlog,openie");
props.setProperty("openieformat","ollie");
props.setProperty("openieresolve_coref","1");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// Annotate an example document.
Annotation doc = new Annotation("Obama was born in Hawaii. He is our president.");
pipeline.annotate(doc);
// Loop over sentences in the document
for (CoreMap sentence : doc.get(CoreAnnotations.SentencesAnnotation.class)) {
// Get the OpenIE triples for the sentence
Collection<RelationTriple> triples = sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class);
// Print the triples
for (RelationTriple triple : triples) {
System.out.println(triple.confidence + "\t" +
triple.subjectLemmaGloss() + "\t" +
triple.relationLemmaGloss() + "\t" +
triple.objectLemmaGloss());
}
}
}
}
I have added
props.setProperty("openieformat","ollie");
props.setProperty("openieresolve_coref","1");
But its not working
For StanfordCoreNLP, flags/properties for individual annotators are set with an annotator.flag name. And boolean flags have value "false" or "true". So, what you have is close to right, but needs to be:
props.setProperty("openie.format","ollie");
props.setProperty("openie.resolve_coref","true");

Stanford CoreNLP Chinese coreference resolution

I want to use stanford coreNLP to process some Chinese Coreference resolution.My code is below:
import java.util.Properties;
import edu.stanford.nlp.coref.CorefCoreAnnotations;
import edu.stanford.nlp.coref.data.CorefChain;
import edu.stanford.nlp.coref.data.Mention;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
public class CorefTest {
public static void main(String[] args) throws Exception {
StanfordCoreNLP pipline = new StanfordCoreNLP("StanfordCoreNLP-chinese.properties");
Annotation document = new Annotation("奥巴马出生在夏威夷,他是美国总统,他在2008年当选");
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,mention,coref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
System.out.println("---");
System.out.println("coref chains");
for (CorefChain cc : document.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
System.out.println("\t" + cc);
}
for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
System.out.println("---");
System.out.println("mentions");
for (Mention m : sentence.get(CorefCoreAnnotations.CorefMentionsAnnotation.class)) {
System.out.println("\t" + m);
}
}
}
}
And it give the result :
---
coref chains
---
mentions
奥巴马出生在夏威夷 , 他是美国总统 , 他在2008年当选
Where nothing in the coref chains,I do set the environment right, and it does support chinese , but how can I get coref chains right?
You are making the pipeline twice in your code.
Note: make sure to add this to your code:
import edu.stanford.nlp.util.StringUtils;
You code should be like this:
Properties props = StringUtils.argsToProperties("-props", "StanfordCoreNLP-chinese.properties");
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,mention,coref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

How to use StanfordNLP Chinese segmentor in Java?

I have tried the following code, however the code does not work and only outputs null.
String text = "我爱北京天安门。";
StanfordCoreNLP pipeline = new StanfordCoreNLP();
Annotation annotation = pipeline.process(text);
String result = annotation.get(CoreAnnotations.ChineseSegAnnotation.class);
System.out.println(result);
The result:
...
done [0.6 sec].
Using mention detector type: rule
null
How to use StanfordNLP Chinese segmentor correctly?
Some sample code:
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.util.StringUtils;
import java.util.*;
public class ChineseSegmenter {
public static void main (String[] args) {
// set the properties to the standard Chinese pipeline properties
Properties props = StringUtils.argsToProperties("-props", "StanfordCoreNLP-chinese.properties");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "...";
Annotation annotation = new Annotation(text);
pipeline.annotate(annotation);
List<CoreLabel> tokens = annotation.get(CoreAnnotations.TokensAnnotation.class);
for (CoreLabel token : tokens)
System.out.println(token);
}
}
Note: Make sure the Chinese models jar is on your CLASSPATH. That file is available here: http://stanfordnlp.github.io/CoreNLP/download.html
The above code should print out the tokens created after the Chinese segmenter is run.

How to split the result of PTBTokenizer into sentences?

I know I could use DocumentPreprocessor to split a text into sentence. But it does not provide enough information if one wants to convert the tokenized text back to the original text. So I have to use PTBTokenizer, which has an invertible option.
However, PTBTokenizer simply returns an iterator of all the tokens (CoreLabels) in a document. It does not split the document into sentences.
The documentation says:
The output of PTBTokenizer can be post-processed to divide a text into sentences.
But this is obviously not trivial.
Is there a class in the Stanford NLP library that can take as input a sequence of CoreLabels, and output sentences? Here's what I mean exactly:
List<List<CoreLabel>> split(List<CoreLabel> documentTokens);
I would suggest you use the StanfordCoreNLP class. Here is some sample code:
import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.ling.CoreAnnotations.*;
import edu.stanford.nlp.util.*;
public class PipelineExample {
public static void main (String[] args) throws IOException {
// build pipeline
Properties props = new Properties();
props.setProperty("annotators","tokenize, ssplit, pos");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = " I am a sentence. I am another sentence.";
Annotation annotation = new Annotation(text);
pipeline.annotate(annotation);
System.out.println(annotation.get(TextAnnotation.class));
List<CoreMap> sentences = annotation.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
System.out.println(sentence.get(TokensAnnotation.class));
for (CoreLabel token : sentence.get(TokensAnnotation.class)) {
System.out.println(token.after() != null);
System.out.println(token.before() != null);
System.out.println(token.beginPosition());
System.out.println(token.endPosition());
}
}
}
}

incompatible types: Object cannot be converted to CoreLabel

I'm trying to use the Stanford tokenizer with the following example from their website:
import java.io.FileReader;
import java.io.IOException;
import java.util.List;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.ling.HasWord;
import edu.stanford.nlp.process.CoreLabelTokenFactory;
import edu.stanford.nlp.process.DocumentPreprocessor;
import edu.stanford.nlp.process.PTBTokenizer;
public class TokenizerDemo {
public static void main(String[] args) throws IOException {
for (String arg : args) {
// option #1: By sentence.
DocumentPreprocessor dp = new DocumentPreprocessor(arg);
for (List sentence : dp) {
System.out.println(sentence);
}
// option #2: By token
PTBTokenizer ptbt = new PTBTokenizer(new FileReader(arg),
new CoreLabelTokenFactory(), "");
for (CoreLabel label; ptbt.hasNext(); ) {
label = ptbt.next();
System.out.println(label);
}
}
}
}
and I get the following error when I try to compile it:
TokenizerDemo.java:24: error: incompatible types: Object cannot be converted to CoreLabel
label = ptbt.next();
Does anyone know what the reason might be? In case you are interested, I'm using Java 1.8 and made sure that CLASSPATH contains the jar file.
Try parameterizing the PTBTokenizer class. For example:
PTBTokenizer<CoreLabel> ptbt = new PTBTokenizer<>(new FileReader(arg),
new CoreLabelTokenFactory(), "");

Resources