Running
./corenlp.sh -annotators quote -outputFormat xml -file input.txt
on the modified input file
"Stanford University" is located in California. It is a great university, founded in 1891.
yields the following output:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="CoreNLP-to-HTML.xsl" type="text/xsl"?>
<root>
<document>
<sentences/>
</document>
</root>
Maybe I misunderstood the intended use of this annotator, but I expected it to mark the parts of the sentence that is between the ".
When I run the script with the "usual" annotators tokenize,ssplit,pos,lemma,ner, they are all working well, but adding quote does not change the output. I use the stanford-corenlp-full-2015-12-09 release.
How can I use the quote annotator and what is it meant to do?
If you build a StanfordCoreNLP object in Java code and run it with the quote annotator, the final Annotation object will have the quotes.
import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.ling.CoreAnnotations.*;
import edu.stanford.nlp.util.*;
public class PipelineExample {
public static void main (String[] args) throws IOException {
// build pipeline
Properties props = new Properties();
props.setProperty("annotators","tokenize, ssplit, quote");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "\"Stanford University\" is located in California. It is a great university, founded in 1891.";
Annotation annotation = new Annotation(text);
pipeline.annotate(annotation);
System.out.println(annotation.get(CoreAnnotations.QuotationsAnnotation.class));
}
}
Currently none of the outputters (json, xml, text, etc...) output the quotes. I'll make a note we should add this to the output for future versions.
Related
I have migrated an extension from quarkus 2.7.5 to quarkus 2.8.0.
After the migration, I run mvn clean install and the console shows me weird warnings about all (maybe 100) config properties (some of them are defined by me, other not like java.specification.version):
[WARNING] [io.quarkus.config] Unrecognized configuration key "my.property" was provided; it will be ignored; verify that the dependency extension for this configuration is set or that you did not make a typo
I think my integration-tests module causes this issue.
Here is my class in runtime folder:
import java.util.Optional;
import io.quarkus.runtime.annotations.ConfigItem;
import io.quarkus.runtime.annotations.ConfigPhase;
import io.quarkus.runtime.annotations.ConfigRoot;
#ConfigRoot(phase = ConfigPhase.RUN_TIME, prefix="", name = "myApp")
public class MyAppConfig {
#ConfigItem(defaultValue = "defaultValue")
String firstProperty;
#ConfigItem(defaultValue = "")
Optional<String> secondProperty;
#ConfigItem(defaultValue = "defaultValue")
String thirdProperty;
// Getters ...
}
Here is my test:
import static org.junit.jupiter.api.Assertions.assertEquals;
import org.eclipse.microprofile.config.inject.ConfigProperty;
import org.junit.jupiter.api.Test;
import io.quarkus.test.junit.QuarkusTest;
#QuarkusTest
public class MyAppIntegrationTest {
#ConfigProperty(name="myApp.first-property")
String firstProperty;
#ConfigProperty(name="myApp.second-property")
String secondProperty;
#ConfigProperty(name="myApp.third-property")
String thirdProperty;
#Test
public void testConfig() {
assertEquals("firstValue", firstProperty);
assertEquals("secondValue", secondProperty);
assertEquals("thirdValue", thirdProperty);
}
}
Can someone help me on this ? For instance, do I need a BuildItem for that ?
Thanks for your help
After spending hours on those warnings, I found that they was caused by the empty prefix field of the ConfigRoot annotation.
Setting name field to "" and prefix to "myApp" fixed the issue
I noticed that corenlp.run can identify "10am tomorrow" and parse it out as time. But the training tutorial and the docs I've seen only allow for 1 word per line. How do I get it to understand a phrase.
On a related note, is there a way to tag compound entities?
Time related phrases like that are recognized by the SUTime library. More details can be found here: https://nlp.stanford.edu/software/sutime.html
There is functionality for extracting entities after the ner tagging has been done.
For instance if you have tagged a sentence: Joe Smith went to Hawaii . as PERSON PERSON O O LOCATION O you can extract out Joe Smith and Hawaii. This requires the entitymentions annotator.
Here is some example code:
package edu.stanford.nlp.examples;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.util.*;
import java.util.*;
public class EntityMentionsExample {
public static void main(String[] args) {
Annotation document =
new Annotation("John Smith visited Los Angeles on Tuesday.");
Properties props = new Properties();
//props.setProperty("regexner.mapping", "small-names.rules");
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
for (CoreMap entityMention : document.get(CoreAnnotations.MentionsAnnotation.class)) {
System.out.println(entityMention);
//System.out.println(entityMention.get(CoreAnnotations.TextAnnotation.class));
System.out.println(entityMention.get(CoreAnnotations.EntityTypeAnnotation.class));
}
}
}
I want to combine consecutive tokens with the same named entity annotation (say, STANFORD UNIVERSITY, where both tokens "stanford" and "university" have NE "ORGANIZATION") into a single token, so that I just have "STANFORD UNIVERSITY" with NE "ORGANIZATION". Is there a way to do that with tokens regex?
So, this really is a two-part question:
1) How would you write the pattern for an unbroken sequence of tokens with the same NER?
2) How would you write the action to combine captured tokens into one (basically, do the opposite of the Split function)?
Thanks!
You want to use the entitymentions annotator, which will do this for you and extract full entities from the text.
sample code:
package edu.stanford.nlp.examples;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.util.*;
import java.util.*;
public class EntityMentionsExample {
public static void main(String[] args) {
Annotation document =
new Annotation("John Smith visted Los Angeles on Tuesday.");
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
for (CoreMap entityMention : document.get(CoreAnnotations.MentionsAnnotation.class)) {
System.out.println(entityMention);
}
}
}
When using the Spock #Stepwise annotation, is there any way to configure it to not fail the entire testsuite after a single test fails?
Decided to just create a new extension called #StepThrough. All I needed to do was subclass StepwiseExtension and take out the line of code that was failing the entire test suite. Pasted code below...
StepThrough.groovy
package com.test.SpockExtensions
import org.spockframework.runtime.extension.ExtensionAnnotation
import java.lang.annotation.ElementType
import java.lang.annotation.Retention
import java.lang.annotation.RetentionPolicy
import java.lang.annotation.Target
/**
* Created by jchertkov on 6/22/15.
*/
#Target(ElementType.TYPE)
#Retention(RetentionPolicy.RUNTIME)
#ExtensionAnnotation(StepThroughExtension.class)
public #interface StepThrough {}
StepThroughExtension.groovy
package com.test.SpockExtensions
import org.spockframework.runtime.extension.builtin.StepwiseExtension
import org.spockframework.runtime.model.SpecInfo
import java.lang.annotation.Annotation
/**
* Created by jchertkov on 6/22/15.
*/
public class StepThroughExtension extends StepwiseExtension {
public void visitSpecAnnotation(Annotation annotation, final SpecInfo spec) {
sortFeaturesInDeclarationOrder(spec);
includeFeaturesBeforeLastIncludedFeature(spec);
}
}
Notes:
I put the code into a package called com.test.SpockExtensions. You will need to do the same with whatever name you would like.
Java users - just change filetype from .groovy to .java
I have a YAML configuration file with a map of properties:
properties:
a.b.c: 1
Boot will parse this as:
{a:{b:{c:1}}}
However, what I desire is :
{'a.b.c': 1}
Is there anyway to coax it into "pass through" key mode? Quoting the key doesn't seem to help.
Update
Actual example below.
Java
import static com.google.common.collect.Maps.newLinkedHashMap;
import java.util.Map;
import javax.annotation.PostConstruct;
import lombok.Data;
import lombok.val;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.Configuration;
#Data
#Configuration
#ConfigurationProperties("hadoop")
public class HadoopProperties {
private Map<Object, Object> properties = newLinkedHashMap();
}
YAML
application.yml:
hadoop:
properties:
fs.defaultFS: hdfs://localhost:8020
mapred.job.tracker: localhost:8021
Result
Calling toString() on the resulting object:
HadoopProperties(properties={fs={defaultFS=hdfs://localhost:8020}, mapred={job={tracker=localhost:8021}}})
I see. It's because you are binding to a very generic object, so Spring Boot thinks your period separators are map key dereferences. What happens if you bind to Map or Properties?
you can use as below
hadoop:
properties:
"[fs.defaultFS]" : hdfs://localhost:8020
"[mapred.job.tracker]": localhost:8021
reference - link