Stanford Parser: sentence by sentence on command line - stanford-nlp

Is there a way to call stanford parser from command line so that it parses one sentence at a time, and in case of troubles at a specific sentence just goes over to the next sentence?
UPDATE:
I have been adapting the script posted StanfordNLP Help. However, I noticed that, with the last version of corenlp (2015-04-20) there are problems with the CCprocessed dependencies: collapsing just appears not to take place (if I grep prep_ on the output, I find nothing).
Collapsing works with the 2015-04-20 and PCFG, for example, so I assume the issue is model-specific.
If I use the very same java class in corenlp 2015-01-29 (with depparse.model changed to parse.model, and removing the original dependencies part), collapsing works just fine. Maybe I am just using the parser in the wrong way, that's why I am re-posting here and not starting a new post. Here is the updated code of the class:
import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.util.*;
public class StanfordSafeLineExample {
public static void main (String[] args) throws IOException {
// build pipeline
Properties props = new Properties();
props.setProperty("annotators","tokenize, ssplit, pos, lemma, depparse");
props.setProperty("ssplit.eolonly","true");
props.setProperty("tokenize.whitespace","false");
props.setProperty("depparse.model", "edu/stanford/nlp/models/parser/nndep/english_SD.gz");
props.setProperty("parse.originalDependencies", "true");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// open file
BufferedReader br = new BufferedReader(new FileReader(args[0]));
// go through each sentence
for (String line = br.readLine() ; line != null ; line = br.readLine()) {
try {
Annotation annotation = new Annotation(line);
pipeline.annotate(annotation);
ArrayList<String> edges = new ArrayList<String>();
CoreMap sentence = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0);
System.out.println("sentence: "+line);
for (CoreLabel token: annotation.get(CoreAnnotations.TokensAnnotation.class)) {
Integer identifier = token.get(CoreAnnotations.IndexAnnotation.class);
String word = token.get(CoreAnnotations.TextAnnotation.class);
String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
String lemma = token.get(CoreAnnotations.LemmaAnnotation.class);
System.out.println(identifier+"\t"+word+"\t"+pos+"\t"+lemma);
}
SemanticGraph tree = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
SemanticGraph tree2 = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
System.out.println("---BASIC");
System.out.println(tree.toString(SemanticGraph.OutputFormat.READABLE));
System.out.println("---CCPROCESSED---");
System.out.println(tree2.toString(SemanticGraph.OutputFormat.READABLE)+"</s>");
} catch (Exception e) {
System.out.println("Error with this sentence: "+line);
System.out.println("");
}
}
}
}

There are many ways to handle this.
The way I'd do it is to run the Stanford CoreNLP pipeline.
Here is where you can get the appropriate jar:
http://nlp.stanford.edu/software/corenlp.shtml
After you cd into the directory stanford-corenlp-full-2015-04-20
you can issue this command:
java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,parse -ssplit.eolonly -outputFormat text -file sample_sentences.txt
sample_sentences.txt would have the sentences you want to parse, one sentence per line
This will put the results in sample_sentences.txt.out which you can extract with some light scripting.
If you change -outputFormat to json instead of text, you will get some json which you can easily load and get the parses from
If you have any issues with this approach let me know and I can modify the answer to further assist you/clarify!
UPDATE:
I am not sure the exact way you are running things, but these options could be helpful.
If you use -fileList to run the pipeline on a list of files rather than on a single file, and then use this flag: -continueOnAnnotateError it should just skip the bad file, which is progress, though admittedly not just skipping the bad sentence
I wrote some Java for doing exactly what you need, so I'll try to post that in the next 24 hours if you just want to use my whipped together Java code, I'm still looking it over...

Here is some sample code for your need:
import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.util.*;
public class StanfordSafeLineExample {
public static void main (String[] args) throws IOException {
// build pipeline
Properties props = new Properties();
props.setProperty("annotators","tokenize, ssplit, pos, depparse");
props.setProperty("ssplit.eolonly","true");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// open file
BufferedReader br = new BufferedReader(new FileReader(args[0]));
// go through each sentence
for (String line = br.readLine() ; line != null ; line = br.readLine()) {
try {
Annotation annotation = new Annotation(line);
pipeline.annotate(annotation);
ArrayList<String> edges = new ArrayList<String>();
CoreMap sentence = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0);
SemanticGraph tree = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
System.out.println("---");
System.out.println("sentence: "+line);
System.out.println(tree.toString(SemanticGraph.OutputFormat.READABLE));
} catch (Exception e) {
System.out.println("---");
System.out.println("Error with this sentence: "+line);
}
}
}
}
instructions:
Cut and paste this into StanfordSafeLineExample.java
put that file in the directory stanford-corenlp-full-2015-04-20
javac -cp "*:." StanfordSafeLineExample.java
add your sentences one sentence per line to a file called sample_sentences.txt
java -cp "*:." StanfordSafeLineExample sample_sentences.txt

Related

Stanford Return all matched expressions

Is there a way to return all the matched expressions?
Consider the following sentence
John Snow killed Ramsay Bolton
where
John-NNP, Snow-NNP, killed- VBD, Ramsay-NNP, Bolton-NNP
And I am using the following tag combination as rules
NNP-NNP
NNP-VBD
VBD-NNP
and expected matched words from the above rules are:
John Snow, Snow killed, killed Ramsay, Ramsay Bolton
But using the below code, I am getting only this as matched expression:
[John Snow, killed Ramsay]
Is there a way in stanford to get all the expected matching words from the sentence? This is the code and rule file I am using right now:
import com.factweavers.multiterm.SetNLPAnnotators;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor;
import edu.stanford.nlp.ling.tokensregex.Env;
import edu.stanford.nlp.ling.tokensregex.NodePattern;
import edu.stanford.nlp.ling.tokensregex.TokenSequencePattern;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import java.util.List;
import java.util.regex.Pattern;
public class StanfordTest {
public static void main(String[] args) {
String rulesFile="en.rules";
Env env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.NORMALIZE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
env.bind("collapseExtractionRules", false);
CoreMapExpressionExtractor extractor= CoreMapExpressionExtractor.createExtractorFromFiles(env, rulesFile);
String content="John Snow killed Ramsay Bolton";
Annotation document = new Annotation(content);
SetNLPAnnotators snlpa = new SetNLPAnnotators();
StanfordCoreNLP pipeline = snlpa.setAnnotators("tokenize, ssplit, pos, lemma, ner");
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
sentences.parallelStream().forEach(sentence -> {
System.out.println(extractor.extractExpressions(sentence));
});
}
}
en.rules
{
ruleType:"tokens",
pattern:([{tag:/VBD/}][ {tag:/NNP/}]),
result:"result1"
}
{
ruleType:"tokens",
pattern:([{tag:/NNP/}][ {tag:/VBD/}]),
result:"result2"
}
{
ruleType:"tokens",
pattern:([{tag:/NNP/}][ {tag:/NNP/}]),
result:"result3"
}
I think you need to create different extractors for different things you want.
The issue here is that when you have two part-of-speech tag rule sequences that overlap like this, the first one that gets matched absorbs the tokens preventing the second pattern from matching.
So if (NNP, NNP) is the first rule, "John Snow" gets matched. But then the Snow is not available to be matched with "Snow killed".
If you have a set of patterns that overlap like this, you should disentangle them and put them in separate extractors.
So you can have a (noun, verb) extractor, and a separate (noun, noun) extractor for instance.

get method to display containing folder, size, and time of last modification Java

I have a program I am writing for a class. I have got the first part down, but need help with the code for this part:
containing folder, size, and time of last modification these steps are the ones I need help writing in.
Here is the challenge
1. Create a file using any word-processing program or text editor. Write an application that displays the file’s name, containing folder, size, and time of last modification.
below is my code so far
import java.nio.file.*;
import java.nio.file.attribute.*;
import java.io.IOException;
import static java.nio.file.AccessMode.*;
public class FileStatistics
{
public static void main(String[] args)
{
Path filePath =
Paths.get("C:\\Users\\John\\Desktop\\N Drive\\St Leo Master folder\\COM-209\\module 6\\sixtestfile.txt");
System.out.println("Path is" + filePath.toString ());
try
{
filePath.getFileSystem().provider().checkAccess
(filePath, READ, EXECUTE);
System.out.println("File can be read & executed");
}
catch(IOException e)
{
System.out.println("File cannot be used in this app");
}
}
}

How do you extract coreferences in CoreNLP 3.6.0? Missing documentation?

I'm trying to run the code found here: https://stanfordnlp.github.io/CoreNLP/coref.html
public class CorefExample {
public static void main(String[] args) throws Exception {
Annotation document = new Annotation("Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008.");
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,mention,coref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
System.out.println("---");
System.out.println("coref chains");
for (CorefChain cc : document.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
System.out.println("\t" + cc);
}
for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
System.out.println("---");
System.out.println("mentions");
for (Mention m : sentence.get(CorefCoreAnnotations.CorefMentionsAnnotation.class)) {
System.out.println("\t" + m);
}
}
}
}
However, these three imports required aren't found for me:
import edu.stanford.nlp.coref.CorefCoreAnnotations;
import edu.stanford.nlp.coref.data.CorefChain;
import edu.stanford.nlp.coref.data.Mention;
I could use these imports instead:
import edu.stanford.nlp.dcoref.CorefCoreAnnotations;
import edu.stanford.nlp.dcoref.CorefChain;
import edu.stanford.nlp.dcoref.Mention;
But then an annotation is missing, specifically:
CorefCoreAnnotations.CorefMentionsAnnotation.class
Additionally, CorefCoreAnnotations.CorefChainAnnotation.class).values() returns null...
I think the problem is that I am using CoreNLP version 3.6.0. This tutorial is for 3.7.0 I believe. Is there a similar example that uses version 3.6.0? If not, what changes do I need to make? I have a large pipeline set up and I'm not sure how hard it would be to upgrade.
Thanks for any help!
Hi I would recommend just upgrading to Stanford CoreNLP 3.7.0, it should not cause too many things to break.
One of the main changes is that we created a new package named edu.stanford.nlp.coref and put the code from edu.stanford.nlp.hcoref into it.
For the most part things should be the same though if you upgrade.

Stanford ParserAnnotator doesn't generate annotations

I'm beginning to learn the Stanford CoreNLP Java API, and am trying to print the syntax tree of a sentence. The syntax tree is supposed to be generated by the ParserAnnotator. In my code (posted below), the ParserAnnotator runs without errors but doesn't generate anything. The error only shows up when the code tries to get the label of the tree's root node, and the tree is revealed to be null. The components that run before it generate their annotations without any problems.
There was one other person on SO who had a problem with the ParserAnnotator, but the issue was with memory. I've increased the memory that I allow Eclipse to use, but the behavior is the same. Running the code in the debugger also did not yield any errors.
Some background information: The sentence I used was "This is a random sentence." I recently upgraded from Windows 8.1 to Windows 10.
public static void main(String[] args){
String sentence = "This is a random sentence.";
Annotation doc = initStanford(sentence);
Tree syntaxTree = doc.get(TreeAnnotation.class);
printTreePreorder(syntaxTree);
}
private static Annotation initStanford(String sentence){
StanfordCoreNLP pipeline = pipeline("tokenize, ssplit, parse");
Annotation document = new Annotation(sentence);
pipeline.annotate(document);
return document;
}
private static StanfordCoreNLP pipeline(String components){
Properties props = new Properties();
props.put("annotators", components);
return new StanfordCoreNLP(props);
}
public static void printTreePreorder(Tree tree){
System.out.println(tree.label());
for(int i = 0;i < tree.numChildren();i++){
printTreePreorder(tree.getChild(i));
}
}
You're trying to get the tree off of the document (Annotation), rather than the sentences (CoreMap). You can get the sentences with:
Tree tree = doc.get(SentencesAnnotation.class).get(0).get(TreeAnnotation.class)
I can also shamelessly plug the Simple CoreNLP API:
Tree tree = new Sentence("this is a sentence").parse()

Modify file using Files.lines

I'd like to read in a file and replace some text with new text. It would be simple using asm and int 21h but I want to use the new java 8 streams.
Files.write(outf.toPath(),
(Iterable<String>)Files.lines(inf)::iterator,
CREATE, WRITE, TRUNCATE_EXISTING);
Somewhere in there I'd like a lines.replace("/*replace me*/","new Code()\n");. The new lines are because I want to test inserting a block of code somewhere.
Here's a play example, that doesn't work how I want it to, but compiles. I just need a way to intercept the lines from the iterator, and replace certain phrases with code blocks.
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import static java.nio.file.StandardOpenOption.*;
import java.util.Arrays;
import java.util.stream.Stream;
public class FileStreamTest {
public static void main(String[] args) {
String[] ss = new String[]{"hi","pls","help","me"};
Stream<String> stream = Arrays.stream(ss);
try {
Files.write(Paths.get("tmp.txt"),
(Iterable<String>)stream::iterator,
CREATE, WRITE, TRUNCATE_EXISTING);
} catch (IOException ex) {}
//// I'd like to hook this next part into Files.write part./////
//reset stream
stream = Arrays.stream(ss);
Iterable<String> it = stream::iterator;
//I'd like to replace some text before writing to the file
for (String s : it){
System.out.println(s.replace("me", "my\nreal\nname"));
}
}
}
edit: I've gotten this far and it works. I was trying with filter and maybe it isn't really necessary.
Files.write(Paths.get("tmp.txt"),
(Iterable<String>)(stream.map((s) -> {
return s.replace("me", "my\nreal\nname");
}))::iterator,
CREATE, WRITE, TRUNCATE_EXISTING);
The Files.write(..., Iterable, ...) method seems tempting here, but converting the Stream to an Iterable makes this cumbersome. It also "pulls" from the Iterable, which is a bit odd. It would make more sense if the file-writing method could be used as the stream's terminal operation, within something like forEach.
Unfortunately, most things that write throw IOException, which isn't permitted by the Consumer functional interface that forEach expects. But PrintWriter is different. At least, its writing methods don't throw checked exceptions, although opening one can still throw IOException. Here's how it could be used.
Stream<String> stream = ... ;
try (PrintWriter pw = new PrintWriter("output.txt", "UTF-8")) {
stream.map(s -> s.replaceAll("foo", "bar"))
.forEachOrdered(pw::println);
}
Note the use of forEachOrdered, which prints the output lines in the same order in which they were read, which is presumably what you want!
If you're reading lines from an input file, modifying them, and then writing them to an output file, it would be reasonable to put both files within the same try-with-resources statement:
try (Stream<String> input = Files.lines(Paths.get("input.txt"));
PrintWriter output = new PrintWriter("output.txt", "UTF-8"))
{
input.map(s -> s.replaceAll("foo", "bar"))
.forEachOrdered(output::println);
}

Resources