Hi i am using the pipeline object to parse forums posts. for each one i do the following :
Annotation document = new Annotation(post);
mPipeline.annotate(document); // annoatiate the post text
I would like each call to annotate to timeout after a few seconds.
I have followed the example at line 65: https://github.com/stanfordnlp/CoreNLP/blob/master/itest/src/edu/stanford/nlp/pipeline/ParserAnnotatorITest.java
So i am creating the pipeline object as follows :
Properties props = new Properties();
props.setProperty("parse.maxtime", "30");
props.setProperty("dcoref.maxtime", "30");
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
mPipeline = new StanfordCoreNLP(props);
How ever when i add the maxtime properties i get the following exception :
Exception in thread "main" java.lang.NullPointerException
at edu.stanford.nlp.pipeline.SentenceAnnotator.annotate(SentenceAnnotator.java:64)
at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:68)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:412)
Without the maxtime option there is no exception.
how can i set maxtime properly ?
Thanks
Related
I am trying to train custom relations in Stanford CoreNLP using the birthplace model.
I have gone through this documentation which details us to make a properties file (similar to the roth.properties) as follows:
#Below are some basic options. See edu.stanford.nlp.ie.machinereading.MachineReadingProperties class for more options.
# Pipeline options
annotators = pos, lemma, parse
parse.maxlen = 100
# MachineReading properties. You need one class to read the dataset into correct format. See edu.stanford.nlp.ie.machinereading.domains.ace.AceReader for another example.
datasetReaderClass = edu.stanford.nlp.ie.machinereading.domains.roth.RothCONLL04Reader
#Data directory for training. The datasetReaderClass reads data from this path and makes corresponding sentences and annotations.
trainPath = "D:\\stanford-corenlp-full-2017-06-09\\birthplace.corp"
#Whether to crossValidate, that is evaluate, or just train.
crossValidate = false
kfold = 10
#Change this to true if you want to use CoreNLP pipeline generated NER tags. The default model generated with the relation extractor release uses the CoreNLP pipeline provided tags (option set to true).
trainUsePipelineNER=false
# where to save training sentences. uses the file if it exists, otherwise creates it.
serializedTrainingSentencesPath = "D:\\stanford-corenlp-full-2017-06-09\\rel\\sentences.ser"
serializedEntityExtractorPath = "D:\\stanford-corenlp-full-2017-06-09\\rel\\entity_model.ser"
# where to store the output of the extractor (sentence objects with relations generated by the model). This is what you will use as the model when using 'relation' annotator in the CoreNLP pipeline.
serializedRelationExtractorPath = "D:\\stanford-corenlp-full-2017-06-09\\rel\\roth_relation_model_pipeline.ser"
# uncomment to load a serialized model instead of retraining
# loadModel = true
#relationResultsPrinters = edu.stanford.nlp.ie.machinereading.RelationExtractorResultsPrinter,edu.stanford.nlp.ie.machinereading.domains.roth.RothResultsByRelation. For printing output of the model.
relationResultsPrinters = edu.stanford.nlp.ie.machinereading.RelationExtractorResultsPrinter
#In this domain, this is trivial since all the entities are given (or set using CoreNLP NER tagger).
entityClassifier = edu.stanford.nlp.ie.machinereading.domains.roth.RothEntityExtractor
extractRelations = true
extractEvents = false
#We are setting the entities beforehand so the model does not learn how to extract entities etc.
extractEntities = false
#Opposite of crossValidate.
trainOnly=true
# The set chosen by feature selection using RothCONLL04:
relationFeatures = arg_words,arg_type,dependency_path_lowlevel,dependency_path_words,surface_path_POS,entities_between_args,full_tree_path
# The above features plus the features used in Bjorne BioNLP09:
# relationFeatures = arg_words,arg_type,dependency_path_lowlevel,dependency_path_words,surface_path_POS,entities_between_args,full_tree_path,dependency_path_POS_unigrams,dependency_path_word_n_grams,dependency_path_POS_n_grams,dependency_path_edge_lowlevel_n_grams,dependency_path_edge-node-edge-grams_lowlevel,dependency_path_node-edge-node-grams_lowlevel,dependency_path_directed_bigrams,dependency_path_edge_unigrams,same_head,entity_counts
I am executing this command in my directory D:\stanford-corenlp-full-2017-06-09:
D:\stanford-corenlp-full-2017-06-09\stanford-corenlp-3.8.0\edu\stanford\nlp>java -cp classpath edu.stanford.nlp.ie.machinereading.MachineReading --arguments roth.properties
and I am getting this error
Error: Could not find or load main class edu.stanford.nlp.ie.machinereading.MachineReading
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.ie.machinereading.MachineReading
Also I have tried to programmatically train the custom relation model with the below C# code:
using java.util;
using System.Collections.Generic;
namespace StanfordRelationDemo
{
class Program
{
static void Main(string[] args)
{
string jarRoot = #"D:\Stanford English Model\stanford-english-corenlp-2018-10-05-models\";
string modelsDirectory = jarRoot + #"edu\stanford\nlp\models";
string sutimeRules = modelsDirectory + #"\sutime\defs.sutime.txt,"
//+ modelsDirectory + #"\sutime\english.holidays.sutime.txt,"
+ modelsDirectory + #"\sutime\english.sutime.txt";
Properties props = new Properties();
props.setProperty("annotators", "pos, lemma, parse");
props.setProperty("parse.maxlen", "100");
props.setProperty("datasetReaderClass", "edu.stanford.nlp.ie.machinereading.domains.roth.RothCONLL04Reader");
props.setProperty("trainPath", "D://Stanford English Model//stanford-english-corenlp-2018-10-05-models//edu//stanford//nlp//models//birthplace.corp");
props.setProperty("crossValidate", "false");
props.setProperty("kfold", "10");
props.setProperty("trainOnly", "true");
props.setProperty("trainUsePipelineNER", "true");
props.setProperty("serializedTrainingSentencesPath", "D://Stanford English Model//stanford-english-corenlp-2018-10-05-models//edu//stanford//nlp//models//rel//sentences.ser");
props.setProperty("serializedEntityExtractorPath", "D://Stanford English Model//stanford-english-corenlp-2018-10-05-models//edu//stanford//nlp//models//rel//entity_model.ser");
props.setProperty("serializedRelationExtractorPath", "D://Stanford English Model//stanford-english-corenlp-2018-10-05-models//edu//stanford//nlp//models//rel//roth_relation_model_pipeline.ser");
props.setProperty("relationResultsPrinters", "edu.stanford.nlp.ie.machinereading.RelationExtractorResultsPrinter");
props.setProperty("entityClassifier", "edu.stanford.nlp.ie.machinereading.domains.roth.RothEntityExtractor");
props.setProperty("extractRelations", "true");
props.setProperty("extractEvents", "false");
props.setProperty("extractEntities", "false");
props.setProperty("trainOnly", "true");
props.setProperty("relationFeatures", "arg_words,arg_type,dependency_path_lowlevel,dependency_path_words,surface_path_POS,entities_between_args,full_tree_path");
var propertyKeys = props.keys();
var propertyStringArray = new List<string>();
while (propertyKeys.hasMoreElements())
{
var key = propertyKeys.nextElement();
propertyStringArray.Add($"-{key}");
propertyStringArray.Add(props.getProperty(key.ToString(), string.Empty));
}
var machineReader = edu.stanford.nlp.ie.machinereading.MachineReading.makeMachineReading(propertyStringArray.ToArray());
var utestResultList = machineReader.run();
}
}
}
I am getting this exception:
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Unhandled Exception: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file) --->
java.io.IOException: Unable to open
"edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger"
as class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(String
textFileOrUrl)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(Properties
config, String modelFileOrUrl, Boolean printLoading)
--- End of inner exception stack trace ---
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(Properties
config, String modelFileOrUrl, Boolean printLoading)
at edu.stanford.nlp.tagger.maxent.MaxentTagger..ctor(String modelFile, Properties config, Boolean printLoading)
at edu.stanford.nlp.tagger.maxent.MaxentTagger..ctor(String modelFile)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(String ,
Boolean )
at edu.stanford.nlp.pipeline.POSTaggerAnnotator..ctor(String annotatorName, Properties props)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.posTagger(Properties
properties)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$42(Properties
, AnnotatorImplementations )
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<>Anon4.apply(Object ,
Object )
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getDefaultAnnotatorPool$65(Entry
, Properties , AnnotatorImplementations )
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<>Anon27.get()
at edu.stanford.nlp.util.Lazy.3.compute()
at edu.stanford.nlp.util.Lazy.get()
at edu.stanford.nlp.pipeline.AnnotatorPool.get(String name)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(Properties ,
Boolean , AnnotatorImplementations , AnnotatorPool )
at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props, Boolean enforceRequirements, AnnotatorPool annotatorPool)
at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props, Boolean enforceRequirements)
at edu.stanford.nlp.ie.machinereading.MachineReading.makeMachineReading(String[]
args)
at StanfordRelationDemo.Program.Main(String[] args) in C:\Users\m1039332\Documents\Visual Studio
2017\Projects\StanfordRelationDemo\StanfordRelationDemo\Program.cs:line
46
I am simply thus unable to train the custom relation using CoreNLP any obvious mistakes which I am making, I would appreciate if anybody would point it out.
I don't think the machine reading code is distributed with the standard distribution.
You should build a jar from the full GitHub.
https://github.com/stanfordnlp/CoreNLP/tree/master/src/edu/stanford/nlp/ie/machinereading
I am trying to run topology which has Windowed Bolt, but getting following exception:
Exception in thread "main" java.lang.NullPointerException
at org.apache.storm.topology.WindowedBoltExecutor.declareOutputFields(WindowedBoltExecutor.java:309)
at org.apache.storm.topology.TopologyBuilder.getComponentCommon(TopologyBuilder.java:432)
at org.apache.storm.topology.TopologyBuilder.createTopology(TopologyBuilder.java:120)
at Main.main(Main.java:23)
I have created custom windowed bolt by extending BaseWindowedBolt.
Topology code :
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("integer", new RandomIntegerSpout(), 1);
builder.setBolt("tumblingsum", new CustomTumblingSumWindow().withTumblingWindow(new Duration(10, TimeUnit.SECONDS)),1).shuffleGrouping("integer");
builder.setBolt("final", new ResultBolt(),1).shuffleGrouping("tumblingsum");
Config config = new Config();
config.put(Config.TOPOLOGY_WORKERS, 1);
StormSubmitter.submitTopology("Test-Windowing-Topology", config, builder.createTopology());
Storm Version is 1.2.2
If I run above topology without WindowedBolt then it is working.
Am I missing anything?
Thanks
The line you're getting an exception from is https://github.com/apache/storm/blob/v1.2.2/storm-core/src/jvm/org/apache/storm/topology/WindowedBoltExecutor.java#L309.
My guess would be that your bolt is returning null from getComponentConfiguration. This looks like a bug, but you can work around it by returning an empty map from getComponentConfiguration.
Raised https://issues.apache.org/jira/browse/STORM-3211 to fix it.
//tagger
MaxentTagger tagger = new MaxentTagger(args[0]);
TokenizerFactory<CoreLabel> ptbTokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(),
"untokenizable=noneKeep");
BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream(args[1]), "utf-8"));
PrintWriter pw = new PrintWriter(new OutputStreamWriter(System.out, "utf-8"));
DocumentPreprocessor documentPreprocessor = new DocumentPreprocessor(r);
documentPreprocessor.setTokenizerFactory(ptbTokenizerFactory);
for (List<HasWord> sentence : documentPreprocessor) {
List<TaggedWord> tSentence = tagger.tagSentence(sentence);
pw.println(Sentence.listToString(tSentence, false));
}
It fails with following exception
Reading POS tagger model from C:\work\development\workspace\stanfordnlp\sample.txt ...
C:\work\development\workspace\stanfordnlp\sample.txtException in thread "main" edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:869)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:767)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:298)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:263)
at phoenix.TokenizerDemo.main(TokenizerDemo.java:42)
Caused by: java.io.StreamCorruptedException: invalid stream header: 416E6F74
at java.io.ObjectInputStream.readStreamHeader(Unknown Source)
at java.io.ObjectInputStream.<init>(Unknown Source)
at edu.stanford.nlp.tagger.maxent.TaggerConfig.readConfig(TaggerConfig.java:748)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:804)
... 4 more
The log should clearly indicate the problem:
Reading POS tagger model from C:\work\development\workspace\stanfordnlp\sample.txt ...
You are incorrectly instantiating the MaxentTagger instance. If you provide a single string argument to the constructor, that string is expected to provide a path to a tagger model file.
See the documentation for MaxentTagger for more information.
Which code properties I need use to get the result as in online parse http://nlp.stanford.edu:8080/parser/index.jsp:
Sentence: Wonderful Doctor. I cant say enough good things about him.
Online parse:
compound(Doctor-2, Wonderful-1)
root(ROOT-0, Doctor-2)
nsubj(say-3, I-1)
advmod(say-3, cant-2)
root(ROOT-0, say-3)
advmod(good-5, enough-4)
amod(things-6, good-5)
dobj(say-3, things-6)
case(him-8, about-7)
nmod(things-6, him-8)
Coding:
I am using Stanford coreNLP 3.5.2. I tried several properties as below, but I Am getting different results.
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, depparse");
//props.put("ner.model", "edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz");
// props.put("annotators", "tokenize, ssplit, pos, lemma, ner, depparse", "parse","sentiment, dcoref");
//props.put("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz");
//props.put("parse.model","edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
//props.put("tokenize.options", "ptb3Escaping=false");
//props.put("parse.maxlen", "10000");
//props.put("ner.applyNumericClassifiers", "true");
//props.put("depparse.model", "edu/stanford/nlp/models/parser/nndep/english_SD.gz");
//props.put("depparse.extradependencies", "ref_only_uncollapsed");
props.put("depparse.extradependencies", "MAXIMAL");
//props.put("depparse.originalDependencies", false);
props.put("parse.originalDependencies", false);
need a advice from any one.
if you look at the below code,every time I am creating the new object for smooks as ftlname dynamically get populated.
try {
Smooks smooks1 = new Smooks("smooks-config.xml");
if (ftlName != null) {
inputStream = new ByteArrayInputStream(xmlMessage.toString()
.getBytes());
outStream = new ByteArrayOutputStream();
smooks1.addVisitor(new FreeMarkerTemplateProcessor(
new TemplatingConfiguration(ftlName)));
smooks1.filterSource(new StreamSource(inputStream),
new StreamResult(outStream));
resultString = outStream.toString();
inputStream.close();
outStream.close();
}
} catch (Exception ee) { }
this is really hitting the performance as every time creating a smooks object, when I have try to use the single smooks instance, getting below error.
java.lang.UnsupportedOperationException: Unsupported call to Smooks instance configuration method after Smooks instance has created an ExecutionContext.
at org.milyn.Smooks.assertIsConfigurable(Smooks.java:588) [milyn-smooks-all-1.5.1.jar:]
at org.milyn.Smooks.addVisitor(Smooks.java:262) [milyn-smooks-all-1.5.1.jar:]
at org.milyn.Smooks.addVisitor(Smooks.java:241) [milyn-smooks-all-1.5.1.jar:]
can you please provide your advice on it.
smooks version :- 1.5.1
My guess (unverified) is that you can't configure the Smooks instance with an XML file (in the constructor) and then proceed to add more Visitor impls via addVisitor().
Is there a reason you're not configuring the freemarker template in the smooks config?