Error while creating NER model on Stanford NER - stanford-nlp

While I was creating a NER model, I got an error message as below:
Exception in thread "main" java.lang.RuntimeException: Got NaN for prob in CRFLogConditionalObjectiveFunction.calculate() - this may well indicate numeric underflow due to overly long documents.
at edu.stanford.nlp.ie.crf.CRFLogConditionalObjectiveFunction.calculate(CRFLogConditionalObjectiveFunction.java:427)
at edu.stanford.nlp.optimization.AbstractCachingDiffFunction.ensure(AbstractCachingDiffFunction.java:140)
at edu.stanford.nlp.optimization.AbstractCachingDiffFunction.valueAt(AbstractCachingDiffFunction.java:145)
at edu.stanford.nlp.optimization.QNMinimizer.lineSearchMinPack(QNMinimizer.java:1460)
at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:1008)
at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:857)
at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:851)
at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:93)
at edu.stanford.nlp.ie.crf.CRFClassifier.trainWeights(CRFClassifier.java:1919)
at edu.stanford.nlp.ie.crf.CRFClassifier.train(CRFClassifier.java:1726)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.train(AbstractSequenceClassifier.java:758)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.train(AbstractSequenceClassifier.java:746)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3034)
To create the NER, I simply used the Java code from Stanford NER website [here].
1
The Java code was:
java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop 06012017_training.prop
Also, the TSV file used to create NER was 35.369MB.
I am trying to create only one tag with the title "SYS".
How do I overcome this Error and successfully create an NER model?
Thank you in advance.

Related

AutoMLSearch with EvalML returning an error

I am getting following error message while trying to run AutoMLSearch with EvalML.
"All pipelines in the current AutoML batch produced a score of np.nan on the primary objective <evalml.objectives.standard_metrics.LogLossBinary object at 0x7f74defbe790>."
I tried the following solution to rectify this, still no use.
https://github.com/alteryx/evalml/issues/3154
Any suggestions?

log4j2: How do you create & set level for LevelMatchFilter in Java?

I'm trying to migrate the following Java code to log4j2:
private Filter setupWarningLevelFilter()
{
LevelMatchFilter levelFilter = new LevelMatchFilter();
levelFilter.setLevelToMatch(Level.ERROR.toString());
levelFilter.setAcceptOnMatch(false);
return levelFilter;
}
I have not found an example googling. (XML config won't help since custom appender doesn't recognize it - have to pass filter & layout in to custom appender via Java instead.) I don't understand how to interpret the cryptic apache documentation. I have gone in all the hacking circles I can think of using all kinds of combinations of build(), newBuilder(), etc. to create the LevelMatchFilter.
Related to this problem is a compilation error that appears to not like me importing both
org.apache.logging.log4j.filter.LevelMatchFilter.Builder and org.apache.logging.log4j.layout.PatternLayout.Builder
It gives error for PatternLayout.Builder stating " a type with the same simple name is already defined by the single-type-import of Builder".
Appreciate any insight/examples anyone can offer.

Converted tfjs model throwing model config null errors

I am trying to use a converted Keras model (https://github.com/idealo/image-quality-assessment/tree/master/models/MobileNet) in tfjs.
All of the Keras conversion examples mention h5 files, while these are hdf5 files. Is there a difference between them?
When I tried using the converted model I get this error
TypeError: Cannot read property 'model_config' of null
Is there a way to fix this?

Empty output when reproducing Chinese coreference results on Conll-2012 using CoreNLP Neural System

Following the instructions on this page https://stanfordnlp.github.io/CoreNLP/coref.html#running-on-conll-2012, Here's my code when I tried to reproduce Chinese coreference results on Conll-2012:
public class TestCoref {
public static void main(String[] args) throws Exception {
Properties props = StringUtils.argsToProperties(args);
props.setProperty("props", "edu/stanford/nlp/coref/properties/neural-chinese-conll.properties");
props.setProperty("coref.data", "path-to/data/conll-2012");
props.setProperty("coref.conllOutputPath", "path-to-output/conll-results");
props.setProperty("coref.scorer", "path-to/reference-coreference-scorers/v8.01/scorer.pl");
CorefSystem coref = new CorefSystem(props);
coref.runOnConll(props);
}
}
As output, I got 3 files like these:
"date-time.coref.predicted.txt
date-time.coref.gold.txt
date-time.predicted.txt"
but all of them are EMPTY!
I got my "conll-2012" data as follows:
First I downloaded train/dev/test-key data from this page http://conll.cemantix.org/2012/data.html, as well as the ontonote-release-5.0 from LDC. Then I ran the script skeleton2conll.sh provided with the official conll 2012 data which produced _conll files.
the model I used is downloaded here http://nlp.stanford.edu/software/stanford-chinese-corenlp-models-current.jar
When I tried to find the problem, I noticed that there exists a function "annotate" in the class CorefSystem which seems to do the real job, but it is not used at all. https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/coref/CorefSystem.java
I wonder if there is a bug in runOnConll function which doesn't read an annotate anything, or how could I reproduce the coreference results?
PS:
I especially want to produce some results on conversational data like "tc" and "bc" in conll-2012. I find that using the coreference API, I can only parse textual data. Is there any other way to use Neural Coref System on conversational data (where different speakers should be indicated) apart from running on Conll-2012?
thanks in advance for help!
As a start, why don't you run this command from the command line:
java -Xmx10g -cp stanford-corenlp-3.9.1.jar:stanford-chine-corenlp-models-3.9.1.jar:* edu.stanford.nlp.coref.CorefSystem -props edu/stanford/nlp/coref/properties/neural-chinese-conll.properties -coref.data <path-to-conll-data> -coref.conllOutputPath <where-to-save-system-output> -coref.scorer <path-to-scoring-script>

How to put data to Hbase without using java

Are there any way to read data from a file and put them into Hbase table without using any java? I tried to store data from pig script by using
sample = LOAD '/mapr/user/username/sample.txt' AS (all:chararray);
STORE deneme INTO 'hbase://sampledata' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('mysampletable:intdata');
but this gave this error message:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/hadoop/hbase/filterWritableByteArrayComparable
ERROR org.apache.pig.tools.grunt.Grunt java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/filter/WritableByteArrayComparable
Pig seems like a good idea to import data into HBase. Check what Armon suggested about setting the $PIG_CLASSPATH.
Another possibility to bulk loading data into HBase is to use featured tools like ImportTsv (Tab Separated Values) and CompleteBulkLoad.
http://hbase.apache.org/book/ops_mgt.html#importtsv
Well, there's the Stargate REST interface, which is usable from any language. It's not perfect, but it's worth a look.
You just need to make sure that $PIG_CLASSPATH also points at hbase.jar

Resources