Stanford 3.7 models on Maven? - stanford-nlp

I am using Stanford 3.7 for NER/RegexNER. I have the following Maven dependencies in my pom:
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.7.0</version>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.7.0</version>
<classifier>models</classifier>
</dependency>
And I am using the Stanford CoreNLP API, following the documentation:
Properties props = new Properties();
props.put("regexner.mapping", "my_file_name");
props.put("regexner.ignorecase", "true");
props.put("annotators", "tokenize, ssplit, ner, regexner");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation document = new Annotation(text);
pipeline.annotate(document);
However, I still get the RuntimeException error:
java.io.IOException: Unable to open "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as class path, filename or URL
Does anyone know how to use the API with the models from version 3.7? Any advice welcome.

That file is in the stanford-corenlp-3.7.0-models.jar so it might be that when you are running your application somehow that jar is not in the CLASSPATH. How are you running this Java code?

Related

TaskID.<init>(Lorg/apache/hadoop/mapreduce/JobID;Lorg/apache/hadoop/mapreduce/TaskType;I)V

val jobConf = new JobConf(hbaseConf)
jobConf.setOutputFormat(classOf[TableOutputFormat])
jobConf.set(TableOutputFormat.OUTPUT_TABLE, tablename)
val indataRDD = sc.makeRDD(Array("1,jack,15","2,Lily,16","3,mike,16"))
indataRDD.map(_.split(','))
val rdd = indataRDD.map(_.split(',')).map{arr=>{
val put = new Put(Bytes.toBytes(arr(0).toInt))
put.add(Bytes.toBytes("cf"),Bytes.toBytes("name"),Bytes.toBytes(arr(1)))
put.add(Bytes.toBytes("cf"),Bytes.toBytes("age"),Bytes.toBytes(arr(2).toInt))
(new ImmutableBytesWritable, put)
}}
rdd.saveAsHadoopDataset(jobConf)
When I run hadoop or spark jobs, I often meet the error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.TaskID.<init>(Lorg/apache/hadoop/mapreduce/JobID;Lorg/apache/hadoop/mapreduce/TaskType;I)V
at org.apache.spark.SparkHadoopWriter.setIDs(SparkHadoopWriter.scala:158)
at org.apache.spark.SparkHadoopWriter.preSetup(SparkHadoopWriter.scala:60)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1188)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1161)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1161)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1161)
at com.iteblog.App$.main(App.scala:62)
at com.iteblog.App.main(App.scala)`
At the begin, I think, is the jar conflict, but I carefully checked the jar: there are no other jars. The spark and hadoop versions are:
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.1</version>`
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.6.0-mr1-cdh5.5.0</version>
And I found that the TaskID and TaskType are all in the hadoop-core jar, but not in the same package. Why the mapred.TaskID can refer the mapreduce.TaskType ?
Oh,I have resolve this problem,add the maven dependency
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.6.0-cdh5.5.0</version>
</dependency>
the error disappear!
I have also faced such issue . It basically due to jar issue only.
Add the Jar file from Maven spark-core_2.10
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>2.0.2</version>
</dependency>
After changing the Jar file

Elasticsearch Java API from web application-error: java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.threadpool.ThreadPool

I couldn't use elasticsearch Java API through JSP. In the following, I have tried to explain what I have done. :|
I have installed elasticseach 2.3.3 on my system by following the elastic instruction and run it from command prompt. everything is working perfectly. It may be useful to say, I have changed below parameters from elasticsearch.yml.
cluster.name: cluster_233
node.name: node_233
bootstrap.mlockall: true
network.host: 127.0.0.1
Then with Netbeans, I have created a Maven project -> Web application project and set the below dependency in pom.xml :
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>2.3.3</version>
<type>jar</type>
</dependency>
Also I have added the Guava version 18 dependency into project and then download all the project dependencies by right clicking on Dependencies and select Download Declared Dependencies . Then created a java class and write below code:
package com.mycompany.esmaven;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.node.Node;
import org.elasticsearch.node.NodeBuilder;
public class aClass {
public String test() throws Exception {
String str = tryToIndex();
String dfd = "";
return str;
}
public String tryToIndex() throws Exception {
Node node = NodeBuilder.nodeBuilder().settings(
Settings.builder()
.put("path.home", "d:/elasticsearch-2.3.3")
.put("cluster.home", "cluster_233")
).node();
Client client = node.client();
client.prepareIndex("kodcucom", "article", "1")
.setSource(putJsonDocument("ElasticSearch: Java API",
"ElasticSearch provides the Java API, all operations "
+ "can be executed asynchronously using a client object.",
new Date(),
new String[]{"elasticsearch"},
"Hüseyin Akdoğan")).execute().actionGet();
node.close();
return "Done";
}
public static Map<String, Object> putJsonDocument(String title,
String content, Date postDate, String[] tags, String author) {
Map<String, Object> jsonDocument = new HashMap<String, Object>();
jsonDocument.put("title", title);
jsonDocument.put("conten", content);
jsonDocument.put("postDate", postDate);
jsonDocument.put("tags", tags);
jsonDocument.put("author", author);
return jsonDocument;
}
}
And through a jsp page tried to call the test() function (I'm going to integrate elasticsearch with a web application). Always after building the project at the first load the below error will appear:
java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;
and after refreshing the page the context of error will change to:
java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.threadpool.ThreadPool
This is the POM:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.ISTEX</groupId>
<artifactId>mvnESwebapp</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>war</packaging>
<name>mvnESwebapp</name>
<properties>
<endorsed.dir>${project.build.directory}/endorsed</endorsed.dir>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>2.3.3</version>
</dependency>
<dependency>
<groupId>javax</groupId>
<artifactId>javaee-web-api</artifactId>
<version>7.0</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>
Also, I would like to mention, with this POM I can index my JSON from the main function. But the problem is I did not know how to run the application through JSP pages.
I really appreciate you for sharing your knowledge.
Regards,
Amin
FWIW, I ran into the same issue as described above -- that is, the error message I saw was the threadpool initialization error the author describes. The solution described in the link below solved the problem for me:
https://discuss.elastic.co/t/could-not-initialize-class-org-elasticsearch-threadpool-threadpool/47575
UPDATED per comment suggestion:
In my case, the fix was to add a guava dependency entry in my POM file. I used the dependency given in the webpage at the link above:
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>19.0</version>
</dependency>
That, at least in my case, solved the problem.
With Java applications, errors like
java.lang.NoSuchMethodError
java.lang.NoClassDefFoundError
generally indicate that you are missing a dependency or you have conflicting dependency. For example, Guava 18.0 and Guava 19.0 are completely different as far as Java is concerned, but they share a lot of code. As the jars are loaded, one will naturally be loaded first, so any attempted usage of the second will cause misleading errors like those above.
Also I have added the Guava version 18 dependency into project and then download all the project dependencies by right clicking on Dependencies and select Download Declared Dependencies
Elasticsearch 2.3.3 already depends on Guava 18.0. As such, it's a transitive dependency of the Elasticsearch project.
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>2.3.3</version>
<type>jar</type>
</dependency>
Your Maven dependency is probably creating a collision with one of your other dependencies. Take a look at your Netbeans dependencies, or more appropriately ask Maven to do it directly:
mvn dependency:tree -Dverbose
This will print out the dependency tree for you to find conflicts. Look for jars that are duplicated with different versions and stop the mismatch from happening.
As a side note, at the time of this answer, Guava 19 is the latest version. So even though ES 2.3.3 wants Guava 18, some other dependency of yours could easily and reasonably want a different version of Guava.

Servlet 500 error ClassNotFound exception

I'm building a web app using Vaadin and it needs to communicate with several REST APIs. I've set it up in IntelliJ with Maven. I was thinking for the REST client I would use GSON to parse the JSON objects I'd be receiving from the open APIs, however, the application crashes due to a servlet exception error.
Caused by:
java.lang.ClassNotFoundException: com.google.gwt.json.client.JSONObject
I've added the GSON dependency to the pom.xml:
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.3.1</version>
</dependency>
And have tried changing the module settings from Provided to Compiled to Runtime but with no change.
I'm just stumped as to why the GSON jar appears in the project dependencies within IntelliJ using Maven but fails on run time. I've seen references to Eclipse and including the jar in the classpath but, again, I'm using IntelliJ/Maven to build my Vaadin project and satisfy dependencies.
Any help is greatly appreciated!
Add below dependency in your classpath:
<dependency>
<groupId>com.google.gwt</groupId>
<artifactId>gwt-user</artifactId>
<version>2.3.0</version>
</dependency>

java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found

I am trying to use the Hadoop HDFS Java API to list all files in HDFS.
I am able to list the files on remote HDFS by running the code in my local eclipse.
But i get the exception
java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2290)
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2303)
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87)
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2342)
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2324)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:163)
when I execute the code from a web server.
I have added the below maven dependencies.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.0.0-cdh4.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
<version>2.0.0-cdh4.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.0.0-cdh4.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.0.0-mr1-cdh4.5.0</version>
</dependency>
Also I have embedded the required jars into the exported jar and maven has added the same in the buildpath.
If any one has encountered this issue earlier request you to please share the solution.
I am facing a similar issue with Apache hadoop 2.2.0 realease, I did a workaround by running it as a separate process, by
final Process p = Runtime.getRuntime ().exec ("java -jar {jarfile} {classfile}";
final Scanner output = new Scanner (p.getErrorStream ());
while (output.hasNext ()) {
try {
System.err.println (output.nextLine ());
} catch (final Exception e) {
}
}
The jar file contains the implementation using the apache hadoop 2.2.0 jars.
Though, I am still searching for an exact solution.
For me, hadoop-hdfs-2.6.0.jar was missing in zeppelin server's lib dir. I copied in zeppelin lib forder and my problem was resolved. :)
and add dependency for hadoop-hdfs-2.6.0.jar in pom.xml also.

Maven fails to download CoreNLP models

When building the sample application from the Stanford CoreNLP website, I ran into a curious exception:
Exception in thread "main" java.lang.RuntimeException: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:493)
…
Caused by: java.io.IOException: Unable to resolve "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as either class path, filename or URL
…
This only happened when the property pos and the ones after it were included in the properties.
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Here is the dependency from my pom.xml:
<dependencies>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.2.0</version>
<scope>compile</scope>
</dependency>
</dependencies>
I actually found the answer to that in the problem description of another question on Stackoverflow.
Quoting W.P. McNeill:
Maven does not download the model files automatically, but only if you
add models line to the .pom. Here is a .pom
snippet that fetches both the code and the models.
Here's what my dependencies look like now:
<dependencies>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.2.0</version>
<classifier>models</classifier>
</dependency>
</dependencies>
The important part to note is the entry <classifier>models</classifier> at the bottom. In order for Eclipse to maintain both references, you'll need to configure a dependency for each stanford-corenlp-3.2.0 and stanford-corenlp-3.2.0-models.
In case you need to use the models for other languages (like Chinese, Spanish, or Arabic) you can add the following piece to your pom.xml file (replace models-chinese with models-spanish or models-arabic for these two languages, respectively):
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.8.0</version>
<classifier>models-chinese</classifier>
</dependency>
With Gradle apparently you can use:
implementation 'edu.stanford.nlp:stanford-corenlp:3.9.2'
implementation 'edu.stanford.nlp:stanford-corenlp:3.9.2:models'
or if you use compile (depricated):
compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version: '3.9.2'
compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version: '3.9.2' classifier: 'models'

Resources