I am running a Spark Program written in java & I am using the sample wordcount example.
I have created a jar file but, when I am submitting the spark job it is throwing an error.
$ spark-submit --class WordCount --master local \ home/cloudera/workspace/sparksample/target/sparksample-0.0.1-SNAPSHOT.jar
I am getting the below error
java.lang.ClassNotFoundException: wordCount
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.util.Utils$.classForName(Utils.scala:175)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Edited
i am also adding my pom.xml so that you can help.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.igi.sparksample</groupId>
<artifactId>sparksample</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
</dependencies>
</project>
After trying so many combinations and doing a bit R&D i solved my issue.
Issue was in my spark submit command i changed it to this
spark-submit --class com.xxx.sparksample.WordCount --master local /home/cloudera/workspace/sparksample/target/sparksample-0.0.1-SNAPSHOT.jar
and it worked.
It can't find the WordCount class. You probably need to include the package that class is in, so you have the full classpath, ie:
--class <PACKAGE>.WordCount
The error you posted doesn't show any problem with Spark.
However, you must have a typo in your program. Java threw a ClassNotFoundException looking for wordCount, where it should most probably be WordCount, with a capital W.
Please check the names of your classes and your imports.
Make sure that the name of class (wordcount or WordCount or whatever...) that you pass to spark-submit is exactly similar to what you have defined.
Make sure that packaging is correct.
To verify, open/extract your jar and see the class name and the package hierarchy.
Related
I'm attempting introduce high availability mode (via JBoss Cache) in my server implementation (essentially an expanded version of the example server) by configuring my Maven project to use jdiameter-ha-api and jdiameter-ha-impl dependencies instead of jdiameter-api and jdiameter-impl, in addition to adding the following extensions to jdiameter-config.xml:
<Extensions>
<SessionDatasource value="org.mobicents.diameter.impl.ha.data.ReplicatedSessionDatasource"/>
<TimerFacility value="org.mobicents.diameter.impl.ha.timer.ReplicatedTimerFacilityImpl"/>
</Extensions>
Now, when I run the server from Eclipse, it works fine, i.e. it start up in clustered mode (w/ JBoss Cache), however, when I attempt to run the jar produced by mvn install, it throws the following error:
2018-10-11 18:24:13,899 - (-)(-)(-)(-)(-) Starting Mobicents DIAMETER Stack v1.7.0-SNAPSHOT (-)(-)(-)(-)(-)
2018-10-11 18:24:13,959 - Failure creating stack 'Server'
org.jdiameter.api.InternalException: java.lang.reflect.InvocationTargetException
at org.jdiameter.client.impl.StackImpl.init(StackImpl.java:135)
at com.company.charging.diameter.ocf.utilities.StackCreator.<init>(StackCreator.java:37)
at com.company.charging.diameter.ocf.utilities.StackCreator.<init>(StackCreator.java:71)
at com.company.charging.diameter.ocf.server.Ocf.<init>(Ocf.java:187)
at com.company.charging.diameter.ocf.server.Ocf.main(Ocf.java:157)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:488)
at org.jdiameter.client.impl.StackImpl.init(StackImpl.java:129)
... 4 more
Caused by: java.lang.ClassNotFoundException: org.mobicents.diameter.impl.ha.timer.ReplicatedTimerFacilityImpl
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:190)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:499)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:291)
at org.jdiameter.client.impl.helpers.AssemblerImpl.fill(AssemblerImpl.java:139)
at org.jdiameter.client.impl.helpers.AssemblerImpl.<init>(AssemblerImpl.java:91)
... 9 more
Given that it starts up in Eclipse just fine, I'm assuming my POM file isn't managing dependencies properly, so that the final jar is missing these classes. Here's the relevant portion of my pom.xml:
<dependencies>
<dependency>
<groupId>org.mobicents.diameter</groupId>
<artifactId>jdiameter-ha-api</artifactId>
<version>${restcomm.diameter.jdiameter.version}</version>
</dependency>
<dependency>
<groupId>org.mobicents.diameter</groupId>
<artifactId>jdiameter-ha-impl</artifactId>
<version>${restcomm.diameter.jdiameter.version}</version>
</dependency>
<dependency>
<groupId>org.mobicents.diameter</groupId>
<artifactId>restcomm-diameter-mux-jar</artifactId>
<version>${restcomm.diameter.mux.version}</version>
</dependency>
</dependencies>
I am able to run wordcount on alluxio with an example jar provided by cloudera, using:
sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar wordcount -libjars /home/nn1/alluxio-1.2.0/core/client/target/alluxio-core-client-1.2.0-jar-with-dependencies.jar alluxio://nn1:19998/wordcount alluxio://nn1:19998/wc1
and it's a success.
But I can't run it when I use the jar created with the ATTACHED CODE, This is also a sample wordcount example
code
sudo -u hdfs hadoop jar /home/nn1/HadoopWordCount-0.0.1-SNAPSHOT-jar-with-dependencies.jar edu.am.bigdata.C45TreeModel.C45DecisionDriver -libjars /home/nn1/alluxio-1.2.0/core/client/target/alluxio-core-client-1.2.0-jar-with-dependencies.jar alluxio://10.30.60.45:19998/abdf alluxio://10.30.60.45:19998/outabdf
Above code is build using maven
Pom.xml file contains
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.6.0-mr1-cdh5.4.5</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0-cdh5.4.5</version>
</dependency>
Could you please help me in running my wordcount program in alluxio cluster. Hope no extra configurations are added into pom file for running the same.
I am getting the following error after running my jar :
java.lang.IllegalArgumentException: Wrong FS:
alluxio://10.30.60.45:19998/outabdf, expected: hdfs://10.30.60.45:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1215)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1412)
at edu.WordCount.run(WordCount.java:47)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at edu.WordCount.main(WordCount.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
The problem comes from the call to
FileSystem fs = FileSystem.get(conf);
on line 101. The FileSystem created by FileSystem.get(conf) will only support paths with the scheme defined by Hadoop's fs.defaultFS property. To fix the error, change that line to
FileSystem fs = FileSystem.get(URI.create("alluxio://nn1:19998/", conf)
By passing a URI, you override fs.defaultFS, enabling the created FileSystem to support paths using the alluxio:// scheme.
You could also fix the error by modifying fs.defaultFS in your core-site.xml
<property>
<name>fs.defaultFS</name>
<value>alluxio://nn1:19998/</value>
</property>
However, this could impact other systems that share the core-site.xml file, so I recommend the first approach of passing an alluxio:// URI to FileSystem.get()
i'm definitely not an expert in mvn, but after 2 days hacking around, i'm just giving up.
my workflow:
1.
mvn archetype:generate
-DarchetypeGroupId=org.apache.flink
-DarchetypeArtifactId=flink-quickstart-scala
-DarchetypeVersion=0.10.1
-DgroupId=org.apache.flink.quickstart
-DartifactId=flink-scala-project
-Dversion=0.1
-Dpackage=org.apache.flink.quickstart
-DinteractiveMode=false
2.
cd flink-scala-project
3.
mvn clean package
here is a build log: https://gist.github.com/zavalit/1e78478ebdda827f3454 and when I run
`java -jar target/flink-scala-project-0.1.jar`
I get
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/flink/api/scala/ExecutionEnvironment$
at org.apache.flink.quickstart.Job$.main(Job.scala:41)
at org.apache.flink.quickstart.Job.main(Job.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.flink.api.scala.ExecutionEnvironment$
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 2 more
The fat jar which you're building is not supposed to be run outside of a cluster environment. Therefore, all Flink related dependencies which run in the cluster environment are excluded from the fat jar.
What you usually do with the generated fat jar is to submit it to a local or remote cluster via bin/flink run -c org.example.MyJob myFatJar.jar. In order to start quickly a local cluster you can run bin/start-local.sh. This will start a local cluster to which you can submit your job jar.
By default, the flink libraries are not included in the fat jar since it would be provided by flink cluster at runtime. To fix that, change the scope of dependencies in pom.xml, from provided to compile:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
link: Maven Doc
I have this code that saves the schemaRDD (person) to a Hive table stored as parquet (person_parquet)
hiveContext.sql("insert overwrite table person_parquet select * from person")
But it throws an error:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory
at org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:399)
at org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:867)
at org.apache.hadoop.hive.ql.session.SessionState.getUserFromAuthenticator(SessionState.java:589)
at org.apache.hadoop.hive.ql.metadata.Table.getEmptyTable(Table.java:174)
at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:116)
at org.apache.hadoop.hive.ql.metadata.Hive.newTable(Hive.java:2566)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:917)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1464)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:137)
at org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.execute(InsertIntoHiveTable.scala:51)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:108)
at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94)
at com.example.KafkaConsumer$$anonfun$main$2.apply(KafkaConsumer.scala:114)
at com.example.KafkaConsumer$$anonfun$main$2.apply(KafkaConsumer.scala:83)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:529)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:529)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:42)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:171)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory
at org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthorizeProviderManager(HiveUtils.java:376)
at org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:381)
... 29 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthorizeProviderManager(HiveUtils.java:366)
... 30 more
I changed my hive-site.xml to this but still throws the same exception
<property>hive.security.authenticator.manager</property>
<value>org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator</value>
<property>hive.security.authorization.enabled</property>
<value>false</value>
<property>hive.security.authorization.manager</property
<value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider</value>
(same hive-site.xml as #1) When I added the hive-exec 1.0 in my dependencies, it threw a different exception (AbstractMethodError)
(same hive-site.xml as #1) I tried adding hive-exec 0.13 to my dependencies. During first run (insert), it still throws an error, but on second and succeeding insert, it's successful.
I am using Sandbox HDP 2.2 (Hive 0.14.0.2.2.0.0-2041) and Spark 1.2.0.
Dependencies:
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>0.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.2.0</version>
</dependency>
"SQLStdConfOnlyAuthorizerFactory" class has been added in hive 0.14.0 version (HIVE-8045) but Spark 1.2 depends on hive 0.13. Your hive-site.xml must be having "hive.security.authorization.manager" set as "org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory " and your classpath musn't be having hive-exec 0.14 JAR thats why its throwing ClassNotFoundException. So either include your hive-exec 0.14.0 JAR in classpath (and before Spark's own hive JARs) or change your entry in hive-site.xml to something like this :-
<property>
<name>hive.security.authorization.manager</name>
<value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider</value>
</property>
Former is not recommended since similar issues may arise further due to hive version mismatch
Changing the value for
hive.security.authorization.manager = org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider
worked.
Changed the hive-site.xml
I think this happens because you have duplicate jars on the classpath.
I'm using the Cloudera-VM. Hadoop version: Hadoop 2.0.0-cdh4.0.0.
I have written an inputFileFormat, when the client calls the getSplits method I get an exception:
IncompatibleClassChangeError found interface org.apache.hadoop.mapreduce.JobContext expecting
I'm using the classes from the mapreduce package not mapred.
However when I look at the stacktrace I see that somewhere along the line the library changes to mapred:
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at com.hadoopApp.DataGeneratorFileInput.getSplits(DataGeneratorFileInput.java:27)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
at com.hadoopApp.HBaseApp.generateData(HBaseApp.java:54)
at com.hadoopApp.HBaseApp.run(HBaseApp.java:24)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.hadoopApp.HBaseApp.main(HBaseApp.java:19)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Not sure if this helps, but i'm using this in my maven pom:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
solved it not sure why
changed my pom to this and started working - not sure why it solved it though - your input is appreciated it:
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.0.0-cdh4.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.0.0-cdh4.2.0</version>
</dependency>
How can get around this?
I've bumped into the same problem when using Hipi on cdh4.2.0.
The problem is caused by incompatibilities between Hadoop versions (jobs build with Hadoop 1 may not work on Hadoop 2). Initially you were building the job with Hadoop v1 and running it on Hadoop 2.0.0 environment (cloudera uses Hadoop 2.0.0).
Fortunately, hadoop 1.x API is fully supported in Hadoop 2.x, so rebuilding the job with newer version of hadoop helps.