Creating jar for running MapReduce on Hadoop 1.2.1 - hadoop

I am new to Hadoop and I have just setup Hadoop 1.2.1 on my Mac laptop (Mavericks). I then created a simple WordCount project in IntelliJ IDEA and was able to run the code on a dummy text file. I am having trouble with successfully creating a jar file which will replicate my execution through the IDE. I get the following error:
java -jar ./out/artifacts/WordCount_jar/WordCount.jar test.txt out [19:35:21]
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:146)
at neu.cs.parallelprogramming.WordCount.main(WordCount.java:48)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 2 more
FAIL: 1
Could anyone let me know what I am missing?

I guess you have to specify your class (which implements the Map/Reduce function).
E.g., $ java -jar ./WordCount.jar classWordCount input.txt output
or $ hadoop jar yourprogram.jar **yourclass** inputpath outputpath

Related

Run generated jar file in apache flink

At the moment I'm trying to run my first flink application. I already tested the java file (KMeans.java) in the IDE and it works perfectly but I can't handle to get this java file run as an jar in command line.
The build was successfully created with mvn clean package.
But if I run my jar file in command line flink run -c KMeans name.jar
this error message appears:
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: The
program's entry point class 'KMeans' was not found in the jar file.
at
org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:617)
at
org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:199)
at
org.apache.flink.client.cli.CliFrontend.buildProgram(CliFrontend.java:856)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:206)
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
Caused by: java.lang.ClassNotFoundException: KMeans at
java.net.URLClassLoader.findClass(URLClassLoader.java:381) at
java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
java.lang.ClassLoader.loadClass(ClassLoader.java:357) at
java.lang.Class.forName0(Native Method) at
java.lang.Class.forName(Class.java:348) at
org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:614)
... 10 more
So I looked up my generated target folder and there is a KMeans.class file in the classes folder. So I'm doing this wrong?
You need to specify the full class name, e.g., org.apache.flink.examples.java.clustering.KMeans.
Note that you only need to use the -c flag if the JAR file doesn't specify the class to run in its manifest.

Spark without Hadoop: Failed to Launch

I'm running Spark 2.1.0, Hive 2.1.1 and Hadoop 2.7.3 on Ubuntu 16.04.
I download the Spark project from github and build the "without hadoop" version:
./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz
"-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided"
When I run ./sbin/start-master.sh, I get the following exception:
Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /home/server/spark/conf/:/home/server/spark/jars/*:/home/server/hadoop/etc/hadoop/:/home/server/hadoop/share/hadoop/common/lib/:/home/server/hadoop/share/hadoop/common/:/home/server/hadoop/share/hadoop/mapreduce/:/home/server/hadoop/share/hadoop/mapreduce/lib/:/home/server/hadoop/share/hadoop/yarn/:/home/server/hadoop/share/hadoop/yarn/lib/ -Xmx1g org.apache.spark.deploy.master.Master --host ThinkPad-W550s-Lab --port 7077 --webui-port 8080
========================================
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
I edit SPARK_DIST_CLASSPATH according to the post Where are hadoop jar files in hadoop 2?
export SPARK_DIST_CLASSPATH=~/hadoop/share/hadoop/common/lib:~/hadoop/share/hadoop/common:~/hadoop/share/hadoop/mapreduce:~/hadoop/share/hadoop/mapreduce/lib:~/hadoop/share/hadoop/yarn:~/hadoop/share/hadoop/yarn/lib
But I'm still getting the same error.
I can see the slf4j jar file is under ~/hadoop/share/hadoop/common/lib.
How could I fix this error?
Thank you!
“Hadoop free” builds need to modify SPARK_DIST_CLASSPATH to include Hadoop’s package jars.
The most convenient place to do this is by adding an entry in conf/spark-env.sh :
export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)
check this https://spark.apache.org/docs/latest/hadoop-provided.html

Error in launching Spark REPL

I got pre-built Spark 1.4.1 and I'm running HDP 2.6. when I try to run spark-shell it gives me an error message as follows.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:111)
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:111)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:111)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:97)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:107)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
What is the issue?
ClassNotFoundException occurs when class loader could not find the
required class in class path . So , basically you should check your
class path and add the class in the classpath.
Check whether hadoop-common-0.21.0.jar is added to your classpath.
Is it possible that your Hadoop home is not set, as in here?
Cannot find hadoop installation: $HADOOP_HOME must be set or hadoop must be in the path

Hadoop getting error Exception in thread "main" java.lang.NoClassDefFoundError:

I am very new to hadoop and map reduce programing.
I downloaded version 1.2.1 and was trying to see some example with command
bin/hadoop jar hadoop*example*.jar
with this command I am getting exception. What is wrong here? is there any problem with installation?
Exception in thread "main" java.lang.NoClassDefFoundError: 1/2/1/hadoop-1/2/1/libexec////logs
Caused by: java.lang.ClassNotFoundException: 1.2.1.hadoop-1.2.1.libexec....logs
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:315)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:330)
at java.lang.ClassLoader.loadClass(ClassLoader.java:250)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:398)
The right command is:
bin/hadoop jar hadoop-*-examples.jar <program name>
If you are using your custom map reduce class, try the following configuration on main method:
job.setJarByClass(WordCount.class);
Reference: http://mydailylearningblog.blogspot.com.br/2011/06/javalangclassnotfoundexception.html

Hadoop inverted index program error

Can somebody tell me what does this error means? and how can I get the output?
Exception in thread "main" java.lang.ClassNotFoundException: org.myorg.LineIndexer
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
This is the code I want to execute :
http://code.google.com/p/hadoop-excercise/source/browse/trunk/lineindexer/LineIndexer.java?spec=svn15&r=15
Is the class included in your jar? Seems not to be the case.
So you have to include the class in the jar, you are passing while starting the job.
I followed the following steps and it worked.
PS: Please make sure you have a sample.txt file in the HDFS and LineIndexer.java in the current directory.
javac -classpath $HADOOP_HOME/hadoop-core.jar *.java
jar cvf li.jar *.class
hadoop jar ii.jar LineIndexer sample.txt li1
hadoop fs -cat li1/part-00000

Resources