Is maven JAR runnable on hadoop? - maven

To produce jar from hadoop mapreduce program(mapreduce wordcount example) i used maven.
Here i successfully done 'clean' and 'install'.
Also 'build' successfully by running as a Java Application by including arguments(input and output).
And it provided expected result successfully.
Now the problem is not running on hadoop.
Giving the following error:
Exception in thread "main" java.lang.ClassNotFoundException: WordCount
Is maven JAR runnable on hadoop?

Maven is a build tool which creates a Java Artifact. Any JAR containing the hadoop dependencies and the class having main() method in the Manifest file should be working with the hadoop.
Try running your JAR using the below command
hadoop jar your-jar.jar wordcount input output
where "wordcount" is the name of the class with main method,"input" and "output" are the arguments.

Two things
1) I think you are missing the package details before the classname. Copy your package name and put it before the classname and it should work.
hadoop jar /home/user/examples.jar com.test.examples.WordCount /home/user/inputfolder /home/user/outputfolder
PS: If you are using a jar for which the source code is not available with you, you can do
jar -tvf /home/user/examples.jar
and it will print all the classses with their folder names. Replace the "/" with "." (dot) and you get the package name. But this needs JDK (not JRE) in the PATH.
2) You are trying to run a MapReduce program from a Windows prompt. Are you sure you have Hadoop installed on your Windows?

Related

how to add xml mahout classifier jar into hadoop cluster, as i dont want to add that library into hadoop classpath

I am parsing xml file using XMLInputFormat.class which is present in mahout-exmaples jar. but while running the jar file of map reduce i am getting below error
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.mahout.classifier.bayes.XmlInputFormat not found
Please let me know how can i make these jars available while running on multinode hadoop cluster.
Include the all mahout-examples JARs in the “-libjars” command line option of the hadoop jar ... command. The jar will be placed in distributed cache and will be made available to all of the job’s task attempts. More specifically, you will find the JAR in one of the ${mapred.local.dir}/taskTracker/archive/${user.name}/distcache/… subdirectories on local nodes.
Please refer this link for more details.

Can I run JAR file which includes another JAR file under lib folder in HDInsight?

Is it possible to run a JAR file in HDInsight which includes another JAR file under the lib folder?
JAR file
├/folder1/subfolder1/myApp/…
│    └.class file
|
|
└ lib/dependency.jar // library (jar file)
Thank you!
On HDInsight, we should be able to run a Java MapReduce JAR, which has a dependency on another JAR. There are a few ways to do this, but typically not by copying the second JAR under lib folder on headnode.
Reasons are – Depending on where the dependency is, you may need to copy the JAR under the lib folder of all worker nodes and headnodes – becomes a tedious task. Also, this change will be erased when the node gets re-imaged by Azure, and hence not a supported way.
Now, there are two types of dependencies –
1. MapReduce driver class has dependency on another external JAR
2. Map or reduce task has dependency on another JAR, where Map or Reduce functions calls an API on the external JAR.
Scenario #1 (MapReduce driver class depends on another JAR):
we can use one of the following options –
a. Copy your dependency JAR to a local folder (like d:\test on windows HDI) on the headnode and then use RDP to append this path to HADOOP_CLASSPATH environment variable on head node– this is suitable for dev/test to run jobs directly from headnode, but won’t work with remote job submissions. So this is not suitable for production scenarios.
b. Using a ‘fat or uber jar’ to include all the dependent jars inside your JAR – you can use Maven ‘Shade’ plugin , example here
Scenario #2 ( Map or Reduce function calls API on external JAR) -
Basically use –libjars option.
If you want to run the mapreduce JAR from Hadoop command line -
a. Copy the Mapreduce JAR to a local path (like d:\test )
b. Copy the dependent JAR on WASB
Example of running a mapreduce JAR with dependency-
hadoop jar D:\Test\BlobCount-0.0.1-SNAPSHOT.jar css.ms.BlobCount.BlobCounter -libjars wasb://mycontainername#azimwasb.blob.core.windows.net/mrdata/jars/microsoft-windowsazure-storage-sdk-0.6.0.jar -DStorageAccount=%StorageAccount% -DStorageKey=%StorageKey% -DContainer=%Container% /mcdpoc/mrinput /mcdpoc/mroutput
The example is using HDInsight windows – you can use similar approach on HDInsight Linux as well.
Using PowerShell or .Net SDK (remote job submission) –With PowerShell, you can use the –LibJars parameter to refer to dependent jars.
you can review the following documentations, these have various examples of using powerShell, SSH etc.
https://azure.microsoft.com/en-us/documentation/articles/hdinsight-use-mapreduce/
https://azure.microsoft.com/en-us/documentation/articles/hdinsight-use-mapreduce/
I hope it helps!
Thanks,
Azim

Hadoop job DocumentDB dependency jar file

I have a hadoop job which gets its input from azure documentdb. I have put the documentdb jar dependency files under a directory called 'lib'. However when I ran the job it gives me a ClassNotFound error message for one of the classes in the jar file. I also tried adding the jar files using the -libjars option but it didn't work either. Does anyone have any idea what can be wrong?

Example Jar in Hadoop release

I am learning Hadoop with book 'Hadoop in Action' by Chuck Lam. In first chapter the books says that Hadoop installation will have example jar and by running 'hadoop jar hadoop-*-examples.jar' will show all the examples. But when I run the command then it throw error 'Could not find or load main class org.apache.hadoop.util.RunJar'. My guess is that installed Hadoop doesn't have example jar. I have installed 'hadoop-2.1.0-beta.tar.gz' on cygwin on Win 7 laptop. Please suggest how to get example jar.
run following command
hadoop jar PathToYourJarFile wordcount inputPath OutputPath
you can get examples jar file at your hadoop installation directory
What I can suggest here is you should manually go to the Hadoop installation directory and look for a jar name similar to hadoop-examples.jar yourself. Different distribution can have different names for the jar.
If you are in Cygwin, while in the Hadoop Installation directory you can also do a ls *examples*.jar to find the same, narrowing down the file listing to any jar file containing examples as a string.
You can then directly use the jar file name like --
hadoop jar <exampleJarYourFound.jar>
Hope this takes you to a solution.

Unable to execute Map/Reduce job

I've been trying to figure out how execute my Map/Reduce job for almost 2 days now. I keep getting a ClassNotFound exception.
I've installed a Hadoop cluster in Ubuntu using Cloudera CDH4.3.0. The .java file (DemoJob.java which is not inside any package) is inside a folder called inputs and all required jar files are inside inputs/lib.
I followed http://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_topic_5_2.html for reference.
I compile the .java file using:
javac -cp "inputs/lib/hadoop-common.jar:inputs/lib/hadoop-map-reduce-core.jar" -d Demo inputs/DemoJob.java
(In the link, it says -cp should be "/usr/lib/hadoop/:/usr/lib/hadoop/client-0.20/". But I don't have those folders in my system at all)
Create jar file using:
jar cvf Demo.jar Demo
Move 2 input files to HDFS
(Now this is where I'm confused. Do I need to move the jar file to HDFS as well? It doesn't say so in the link. But if it is not in HDFS, then how does the hadoop jar .. command work? I mean how does it combine the jar file which is in Linux system and the input files which are in HDFS?)
I run my code using:
hadoop jar Demo.jar DemoJob /Inputs/Text1.txt /Inputs/Text2.txt /Outputs
I keep getting ClassNotFoundException : DemoJob.
Somebody please help.
The class not found exception only means that some class wasn't found when class DemoJob was loaded. The missing class could have been a class referenced (imported, for example) by DemoJob. I think the problem is that you don't have the /usr/lib/hadoop/:/usr/lib/hadoop/client-0.20/ folders (classes) in your class path. It's the classes that should be there but aren't that probably are triggering the class not found exception.
Finally figured out what the problem was. Instead of creating a jar file from a folder, I directly created the jar file from the .class files using jar -cvf Demo.jar *.class
This resolved the ClassNotFound error. But I don't understand why it was not working earlier. Even when I created the jar file from a folder, I did mention the folder name when executing the class file as:hadoop jar Demo.jar Demo.DemoJob /Inputs/Text1.txt /Inputs/Text2.txt /Outputs

Resources