Selecting appropriate JAR files to run map reduce from eclipse - hadoop

I have hadoop 2.0.0 CDH -4.7.0 installed . I want to set up my eclipse to run my own map reduce programs but I am not sure which hadoop jAR files to add as a reference . I have added all jar files but that didn't work for me .
Can any one please help me to get the list of jar files and their location in cdh 4.7.0 to run mr job from eclipse ?
Thanks

This is depend on your program that which type of function you are used.
In this image jar names are showing. This is compulsory for simple hadoop map reduce program.

Related

Run Spark job with properties files

As a beginner of stack Hadoop, I would like to run my Spark job with spark-submit via Oozie. Having an jar including src compiling project files, I have also a set of properties files (about 20). I want that, when running my spark Job, we can load these properties files from a different folder beside the folder including my Spark Job compiled jar. I've tried:
In my job.properties of oozie, I added:
oozie.libpath=[path to the folder including all of my properties files]
and oozie.use.system.libpath=true.
on the spark-submit command, I added --files or --properties-file but it's not working (It doesn't accept the folder)
Thanks for any suggestions or feel free to ask more if my question is not clear.

Can I run JAR file which includes another JAR file under lib folder in HDInsight?

Is it possible to run a JAR file in HDInsight which includes another JAR file under the lib folder?
JAR file
├/folder1/subfolder1/myApp/…
│    └.class file
|
|
└ lib/dependency.jar // library (jar file)
Thank you!
On HDInsight, we should be able to run a Java MapReduce JAR, which has a dependency on another JAR. There are a few ways to do this, but typically not by copying the second JAR under lib folder on headnode.
Reasons are – Depending on where the dependency is, you may need to copy the JAR under the lib folder of all worker nodes and headnodes – becomes a tedious task. Also, this change will be erased when the node gets re-imaged by Azure, and hence not a supported way.
Now, there are two types of dependencies –
1. MapReduce driver class has dependency on another external JAR
2. Map or reduce task has dependency on another JAR, where Map or Reduce functions calls an API on the external JAR.
Scenario #1 (MapReduce driver class depends on another JAR):
we can use one of the following options –
a. Copy your dependency JAR to a local folder (like d:\test on windows HDI) on the headnode and then use RDP to append this path to HADOOP_CLASSPATH environment variable on head node– this is suitable for dev/test to run jobs directly from headnode, but won’t work with remote job submissions. So this is not suitable for production scenarios.
b. Using a ‘fat or uber jar’ to include all the dependent jars inside your JAR – you can use Maven ‘Shade’ plugin , example here
Scenario #2 ( Map or Reduce function calls API on external JAR) -
Basically use –libjars option.
If you want to run the mapreduce JAR from Hadoop command line -
a. Copy the Mapreduce JAR to a local path (like d:\test )
b. Copy the dependent JAR on WASB
Example of running a mapreduce JAR with dependency-
hadoop jar D:\Test\BlobCount-0.0.1-SNAPSHOT.jar css.ms.BlobCount.BlobCounter -libjars wasb://mycontainername#azimwasb.blob.core.windows.net/mrdata/jars/microsoft-windowsazure-storage-sdk-0.6.0.jar -DStorageAccount=%StorageAccount% -DStorageKey=%StorageKey% -DContainer=%Container% /mcdpoc/mrinput /mcdpoc/mroutput
The example is using HDInsight windows – you can use similar approach on HDInsight Linux as well.
Using PowerShell or .Net SDK (remote job submission) –With PowerShell, you can use the –LibJars parameter to refer to dependent jars.
you can review the following documentations, these have various examples of using powerShell, SSH etc.
https://azure.microsoft.com/en-us/documentation/articles/hdinsight-use-mapreduce/
https://azure.microsoft.com/en-us/documentation/articles/hdinsight-use-mapreduce/
I hope it helps!
Thanks,
Azim

Missing of hadoop-mapreduce-client-core-[0-9.]*.jar in hadoop1.2.1

I have installed Hadoop 1.2.1 on a three node cluster. while installing Oozie, When i try to generate a war file for the web console, I get this error.
hadoop-mapreduce-client-core-[0-9.]*.jar' not found in '/home/hduser/hadoop'
I believe the version of Hadoop that I am using doesn't have this jar file(don't know where to find them). So can anyone please tell me how to create a war file and enable the web console. Any help is appreciated.
You are correct. You have 2 options :
1. Download individual jars and put them inside your hadoop1.2.1 directory and generate the war file.
2. Download Hadoop 2.x and use it while creating the war file and once it has been built continue using your hadoop1.2.1.
For example : oozie-3.3.2 bin/oozie-setup.sh prepare-war -hadoop
hadoop-1.1.2 ~/hadoop-eco/hadoop-2.2.0 -extjs
~/hadoop-eco/oozie/oozie-3.3.2/webapp/src/main/webapp/ext-2.2
Here I have built Oozie-3.3.2 to use it with hadoop-1.1.2 but using hadoop-2.2.0
HTH

Example Jar in Hadoop release

I am learning Hadoop with book 'Hadoop in Action' by Chuck Lam. In first chapter the books says that Hadoop installation will have example jar and by running 'hadoop jar hadoop-*-examples.jar' will show all the examples. But when I run the command then it throw error 'Could not find or load main class org.apache.hadoop.util.RunJar'. My guess is that installed Hadoop doesn't have example jar. I have installed 'hadoop-2.1.0-beta.tar.gz' on cygwin on Win 7 laptop. Please suggest how to get example jar.
run following command
hadoop jar PathToYourJarFile wordcount inputPath OutputPath
you can get examples jar file at your hadoop installation directory
What I can suggest here is you should manually go to the Hadoop installation directory and look for a jar name similar to hadoop-examples.jar yourself. Different distribution can have different names for the jar.
If you are in Cygwin, while in the Hadoop Installation directory you can also do a ls *examples*.jar to find the same, narrowing down the file listing to any jar file containing examples as a string.
You can then directly use the jar file name like --
hadoop jar <exampleJarYourFound.jar>
Hope this takes you to a solution.

How does zookeeper determine the 'java.library.path' for a hadoop job?

I am running hadoop jobs on a distributed cluster using oozie. I give a setting 'oozie.libpath' for the oozie jobs.
Recently, I have deleted few of my older version jar files from the library path oozie uses and I have replaced them with newer versions. However, when I run my hadoop job, my older version of jar files and newer version of jar files both get loaded and the mapreduce is still using the older version.
I am not sure where zookeeper is loading the jar files from. Are there any default settings that it loads the jar files from ? There is only one library path in my HDFS and it does not have those jar files.
I have found what is going wrong. The wrong jar is shipped with my job file. Should have check here first

Resources