Run Spark job with properties files - hadoop

As a beginner of stack Hadoop, I would like to run my Spark job with spark-submit via Oozie. Having an jar including src compiling project files, I have also a set of properties files (about 20). I want that, when running my spark Job, we can load these properties files from a different folder beside the folder including my Spark Job compiled jar. I've tried:
In my job.properties of oozie, I added:
oozie.libpath=[path to the folder including all of my properties files]
and oozie.use.system.libpath=true.
on the spark-submit command, I added --files or --properties-file but it's not working (It doesn't accept the folder)
Thanks for any suggestions or feel free to ask more if my question is not clear.

Related

How does the example find the lib in Oozie best case?

According to the document of Oozie, I try to run a map-reduce example on Oozie. As everyone knows, 'workflow.xml' (and 'coordinator.xml') should be in HDFS.
Then input the command: oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run. And I also know the 'job.properties' should be in local file system.
But there are two things confused me:
1.why dose the jar or class variable in workflow.xml come from directory Lib of HDFS?
2.There is a picture showing the content of oozie-examples-4.3.1.jar. This jar is in HDFS, how can it import Lib?
Forgive my poor English.
The highlighted red box is part of the Hadoop and Java default classpath. Any Java code that's ran within YARN, as part of MapReduce has access to the packages that appear when you run hadoop classpath command. By the way, mapred.* classes of Hadoop are almost all deprecated
That's nothing to do with Oozie, per say, but Oozie extends the Hadoop classpath with the Oozie ShareLib, which must be explicitly enabled with a property file argument
oozie.use.system.libpath=true
And in addition to that classpath,, Oozie will send the ${wf.application.path}/lib directory to all running jobs

Hadoop confs for client application

I have a client application that uses the hadoop conf files (hadoop-site.xml and hadoop-core.xml)
I don't want to check it in on the resources folders, so I try to add it via idea.
The problem is that the hadoop Confs ignores my HADOOP_CONF_DIR and loads the default confs from the hadoop package. Any ideia ?
I'm using gradle
I end up solving it by putting the configuration files on test resources folder. So when the jar gets build it does not take it.

Selecting appropriate JAR files to run map reduce from eclipse

I have hadoop 2.0.0 CDH -4.7.0 installed . I want to set up my eclipse to run my own map reduce programs but I am not sure which hadoop jAR files to add as a reference . I have added all jar files but that didn't work for me .
Can any one please help me to get the list of jar files and their location in cdh 4.7.0 to run mr job from eclipse ?
Thanks
This is depend on your program that which type of function you are used.
In this image jar names are showing. This is compulsory for simple hadoop map reduce program.

Example Jar in Hadoop release

I am learning Hadoop with book 'Hadoop in Action' by Chuck Lam. In first chapter the books says that Hadoop installation will have example jar and by running 'hadoop jar hadoop-*-examples.jar' will show all the examples. But when I run the command then it throw error 'Could not find or load main class org.apache.hadoop.util.RunJar'. My guess is that installed Hadoop doesn't have example jar. I have installed 'hadoop-2.1.0-beta.tar.gz' on cygwin on Win 7 laptop. Please suggest how to get example jar.
run following command
hadoop jar PathToYourJarFile wordcount inputPath OutputPath
you can get examples jar file at your hadoop installation directory
What I can suggest here is you should manually go to the Hadoop installation directory and look for a jar name similar to hadoop-examples.jar yourself. Different distribution can have different names for the jar.
If you are in Cygwin, while in the Hadoop Installation directory you can also do a ls *examples*.jar to find the same, narrowing down the file listing to any jar file containing examples as a string.
You can then directly use the jar file name like --
hadoop jar <exampleJarYourFound.jar>
Hope this takes you to a solution.

How to execute map reduce program(ex. wordcount) from HDFS and see the output?

I am new to Hadoop. I have a simple wordcount program in eclipse which takes input files and then shows the output. But I need to execute the same program from HDFS. I have already created a JAR file for the wordcount program.
Can any one pls let me know how to proceed?
You need to have a cluster set up, even if is a single node cluster. Then you can run your .jar from the hadoop command line:
jar
Runs a jar file. Users can bundle their Map Reduce code in a jar
file and execute it using this command.
Usage: hadoop jar <jar> [mainClass] args...
The streaming jobs are run via this command. Examples can be referred
from Streaming examples
Word count example is also run using jar command. It can be referred
from Wordcount example
Initially you need to set up a hadoop cluster as discussed by Remus.
Single Node SetUp and Multi Node SetUp are two good way to start with.
Once you have the set up done, start hadoop daemons and copy the input files into any hdfs directory.
Prepare the jar of your program.
Run the jar on the terminal using hadoop jar <you jar name> <your main class> <input path><output directory path>
(The jar arguments depend on your program)

Resources