Running jar on hadoop server as a service - hadoop

i made one jar
which analyzes system logs .. for running this jar on HADOOP server i can do it using command line like "bin/hadoop jar log.jar"
but my problem is i want to make this jar executable in background as a service on Ubuntu master machine.
can any one help me how can i make HADOOP jar as a service so it can run like a background service on Ubuntu Machine .. runs after every 1hrs.

You have a few options, here's two:
Configure a crontab job to run your job every hour, something like (you'll need to fully qualify the path to hadoop and the jar itself):
0 * * * * /usr/lib/hadoop/bin/hadoop jar /path/to/jar/log.jar
Run an OOZIE server and configure a coordinator to submit the job on an hourly basis. More effort that the above suggestion but worth a look.

Related

MapReduce not starting on Single Node

I need help, I have installed hadoop-2.6 on Ubuntu 14.04 with HIPI configuration and all services are up and running.However while running jar file and using the following command doesnt execute a map-reduce job: hadoop jar faceCount.jar /project/input.hib and project/output
The job is planned but nothing is happening...
This is http://macvaio-vpceh28fn:8088/cluster

Running Spark Jobs via Oozie

Is it possible to run Spark Jobs e.g. Spark-sql jobs via Oozie?
In the past we have used Oozie with Hadoop. Since we are now using Spark-Sql on top of YARN, looking for a way to use Oozie to schedule jobs.
Thanks.
Yup its possible ... The procedure is also same, that you have to provide Oozia a directory structure having coordinator.xml, workflow.xml and a lib directory containing your Jar files.
But remember Oozie starts the job with java -cp command, not with spark-submit, so if you have to run it with Oozie, Here is a trick.
Run your jar with spark-submit in background.
Look for that process in process list. It will be running under java -cp command but with some additional Jars, that are added by spark-submit. Add those Jars in CLASS_PATH. and that's it. Now you can run your Spark applications through Oozie.
1. nohup spark-submit --class package.to.MainClass /path/to/App.jar &
2. ps aux | grep '/path/to/App.jar'
EDITED: You can also use latest Oozie, which has Spark Action also.
To run Spark SQL by Oozie you need to use Oozie Spark Action.
You can locate oozie.gz on your distribution. Usually in cloudera you can find this oozie examples directory at below path.
]$ locate oozie.gz
/usr/share/doc/oozie-4.1.0+cdh5.7.0+267/oozie-examples.tar.gz
Spark SQL need hive-site.xml file for execution which you need to provide in workflow.xml
< spark-opts>--file /hive-site.xml < /spark-opts>

Differences between hadoop jar and yarn -jar

what's the difference between run a jar file with commands "hadoop jar " and "yarn -jar " ?
I've used the "hadoop jar" command on my MAC successfully but I want be sure that the execution is being correct and parallel on my four cores.
Thanks!!!
Short Answer
They are probably identical for you, but even if they aren't, they should both utilize your cluster to the best of its ability.
Longer Answer
The /usr/bin/yarn script sets up the execution environment so that all of the yarn commands can be run. The /usr/bin/hadoop script isn't quite as concerned about yarn specific functionality. However, if you have your cluster set up to use yarn as the default implementation of mapreduce (MRv2), then hadoop jar will probably act the same as yarn jar for a mapreduce job.
Either way you're probably fine, but you can always check the resource manager (or job tracker) web interface to see how your job is distributed across the cluster (whether it's a single node cluster or not)

Apache Giraph on EMR

Has any tried Apache Giraph on EMR?
It seems to me the only requirements to run on EMR are to add proper bootstrap scripts to the Job Flow configuration. Then I should just need to use a standard Custom JAR launch step to launch the Giraph Runner with appropriate arguments for my Giraph program.
Any documentation/tutorial or if you could just share your experience with Giraph on EMR, that will be much appreciated.
Yes, I run Giraph jobs on EMR regularly but I don't use "Job Flows", I manually login to the master node and use it as a normal Hadoop cluster (I just submit the job with hadoop jar command).
You are right, you need to add bootstrap scripts to run Zookeeper and to add Zookeeper details to core-site config. Here is how I did it :
Bootstrap actions -
Configure Hadoop s3://elasticmapreduce/bootstrap-actions/configure-hadoop --site-key-value, io.file.buffer.size=65536, --core-key-value, giraph.zkList=localhost:2181, --mapred-key-value, mapreduce.job.counters.limit=1200
Run if s3://elasticmapreduce/bootstrap-actions/run-if instance.isMaster=true, s3://hpc-chikitsa/zookeeper_install.sh
The contents of zookeeper_install.sh are :
#!/bin/bash
wget --no-check-certificate http://apache.mesi.com.ar/zookeeper/zookeeper3.4./zookeeper3.4.5.tar.gz
tar zxvf zookeeper-3.4.5.tar.gz
cd zookeeper-3.4.5
mv conf/zoo_sample.cfg conf/zoo.cfg
sudo bin/zkServer.sh start
Then copy your Giraph jar file to master node (using scp) and then ssh to master node and submit the job using hadoop jar command.
Hope that helps.
Here is a relevant mail-thread on giraph-user mailing list :
https://www.mail-archive.com/user%40giraph.apache.org/msg01240.html

Using different hadoop-mapreduce-client-core.jar to run hadoop cluster

I'm working on a hadoop cluster with CDH4.2.0 installed and ran into this error. It's been fixed in later versions of hadoop but I don't have access to update the cluster. Is there a way to tell hadoop to use this jar when running my job through the command line arguments like
hadoop jar MyJob.jar -D hadoop.mapreduce.client=hadoop-mapreduce-client-core-2.0.0-cdh4.2.0.jar
where the new mapreduce-client-core.jar file is the patched jar from the ticket. Or must hadoop be completely recompiled with this new jar? I'm new to hadoop so I don't know all the command line options that are possible.
I'm not sure how that would work as when you're executing the hadoop command you're actually executing code in the client jar.
Can you not use MR1? The issue says this issue only occurs when you're using MR2, so unless you really need Yarn you're probably better using the MR1 library to run your map/reduce.

Resources