Parallel Map Reduce Jobs in Hadoop - hadoop

I have to run in hadoop 1.0.4 many (maybe 12) jobs. I want tha five first to run in parallel, and when all finish to run 4 other jobs in parallel and at last to run the last 3 again to run in parallel. How can i set it in hadoop 1.0.4 as i see that all jobs run one each other and not in parallel.

JobControl API can be used for MR job dependency. For complex work flows, Oozie or Azkaban is recommended. Here is Oozie vs Azkaban,

Related

Yarn - executor for spark job

Process spark = new SparkLauncher()
.setAppResource("myApp.jar")
.setMainClass("com.aa.bb.app")
.setMaster("yarn")
.setDeployMode( "cluster")
.addAppArgs( data)
.launch();
This is how I executed my spark jar to yarn cluster. Here are some question below.
Is this processing with a executor? ( 1 spark-submit per 1 yarn executor?)
How should I executed multi spark job concurrently? (Where should I set dynamic allocation(spark.dynamicAllocation.enabled)?)
Where should I set number of executor configuration? In java code? In yarn xml?
If I set number of executor as 2, and process single job, one of executor will do nothing?
Need to do nothing for this. It allocated automatically.

Find running job priority

How can I find the priority used by a job running in Hadoop?
I tried to use Hadoop commands like hadoop job, yarn container, or mapred job, etc., but couldn't find how to get the running job priority.
You can use getJobPriority() method in your mapreduce code.
Use:
hadoop job -list
...it will show you the information of all running jobs with priority.
hadoop job -list all
...will show you the information of all the job(Running,Success,Fail) with priority.

How run Pig jobs in sequence one by one

I have requirement to run pig jobs in sequence without manual interaction .
Could you please advise me is there anyway to automate pig jobs by using pig / some other way
Assume jobs :
JOB001
JOB002
JOB003
JOB004
JOB001 -- Is my 1 st JOB --> after successful run 'JOB001' It should trigger 'JOB002'
JOB002 --> after successful run 'JOB002' It should trigger 'JOB003'
JOB003 --> after successful run 'JOB003' It should trigger 'JOB004'.
Oozie is the tool for you.
Simply create a workflow connecting one Pig job to another.
oozie is designed to schedule the hadoop jobs. see here for running pig jobs in oozie. Quick start help you to begin with oozie.

Differences between hadoop jar and yarn -jar

what's the difference between run a jar file with commands "hadoop jar " and "yarn -jar " ?
I've used the "hadoop jar" command on my MAC successfully but I want be sure that the execution is being correct and parallel on my four cores.
Thanks!!!
Short Answer
They are probably identical for you, but even if they aren't, they should both utilize your cluster to the best of its ability.
Longer Answer
The /usr/bin/yarn script sets up the execution environment so that all of the yarn commands can be run. The /usr/bin/hadoop script isn't quite as concerned about yarn specific functionality. However, if you have your cluster set up to use yarn as the default implementation of mapreduce (MRv2), then hadoop jar will probably act the same as yarn jar for a mapreduce job.
Either way you're probably fine, but you can always check the resource manager (or job tracker) web interface to see how your job is distributed across the cluster (whether it's a single node cluster or not)

Does default mahout programs runs over hadoop in cluster

I have 3 operations from Mahout and I want them to run over Multi-Node Hadoop cluster.
Does these operations could run?
seq2sparse, trainnb, testnb
I try to run it, but it seems that all executes over one machine(master).

Resources