How to allocate specific number of mappers to multiple job in Hadoop? - hadoop

I am executing multiple PIG Scripts say script1, script2, script3, script4. In that I script1 is executing independently and script2,3,4 executing parallely after scripts get executed.
I am giving input file of size 7-8 GB. So after executing script1, I am observing that instead of parallely executing script 2,3,4 only script2 is executing as it is consuming 33-35 mappers. Other remain in like queue (means script3,4 have not get mapper allocation). Due to this too much time requires to execute all scripts.
So what I am thinking is that If I am able to set the limit of mapper to each script then may be time require to execute wll be less as all scripts may get allocation of mappers.
So is there any way to allocate specific number of mappers to multiple scripts?

If your map number is correctly set (according to your core/node and disks/node values), then having 1 job consuming all your maps or having N job consuming MapNumber / N maps will have the same result. But if you really want to distribute your maps on an amount of jobs you can set the per job map number (mapreduce.job.maps in mapred-site.xml i think).
Considering you still have free map slots, there are some config to enable jobs parallel executions like discussed here : Running jobs parallely in hadoop
You can also set a map number for each job (even if I am not sure it really works) if you provide a job.xml in which you set your map number to your hadoop command.

you can add the following line at the beginning of your script :
set mapred.map.tasks 8
and this will let all of your scripts to run concurrently.
please note that if your machine is saturated this will not affect how long all the scripts run

Related

Arrays and user job limitations on SLURM?

I'm a new SLURM user and I'm trying to figure out the best way to submit a job that requires the same command to run 400,000 times with different input files (approximately 200MB memory per CPU, 4 minutes for one instance, each instance runs independently).
I read through the documentation, and so far it seems that arrays are the way to go.
I can use up to 3 nodes on my HPC with 20 cores each, which means that I could run 60 instances of my command at the same time. However, user limit for jobs running at the same time is 10 jobs, with 20 jobs in the queue.
So far, everything I've tried runs each instance of the command as a separate job, thus limiting it to 10 instances in parallel.
How can I fully utilize all available cores in light of the job limits?
Thanks in advance for your help!
You can have a look at tools like GREASY that will allow you to run a single Slurm job and spawn multiple subtasks.
The documentation specifies how to install it and use it and can be found here
You don't even need the job array to attain the defined objective. Firstly submit a job via sbatch job_script command, in the job_script you can customise the job submission. You can use srun parameters & along with the for loop to run the maximum jobs.

Chaining Map Reduce Program

I have a situation, during one POC I want to create a nested MapReduce within one Job. Like a Map M1 O/P to Reducer R1 O/P then that R1 output goes to M2 and final output will come with either M2 or we can run R2 with M2 O/P.
Single Job ID - M1->R1->M2->R2...Final output will be in a single O/P file.
Can we do it without Oozie?
You can chain multiple jobs in your Driver class. First, create a job for first MapReduce, by defining all the required configuration. Then start the job as usual by calling:
job1.waitForCompletion(true);
This is wait until the job is finished. Now check the final status of the first job, whether failed or succeeded for appropriate next action.
If the first job is completed successfully, then launch the next MapReduce in the same way. First define the required parameters and launch the job with:
job2.waitForCompletion(true);
The important thing will be output path of the first will be input for the second job. This is serial (sequential) job chaining, because both the jobs will be running one after another.
You can also make use of job control where in you can execute a number of map reduce jobs in a sequence. In your case there are two mappers and two or one reducers. You can have two map reduce jobs and for the second job you can use set the number of reducers to zero if you don't require reducers.

How to check the overall progress of PIG job

A pig script can be translated into multiple MR jobs and I am wondering if there is an interface or a way to see the progress of the overall PIG script like how many jobs are scheduled, executed and so on.
We had the same problem at Twitter, as some of our Pig scripts spin up dozens of Map-Reduce jobs and it's sometimes hard to tell which of them is doing what, reason about efficiency of the plan, understand how many will run in parallel, etc.
So we created Twitter Ambrose: https://github.com/twitter/ambrose
It spins up a little jetty server which gives you a nice web ui that shows the job DAG, colors the nodes as the jobs complete, gives you stats about the jobs, and tells you which relations each job is trying to calculate.
There is a command illustrate but it throws an exception on my deployment. So I use another approach.
You can get the information on how many MR jobs are scheduled by using explain command and looking at the Physical Plan section, which is at the end of the explain report. To get the number of MR jobs for the script I do the following:
./pig -e 'explain -script ./script_name.pig' > ./explain.txt
grep MapReduce ./explain.txt | wc -l
Now we have the number of MR jobs planned. To monitor script execution, before you run it, you need to access Hadoop's jobtracker page (via "http://(IP_or_node_name):50030/jobtracker.jsp") and write down the name of last job (Completed Jobs section). Submit the script. Refresh the jobtracker page and count how many running jobs there are and how many are completed after the one you have noted. Now you can get an idea of how many jobs are left to be executed.
Click on each job and see its statistics and progress.
A much simpler approach would be to run the script on a small dataset, note down the number of jobs, it is displayed on the console output after the script execution. As pig does not change its execution plan, it will be the same with the big dataset. By looking into stats of each job on Hadoop's jobtracker page (via "http://(IP_or_node_name):50030/jobtracker.jsp") you can get the idea of the proportion of time each MR job takes. Than you can use it to approximately interpolate the execution time on large dataset. If you have skewed data and some Cartesian products, execution time prediction might become tricky.

hadoop FIFO scheduling does not make the submitted jobs run in parallel?

I have configured map capacity with 4000 maps, and configure each job with 500 maps, based on my understanding of FIFO mode and the link
Running jobs parallely in hadoop
if I submit 8 jobs, these 8 jobs should run in parallel, right? However, I still see that the 8 jobs I submitted run in sequential, which is something make me feel strange.
Another way is to try fair scheduler, but I have some other running bugs...
How to make this run in parallel?
I am the only user now.
Question: what does the job tracker web UI show for total running jobs?
Actually I have submitted like 80 jobs, so all jobs are submitted successfully since I can see 80 of them
under "Running Jobs" section, but they just run sequentially
Question: how many input files are you currently processing? what does this relate to with regards to the number of mappers for the job?
Since for each job I configure 500 maps through mapred-site.xml setting map.task.num=500.
below is the information
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
map 1.40% 500 402 91 7 0 0 / 0
reduce 0.00% 1 1 0 0 0 0 / 0
Question: You can configure your Input format to only run 500 maps, but there are occasions where Hadoop ignores this value: if you have more then 500 input files, for example.
I am sure this will not happen, since I customized the inputformat, so that the number of mappers to run is exactly the number of mappers I configure in mapred-site.xml
Question: When you start your job, how many files are you running over, what's the Input Format you are using, and what if any file compression are you using on the input files
Ok, I actually run only one file, but this file will be fully loaded to all maptasks, so I actually use the distrbutecache mechanism to let each maptask load this file fully. I did not use compression currently
Question: What does the job tracker show for the total number of configured mapper and reducer slots? Does this match up with your expected value of 5000?
Below is the information
Maps Reduces TotalSubmissions Nodes Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes
83 0 80 8 4000 80 510.00 0
Whether you run the FairScheduler or the CapacityScheduler, you should still be able to run jobs in parallel, but there are some reasons that you may see that your jobs run sequentially:
Are you the only person using the cluster, if not, how many other people are using it:
Question: what does the job tracker web UI show for total running jobs?
If you are indeed the only job(s) running on the cluster at a particular point in time, then check the Job Tracker web UI for your currently running job - how many input files are you currently processing? what does this relate to with regards to the number of mappers for the job?
You can configure your Input format to only run 500 maps, but there are occasions where Hadoop ignores this value: if you have more then 500 input files, for example.
Question: When you start your job, how many files are you running over, what's the Input Format you are using, and what if any file compression are you using on the input files
Question: What does the job tracker show for the total number of configured mapper and reducer slots? Does this match up with your expected value of 5000?

Unusual Hadoop error - tasks get killed on their own

When I run my hadoop job I get the following error:
Request received to kill task 'attempt_201202230353_23186_r_000004_0' by user
Task has been KILLED_UNCLEAN by the user
The logs appear to be clean. I run 28 reducers, and this doesnt happen for all the reducers. It happens for a selected few and the reducer starts again. I fail to understand this. Also other thing I have noticed is that for a small dataset, I rarely see this error!
There are three things to try:
Setting a CounterIf Hadoop sees a counter for the job progressing then it won't kill it (see Arockiaraj Durairaj's answer.) This seems to be the most elegant as it could allow you more insight into long running jobs and were the hangups may be.
Longer Task TimeoutsHadoop jobs timeout after 10 minutes by default. Changing the timeout is somewhat brute force, but could work. Imagine analyzing audio files that are generally 5MB files (songs), but you have a few 50MB files (entire album). Hadoop stores an individual file per block. So if your HDFS block size is 64MB then a 5MB file and a 50 MB file would both require 1 block (64MB) (see here http://blog.cloudera.com/blog/2009/02/the-small-files-problem/, and here Small files and HDFS blocks.) However, the 5MB job would run faster than the 50MB job. Task timeout can be increased in the code (mapred.task.timeout) for the job per the answers to this similar question: How to fix "Task attempt_201104251139_0295_r_000006_0 failed to report status for 600 seconds."
Increase Task AttemptsConfigure Hadoop to make more than the 4 default attempts (see Pradeep Gollakota's answer). This is the most brute force method of the three. Hadoop will attempt the job more times, but you could be masking an underlying issue (small servers, large data blocks, etc).
Can you try using counter(hadoop counter) in your reduce logic? It looks like hadoop is not able to determine whether your reduce program is running or hanging. It waits for a few minutes and kills it, even though your logic may be still executing.

Resources