hadoop FIFO scheduling does not make the submitted jobs run in parallel? - hadoop

I have configured map capacity with 4000 maps, and configure each job with 500 maps, based on my understanding of FIFO mode and the link
Running jobs parallely in hadoop
if I submit 8 jobs, these 8 jobs should run in parallel, right? However, I still see that the 8 jobs I submitted run in sequential, which is something make me feel strange.
Another way is to try fair scheduler, but I have some other running bugs...
How to make this run in parallel?
I am the only user now.
Question: what does the job tracker web UI show for total running jobs?
Actually I have submitted like 80 jobs, so all jobs are submitted successfully since I can see 80 of them
under "Running Jobs" section, but they just run sequentially
Question: how many input files are you currently processing? what does this relate to with regards to the number of mappers for the job?
Since for each job I configure 500 maps through mapred-site.xml setting map.task.num=500.
below is the information
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
map 1.40% 500 402 91 7 0 0 / 0
reduce 0.00% 1 1 0 0 0 0 / 0
Question: You can configure your Input format to only run 500 maps, but there are occasions where Hadoop ignores this value: if you have more then 500 input files, for example.
I am sure this will not happen, since I customized the inputformat, so that the number of mappers to run is exactly the number of mappers I configure in mapred-site.xml
Question: When you start your job, how many files are you running over, what's the Input Format you are using, and what if any file compression are you using on the input files
Ok, I actually run only one file, but this file will be fully loaded to all maptasks, so I actually use the distrbutecache mechanism to let each maptask load this file fully. I did not use compression currently
Question: What does the job tracker show for the total number of configured mapper and reducer slots? Does this match up with your expected value of 5000?
Below is the information
Maps Reduces TotalSubmissions Nodes Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes
83 0 80 8 4000 80 510.00 0

Whether you run the FairScheduler or the CapacityScheduler, you should still be able to run jobs in parallel, but there are some reasons that you may see that your jobs run sequentially:
Are you the only person using the cluster, if not, how many other people are using it:
Question: what does the job tracker web UI show for total running jobs?
If you are indeed the only job(s) running on the cluster at a particular point in time, then check the Job Tracker web UI for your currently running job - how many input files are you currently processing? what does this relate to with regards to the number of mappers for the job?
You can configure your Input format to only run 500 maps, but there are occasions where Hadoop ignores this value: if you have more then 500 input files, for example.
Question: When you start your job, how many files are you running over, what's the Input Format you are using, and what if any file compression are you using on the input files
Question: What does the job tracker show for the total number of configured mapper and reducer slots? Does this match up with your expected value of 5000?

Related

Arrays and user job limitations on SLURM?

I'm a new SLURM user and I'm trying to figure out the best way to submit a job that requires the same command to run 400,000 times with different input files (approximately 200MB memory per CPU, 4 minutes for one instance, each instance runs independently).
I read through the documentation, and so far it seems that arrays are the way to go.
I can use up to 3 nodes on my HPC with 20 cores each, which means that I could run 60 instances of my command at the same time. However, user limit for jobs running at the same time is 10 jobs, with 20 jobs in the queue.
So far, everything I've tried runs each instance of the command as a separate job, thus limiting it to 10 instances in parallel.
How can I fully utilize all available cores in light of the job limits?
Thanks in advance for your help!
You can have a look at tools like GREASY that will allow you to run a single Slurm job and spawn multiple subtasks.
The documentation specifies how to install it and use it and can be found here
You don't even need the job array to attain the defined objective. Firstly submit a job via sbatch job_script command, in the job_script you can customise the job submission. You can use srun parameters & along with the for loop to run the maximum jobs.

Spark batches does not complete when running on Yarn cluster

Setting the scene
I am working to make a Spark streaming application (Spark 2.2.1 with Scala) run on a Yarn cluster (Hadoop 2.7.4).
So far I managed to submit the application to the Yarn cluster with spark-submit. I can see that the receiver task starts up correctly and fetches a lot of records from the database (Couchbase Server 5.0) and I can also see that the records are divided into batches.
The question
When I look at the Streaming Statistics on the Spark Web UI, I can however see that my batches are never processed. I have seen batches with 0 records process and complete but when a batch with records start processing it never completes. One time it even got stuck on a batch with 0 records.
I even tried simplifying the output operations on the SteamingContext as much as possible. But still with the very simple output operation print() my batches are never processed. The logs does not show any warnings or errors.
Does anyone know what might be wrong? Any suggestions on how to solve this will be much appreciated.
More Info
The main class of the Spark application is built from this example (first one) from the Couchbase Spark Connector documentation combined with this example with checkpoint from the Spark Documentation.
Right now I have 3230 Active Batches (3229 queued and 1 processing) and 1 Completed Batch (that had 0 records) and the application has been running for 4 hours and 30 minutes... and another batch is added every 5 seconds.
If I look at the "thread dump" for the executors I see a lot of WAITING, TIMED WAITING and a few RUNNABLE threads. The list will fill up 3 screenshots, so i will only post it if needed.
Below you will find some screenshots from the Web UI
Executor Overview
Spark Jobs Overview
Node Overview with resources
Capacity Scheduler Overview
Per screenshot, you have 2 cores and 1 is being used for driver and another is being used for receiver. You don't have a core for the actual processing to happen. Please increase the number of cores and try again.
Refer: https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers
If you are using an input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use “local[n]” as the master URL, where n > number of receivers to run (see Spark Properties for information on how to set the master).

Limit number of concurrent map tasks in Hadoop 2.6.0

I want to restrict number of simultaneously map tasks running at one slave node.
In my case, when I submit my job, Hadoop generate 8 map tasks, when I looked at Job history UI at port 19888, I always saw that 8 map tasks started at the same time at same slave node.
Even I tried to set this attribute mapreduce.tasktracker.map.tasks.maximum equal to 4 (from how to restrict the concurrent running map tasks?). It still didn't work for me.
Does anyone have a experiment to deal with this issue?
for hadoop 2.6 it should be newer API - see this mapred-default.xml
set mapreduce.tasktracker.map.tasks.maximum to 4
The maximum number of map tasks that will be run simultaneously by a task tracker.

How to allocate specific number of mappers to multiple job in Hadoop?

I am executing multiple PIG Scripts say script1, script2, script3, script4. In that I script1 is executing independently and script2,3,4 executing parallely after scripts get executed.
I am giving input file of size 7-8 GB. So after executing script1, I am observing that instead of parallely executing script 2,3,4 only script2 is executing as it is consuming 33-35 mappers. Other remain in like queue (means script3,4 have not get mapper allocation). Due to this too much time requires to execute all scripts.
So what I am thinking is that If I am able to set the limit of mapper to each script then may be time require to execute wll be less as all scripts may get allocation of mappers.
So is there any way to allocate specific number of mappers to multiple scripts?
If your map number is correctly set (according to your core/node and disks/node values), then having 1 job consuming all your maps or having N job consuming MapNumber / N maps will have the same result. But if you really want to distribute your maps on an amount of jobs you can set the per job map number (mapreduce.job.maps in mapred-site.xml i think).
Considering you still have free map slots, there are some config to enable jobs parallel executions like discussed here : Running jobs parallely in hadoop
You can also set a map number for each job (even if I am not sure it really works) if you provide a job.xml in which you set your map number to your hadoop command.
you can add the following line at the beginning of your script :
set mapred.map.tasks 8
and this will let all of your scripts to run concurrently.
please note that if your machine is saturated this will not affect how long all the scripts run

Unusual Hadoop error - tasks get killed on their own

When I run my hadoop job I get the following error:
Request received to kill task 'attempt_201202230353_23186_r_000004_0' by user
Task has been KILLED_UNCLEAN by the user
The logs appear to be clean. I run 28 reducers, and this doesnt happen for all the reducers. It happens for a selected few and the reducer starts again. I fail to understand this. Also other thing I have noticed is that for a small dataset, I rarely see this error!
There are three things to try:
Setting a CounterIf Hadoop sees a counter for the job progressing then it won't kill it (see Arockiaraj Durairaj's answer.) This seems to be the most elegant as it could allow you more insight into long running jobs and were the hangups may be.
Longer Task TimeoutsHadoop jobs timeout after 10 minutes by default. Changing the timeout is somewhat brute force, but could work. Imagine analyzing audio files that are generally 5MB files (songs), but you have a few 50MB files (entire album). Hadoop stores an individual file per block. So if your HDFS block size is 64MB then a 5MB file and a 50 MB file would both require 1 block (64MB) (see here http://blog.cloudera.com/blog/2009/02/the-small-files-problem/, and here Small files and HDFS blocks.) However, the 5MB job would run faster than the 50MB job. Task timeout can be increased in the code (mapred.task.timeout) for the job per the answers to this similar question: How to fix "Task attempt_201104251139_0295_r_000006_0 failed to report status for 600 seconds."
Increase Task AttemptsConfigure Hadoop to make more than the 4 default attempts (see Pradeep Gollakota's answer). This is the most brute force method of the three. Hadoop will attempt the job more times, but you could be masking an underlying issue (small servers, large data blocks, etc).
Can you try using counter(hadoop counter) in your reduce logic? It looks like hadoop is not able to determine whether your reduce program is running or hanging. It waits for a few minutes and kills it, even though your logic may be still executing.

Resources