Unusual Hadoop error - tasks get killed on their own - hadoop

When I run my hadoop job I get the following error:
Request received to kill task 'attempt_201202230353_23186_r_000004_0' by user
Task has been KILLED_UNCLEAN by the user
The logs appear to be clean. I run 28 reducers, and this doesnt happen for all the reducers. It happens for a selected few and the reducer starts again. I fail to understand this. Also other thing I have noticed is that for a small dataset, I rarely see this error!

There are three things to try:
Setting a CounterIf Hadoop sees a counter for the job progressing then it won't kill it (see Arockiaraj Durairaj's answer.) This seems to be the most elegant as it could allow you more insight into long running jobs and were the hangups may be.
Longer Task TimeoutsHadoop jobs timeout after 10 minutes by default. Changing the timeout is somewhat brute force, but could work. Imagine analyzing audio files that are generally 5MB files (songs), but you have a few 50MB files (entire album). Hadoop stores an individual file per block. So if your HDFS block size is 64MB then a 5MB file and a 50 MB file would both require 1 block (64MB) (see here http://blog.cloudera.com/blog/2009/02/the-small-files-problem/, and here Small files and HDFS blocks.) However, the 5MB job would run faster than the 50MB job. Task timeout can be increased in the code (mapred.task.timeout) for the job per the answers to this similar question: How to fix "Task attempt_201104251139_0295_r_000006_0 failed to report status for 600 seconds."
Increase Task AttemptsConfigure Hadoop to make more than the 4 default attempts (see Pradeep Gollakota's answer). This is the most brute force method of the three. Hadoop will attempt the job more times, but you could be masking an underlying issue (small servers, large data blocks, etc).

Can you try using counter(hadoop counter) in your reduce logic? It looks like hadoop is not able to determine whether your reduce program is running or hanging. It waits for a few minutes and kills it, even though your logic may be still executing.

Related

Writing high volume reducer output to HBase

I have an Hadoop MapReduce job whose output is a row-id with a Put/Delete operation for that row-id. Due to the nature of the problem, the output is rather high volume. We have tried several method to get this data back into HBase and they have all failed...
Table Reducer
This is way to slow since it seems that it must do a full round trip for every row. Due to how the keys sort for our reducer step, the row-id is not likely to be on the same node as the reducer.
completebulkload
This seems to take a long time (never completes) and there is no real indication of why. Both IO and CPU show very low usage.
Am I missing something obvious?
I saw from your answer to self that you solved your problem but for completeness I'd mention that there's another option - writing directly to hbase. We have a set up where we stream data into HBase and with proper key and region splitting we get to more than 15,000 1K records per second per node
CompleteBulkLoad was the right answer. Per #DonaldMiner I dug deeper and found out that the CompleteBulkLoad process was running as "hbase" which resulted in a permission denied error when trying to move/rename/delete the source files. The implementation appears to retry for a long time before giving an error message; up to 30 minutes in our case.
Giving the hbase user write access to the files resolved the issue.

Why are map tasks killed for no apparent reason?

I am running a Pig job that loads around 8 million rows from HBase (several columns) using HBaseStorage. The job finishes successfully and seems to produce the right results but when I look at the job details in the job tracker it says 50 map tasks were created of which 28 where successful and 22 were killed. The reduce ran fine. By looking at the logs of the killed map tasks there is nothing obvious to me as to why the tasks were killed. In fact the logs of successful and failed tasks are practically identical and both tasks are taking some reasonable time. Why are all these map tasks created and then killed? Is it normal or is it a sign of a problem?
This sounds like Speculative Execution in Hadoop. It runs the same task on several nodes and kills them when at least one completes. See the explanation this this book: https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/task-execution

When do the results from a mapper task get deleted from disk?

When do the outputs for a mapper task get deleted from the local filesystem? Do they persist until the entire job completes or do they get deleted at an earlier time than that?
In addition to the map and reduce tasks, two further tasks are created: a job setup task
and a job cleanup task. These are run by tasktrackers and are used to run code to setup
the job before any map tasks run, and to cleanup after all the reduce tasks are complete.
The OutputCommitter that is configured for the job determines the code to be run, and
by default this is a FileOutputCommitter. For the job setup task it will create the final
output directory for the job and the temporary working space for the task output, and
for the job cleanup task it will delete the temporary working space for the task output.
Have a look at OutputCommitter.
If your hadoop.tmp.dir is set to a default setting (say, /tmp/), it will most likely be subject to tmpwatch and any default settings in your OS. I would suggest poking around in /etc/cron.d/, /etc/cron.daily, etc/cron.weekly/, etc., to see exactly what your OS default is like.
One thing to keep in mind about tmpwatch is that, by default, it will key on access time, not modification time (i.e., files that have not been 'touched' since X will be considered 'stale' and subject to removal). However, it's a common practice with Hadoop to mount filesystems with the noatime and nodiratime flags, meaning that access times will not get updated and thus skewing your tmpwatch behaviors.
Otherwise, Hadoop will purge task attempt logs older than 24 hours (after task completion), by default. While a few years old, this writeup has some great info on the default behaviors. Take a look in particular at the sections that refer to mapreduce.job.userlog.retain.hours.
EDIT: responding to OP's comment, which clears up my misunderstanding of the question:
As far as the intermediate output of map tasks which is spilled to disk, used by any combiners, and copied to any reducers, the Hadoop Definitive Guide has this to say:
Tasktrackers do not delete map outputs from disk as soon as the first
reducer has retrieved them, as the reducer may fail. Instead, they
wait until they are told to delete them by the jobtracker, which is
after the job has completed.
Source
I've also +1'd #mgs answer below, as they have linked the source code that controls this and described the Job cleanup task.
So, yes, the map output data is deleted immediately after the job completes, successfully or not, and no sooner.
"Tasktrackers do not delete map outputs from disk as soon as the first reducer has retrieved them, as the reducer may fail. Instead, they wait until they are told to delete them by the jobtracker, which is after the job has completed"
Hadoop: The Definitive Guide ( Section 6.4)

hadoop FIFO scheduling does not make the submitted jobs run in parallel?

I have configured map capacity with 4000 maps, and configure each job with 500 maps, based on my understanding of FIFO mode and the link
Running jobs parallely in hadoop
if I submit 8 jobs, these 8 jobs should run in parallel, right? However, I still see that the 8 jobs I submitted run in sequential, which is something make me feel strange.
Another way is to try fair scheduler, but I have some other running bugs...
How to make this run in parallel?
I am the only user now.
Question: what does the job tracker web UI show for total running jobs?
Actually I have submitted like 80 jobs, so all jobs are submitted successfully since I can see 80 of them
under "Running Jobs" section, but they just run sequentially
Question: how many input files are you currently processing? what does this relate to with regards to the number of mappers for the job?
Since for each job I configure 500 maps through mapred-site.xml setting map.task.num=500.
below is the information
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
map 1.40% 500 402 91 7 0 0 / 0
reduce 0.00% 1 1 0 0 0 0 / 0
Question: You can configure your Input format to only run 500 maps, but there are occasions where Hadoop ignores this value: if you have more then 500 input files, for example.
I am sure this will not happen, since I customized the inputformat, so that the number of mappers to run is exactly the number of mappers I configure in mapred-site.xml
Question: When you start your job, how many files are you running over, what's the Input Format you are using, and what if any file compression are you using on the input files
Ok, I actually run only one file, but this file will be fully loaded to all maptasks, so I actually use the distrbutecache mechanism to let each maptask load this file fully. I did not use compression currently
Question: What does the job tracker show for the total number of configured mapper and reducer slots? Does this match up with your expected value of 5000?
Below is the information
Maps Reduces TotalSubmissions Nodes Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes
83 0 80 8 4000 80 510.00 0
Whether you run the FairScheduler or the CapacityScheduler, you should still be able to run jobs in parallel, but there are some reasons that you may see that your jobs run sequentially:
Are you the only person using the cluster, if not, how many other people are using it:
Question: what does the job tracker web UI show for total running jobs?
If you are indeed the only job(s) running on the cluster at a particular point in time, then check the Job Tracker web UI for your currently running job - how many input files are you currently processing? what does this relate to with regards to the number of mappers for the job?
You can configure your Input format to only run 500 maps, but there are occasions where Hadoop ignores this value: if you have more then 500 input files, for example.
Question: When you start your job, how many files are you running over, what's the Input Format you are using, and what if any file compression are you using on the input files
Question: What does the job tracker show for the total number of configured mapper and reducer slots? Does this match up with your expected value of 5000?

Is it possible to restart a "killed" Hadoop job from where it left off?

I have a Hadoop job that processes log files and reports some statistics. This job died about halfway through the job because it ran out of file handles. I have fixed the issue with the file handles and am wondering if it is possible to restart a "killed" job.
As it turns out, there is not a good way to do this; once a job has been killed there is no way to re-instantiate that job and re-start processing immediately prior to the first failure. There are likely some really good reasons for this but I'm not qualified to speak to this issue.
In my own case, I was processing a large set of log files and loading these files into an index. Additionally I was creating a report on the contents of these files at the same time. In order to make the job more tolerant of failures on the indexing side (a side-effect, this isn't related to Hadoop at all) I altered my job to instead create many smaller jobs, each one of these jobs processing a chunk of these log files. When one of these jobs finishes, it renames the processed log files so that they are not processed again. Each job waits for the previous job to complete before running.
Chaining multiple MapReduce jobs in Hadoop
When one job fails, all of the subsequent jobs quickly fail afterward. Simply fixing whatever the issue was and the re-submitting my job will, roughly, pick up processing where it left off. In the worst-case scenario where a job was 99% complete at the time of it's failure, that one job will be erroneously and wastefully re-processed.

Resources