What can cause hadoop kill reducer task an retry - hadoop

my hadoop job has a very high ‘Killed Task Attempts’ number on its reducer tasks, I check the status of killed task:
Request received to kill task 'attempt_201308122006_41526_r_000030_1' by user
-------
Task has been KILLED_UNCLEAN by the user
and no stdout and stderr logs
what could cause this ? and how can I solve it?

If you have speculative execution turned on, then you will potentially see a number of map / reduce tasks that will be 'killed'. This is due to hadoop running long running tasks on more than a single task tracker, and the first one to complete 'wins' while the others are killed off.
In general i would only worry about the task attempts that 'failed' in the job tracker
Try turning speculative execution off:
mapred.map.tasks.speculative.execution = false
mapred.reduce.tasks.speculative.execution = false

If not the speculative execution, it could be the Fair Scheduler kicked in claiming task trackers for pool with minMaps and minReduces.

Related

How to remove/run a pending job in Nomad?

There are some pending jobs in "$ nomad status" output. Is there a way to run a pending job?
$ nomad status
ID Type Priority Status Submit Date
5e74587392a49e5dca9c9c6d-0-build-1004018-z100-solid-octo-potato-14 batch 50 pending (stopped) 2020-03-20T14:45:24+09:00
5e74678bdb1df409005677d6-0-build-1004018-z100-solid-octo-potato-17 batch 50 pending 2020-03-20T15:49:48+09:00
5e746884db1df409005677dc-0-build-1004018-z100-solid-octo-potato-19 batch 50 pending 2020-03-20T15:53:56+09:00
5e746a02db1df409005677e3-0-build-1004018-z100-solid-octo-potato-20 batch 50 pending 2020-03-20T16:00:19+09:00
Best regards,
Pending means that it is about to run but something prevents it from it.
Try this:
Check the status of a job, e.g. nomad job status 5e74587392a49e5dca9c9c6d-0-build-1004018-z100-solid-octo-potato-14 to see if there are any messages that can make you understand what's happening.
if the previous step doesn't help, Nomad creates allocations (set of tasks in a job should be run on a particular node). Their IDs will be visible in nomad job status output. You can check an allocation's status by nomad alloc status $ID, where $ID is the ID of an allocation.
As for removal of jobs, you can run nomad job stop -purge 5e74587392a49e5dca9c9c6d-0-build-1004018-z100-solid-octo-potato-14 to remove the job from the job list.

Preemption with Tez along with the yarn FairShare scheduler supported?

We've been switching our 10 nodes cluster from MapReduce to Tez lately and we are experiencing issues with resource management since then. It seems like preemption does not work as expected :
a very consuming job arrives it gets all free ressources
a second job arrives and wait for resources to be freed by job1
job2 gets a very little resource (5%) over a long time and it keeps increasing very slowly but most of the time never reach the fair share.
I'm assuming the preemption mechanism used by the FairShare yarn scheduler is not working as it should and resources only get assigned to job2 when some job1 containers are done.
I've looked into Tez doc and I could think that Tez would have been developed with the Capacity Scheduler as a defacto scheduler, but can't find any help for the FairShare scheduler.
Some conf variables used that may help :
hive.server2.tez.default.queues=default
hive.server2.tez.initialize.default.sessions=false
hive.server2.tez.session.lifetime=162h
hive.server2.tez.session.lifetime.jitter=3h
hive.server2.tez.sessions.init.threads=16
hive.server2.tez.sessions.per.default.queue=10
hive.tez.auto.reducer.parallelism=false
hive.tez.bucket.pruning=false
hive.tez.bucket.pruning.compat=true
hive.tez.container.max.java.heap.fraction=0.8
hive.tez.container.size=-1
hive.tez.cpu.vcores=-1
hive.tez.dynamic.partition.pruning=true
hive.tez.dynamic.partition.pruning.max.data.size=104857600
hive.tez.dynamic.partition.pruning.max.event.size=1048576
hive.tez.enable.memory.manager=true
hive.tez.exec.inplace.progress=true
hive.tez.exec.print.summary=false
hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat
hive.tez.input.generate.consistent.splits=true
hive.tez.log.level=INFO
hive.tez.max.partition.factor=2.0
hive.tez.min.partition.factor=0.25
hive.tez.smb.number.waves=0.5
hive.tez.task.scale.memory.reserve-fraction.min=0.3
hive.tez.task.scale.memory.reserve.fraction=-1.0
hive.tez.task.scale.memory.reserve.fraction.max=0.5
yarn.scheduler.fair.preemption=true
yarn.scheduler.fair.preemption.cluster-utilization-threshold=0.7
yarn.scheduler.maximum-allocation-mb=32768
yarn.scheduler.maximum-allocation-vcores=4
yarn.scheduler.minimum-allocation-mb=2048
yarn.scheduler.minimum-allocation-vcores=1
yarn.resourcemanager.scheduler.address=${yarn.resourcemanager.hostname}:8030
yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
yarn.resourcemanager.scheduler.client.thread-count=50
yarn.resourcemanager.scheduler.monitor.enable=false
yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy

kafka spark streaming job with many active jobs

I meet with a “many Active jobs” issue when using direct kafka streaming on YARN. (spark 1.5, hadoop 2.6, CDH5.5.1)
The problem happens when kafka has almost NO traffic.
From application UI, I see many ‘active’ jobs are keep running for hours. And finally the driver “Requesting 4 new executors because tasks are backlogged”.
But, when looking at the driver log of a ‘activity’ job, the log says the job is finished. So, why the application UI shows this job is activity like forever?
Thanks!
Here are related log info about one of the ‘activity’ jobs.
There are two stages: a reduceByKey follows a flatmap. The log says both stages are finished in ~20ms and the job also finishes in 64 ms.
Got job 6567
Final stage: ResultStage 9851(foreachRDD at
Parents of final stage: List(ShuffleMapStage 9850)
Missing parents: List(ShuffleMapStage 9850)
…
Finished task 0.0 in stage 9850.0 (TID 29551) in 20 ms
Removed TaskSet 9850.0, whose tasks have all completed, from pool
ShuffleMapStage 9850 (flatMap at OpaTransLogAnalyzeWithShuffle.scala:83) finished in 0.022 s
…
Submitting ResultStage 9851 (ShuffledRDD[16419] at reduceByKey at OpaTransLogAnalyzeWithShuffle.scala:83), which is now runnable
…
ResultStage 9851 (foreachRDD at OpaTransLogAnalyzeWithShuffle.scala:84) finished in 0.023 s
Job 6567 finished: foreachRDD at OpaTransLogAnalyzeWithShuffle.scala:84, took 0.064372 s
Finished job streaming job 1468592373000 ms.1 from job set of time 1468592373000 ms
I am facing similar issue. Myn is spark streaming applicaiton where in my only action is to write to cassandra table. And, this write fails due to certain ssl authenticaion. Ideally it should show such batches as failed in Streaming, but it remains in active state forever; inside the batch the jobs are completed successfully, ideally it should have been marked failed.

Run custom Speculator in Hadoop 2.6.0

I am programming my custom speculator, I reviewed documentation and by default is "DefaultSpeculator.java" and is set in class "MRAppMaster.java" (function createSpeculator()) in core of Hadoop. I want to know if you can update/change speculator in runtime when executing my job, because i need to test between about 5 speculators.
Thanks !!!
The speculative execution can be turned on and off for map tasks and reduce tasks on a cluster-wide basis or on a per-job basis.
The speculator is instantiated in MRAppMaster (Map-Reduce Application Master). As you mentioned in your question, following is the piece of code in MRAppMaster::serviceInit() function, which instantiates the speculator:
if (conf.getBoolean(MRJobConfig.MAP_SPECULATIVE, false)
|| conf.getBoolean(MRJobConfig.REDUCE_SPECULATIVE, false)) {
//optional service to speculate on task attempts' progress
speculator = createSpeculator(conf, context);
addIfService(speculator);
}
It checks the JobConfig, to see if speculative execution is turned on for either Map or Reduce tasks and then creates the speculator.
Since the speculator is created inside the MRAppMaster, you can enable your custom speculator for each job.
Following are the speculative execution properties:
mapreduce.map.speculative: Enable speculative execution for map tasks
mapreduce.reduce.speculative: Enable speculative execution for reduce
tasks
yarn.app.mapreduce.am.job.speculator.class: Speculator class
yarn.app.mapreduce.am.job.task.estimator.class: Estimator class. This is used by speculator for estimating the run time of a task.

hadoop streaming jobs fails to report?

All jobs were running successfully using hadoop-streaming, but all of a sudden I started to see errors due to one of worker machines
Hadoop job_201110302152_0002 failures on master
Attempt Task Machine State Error Logs
attempt_201110302152_0002_m_000037_0 task_201110302152_0002_m_000037 worker2 FAILED
Task attempt_201110302152_0002_m_000037_0 failed to report status for 622 seconds. Killing!
-------
Task attempt_201110302152_0002_m_000037_0 failed to report status for 601 seconds. Killing!
Last 4KB
Last 8KB
All
Questions :
- Why does this happening ?
- How can I handle such issues?
Thank you
The description for mapred.task.timeout which defaults to 600s says "The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. "
Increasing the value of mapred.task.timeout might solve the problem, but you need to figure out if more than 600s is actually required for the map task to complete processing the input data or if there is a bug in the code which needs to be debugged.
According to the Hadoop best practices, on average a map task should take a minute or so to process an InputSplit.

Resources