Yarn app timeout and no error - hadoop

I am running a map-reduce job triggered by YARN's REST API. The yarn app starts, triggers another map-reduce job. But the actual yarn app timeout exactly around 12 mins.
This is the final log where it ends:
2016-09-01 13:22:53 DEBUG ProtobufRpcEngine:221 - Call: getJobReport took 0ms
2016-09-01 13:22:54 DEBUG Client:97 - stopping client from cache: org.apache.hadoop.ipc.Client#6bbe2511
There are literally no errors or exceptions. I don't know which setting in Hadoop is causing this.
The Diagnostics says Application *application_whatever* failed 1 times due to ApplicationMaster for attempt *appattempt_application_whatever* timed out. Failing the application.

Related

Hadoop: Job failed as tasks failed. failedMaps:1 failedReduces:0

I am trying to run a random job, but when it get 4 jobs failed, it fails the job:
Job failed as tasks failed. failedMaps:1 failedReduces:0
I tried a lot of times but every 4 fails my job fails, all tasks get killed.
Any option to change this number of fails allowed?
The issue is when the SAME map fails 4 time.
You can use these parameters to change the failure threshold in mapred-site.xml file:
mapreduce.map.maxattempts
mapreduce.reduce.maxattempts
It is also possible to ignore failed tasks, these paramteres say that the threshold of failed tasks:
mapreduce.map.failures.maxpercent
mapreduce.map.failures.maxpercent

Spark Streaming: How to restart spark streaming job running on hdfs cleanly

We have a spark streaming job which reads data from kafka running on a 4 node cluster that uses a checkpoint directory on HDFS ....we had an I/O error where we ran out of the space and we had to go in and delete a few hdfs folders to free up some space and now we have bigger disks mounted ....and want to restart cleanly no need to preserve checkpoint data or kafka offset.....getting the error ..
Application application_1482342493553_0077 failed 2 times due to AM Container for appattempt_1482342493553_0077_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://hdfs-name-node:8088/cluster/app/application_1482342493553_0077Then, click on links to logs of each attempt.
Diagnostics: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1266542908-96.118.179.119-1479844615420:blk_1073795938_55173 file=/user/hadoopuser/streaming_2.10-1.0.0-SNAPSHOT.jar
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1484420770001
final status: FAILED
tracking URL: http://hdfs-name-node:8088/cluster/app/application_1482342493553_0077
user: hadoopuser
From the error what i can make out is it's still looking for old hdfs blocks which we deleted ...
From research found that ..changing check point directory will help tried changing it and pointing to a new directory ...but still it's not helping to restart spark on clean slate ..it's still giving the same block exception ...Are we missing anything while doing the configuration changes? And how can we make sure that spark is started on a clean slate ?
Also this is how we are setting the checkpoint directory
val ssc = new StreamingContext(sparkConf, Seconds(props.getProperty("spark.streaming.window.seconds").toInt))
ssc.checkpoint(props.getProperty("spark.checkpointdir"))
val sc = ssc.sparkContext
current checkpoint directory in property file is like this
spark.checkpointdir:hdfs://hadoopuser#hdfs-name-node:8020/user/hadoopuser/.checkpointDir1
previously it used to be like this
spark.checkpointdir:hdfs://hadoopuser#hdfs-name-node:8020/user/hadoopuser/.checkpointDir

Operation not permitted while launch mr job

I have change my kerberos cluster to unkerberized
after that below Exception occurred while launching MR jobs.
Application application_1458558692044_0001 failed 1 times due to AM Container for appattempt_1458558692044_0001_000001 exited with exitCode: -1000
For more detailed output, check application tracking page:http://hdp1.impetus.co.in:8088/proxy/application_1458558692044_0001/Then, click on links to logs of each attempt.
Diagnostics: Operation not permitted
Failing this attempt. Failing the application.
I am able to continue my work
delete yarn folder from all the nodes which is define on yarn.nodemanager.local-dirs this property
then restart yarn process

Spark on Yarn with Jdk8

I am running a spark job on hadoop yarn (hadoop 2.7.0 but also tried 2.4.0, all on my box using the downloads from apache-hadoop web site and spark 1.3.1). My spark job is in scala but contains classes compiled with jdk8.
When I run hadoop on jdk8, I get
INFO yarn.Client:
client token: N/A
diagnostics: Shutdown hook called before final status was reported.
ApplicationMaster host: kostas-pc
ApplicationMaster RPC port: 0
queue: default
start time: 1431513335001
final status: SUCCEEDED
Even if the job is marked as SUCCEEDED, it actually didn't do anything due to "Shutdown hook called before final status was reported.". In fact no logging is visible from my spark job.
When I switch the jdk that I run hadoop, to jdk7, my job starts running and I can see log entries from my scala code, but when it gets to the code compiled with jdk8 it fails with incompatible class error (as expected).
So it seems running hadoop+spark with jdk8 is not compatible. Are there any solutions to this?
Thanks
Seems spark 1.4.0 is ok with jdk8

Can't run a MapReduce job with YARN

I'm making my first steps mastering hadoop. I've setup a CDH4.5 in distributed mode (on two virtual machines). I'm having problems running MapReduce jobs with YARN. I could launch successfully a DistributedShell application (from CDH examples), but once I run a MapReduce job, it just hangs there forever.
This is what I'm trying to launch:
sudo -uhdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 1 1
These are the last resource manager's log lines:
13/12/10 23:30:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1386714123362_0001
13/12/10 23:30:02 INFO client.YarnClientImpl: Submitted application application_1386714123362_0001 to ResourceManager at master/192.168.122.175:8032
13/12/10 23:30:02 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1386714123362_0001/
13/12/10 23:30:02 INFO mapreduce.Job: Running job: job_1386714123362_0001
The node manager's log doesn't get any new messages once I run the job.
This is what I see on resource manager's web page regarding the job:
State - ACCEPTED
FinalStatus - UNDEFINED
Progress - (progress bar in 0%)
Tracking UI - UNASSIGNED
Apps Submitted - 1
Apps Pending - 1
Apps Running - 0
I found this at http://hadoop.apache.org/docs/r2.0.6-alpha/hadoop-project-dist/hadoop-common/releasenotes.html:
YARN-300. Major bug reported by shenhong and fixed by Sandy Ryza (resourcemanager , scheduler)
After YARN-271, fair scheduler can infinite loop and not schedule any application.
After yarn-271, when yarn.scheduler.fair.max.assign<=0, when a node was been reserved, fairScheduler will infinite loop and not schedule any application.
try with new version i.e. 2.0 above
Probably caused by system resource issue, I fixed it by restarting my system.

Resources