Running a distcp job I encounter the following problem:
Almost all map tasks are marked as successful but with note saying Container killed.
On the online interface the log for the map jobs says:
Progress 100.00
State SUCCEEDED
but under Note it says for almost every attempt (~200)
Container killed by the ApplicationMaster.
Container killed by the ApplicationMaster. Container killed on request. Exit code is 143
In the log file associated with the attempt I can see a log saying Task 'attempt_xxxxxxxxx_0' done.
stderr output is empty for all jobs/attempts.
When looking at the application master log and following one of the successful (but killed) attempts I find the following logs:
2017-01-05 10:27:22,772 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1483370705805_4012_m_000000_0
2017-01-05 10:27:22,773 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1483370705805_4012_m_000000 Task Transitioned from RUNNING to SUCCEEDED
2017-01-05 10:27:22,775 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2017-01-05 10:27:22,775 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1483370705805_4012Job Transitioned from RUNNING to COMMITTING
2017-01-05 10:27:22,776 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT
2017-01-05 10:27:23,118 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2017-01-05 10:27:24,125 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e116_1483370705805_4012_01_000002
2017-01-05 10:27:24,126 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2017-01-05 10:27:24,126 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1483370705805_4012_m_000000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
i have set "mapreduce.map.speculative=false"!
All MAP task are SUCCEEDED(distcp job has no REDUCE),but MAPREDUCE is going for a long time(several hours) , then it is succeeded and distcp job is done.
I am running 'yarn version'= Hadoop 2.5.0-cdh5.3.1
Should I be worried about this? And what causes the containers to be killed? Any suggestions would be greatly appreciated!
Those killed attempts might be due to speculative execution. In this case there is nothing to worry about.
To make sure it is the case, try running your distcp like this:
hadoop distcp -Dmapreduce.map.speculative=false ...
You should stop seeing those killed attempts.
Related
I am seeing the below error when I submit an Oozie job in an EMR Hadoop cluster. I could see that a certain container is not finding the temporary output file that is produced by another job (?).
I verified the name node address, and also verified there is enough memory on it (attached additional EBS volume). I am using Master instance type - m5.2xlarge and core instance type - r5a.2xlarge.
This is the error I see in the Application Master logs.
Application application_1632399471753_0051 failed 2 times due to AM Container for appattempt_1632399471753_0051_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: java.io.FileNotFoundException: File does not exist: hdfs://ip-10-23-31-119.us-west-2.compute.internal:8020/tmp/crunch-1324412807/p2/MAP
For more detailed output, check the application tracking page: http://ip-10-23-31-119.us-west-2.compute.internal:8088/cluster/app/application_1632399471753_0051 Then click on links to logs of each attempt.
. Failing the application.
This is the error I see when I go into one of the task logs.
2021-09-23 13:19:43,280 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1632399471753_0051Job Transitioned from INITED to SETUP
2021-09-23 13:19:43,282 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP
2021-09-23 13:19:43,294 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1632399471753_0051Job Transitioned from SETUP to RUNNING
2021-09-23 13:19:43,309 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1632399471753_0051_m_000000 Task Transitioned from NEW to SCHEDULED
2021-09-23 13:19:43,311 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1632399471753_0051_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2021-09-23 13:19:43,311 INFO [Thread-85] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceRequest:<memory:7168, vCores:1>
2021-09-23 13:19:43,380 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1632399471753_0051, File: hdfs://ip-10-23-31-119.us-west-2.compute.internal:8020/tmp/hadoop-yarn/staging/hadoop/.staging/job_1632399471753_0051/job_1632399471753_0051_1.jhist
2021-09-23 13:19:44,270 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
2021-09-23 13:19:44,298 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1632399471753_0051: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:14336, vCores:10> knownNMs=2
2021-09-23 13:19:45,306 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1
2021-09-23 13:19:45,308 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1632399471753_0051_01_000002 to attempt_1632399471753_0051_m_000000_0
2021-09-23 13:19:45,309 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2021-09-23 13:19:45,362 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-jar file on the remote FS is hdfs://ip-10-23-31-119.us-west-2.compute.internal:8020/tmp/hadoop-yarn/staging/hadoop/.staging/job_1632399471753_0051/job.jar
2021-09-23 13:19:45,364 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-conf file on the remote FS is /tmp/hadoop-yarn/staging/hadoop/.staging/job_1632399471753_0051/job.xml
2021-09-23 13:19:45,367 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://ip-10-23-31-119.us-west-2.compute.internal:8020/tmp/crunch-1324412807/p2/MAP
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createCommonContainerLaunchContext(TaskAttemptImpl.java:902)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createContainerLaunchContext(TaskAttemptImpl.java:947)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1714)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1691)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1210)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:147)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1459)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1451)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://ip-10-23-31-119.us-west-2.compute.internal:8020/tmp/crunch-1324412807/p2/MAP
at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1444)
at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1437)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1452)
at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:774)
at org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java:601)
at org.apache.hadoop.mapreduce.v2.util.MRApps.setupDistributedCache(MRApps.java:491)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createCommonContainerLaunchContext(TaskAttemptImpl.java:821)
... 14 more
2021-09-23 13:19:45,369 INFO [AsyncDispatcher ShutDown handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
End of LogType:syslog
I am not sure what other things I need to check and troubleshoot this issue. I don not see any logs being printed from my application code.
In the process of learning Sqoop I execute a sqoop command to fetch all mysql database in Cloudera's DH, which returns all available databases correctly. The problem is if I run the same command as a job in an Oozie workflow it always fails.
job.properties
nameNode=hdfs://quickstart.cloudera:8020
resourceManager=0.0.0.0:8032
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/oozie/pig_demo
workflow.xml
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.2">
<start to="sqoop-36c5"/>
<action name="sqoop-36c5">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${resourceManager}</job-tracker>
<name-node>${nameNode}</name-node>
<command>list-databases --m 1
--connect "jdbc:mysql://quickstart.cloudera:3306"
--username retail_dba
--password cloudera</command>
</sqoop>
<ok to="finish"/>
<error to="errorHalt"/>
</action>
<kill name="errorHalt">
<message>Input unavailable,error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="finish"/>
</workflow-app>
The following is logs generated
2019-01-24 09:52:09,352 INFO [Thread-69] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://quickstart.cloudera:8020/tmp/hadoop-yarn/staging/history/done_intermediate/cloudera/job_1548311302916_0001-1548312666604-cloudera-oozie%3Alauncher%3AT%3Dsqoop%3AW%3Dfoo%2Dwf%3AA%3Dsqoop%2D36c5%3AID%3D00-1548312729060-1-0-SUCCEEDED-root.cloudera-1548312703464.jhist_tmp to hdfs://quickstart.cloudera:8020/tmp/hadoop-yarn/staging/history/done_intermediate/cloudera/job_1548311302916_0001-1548312666604-cloudera-oozie%3Alauncher%3AT%3Dsqoop%3AW%3Dfoo%2Dwf%3AA%3Dsqoop%2D36c5%3AID%3D00-1548312729060-1-0-SUCCEEDED-root.cloudera-1548312703464.jhist
2019-01-24 09:52:09,352 INFO [Thread-69] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
2019-01-24 09:52:09,353 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1548311302916_0001_m_000000_0
2019-01-24 09:52:09,437 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1548311302916_0001_m_000000_0 TaskAttempt Transitioned from SUCCESS_FINISHING_CONTAINER to SUCCEEDED
2019-01-24 09:52:09,441 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job diagnostics to
2019-01-24 09:52:09,442 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: History url is http://quickstart.cloudera:19888/jobhistory/job/job_1548311302916_0001
2019-01-24 09:52:09,480 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Waiting for application to be successfully unregistered.
2019-01-24 09:52:10,483 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2019-01-24 09:52:10,488 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://quickstart.cloudera:8020 /tmp/hadoop-yarn/staging/cloudera/.staging/job_1548311302916_0001
2019-01-24 09:52:10,505 INFO [Thread-69] org.apache.hadoop.ipc.Server: Stopping server on 34049
2019-01-24 09:52:10,517 INFO [IPC Server listener on 34049] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 34049
2019-01-24 09:52:10,523 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2019-01-24 09:52:10,524 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted
2019-01-24 09:52:10,531 INFO [Ping Checker] org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: TaskAttemptFinishingMonitor thread interrupted
2019-01-24 09:52:10,556 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Job end notification started for jobID : job_1548311302916_0001
2019-01-24 09:52:10,560 INFO [Thread-69] org.mortbay.log: Job end notification attempts left 0
2019-01-24 09:52:10,560 INFO [Thread-69] org.mortbay.log: Job end notification trying http://quickstart.cloudera:11000/oozie/callback?id=0000000-190124093049946-oozie-oozi-W#sqoop-36c5&status=SUCCEEDED
2019-01-24 09:52:10,590 INFO [Thread-69] org.mortbay.log: Job end notification to http://quickstart.cloudera:11000/oozie/callback?id=0000000-190124093049946-oozie-oozi-W#sqoop-36c5&status=SUCCEEDED succeeded
2019-01-24 09:52:10,590 INFO [Thread-69] org.mortbay.log: Job end notification succeeded for job_1548311302916_0001
2019-01-24 09:52:15,605 INFO [Thread-69] org.apache.hadoop.ipc.Server: Stopping server on 44688
2019-01-24 09:52:15,613 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2019-01-24 09:52:15,617 INFO [IPC Server listener on 44688] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 44688
2019-01-24 09:52:15,637 INFO [Thread-69] org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:0
The sqoop job succeeds but TaskAttempt Transitioned from SUCCESS_FINISHING_CONTAINER to SUCCEEDED is KILLED, why is that.
[cloudera#quickstart ~]$ yarn version
Hadoop 2.6.0-cdh5.13.0
Subversion http://github.com/cloudera/hadoop -r 42e8860b182e55321bd5f5605264da4adc8882be
Compiled by jenkins on 2017-10-04T18:08Z
Compiled with protoc 2.5.0
From source with checksum 5e84c185f8a22158e2b0e4b8f85311
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.13.0.jar
First of all, you are running a listing database command. Not sure why you have put that in workflow.xml file.
I have experienced the Cloudera VM does not behave very consistently as we generally run it with limited memory and either container is not allocated or it is killed. Restarting the entire VM does not help also.
If you get a fresh instance of Cloudera VM again with the image and try to run it, it may solve your problem. It worked for us in past.
I have created map reduce job in java on eclipse.I have create map reduce job for wordcount which reads data(approximate 7453215 record - 670MB) from sql server and store result back to sql server.I have created HDInsight cluster on azure which has 2 head node and 3 worker nodes.Each node has 4 cores and 14GB RAM.Map Reduce job running successfully on local but while i am submitting jar file of map reduce job to HDInsight cluster on azure then it stoppped on map task at 67%.
here is the log,
17/12/01 13:23:20 INFO client.AHSProxy: Connecting to Application
History server at headnodehost/10.0.0.20:10200 17/12/01 13:23:21 INFO
client.RequestHedgingRMFailoverProxyProvider: Looking for the active
RM in [rm1, rm2]... 17/12/01 13:23:21 INFO
client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
17/12/01 13:23:21 WARN mapreduce.JobResourceUploader: Hadoop
command-line option parsing not performed. Implement the Tool
interface and execute your application with ToolRunner to remedy this.
17/12/01 13:23:36 INFO mapreduce.JobSubmitter: number of splits:2
17/12/01 13:23:37 INFO mapreduce.JobSubmitter: Submitting tokens for
job: job_1512119994740_0011 17/12/01 13:23:37 INFO
impl.YarnClientImpl: Submitted application
application_1512119994740_0011 17/12/01 13:23:37 INFO mapreduce.Job:
The url to track the job:
http://hn1-hdpclu.53o3id15rwte5en44vyo02sv0h.dx.internal.cloudapp.net:8088/proxy/application_1512119994740_0011/
17/12/01 13:23:37 INFO mapreduce.Job: Running job:
job_1512119994740_0011 17/12/01 13:23:47 INFO mapreduce.Job: Job
job_1512119994740_0011 running in uber mode : false
17/12/01 13:23:47 INFO mapreduce.Job: map 0% reduce 0%
17/12/01 13:24:00 INFO mapreduce.Job: map 33% reduce 0%
17/12/01 13:24:06 INFO mapreduce.Job: map 67% reduce 0%
Error:
2017-12-02 07:09:17,697 INFO [CommitterEvent Processor #1]
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler:
Processing the event EventType: TASK_ABORT 2017-12-02 07:09:17,697
WARN [CommitterEvent Processor #1]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Output
Path is null in abortTask() 2017-12-02 07:09:17,699 INFO
[AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1512191608534_0003_m_000000_0 TaskAttempt Transitioned from
FAIL_TASK_CLEANUP to FAILED 2017-12-02 07:09:17,708 INFO [Thread-56]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures
on node 10.0.0.11 2017-12-02 07:09:17,709 INFO [AsyncDispatcher event
handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1512191608534_0003_m_000000_1 TaskAttempt Transitioned from
NEW to UNASSIGNED 2017-12-02 07:09:17,709 INFO [Thread-56]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Added
attempt_1512191608534_0003_m_000000_1 to list of failed maps
2017-12-02 07:09:17,721 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1512191608534_0003_m_000001_0 TaskAttempt Transitioned from
FAIL_CONTAINER_CLEANUP to FAIL_TASK_CLEANUP 2017-12-02 07:09:17,728
INFO [CommitterEvent Processor #2]
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler:
Processing the event EventType: TASK_ABORT 2017-12-02 07:09:17,728
WARN [CommitterEvent Processor #2]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Output
Path is null in abortTask() 2017-12-02 07:09:17,728 INFO
[AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1512191608534_0003_m_000001_0 TaskAttempt Transitioned from
FAIL_TASK_CLEANUP to FAILED 2017-12-02 07:09:17,729 INFO [Thread-56]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 2 failures
on node 10.0.0.11 2017-12-02 07:09:17,729 INFO [AsyncDispatcher event
handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1512191608534_0003_m_000001_1 TaskAttempt Transitioned from
NEW to UNASSIGNED 2017-12-02 07:09:17,729 INFO [Thread-56]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Added
attempt_1512191608534_0003_m_000001_1 to list of failed maps
2017-12-02 07:09:18,234 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
Scheduling: PendingReds:1 ScheduledMaps:2 ScheduledReds:0
AssignedMaps:3 AssignedReds:0 CompletedMaps:0 CompletedReds:0
ContAlloc:3 ContRel:0 HostLocal:0 RackLocal:0 2017-12-02 07:09:18,240
INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
getResources() for application_1512191608534_0003: ask=1 release= 0
newContainers=0 finishedContainers=1 resourcelimit= knownNMs=1 2017-12-02 07:09:18,240 INFO [RMCommunicator
Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Received completed container container_1512191608534_0003_01_000002
2017-12-02 07:09:18,240 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Recalculating schedule, headroom= 2017-12-02
07:09:18,240 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce
slow start threshold not met. completedMapsForReduceSlowstart 1
2017-12-02 07:09:18,240 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
Diagnostics report from attempt_1512191608534_0003_m_000000_0:
Container killed by the ApplicationMaster. Container killed on
request. Exit code is 143 Container exited with a non-zero exit code
143
I write my own Scheduler in Hadoop2.6.0 inherit AbstractYarnScheduler.
I compile successfully but when submit job in hadoop, RM broke down.
Here is the log in Master Node
2015-07-19 13:31:59,931 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1437327062733_0001 State change from NEW to NEW_SAVING
**2015-07-19 13:31:59,932 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread**
java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/proto/YarnServerResourceManagerServiceProtos$ApplicationStateDataProtoOrBuilder
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2013)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1978)
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:56)
at org.apache.hadoop.yarn.util.Records.newRecord(Records.java:36)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.records.ApplicationStateData.newInstance(ApplicationStateData.java:43)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.records.ApplicationStateData.newInstance(ApplicationStateData.java:56)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:131)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:1)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:787)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:839)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.proto.YarnServerResourceManagerServiceProtos$ApplicationStateDataProtoOrBuilder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 20 more
2015-07-19 13:31:59,934 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
2015-07-19 13:31:59,937 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2015-07-19 13:31:59,938 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:8088
2015-07-19 13:31:59,938 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2015-07-19 13:31:59,938 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2015-07-19 13:31:59,944 WARN org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 129.10.58.155:50992 Call#42 Retry#0
java.lang.NoSuchMethodError: org.apache.hadoop.yarn.server.utils.BuilderUtils.newApplicationResourceUsageReport(IILorg/apache/hadoop/yarn/api/records/Resource;Lorg/apache/hadoop/yarn/api/records/Resource;Lorg/apache/hadoop/yarn/api/records/Resource;)Lorg/apache/hadoop/yarn/api/records/ApplicationResourceUsageReport;
at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.<clinit>(RMServerUtils.java:237)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:520)
at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:296)
at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:170)
at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:401)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
2015-07-19 13:32:00,039 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032
2015-07-19 13:32:00,040 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8032
2015-07-19 13:32:00,040 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033
2015-07-19 13:32:00,041 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8033
2015-07-19 13:32:00,041 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-07-19 13:32:00,041 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state
2015-07-19 13:32:00,041 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system...
2015-07-19 13:32:00,042 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-07-19 13:32:00,042 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped.
2015-07-19 13:32:00,042 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete.
2015-07-19 13:32:00,042 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, igonring any new events.
2015-07-19 13:32:01,042 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.
2015-07-19 13:32:02,043 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.
2015-07-19 13:32:03,043 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.
2015-07-19 13:32:04,043 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.
2015-07-19 13:32:05,044 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.
2015-07-19 13:32:06,044 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.
2015-07-19 13:32:07,044 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.
2015-07-19 13:32:08,044 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.
According to the error information, the system can't find the class org.apache.hadoop.yarn.proto.YarnServerResourceManagerServiceProtos$ApplicationStateDataProtoOrBuilder, do you add the class to the classpath?
I think it could be some user session/permission issue, I got same error in my standalone instance (Ubuntu Desktop 16 LTE, jdk1.8.92 & Hadoop 2.7.2). It works normal again if I restart my machine & start over again. it keeps popping up same error if I just re-login, restart daemons, resubmit the job on the same terminal session.
Were you able to fix this issue?
Steps to reproduce on my machine are:
(1)Start terminal, login (on terminal) to hadoop dedicated user hduser (a sudo user) using command: su hduser
(2)start hadoop daemons using commands: start-dfs.sh & start-yarn.sh
(3)I can see all processes with jps command.
(4)Few MR jobs completed successfully. I submit same job after about 10-15 min.
(5)hduser user session is thrown out & land in regular desktop user session.
I have a pseudo distributed hadoop cluster running CDH5.0.2. I'm running a sqoop import command:
sudo -u sqoop sqoop import --connect "jdbc:sqlserver://x.x.x.x:1433;databaseName=yyyyy" --username x --password y --table table_name
I'm just importing a very small table that has 12 rows and 2 columns for testing. The job has been running for half hour. On My resource manager, the mapper tasks' status are listed as NEW and their state are listed as SCHEDULED. I don't think it ever runs!
When i list the job on yarn using:
yarn application -list
i'm getting:
14/07/01 15:55:06 INFO client.RMProxy: Connecting to ResourceManager at host/x.x.x.x:8032
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1404252440376_0001 ActivityType.jar MAPREDUCE sqoop root.sqoop RUNNING UNDEFINED 5% http://host:42583
this is the Application Master log im seeing. How do i fix this?
2014-07-01 15:14:12,880 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-07-01 15:14:12,885 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-07-01 15:14:12,888 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at host/x.x.x.x:8030
2014-07-01 15:14:12,973 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: maxContainerCapability: 1024
2014-07-01 15:14:12,973 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: root.sqoop
2014-07-01 15:14:12,977 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit on the thread pool size is 500
2014-07-01 15:14:12,979 INFO [main] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
2014-07-01 15:14:12,985 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1404252440376_0001Job Transitioned from INITED to SETUP
2014-07-01 15:14:12,987 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP
2014-07-01 15:14:12,997 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1404252440376_0001Job Transitioned from SETUP to RUNNING
2014-07-01 15:14:13,018 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1404252440376_0001_m_000000 Task Transitioned from NEW to SCHEDULED
2014-07-01 15:14:13,019 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1404252440376_0001_m_000001 Task Transitioned from NEW to SCHEDULED
2014-07-01 15:14:13,019 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1404252440376_0001_m_000002 Task Transitioned from NEW to SCHEDULED
2014-07-01 15:14:13,019 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1404252440376_0001_m_000003 Task Transitioned from NEW to SCHEDULED
2014-07-01 15:14:13,021 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1404252440376_0001_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-01 15:14:13,021 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1404252440376_0001_m_000001_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-01 15:14:13,021 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1404252440376_0001_m_000002_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-01 15:14:13,021 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1404252440376_0001_m_000003_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-01 15:14:13,022 INFO [Thread-51] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceReqt:1024
2014-07-01 15:14:13,066 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1404252440376_0001, File: hdfs://host:8020/user/sqoop/.staging/job_1404252440376_0001/job_1404252440376_0001_1.jhist
2014-07-01 15:14:13,976 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
2014-07-01 15:14:14,054 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1404252440376_0001: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=1
The main problem I saw when this happened to me is that there were not enough resources for executing sqoop.
When I was executing Sqoop along with other YARN applications it usually didn't have enough resources so map tasks was always stuck at 0%. I went to the driver logs and the last lines of the logs had something like:
2017-02-01 14:54:48,638 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1484947659248_0500_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2017-02-01 14:54:48,638 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1484947659248_0500_m_000001_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2017-02-01 14:54:48,638 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1484947659248_0500_m_000002_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2017-02-01 14:54:48,638 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1484947659248_0500_m_000003_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2017-02-01 14:54:48,639 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1484947659248_0500_m_000004_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2017-02-01 14:54:48,639 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1484947659248_0500_m_000005_0 TaskAttempt Transitioned from NEW to UNASSIGNED
After this, nothing is logged and sqoop is still at 0%.
When there was nothing else running on YARN, Sqoop executed without any problem.
this is not the most helpful answer, but I ended up reinstalling the whole thing.