I'm new to Hadoop. I am trying to check if my Yarn/Map Reduce is working correctly.(http://hadooptutorial.info/run-example-mapreduce-program/) I get this error message
http://localhost:8088/cluster/app/application_1468349436383_0001
Application application_1468349436383_0001 failed 2 times due to AM Container for appattempt_1468349436383_0001_000002 exited with exitCode: 127
For more detailed output, check application tracking page:http://localhost:8088/cluster/app/application_1468349436383_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1468349436383_0001_02_000001
Exit code: 127
Stack trace: ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 127
Failing this attempt. Failing the application.
I'm guessing my configurations aren't correct because MapReduce dosen't even start up.
I'm really sorry if this is a trivial question. I've very new to hadoop and I couldn't find anything online to fix this.
Here are my hadoop configurations
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>8</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Thank You so much
nodemanager log error snapshot
2016-07-13 10:37:42,145 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1468349436383_0001_01_000001 is : 127
2016-07-13 10:37:42,147 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1468349436383_0001_01_000001 and exit code: 127
ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-07-13 10:37:42,149 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.
2016-07-13 10:37:42,149 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1468349436383_0001_01_000001
2016-07-13 10:37:42,149 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 127
2016-07-13 10:37:42,150 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=127:
2016-07-13 10:37:42,150 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
2016-07-13 10:37:42,150 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:456)
2016-07-13 10:37:42,150 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
2016-07-13 10:37:42,150 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
2016-07-13 10:37:42,150 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
2016-07-13 10:37:42,150 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
2016-07-13 10:37:42,150 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2016-07-13 10:37:42,150 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2016-07-13 10:37:42,150 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2016-07-13 10:37:42,150 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.lang.Thread.run(Thread.java:745)
2016-07-13 10:37:42,150 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 127
2016-07-13 10:37:42,151 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1468349436383_0001_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2016-07-13 10:37:42,151 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1468349436383_0001_01_000001
2016-07-13 10:37:42,183 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /tmp/hadoop-aditghosh/nm-local-dir/usercache/aditghosh/appcache/application_1468349436383_0001/container_1468349436383_0001_01_000001
2016-07-13 10:37:42,184 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=aditghosh OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1468349436383_0001 CONTAINERID=container_1468349436383_0001_01_000001
2016-07-13 10:37:42,188 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1468349436383_0001_01_000001 transitioned from EXITED_WITH_FAILURE to DONE
2016-07-13 10:37:42,188 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_1468349436383_0001_01_000001 from application application_1468349436383_0001
2016-07-13 10:37:42,188 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: ResourceCalculatorPlugin is unavailable on this system. org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is disabled.
2016-07-13 10:37:42,188 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1468349436383_0001
2016-07-13 10:37:43,970 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed containers from NM context: [container_1468349436383_0001_01_000001]
2016-07-13 10:37:43,981 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1468349436383_0001_000002 (auth:SIMPLE)
2016-07-13 10:37:43,986 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1468349436383_0001_02_000001 by user aditghosh
2016-07-13 10:37:43,987 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=aditghosh IP=127.0.0.1 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1468349436383_0001 CONTAINERID=container_1468349436383_0001_02_000001
2016-07-13 10:37:43,987 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Adding container_1468349436383_0001_02_000001 to application application_1468349436383_0001
2016-07-13 10:37:43,988 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1468349436383_0001_02_000001 transitioned from NEW to LOCALIZING
2016-07-13 10:37:43,988 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1468349436383_0001
2016-07-13 10:37:43,988 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1468349436383_0001_02_000001 transitioned from LOCALIZING to LOCALIZED
2016-07-13 10:37:44,037 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1468349436383_0001_02_000001 transitioned from LOCALIZED to RUNNING
2016-07-13 10:37:44,037 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: ResourceCalculatorPlugin is unavailable on this system. org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is disabled.
2016-07-13 10:37:44,100 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /tmp/hadoop-aditghosh/nm-local-dir/usercache/aditghosh/appcache/application_1468349436383_0001/container_1468349436383_0001_02_000001/default_container_executor.sh]
2016-07-13 10:37:44,132 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1468349436383_0001_02_000001 is : 127
2016-07-13 10:37:44,132 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1468349436383_0001_02_000001 and exit code: 127
ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-07-13 10:37:44,132 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.
2016-07-13 10:37:44,132 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1468349436383_0001_02_000001
2016-07-13 10:37:44,132 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 127
2016-07-13 10:37:44,132 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=127:
2016-07-13 10:37:44,132 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
2016-07-13 10:37:44,132 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:456)
2016-07-13 10:37:44,133 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
2016-07-13 10:37:44,133 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
2016-07-13 10:37:44,133 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
2016-07-13 10:37:44,133 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
2016-07-13 10:37:44,133 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2016-07-13 10:37:44,133 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2016-07-13 10:37:44,133 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2016-07-13 10:37:44,133 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.lang.Thread.run(Thread.java:745)
2016-07-13 10:37:44,133 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 127
2016-07-13 10:37:44,133 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1468349436383_0001_02_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2016-07-13 10:37:44,133 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1468349436383_0001_02_000001
2016-07-13 10:37:44,160 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /tmp/hadoop-aditghosh/nm-local-dir/usercache/aditghosh/appcache/application_1468349436383_0001/container_1468349436383_0001_02_000001
2016-07-13 10:37:44,161 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=aditghosh OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1468349436383_0001 CONTAINERID=container_1468349436383_0001_02_000001
2016-07-13 10:37:44,161 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1468349436383_0001_02_000001 transitioned from EXITED_WITH_FAILURE to DONE
2016-07-13 10:37:44,161 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_1468349436383_0001_02_000001 from application application_1468349436383_0001
2016-07-13 10:37:44,161 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: ResourceCalculatorPlugin is unavailable on this system. org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is disabled.
2016-07-13 10:37:44,161 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1468349436383_0001
2016-07-13 10:37:45,983 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed containers from NM context: [container_1468349436383_0001_02_000001]
2016-07-13 10:37:45,987 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1468349436383_0001 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2016-07-13 10:37:45,988 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /tmp/hadoop-aditghosh/nm-local-dir/usercache/aditghosh/appcache/application_1468349436383_0001
2016-07-13 10:37:45,988 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1468349436383_0001
2016-07-13 10:37:45,992 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1468349436383_0001 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2016-07-13 10:37:45,992 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler: Scheduling Log Deletion for application: application_1468349436383_0001, with delay of 10800 seconds
2016-07-13 13:37:46,480 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /usr/local/hadoop/logs/userlogs/application_1468349436383_0001
2016-07-14 08:57:21,802 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater
java.io.IOException: Failed on local exception: java.io.IOException: Operation timed out; Host Details : local host is: "Adits-MacBook-Pro.local/127.0.0.1"; destination host is: "0.0.0.0":8031;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy72.nodeHeartbeat(Unknown Source)
at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy73.nodeHeartbeat(Unknown Source)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:596)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Operation timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:520)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1084)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:979)
Resource Manager log error snapshot
2016-07-13 10:37:44,978 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1468349436383_0001_000002
2016-07-13 10:37:44,978 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Application finished, removing password for appattempt_1468349436383_0001_000002
2016-07-13 10:37:44,978 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1468349436383_0001_000002 State change from FINAL_SAVING to FAILED
2016-07-13 10:37:44,978 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The number of failed attempts is 2. The max attempts is 2
2016-07-13 10:37:44,979 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1468349436383_0001 with final state: FAILED
2016-07-13 10:37:44,987 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1468349436383_0001 State change from ACCEPTED to FINAL_SAVING
2016-07-13 10:37:44,987 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1468349436383_0001
2016-07-13 10:37:44,987 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Attempt appattempt_1468349436383_0001_000002 is done. finalState=FAILED
2016-07-13 10:37:44,987 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1468349436383_0001 requests cleared
2016-07-13 10:37:44,988 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1468349436383_0001 failed 2 times due to AM Container for appattempt_1468349436383_0001_000002 exited with exitCode: 127
For more detailed output, check application tracking page:http://localhost:8088/cluster/app/application_1468349436383_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1468349436383_0001_02_000001
Exit code: 127
Stack trace: ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 127
Failing this attempt. Failing the application.
2016-07-13 10:37:44,988 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application removed - appId: application_1468349436383_0001 user: aditghosh queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2016-07-13 10:37:44,990 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1468349436383_0001 State change from FINAL_SAVING to FAILED
2016-07-13 10:37:44,991 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1468349436383_0001 user: aditghosh leaf-queue of parent: root #applications: 0
2016-07-13 10:37:44,992 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=aditghosh OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1468349436383_0001 failed 2 times due to AM Container for appattempt_1468349436383_0001_000002 exited with exitCode: 127
For more detailed output, check application tracking page:http://localhost:8088/cluster/app/application_1468349436383_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1468349436383_0001_02_000001
Exit code: 127
Stack trace: ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 127
Failing this attempt. Failing the application. APPID=application_1468349436383_0001
2016-07-13 10:37:44,994 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1468349436383_0001,name=QuasiMonteCarlo,user=aditghosh,queue=default,state=FAILED,trackingUrl=http://localhost:8088/cluster/app/application_1468349436383_0001,appMasterHost=N/A,startTime=1468431458818,finishTime=1468431464979,finalStatus=FAILED,memorySeconds=8214,vcoreSeconds=4,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0>,applicationType=MAPREDUCE
2016-07-13 10:47:24,771 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://localhost:8088/cluster/app/application_1468349436383_0001 which is the app master GUI of application_1468349436383_0001 owned by aditghosh
2016-07-13 11:50:38,782 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2016-07-13 11:50:38,782 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2016-07-13 11:50:38,792 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 3
2016-07-13 11:50:38,793 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing RMDTMasterKey.
2016-07-14 08:56:27,045 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8031: readAndProcess from client 10.0.0.19 threw exception [java.io.IOException: Operation timed out]
java.io.IOException: Operation timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:2603)
at org.apache.hadoop.ipc.Server.access$2800(Server.java:136)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1481)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:771)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:637)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:608)
2016-07-14 08:57:22,834 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Received duplicate heartbeat from node localhost:53667 responseId=65089
2016-07-14 11:50:41,081 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2016-07-14 11:50:41,081 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2016-07-14 11:50:41,094 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 4
2016-07-14 11:50:41,096 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing RMDTMasterKey.
Related
While running the oozie sample examples, Oozie jobs are getting scheduled and status showing as Running. After some times jobs get KILLED. While digging the hadoop logs found these exceptions.
I have setup the oozie 4.3.1 with Hadoop 2.7.3. I have also updated the job.properties with proper nameNode and jobTracker configurations.
Please let me know the whats needed to changes and to fix the issue.
Oozie Web Console
User: root
Name: oozie:launcher:T=java:W=java-main-wf:A=java-node:ID=0000000-191231120255907-oozie-root-W
Application Type: MAPREDUCE
Application Tags:
YarnApplicationState: FAILED
Queue: default
FinalStatus Reported by AM: FAILED
Started: Tue Dec 31 12:04:15 +0530 2019
Elapsed: 2mins, 0sec
Tracking URL: History
Diagnostics:
Application application_1576228338940_0013 failed 2 times due to AM Container for appattempt_1576228338940_0013_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://f091403isdpbato05:8088/cluster/app/application_1576228338940_0013Then, click on links to logs of each attempt.
Diagnostics: java.lang.NoClassDefFoundError: Could not initialize class java.net.NetworkInterface
Failing this attempt. Failing the application.
Hadoop Application Overview
yarn-root-nodemanager-load-5.log
2019-12-31 12:04:45,264 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1576228338940_0013_000001 (auth:SIMPLE)
2019-12-31 12:04:45,267 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1576228338940_0013_01_000001 by user root
2019-12-31 12:04:45,267 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1576228338940_0013
2019-12-31 12:04:45,267 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root IP=10.32.193.39 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1576228338940_0013 CONTAINERID=container_1576228338940_0013_01_000001
2019-12-31 12:04:45,268 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1576228338940_0013 transitioned from NEW to INITING
2019-12-31 12:04:45,268 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Adding container_1576228338940_0013_01_000001 to application application_1576228338940_0013
2019-12-31 12:04:45,268 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1576228338940_0013 transitioned from INITING to RUNNING
2019-12-31 12:04:45,268 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1576228338940_0013_01_000001 transitioned from NEW to LOCALIZING
2019-12-31 12:04:45,268 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1576228338940_0013
2019-12-31 12:04:45,269 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://10.32.193.39:9000/user/root/examples/apps/java-main/lib/oozie-examples-4.3.1.jar transitioned from INIT to DOWNLOADING
2019-12-31 12:04:45,269 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://10.32.193.39:9000/user/root/share/lib/lib_20191231120005/oozie/oozie-sharelib-oozie-4.3.1.jar transitioned from INIT to DOWNLOADING
2019-12-31 12:04:45,269 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://10.32.193.39:9000/user/root/share/lib/lib_20191231120005/oozie/oozie-hadoop-utils-hadoop-2-4.3.1.jar transitioned from INIT to DOWNLOADING
2019-12-31 12:04:45,269 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://10.32.193.39:9000/user/root/share/lib/lib_20191231120005/oozie/json-simple-1.1.jar transitioned from INIT to DOWNLOADING
2019-12-31 12:04:45,269 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://10.32.193.39:9000/tmp/hadoop-yarn/staging/root/.staging/job_1576228338940_0013/job.splitmetainfo transitioned from INIT to DOWNLOADING
2019-12-31 12:04:45,269 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://10.32.193.39:9000/tmp/hadoop-yarn/staging/root/.staging/job_1576228338940_0013/job.split transitioned from INIT to DOWNLOADING
2019-12-31 12:04:45,269 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://10.32.193.39:9000/tmp/hadoop-yarn/staging/root/.staging/job_1576228338940_0013/job.xml transitioned from INIT to DOWNLOADING
2019-12-31 12:04:45,269 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://10.32.193.39:9000/user/root/examples/apps/java-main/lib/oozie-examples-4.3.1.jar, 1577773923766, FILE, null }
2019-12-31 12:04:45,278 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://10.32.193.39:9000/user/root/share/lib/lib_20191231120005/oozie/oozie-sharelib-oozie-4.3.1.jar, 1577773807011, FILE, null }
2019-12-31 12:04:45,288 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://10.32.193.39:9000/user/root/share/lib/lib_20191231120005/oozie/oozie-hadoop-utils-hadoop-2-4.3.1.jar, 1577773807007, FILE, null }
2019-12-31 12:04:45,297 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://10.32.193.39:9000/user/root/share/lib/lib_20191231120005/oozie/json-simple-1.1.jar, 1577773807004, FILE, null }
2019-12-31 12:04:45,304 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download resource { { hdfs://10.32.193.39:9000/user/root/examples/apps/java-main/lib/oozie-examples-4.3.1.jar, 1577773923766, FILE, null },pending,[(container_1576228338940_0013_01_000001)],14920685961731101,DOWNLOADING}
java.lang.NoClassDefFoundError: Could not initialize class java.net.NetworkInterface
at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:691)
at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1079)
at org.apache.hadoop.hdfs.RemoteBlockReader2.<init>(RemoteBlockReader2.java:296)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:441)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:818)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:697)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:656)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-12-31 12:04:45,304 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://10.32.193.39:9000/user/root/examples/apps/java-main/lib/oozie-examples-4.3.1.jar(->/tmp/hadoop-root/nm-local-dir/filecache/106/oozie-examples-4.3.1.jar) transitioned from DOWNLOADING to FAILED
2019-12-31 12:04:45,306 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download resource { { hdfs://10.32.193.39:9000/user/root/share/lib/lib_20191231120005/oozie/oozie-sharelib-oozie-4.3.1.jar, 1577773807011, FILE, null },pending,[(container_1576228338940_0013_01_000001)],14920685961866685,DOWNLOADING}
java.lang.NoClassDefFoundError: Could not initialize class java.net.NetworkInterface
at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:691)
at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1079)
at org.apache.hadoop.hdfs.RemoteBlockReader2.<init>(RemoteBlockReader2.java:296)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:441)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:818)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:697)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:656)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
yarn-root-resourcemanager-load-5.log
appattempt_1576228338940_0013_000002 with final state: FAILED, and exit status: -1000
2019-12-31 12:06:15,456 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1576228338940_0013_000002 State change from LAUNCHED to FINAL_SAVING
2019-12-31 12:06:15,456 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1576228338940_0013_000002
2019-12-31 12:06:15,456 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Application finished, removing password for appattempt_1576228338940_0013_000002
2019-12-31 12:06:15,456 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1576228338940_0013_000002 State change from FINAL_SAVING to FAILED
2019-12-31 12:06:15,456 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The number of failed attempts is 2. The max attempts is 2
2019-12-31 12:06:15,456 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1576228338940_0013 with final state: FAILED
2019-12-31 12:06:15,457 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1576228338940_0013 State change from ACCEPTED to FINAL_SAVING
2019-12-31 12:06:15,457 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1576228338940_0013
2019-12-31 12:06:15,457 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Attempt appattempt_1576228338940_0013_000002 is done. finalState=FAILED
2019-12-31 12:06:15,457 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1576228338940_0013 failed 2 times due to AM Container for appattempt_1576228338940_0013_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://f091403isdpbato05:8088/cluster/app/application_1576228338940_0013Then, click on links to logs of each attempt.
Diagnostics: java.lang.NoClassDefFoundError: Could not initialize class java.net.NetworkInterface
Failing this attempt. Failing the application.
2019-12-31 12:06:15,457 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1576228338940_0013 State change from FINAL_SAVING to FAILED
2019-12-31 12:06:15,457 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1576228338940_0013 requests cleared
2019-12-31 12:06:15,457 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1576228338940_0013 failed 2 times due to AM Container for appattempt_1576228338940_0013_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://f091403isdpbato05:8088/cluster/app/application_1576228338940_0013Then, click on links to logs of each attempt.
Diagnostics: java.lang.NoClassDefFoundError: Could not initialize class java.net.NetworkInterface
Failing this attempt. Failing the application. APPID=application_1576228338940_0013
2019-12-31 12:06:15,457 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application removed - appId: application_1576228338940_0013 user: root queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2019-12-31 12:06:15,457 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1576228338940_0013 user: root leaf-queue of parent: root #applications: 0
2019-12-31 12:06:15,457 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1576228338940_0013,name=oozie:launcher:T\=java:W\=java-main-wf:A\=java-node:ID\=0000000-191231120255907-oozie-root-W,user=root,queue=default,state=FAILED,trackingUrl=http://f091403isdpbato05:8088/cluster/app/application_1576228338940_0013,appMasterHost=N/A,startTime=1577774055215,finishTime=1577774175456,finalStatus=FAILED,memorySeconds=58978,vcoreSeconds=28,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0>,applicationType=MAPREDUCE
I have a SPARK job that keeps returning with Exit Code 1 and I am not able to figure out what this particular exit code means and why is the application returning with this code. This is what I see in the Node Manager logs-
2017-07-10 07:54:03,839 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1499673023544_0001_01_000001 and exit code: 1
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1499673023544_0001_01_000001
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 1
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=1:
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:456)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.lang.Thread.run(Thread.java:745)
2017-07-10 07:54:03,844 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 1
2017-07-10 07:54:03,846 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1499673023544_0001_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2017-07-10 07:54:03,846 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1499673023544_0001_01_000001
When I checked the logs the particular application (and container), it didn't return any specific stack trace or error message. This is what I see in the container's log (stderr) when the job terminates.
INFO impl.ContainerManagementProtocolProxy: Opening proxy : myplayground:52311
17/07/10 07:54:02 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. myplayground:36322
17/07/10 07:54:03 INFO cluster.YarnClusterSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor#myplayground:49562/user/Executor#509101946]) with ID 1
17/07/10 07:54:03 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
17/07/10 07:54:03 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done
17/07/10 07:54:03 ERROR yarn.ApplicationMaster: User application exited with status 1
17/07/10 07:54:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
17/07/10 07:54:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
17/07/10 07:54:03 INFO ui.SparkUI: Stopped Spark web UI at http://x.x.x.x:37961
17/07/10 07:54:03 INFO scheduler.DAGScheduler: Stopping DAGScheduler
17/07/10 07:54:03 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
17/07/10 07:54:03 INFO cluster.YarnClusterSchedulerBackend: Asking each executor to shut down
17/07/10 07:54:03 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/07/10 07:54:03 INFO storage.MemoryStore: MemoryStore cleared
17/07/10 07:54:03 INFO storage.BlockManager: BlockManager stopped
17/07/10 07:54:03 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/07/10 07:54:03 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/07/10 07:54:03 INFO spark.SparkContext: Successfully stopped SparkContext
17/07/10 07:54:03 INFO util.ShutdownHookManager: Shutdown hook called
17/07/10 07:54:03 INFO util.ShutdownHookManager: Deleting directory /tmp/Hadoop-hadoop/nm-local-dir/usercache/myprdusr/appcache/application_1499673023544_0001/spark-2adeda9f-9244-4519-b87f-ec895a50cfcd
17/07/10 07:54:03 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/07/10 07:54:03 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
So, in both the logs, all I get to see is that the application exited with exit code 1. Can anyone tell me what this particular error code means and the possible reasons why Yarn is throwing this exception?
I was finally able to fix the problem. What was happening was that my bash script that calls spark-submit was passing an invalid argument to it. When a job starts, a script called launch_container.sh would be executing org.apache.spark.deploy.yarn.ApplicationMaster with the arguments passed to spark-submit and the ApplicationMaster returns with an exit code of 1 when any argument to it is invalid.
More information here
My spark version is 1.6.2 and runs on yarn.
The log of drive container reports SUCCEEDED as follows:
16/11/17 17:25:56 INFO ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
16/11/17 17:25:56 INFO ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
However, The log of one executor container records an ERROR:
16/11/17 17:25:56 WARN CoarseGrainedExecutorBackend: An unknown (xxx-xxx-xxx-xxx:xxxx) driver disconnected.
16/11/17 17:25:56 ERROR CoarseGrainedExecutorBackend: Driver xxx-xxx-xxx-xxx:xxxx disassociated! Shutting down.
I think the job is succeeded because the output results are as expected.
But I want to know why the error was thrown and whether the job was really succeeded.
I find more info from the log of yarn NodeManager:
WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_xxxxxxxxx and exit code: 1
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Someone encounter the same question?
Thanks.
Hi I have installed Cloudera Manager.While turning my benchmarking tests map-reduce task is failing saying following error message :
16/04/01 12:42:40 INFO mapreduce.Job: Job job_1459494626924_0001 running in uber mode : false
16/04/01 12:42:40 INFO mapreduce.Job: map 0% reduce 0%
16/04/01 12:42:54 INFO mapreduce.Job: map 16% reduce 0%
16/04/01 12:42:55 INFO mapreduce.Job: map 29% reduce 0%
16/04/01 12:42:56 INFO mapreduce.Job: map 75% reduce 0%
16/04/01 12:42:57 INFO mapreduce.Job: map 83% reduce 0%
16/04/01 12:42:59 INFO mapreduce.Job: map 100% reduce 0%
16/04/01 12:43:01 INFO mapreduce.Job: Task Id : attempt_1459494626924_0001_r_000000_0, Status : FAILED
Exception from container-launch.
Container id: container_1459494626924_0001_01_000010
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
16/04/01 12:43:01 INFO mapreduce.Job: Task Id : attempt_1459494626924_0001_r_000005_0, Status : FAILED
Exception from container-launch.
Container id: container_1459494626924_0001_01_000015
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
16/04/01 12:43:01 INFO mapreduce.Job: Task Id : attempt_1459494626924_0001_r_000001_0, Status : FAILED
Exception from container-launch.
Container id: container_1459494626924_0001_01_000011
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
16/04/01 12:43:01 INFO mapreduce.Job: Task Id : attempt_1459494626924_0001_r_000006_0, Status : FAILED
Exception from container-launch.
Container id: container_1459494626924_0001_01_000016
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
16/04/01 12:43:01 INFO mapreduce.Job: Task Id : attempt_1459494626924_0001_r_000004_0, Status : FAILED
Exception from container-launch.
Container id: container_1459494626924_0001_01_000014
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
what could be the issue that need to be check.I have done YARN tuning using excel sheet provided by cloudera.I also tried to adjust Namenode's Vcores and memory.
In log file I can see following error message
2016-03-30 22:04:27,404 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.195.48.127 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS
2016-03-30 17:38:24,133 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 15: SIGTERM
2016-03-30 17:38:24,148 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2016-03-30 17:38:24,151 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#slave5:8088
2016-03-30 17:38:24,151 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2016-03-30 17:38:24,152 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2016-03-30 17:38:24,254 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032
2016-03-30 17:38:24,257 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8032
2016-03-30 17:38:24,257 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2016-03-30 17:38:24,257 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033
2016-03-30 17:38:24,260 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8033
2016-03-30 17:38:24,260 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2016-03-30 17:38:24,263 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type STATUS_UPDATE for node slave2:8041
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException
at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:247)
at org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:778)
at org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:736)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:418)
at org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:79)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:866)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:850)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:174)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:242)
... 13 more
2016-03-30 17:38:24,283 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state
2016-03-30 17:38:24,283 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system...
2016-03-30 17:38:24,284 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped.
2016-03-30 17:38:24,285 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete.
2016-03-30 17:38:24,285 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, igonring any new events.
Could anybody suggest what is the problem ??
I submitted a job to a cluster running Hadoop 2.7.1.'jps'is okay in Master and Slaves."hdfs dfsadmin -report" is fun,but when i run any grep or wordcount,it is wrong.Even small input file,it stays for half to one hour,then failed with following errors.
15/12/09 08:42:55 INFO impl.YarnClientImpl: Submitted application application_1449645631518_0003
15/12/09 08:42:55 INFO mapreduce.Job: The url to track the job: http://Master:8088/proxy/application_1449645631518_0003/
15/12/09 08:42:55 INFO mapreduce.Job: Running job: job_1449645631518_0003
15/12/09 09:07:12 INFO mapreduce.Job: Job job_1449645631518_0003 running in uber mode : false
15/12/09 09:07:12 INFO mapreduce.Job: map 0% reduce 0%
15/12/09 09:07:12 INFO mapreduce.Job: Job job_1449645631518_0003 failed with state FAILED due to: Application application_1449645631518_0003 failed 2 times due to ApplicationMaster for attempt appattempt_1449645631518_0003_000002 timed out. Failing the application.
15/12/09 09:07:12 INFO mapreduce.Job: Counters: 0
15/12/09 09:07:13 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/12/09 09:07:13 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1449645631518_0004
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://Master:9000/user/hadoop/grep-temp-105897268
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at org.apache.hadoop.examples.Grep.run(Grep.java:94)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.Grep.main(Grep.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
This is ResourceManager log:
2015-12-09 12:37:11,661 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The number of failed attempts is 2. The max attempts is 2
2015-12-09 12:37:11,661 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1449645631518_0005 with final state: FAILED
2015-12-09 12:37:11,661 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1449645631518_0005
2015-12-09 12:37:11,661 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1449645631518_0005 State change from ACCEPTED to FINAL_SAVING
2015-12-09 12:37:11,661 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Attempt appattempt_1449645631518_0005_000002 is done. finalState=FAILED
2015-12-09 12:37:11,662 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1449645631518_0005_02_000001 Container Transitioned from RUNNING to KILLED
2015-12-09 12:37:11,662 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Completed container: container_1449645631518_0005_02_000001 in state: KILLED event:KILL
2015-12-09 12:37:11,662 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1449645631518_0005 CONTAINERID=container_1449645631518_0005_02_000001
2015-12-09 12:37:11,662 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released container container_1449645631518_0005_02_000001 of capacity <memory:2048, vCores:1> on host Slave2:48352, which currently has 0 containers, <memory:0, vCores:0> used and <memory:8192, vCores:8> available, release resources=true
2015-12-09 12:37:11,662 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default used=<memory:0, vCores:0> numContainers=0 user=hadoop user-resources=<memory:0, vCores:0>
2015-12-09 12:37:11,662 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_1449645631518_0005_02_000001, NodeId: Slave2:48352, NodeHttpAddress: Slave2:8042, Resource: <memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 11.11.1.3:48352 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=<memory:16384, vCores:16>
2015-12-09 12:37:11,663 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> cluster=<memory:16384, vCores:16>
2015-12-09 12:37:11,663 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0
2015-12-09 12:37:11,663 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1449645631518_0005_000002 released container container_1449645631518_0005_02_000001 on node: host: Slave2:48352 #containers=0 available=<memory:8192, vCores:8> used=<memory:0, vCores:0> with event: KILL
2015-12-09 12:37:11,663 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1449645631518_0005 requests cleared
2015-12-09 12:37:11,663 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application removed - appId: application_1449645631518_0005 user: hadoop queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2015-12-09 12:37:11,663 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1449645631518_0005 failed 2 times due to ApplicationMaster for attempt appattempt_1449645631518_0005_000002 timed out. Failing the application.
2015-12-09 12:37:11,667 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1449645631518_0005 State change from FINAL_SAVING to FAILED
2015-12-09 12:37:11,667 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1449645631518_0005 user: hadoop leaf-queue of parent: root #applications: 0
2015-12-09 12:37:11,667 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1449645631518_0005 failed 2 times due to ApplicationMaster for attempt appattempt_1449645631518_0005_000002 timed out. Failing the application. APPID=application_1449645631518_0005
2015-12-09 12:37:11,668 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1449645631518_0005,name=grep-search,user=hadoop,queue=default,state=FAILED,trackingUrl=http://Master:8088/cluster/app/application_1449645631518_0005,appMasterHost=N/A,startTime=1449663079331,finishTime=1449664631661,finalStatus=FAILED,memorySeconds=3177991,vcoreSeconds=1550,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0>,applicationType=MAPREDUCE
2015-12-09 12:37:11,668 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning master appattempt_1449645631518_0005_000002
2015-12-09 12:37:12,366 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 6
2015-12-09 12:37:12,710 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Null container completed...
2015-12-09 12:37:12,711 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Null container completed...
What is wrong with it.It has troubled me several days,thank you very much for help me!