Yarn can't connect to Hadoop HDFS? - hadoop

I am running one of the examples (pi) that came with Hadoop. The program doesn't respond, as it looks like it gets no response back due to connection with HDFS maybe?
yarn jar hadoop/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 10 100
16/07/27 06:32:38 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/07/27 06:32:38 INFO input.FileInputFormat: Total input paths to process : 10
16/07/27 06:32:38 INFO mapreduce.JobSubmitter: number of splits:10
16/07/27 06:32:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469626018898_0001
16/07/27 06:32:39 INFO impl.YarnClientImpl: Submitted application application_1469626018898_0001
16/07/27 06:32:39 INFO mapreduce.Job: The url to track the job: http://IP_ADDRESS/proxy/application_14696260188001/
16/07/27 06:32:39 INFO mapreduce.Job: Running job: job_1469626018898_0001
I do telnet IP_ADDRESS 9000 and connection was successful.
I did already setup hdfs-site.xml with the following (to listen on both private and public addresses):
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>0.0.0.0</value>
</property>
And core-site.xml is setup with:
<property>
<name>fs.defaultFS</name>
<value>hdfs://IP_ADDRESS:9000</value>
</property>
Any ideas why Yarn job looks like its not reaching HDFS service and thereby not completing?

Related

Mapreduce job ipc.Client retrying to connect

I am testing my hadoop cluster which consists of 4 docker containers:
Datanode
Secondary Namenode
Namenode
Resource Manager
When I submit a map reduce job I notice connection issues once both map and reduce are at 100%. This then reaches the maximum number of re-tries before erroring and providing a stack trace. The weird thing is that the job finishes and provides an answer. However the node manager web interface shows a failed job. None of the question/answers I have found so far fix my particular issue.
All my machines have exposed the port range 50100:50200 to comply with the 'yarn.app.mapreduce.am.job.client.port-range' property.
The job I submit is
sudo -u hdfs hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.1.jar pi 1 1
This is the output:
Number of Maps = 1
Samples per Map = 1
Wrote input for Map #0
Starting Job
16/06/18 19:14:07 INFO client.RMProxy: Connecting to ResourceManager at resource-manager/172.19.0.2:8032
16/06/18 19:14:08 INFO input.FileInputFormat: Total input paths to process : 1
16/06/18 19:14:08 INFO mapreduce.JobSubmitter: number of splits:1
16/06/18 19:14:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1466277178029_0001
16/06/18 19:14:08 INFO impl.YarnClientImpl: Submitted application application_1466277178029_0001
16/06/18 19:14:08 INFO mapreduce.Job: The url to track the job: http://resource-manager:8088/proxy/application_1466277178029_0001/
16/06/18 19:14:08 INFO mapreduce.Job: Running job: job_1466277178029_0001
16/06/18 19:14:15 INFO mapreduce.Job: Job job_1466277178029_0001 running in uber mode : false
16/06/18 19:14:15 INFO mapreduce.Job: map 0% reduce 0%
16/06/18 19:14:19 INFO mapreduce.Job: map 100% reduce 0%
16/06/18 19:14:26 INFO mapreduce.Job: map 100% reduce 100%
16/06/18 19:14:32 INFO ipc.Client: Retrying connect to server: 01d3c03f829a/172.19.0.4:50100. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/06/18 19:14:33 INFO ipc.Client: Retrying connect to server: 01d3c03f829a/172.19.0.4:50100. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/06/18 19:14:34 INFO ipc.Client: Retrying connect to server: 01d3c03f829a/172.19.0.4:50100. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/06/18 19:14:36 INFO mapreduce.Job: map 0% reduce 0%
16/06/18 19:14:36 INFO mapreduce.Job: Job job_1466277178029_0001 failed with state FAILED due to: Application application_1466277178029_0001 failed 2 times due to AM Container for appattempt_1466277178029_0001_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://resource-manager:8088/proxy/application_1466277178029_0001/AThen, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1466277178029_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
16/06/18 19:14:36 INFO mapreduce.Job: Counters: 0
Job Finished in 28.862 seconds
Estimated value of Pi is 4.00000000000000000000
the container log has the following:
2016-06-18 19:14:32,273 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1466277178029_0001_000002
2016-06-18 19:14:32,443 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-06-18 19:14:32,475 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2016-06-18 19:14:32,477 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier#3514a4c0)
2016-06-18 19:14:32,515 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred newApiCommitter.
2016-06-18 19:14:33,060 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Attempt num: 2 is last retry: true because a commit was started.
2016-06-18 19:14:33,061 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$NoopEventHandler
2016-06-18 19:14:33,067 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
2016-06-18 19:14:33,068 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
2016-06-18 19:14:33,118 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring
2016-06-18 19:14:33,141 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring
2016-06-18 19:14:33,162 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring
2016-06-18 19:14:33,183 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Emitting job history data to the timeline server is not enabled
2016-06-18 19:14:33,185 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Will not try to recover. recoveryEnabled: true recoverySupportedByCommitter: false numReduceTasks: 1 shuffleKeyValidForRecovery: true ApplicationAttemptID: 2
2016-06-18 19:14:33,210 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring
2016-06-18 19:14:33,212 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at hdfs://namenode:9000/user/hdfs/.staging/job_1466277178029_0001/job_1466277178029_0001_1.jhist
2016-06-18 19:14:33,621 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
2016-06-18 19:14:33,640 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-mrappmaster.properties,hadoop-metrics2.properties
2016-06-18 19:14:33,689 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-06-18 19:14:33,689 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system started
2016-06-18 19:14:33,708 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true
2016-06-18 19:14:33,708 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3
2016-06-18 19:14:33,708 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33
2016-06-18 19:14:33,739 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at resource-manager/172.19.0.2:8030
2016-06-18 19:14:33,814 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: maxContainerCapability: <memory:4096, vCores:4>
2016-06-18 19:14:33,814 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: root.hdfs
2016-06-18 19:14:33,837 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring
2016-06-18 19:14:33,840 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryCopyService: History file is at hdfs://namenode:9000/user/hdfs/.staging/job_1466277178029_0001/job_1466277178029_0001_1.jhist
2016-06-18 19:14:33,894 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1466277178029_0001, File: hdfs://namenode:9000/user/hdfs/.staging/job_1466277178029_0001/job_1466277178029_0001_2.jhist
2016-06-18 19:14:33,959 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: Was asked to shut down.
2016-06-18 19:14:33,959 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.io.IOException: Was asked to shut down.
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1546)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1540)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1473)
2016-06-18 19:14:33,962 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1
A few times it says 'Cannot locate configuration' or 'Default file system is set solely by core-default.xml'. Is this significant? In case this changes anything I am using the cloudera repo to install various hadoop services instead of unpacking a .tar.gz.
My config files are:
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.hosts</name>
<value>*</value>
</property>
</configuration>
yar-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>resource-manager</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>resource-manager:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>resource-manager:8030</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///data/1/yarn/local,file:///data/2/yarn/local,file:///data/3/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///data/1/yarn/logs,file:///data/2/yarn/logs,file:///data/3/yarn/logs</value>
</property>
<property>
<name>yarn.log.aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>Where to aggregate logs</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://namenode:8020/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>resource-manager:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>resource-manager:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>resource-manager:8033</value>
</property>
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>600</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1000</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>namenode:8021</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>history-server:10020</value>
<description>Enter your JobHistoryServer hostname.</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>history-server:19888</value>
<description>Enter your JobHistoryServer hostname.</description>
</property>
<property>
<name>yarn.app.mapreduce.am.job.client.port-range</name>
<value>50100-50200</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.name.dir or dfs.namenode.name.dir</name>
<value>file:///data/1/dfs/nn,file:///nfsmount/dfs/nn</value>
</property>
<property>
<name>dfs.data.dir or dfs.datanode.data.dir</name>
<value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>namenode:50070</value>
<description>
The address and the base port on which the dfs NameNode Web UI will listen.
</description>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
Thanks for reading.
For anyone who has the same issue the solution is to add the following to the hdfs-site.xml:
<property>
<name>dfs.safemode.threshold.pct</name>
<value>0</value>
</property>

Hadoop 2.7.0 - MapReduce Jobs not Running - Failing with AM Container Error

I am using Hadoop 2.7.0 in pseudo node mode, on a Fedora 22 Virtual Machine. A few days back the MapReduce jobs ran fine, but after installed Oozie and made modifications to the yarn-site.xml . I am getting the below error on running the Pi example job and come what may I am not able to debug the error,
EDITED - I am running the job using command line and NOT using the oozie workflow engine .. command - hadoop jar 10 100
Starting Job
15/12/17 15:22:05 INFO client.RMProxy: Connecting to ResourceManager at /192.168.122.1:8032
15/12/17 15:22:06 INFO input.FileInputFormat: Total input paths to process : 10
15/12/17 15:22:06 INFO mapreduce.JobSubmitter: number of splits:10
15/12/17 15:22:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1450326099697_0001
15/12/17 15:22:07 INFO impl.YarnClientImpl: Submitted application application_1450326099697_0001
15/12/17 15:22:07 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1450326099697_0001/
15/12/17 15:22:07 INFO mapreduce.Job: Running job: job_1450326099697_0001
15/12/17 15:22:17 INFO mapreduce.Job: Job job_1450326099697_0001 running in uber mode : false
15/12/17 15:22:17 INFO mapreduce.Job: map 0% reduce 0%
15/12/17 15:22:24 INFO mapreduce.Job: map 10% reduce 0%
15/12/17 15:22:30 INFO mapreduce.Job: map 20% reduce 0%
15/12/17 15:22:36 INFO mapreduce.Job: map 30% reduce 0%
15/12/17 15:22:42 INFO mapreduce.Job: map 40% reduce 0%
15/12/17 15:22:46 INFO mapreduce.Job: map 50% reduce 0%
15/12/17 15:22:51 INFO mapreduce.Job: map 60% reduce 0%
15/12/17 15:22:56 INFO mapreduce.Job: map 70% reduce 0%
15/12/17 15:23:01 INFO mapreduce.Job: map 80% reduce 0%
15/12/17 15:23:07 INFO mapreduce.Job: map 90% reduce 0%
15/12/17 15:23:13 INFO mapreduce.Job: map 100% reduce 0%
15/12/17 15:23:18 INFO mapreduce.Job: map 100% reduce 100%
15/12/17 15:23:23 INFO ipc.Client: Retrying connect to server: vlan722-rsvd-router.ddr.priv/192.168.122.1:34460. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
sleepTime=1000 MILLISECONDS)
15/12/17 15:23:24 INFO ipc.Client: Retrying connect to server: vlan722-rsvd-router.ddr.priv/192.168.122.1:34460. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
sleepTime=1000 MILLISECONDS)
15/12/17 15:23:25 INFO ipc.Client: Retrying connect to server: vlan722-rsvd-router.ddr.priv/192.168.122.1:34460. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
sleepTime=1000 MILLISECONDS)
15/12/17 15:23:28 INFO mapreduce.Job: map 0% reduce 0%
15/12/17 15:23:28 INFO mapreduce.Job: Job job_1450326099697_0001 failed with state FAILED due to: Application application_1450326099697_0001 failed 2 times due to AM Container for
appattempt_1450326099697_0001_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://hadoop:8088/cluster/app/application_1450326099697_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1450326099697_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
15/12/17 15:23:28 INFO mapreduce.Job: Counters: 0
Job Finished in 82.924 seconds
Estimated value of Pi is 3.14800000000000000000
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>
/home/osboxes/hadoop/etc/hadoop,
/home/osboxes/hadoop/share/hadoop/common/*,
/home/osboxes/hadoop/share/hadoop/common/lib/*,
/home/osboxes/hadoop/share/hadoop/hdfs/*,
/home/osboxes/hadoop/share/hadoop/hdfs/lib/*,
/home/osboxes/hadoop/share/hadoop/yarn/*,
/home/osboxes/hadoop/share/hadoop/yarn/lib/*,
/home/osboxes/hadoop/share/hadoop/mapreduce/*,
/home/osboxes/hadoop/share/hadoop/mapreduce/lib/*
</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5120</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>http://192.168.122.1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>http://192.168.122.1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>http://192.168.122.1:8031</value>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>http://192.168.122.1:8041</value>
</property>
Any help on this would be very much appreciated.
EDIT - yarn-site.xml before
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Finally I solved the issue by making the following change to mapred-site.xml ,
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>http://localhost:19888</value>
</property>
After that the jobs ran perfectly fine.

hadoop distcp not working,MR job in accepted state

I am trying to copy data from CDH4 to CDH5 cluster. When I submit the distcp job from CDH5, MR job goes to accepted state and stays there ( I have tried it multiple times, it stayed there for more than 15 hrs). Data I want to copy is less than 10MB.
Below is the setup and steps I am using.
Source: CDH4, e.g. NodeName = cloudera4
Destination: CDH5, e.g. NodeName = Cloudera1
Command used on CDH5:
hadoop distcp hftp://Cloudera4:50070/ hdfs://Cloudera1/
Below is the console output:
[root#Cloudera1-RD opt]# sudo -u hdfs hadoop distcp hftp://Cloudera4:50070/ hdfs://Cloudera1/
15/03/05 10:51:23 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hftp://Cloudera4:50070/], targetPath=hdfs://Cloudera1/, targetPathExists=true, preserveRawXattrs=false}
15/03/05 10:51:23 INFO client.RMProxy: Connecting to ResourceManager at Cloudera1:8032
15/03/05 10:51:27 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
15/03/05 10:51:27 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
15/03/05 10:51:28 INFO client.RMProxy: Connecting to ResourceManager at Cloudera1:8032
15/03/05 10:51:29 INFO mapreduce.JobSubmitter: number of splits:18
15/03/05 10:51:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1425491750932_0010
15/03/05 10:51:30 INFO impl.YarnClientImpl: Submitted application application_1425491750932_0010
15/03/05 10:51:30 INFO mapreduce.Job: The url to track the job: http://Cloudera1:8088/proxy/application_1425491750932_0010/
15/03/05 10:51:30 INFO tools.DistCp: DistCp job-id: job_1425491750932_0010
15/03/05 10:51:30 INFO mapreduce.Job: Running job: job_1425491750932_0010
This MR job stays in Accepted state forever.
I am stuck with this from many days now.
I really appreciate your help.
The problem with your code is Do not run distcp as the hdfs user which is blacklisted for MapReduce jobs by default.
Refer the Link and run distcp
solved it by using:
hdfs dfs -cp s3://<path> hdfs:///user/livy/

Why Mapreduce with YARN stuck on CDH 5.3?

Mapreduce with YARN fail to move ahead of 0% map and 0% reduce. I am using Cloudera CDH on google compute high memory instance(13 GM RAM). 8 GB free ram is available on the machine. Can you please help me to fix it?
sunny#hadoop-m:~$ hadoop jar /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/jars/hadoop-mapreduce-examples-2.5.0-cdh5.3.0.jar grep input output 'dfs[a-z.]+'
14/12/24 00:13:53 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m.c.sunny-hadoop-trial.internal/10.240.253.233:8032
14/12/24 00:13:53 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
14/12/24 00:13:54 INFO input.FileInputFormat: Total input paths to process : 5
14/12/24 00:13:54 INFO mapreduce.JobSubmitter: number of splits:5
14/12/24 00:13:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1419360146634_0001
14/12/24 00:13:54 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/12/24 00:13:54 INFO impl.YarnClientImpl: Submitted application application_1419360146634_0001
14/12/24 00:13:55 INFO mapreduce.Job: The url to track the job: http://hadoop-m.c.sunny-hadoop-trial.internal:8088/proxy/application_1419360146634_0001/
14/12/24 00:13:55 INFO mapreduce.Job: Running job: job_1419360146634_0001
Resource Manager Output
Some more info about job
yarn-site.xml: http://pastebin.mozilla.org/8113782
mapred-site.xml: http://pastebin.mozilla.org/8113813
Server 's IP got changed because of DHCP service. Client configuration for HDFS and YARN became stale. I needed to update client configuration, I did it with Cloudera manager and now cluster is running fine.

Hadoop error in shuffle in fetcher: Exceeded MAX_FAILED_UNIQUE_FETCHES

I am new to hadoop. I have a kerberos security enabled hadoop cluster (master and 1 slave) set up on a virtual box. I am trying to run a job from the hadoop examples 'pi'. The job terminates with the error Exceeded MAX_FAILED_UNIQUE_FETCHES. I tried searching for this error but the solutions given on the internet do not seem to be working for me. Perhaps I am missing something obvious. I even tried removing the slave from the etc/hadoop/slaves file to see if the job can run only on the master but that fails as well with the same error. Below is the log. I am running this on 64-bit Ubuntu 14.04 virtual box. Any help appreciated.
montauk#montauk-vmaster:/usr/local/hadoop$ sudo -u yarn bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar pi 2 10
Number of Maps = 2
Samples per Map = 10
OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/06/05 12:04:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
14/06/05 12:04:49 INFO client.RMProxy: Connecting to ResourceManager at /192.168.0.29:8040
14/06/05 12:04:50 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 17 for yarn on 192.168.0.29:54310
14/06/05 12:04:50 INFO security.TokenCache: Got dt for hdfs://192.168.0.29:54310; Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.0.29:54310, Ident: (HDFS_DELEGATION_TOKEN token 17 for yarn)
14/06/05 12:04:50 INFO input.FileInputFormat: Total input paths to process : 2
14/06/05 12:04:51 INFO mapreduce.JobSubmitter: number of splits:2
14/06/05 12:04:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1401975262053_0007
14/06/05 12:04:51 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.0.29:54310, Ident: (HDFS_DELEGATION_TOKEN token 17 for yarn)
14/06/05 12:04:53 INFO impl.YarnClientImpl: Submitted application application_1401975262053_0007
14/06/05 12:04:53 INFO mapreduce.Job: The url to track the job: http://montauk-vmaster:8088/proxy/application_1401975262053_0007/
14/06/05 12:04:53 INFO mapreduce.Job: Running job: job_1401975262053_0007
14/06/05 12:05:29 INFO mapreduce.Job: Job job_1401975262053_0007 running in uber mode : false
14/06/05 12:05:29 INFO mapreduce.Job: map 0% reduce 0%
14/06/05 12:06:04 INFO mapreduce.Job: map 50% reduce 0%
14/06/05 12:06:06 INFO mapreduce.Job: map 100% reduce 0%
14/06/05 12:06:34 INFO mapreduce.Job: map 100% reduce 100%
14/06/05 12:06:34 INFO mapreduce.Job: Task Id : attempt_1401975262053_0007_r_000000_0, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#4
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)
I came across the same problem as yours when I install cdh5.1.0 with kerberos security using tarball,solutions found by google are insufficient memory,but I don't think it's my situation since my input is very small (52K).
After digging several days,I found root cause in this link.
To sum up solutions in that link can be:
add following property in yarn-site.xml even it's default in yarn-default.xml
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
remove property yarn.nodemanager.local-dirs and use default value /tmp.Then exec following commands:
mkdir -p /tmp/hadoop-yarn/nm-local-dir
chown yarn:yarn /tmp/hadoop-yarn/nm-local-dir
The problem can be concluded:
After setting yarn.nodemanager.local-dirs property, the property yarn.nodemanager.aux-services.mapreduce_shuffle.class in yarn-default.xml doesn't work.
The root cause I haven't found also.
I had the same issue.I had mapreduce job without reducer.Then I solved it using job.setNumReduceTasks(0);
change below property in yarn-site.xml and create the directory.
yarn.nodemanager.local-dirs
/tmp
mkdir -p /tmp/hadoop-yarn/nm-local-dir
chown yarn:yarn /tmp/hadoop-yarn/nm-local-dir
tune the resources properety in mapred-site.xml
mapreduce.reduce.shuffle.input.buffer.percent=0.50
mapreduce.reduce.shuffle.memory.limit.percent=0.2
mapreduce.reduce.shuffle.parallelcopies=4
Restart resourcemanager and nodemanager on their respective nodes.

Resources