I have one master one worker cluster. I am upgrading to YARN from Hadoop classic. resourcemanager and historyserver successfully started, but nodemanager is not starting it is giving error
java.lang.NumberFormatException: For input string: "${nodemanager.resource.memory-mb}"
I have kept same yarn-site.xml.template in both server.
I have replaced ${nodemanager.resource.memory-mb} to 8192
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>__RM_IP__</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>${nodemanager.resource.memory-mb}</value>
</property>
</configuration></br>
i created HA-Cluster with one DataNode ,active NameNode , Standby NameNode and three JournalNode
when is was put a file to HDFS get this error:
put: Operation category READ is not supported in state standby
put command :
./hadoop fs -put golnaz.txt /user/input
NameNode log:
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.rollEditLog(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:148)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:273)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:315)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
2016-09-15 02:07:23,961 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000, call org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol.rollEditLog from 103.41.177.161:45797 Call#11403 Retry#0: org.apache.hadoop.ipc.StandbyException: Operation category JOURNAL is not supported in state standby
2016-09-15 02:07:30,547 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000, call org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol.rollEditLog from 103.41.177.160:39200 Call#11404 Retry#0: org.apache.hadoop.ipc.StandbyException: Operation category JOURNAL is not supported in state standby
Error in SecondaryNameNode log :
ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category JOURNAL is not supported in state standby
and this is HDFS-Site.xml:
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/root/hadoopstorage/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/root/hadoopstorage/name</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>ha-cluster</value>
</property>
<property>
<name>dfs.ha.namenodes.ha-cluster</name>
<value>NameNode,Standby</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ha-cluster.NameNode</name>
<value>103.41.177.161:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ha-cluster.Standby</name>
<value>103.41.177.162:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ha-cluster.NameNode</name>
<value>103.41.177.161:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.ha-cluster.Standby</name>
<value>103.41.177.162:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
value>qjournal://103.41.177.161:8485;103.41.177.162:8485;103.41.177.160:8485/ha-cluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ha-cluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>3000</value>
</property>
</configuration>
You didn't mention anything about the automatic failover (ZKFC & Zookeeper). Without it, hdfs won't failover automatically.
You may try the following : make sure that both of your namenodes are in the standby state, by checking the namenodes consoles (or using the getServiceState command from Administrative commands). If so, manually trigger the transition using -transitionToActive command and tail the namenodes logs at the same time. In case of transition failure update your post with namenode logs.
I want to test out a cluster of some few computers: Each with 2 Cores and 256 MB of RAM. By following Cloudera's tutorial, I've tried instructing Hadoop 2.6.0 about my low resources NodeManagers( Ubuntu 14.04). I have the following configurations :
mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoop-master:54311</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop-master:19888</value>
</property>
<property>
<name>mapred.task.profile</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>200</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>200</value>
</property>
<property>
<name>mapreduce.map.java.opts.max.heap</name>
<value>160</value>
</property>
<property>
<name>mapreduce.reduce.java.opts.max.heap</name>
<value>160</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value> org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>200</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>200</value>
</property>
<property>
<name>yarn.scheduler.increment-allocation-mb</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-master:8050</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/app-logs</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///usr/local/hadoop/local</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>200</value>
</property>
</configuration>
But when I try and run a small pi generation example, I get this error:
yarn jar hadoop-mapreduce-examples-2.6.0.jar pi 1 1
Number of Maps = 1
Samples per Map = 1
Wrote input for Map #0
Starting Job
16/01/28 19:23:24 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/10.0.3.100:8050
16/01/28 19:23:25 INFO input.FileInputFormat: Total input paths to process : 1
16/01/28 19:23:25 INFO mapreduce.JobSubmitter: number of splits:1
16/01/28 19:23:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1454008935455_0001
16/01/28 19:23:26 INFO impl.YarnClientImpl: Submitted application application_1454008935455_0001
16/01/28 19:23:26 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1454008935455_0001/
16/01/28 19:23:26 INFO mapreduce.Job: Running job: job_1454008935455_0001
16/01/28 19:23:34 INFO mapreduce.Job: Job job_1454008935455_0001 running in uber mode : false
16/01/28 19:23:34 INFO mapreduce.Job: map 0% reduce 0%
16/01/28 19:23:34 INFO mapreduce.Job: Job job_1454008935455_0001 failed with state FAILED due to: Application application_1454008935455_0001 failed 2 times due to AM Container for appattempt_1454008935455_0001_000002 exited with exitCode: -103
For more detailed output, check application tracking page:http://hadoop-master:8088/proxy/application_1454008935455_0001/Then, click on links to logs of each attempt.
Diagnostics: Container [pid=847,containerID=container_1454008935455_0001_02_000001] is running beyond virtual memory limits. Current usage: 210.8 MB of 200 MB physical memory used; 1.3 GB of 420.0 MB virtual memory used. Killing container.
Dump of the process-tree for container_1454008935455_0001_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 855 847 847 847 (java) 466 16 1410424832 53695 /usr/lib/jvm/java-7-openjdk-i386/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/usr/local/hadoop/logs/userlogs/application_1454008935455_0001/container_1454008935455_0001_02_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster
|- 847 845 847 847 (bash) 0 0 5431296 276 /bin/bash -c /usr/lib/jvm/java-7-openjdk-i386/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/usr/local/hadoop/logs/userlogs/application_1454008935455_0001/container_1454008935455_0001_02_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/usr/local/hadoop/logs/userlogs/application_1454008935455_0001/container_1454008935455_0001_02_000001/stdout 2>/usr/local/hadoop/logs/userlogs/application_1454008935455_0001/container_1454008935455_0001_02_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.
16/01/28 19:23:34 INFO mapreduce.Job: Counters: 0
Job Finished in 9.962 seconds
java.io.FileNotFoundException: File does not exist: hdfs://hadoop-master:9000/user/hduser/QuasiMonteCarlo_1454009003268_765740795/out/reduce-out
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1750)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1774)
at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Is there an error in this configuration? Or maybe Hadoop isn't made for such low resources. I'm just doing this for learning purposes.
Yeah you'll get into trouble with low resources. For testing purposes disable mem checks:
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
For yarn.scheduler.minimum-allocation-mb you might go even lower because actual reserved mem is used in incremental steps. i.e. if you set it to 100 and request 101, yarn will round it up to 200.
vmem check is unreliable and imho should really be disabled on yarn by default.
I tried to run simple word count as MapReduce job. Everything works fine when run locally (all work done on Name Node). But, when I try to run it on a cluster using YARN (adding mapreduce.framework.name=yarn to mapred-site.conf) job hangs.
I came across a similar problem here:
MapReduce jobs get stuck in Accepted state
Output from job:
*** START ***
15/12/25 17:52:50 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/12/25 17:52:51 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/12/25 17:52:51 INFO input.FileInputFormat: Total input paths to process : 5
15/12/25 17:52:52 INFO mapreduce.JobSubmitter: number of splits:5
15/12/25 17:52:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1451083949804_0001
15/12/25 17:52:53 INFO impl.YarnClientImpl: Submitted application application_1451083949804_0001
15/12/25 17:52:53 INFO mapreduce.Job: The url to track the job: http://hadoop-droplet:8088/proxy/application_1451083949804_0001/
15/12/25 17:52:53 INFO mapreduce.Job: Running job: job_1451083949804_0001
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.tracker</name>
<value>localhost:54311</value>
</property>
<!--
<property>
<name>mapreduce.job.tracker.reserved.physicalmemory.mb</name>
<value></value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>3000</value>
<source>mapred-site.xml</source>
</property> -->
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!--
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>3000</value>
<source>yarn-site.xml</source>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>500</value>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>3000</value>
</property>
-->
</configuration>
//I the left commented options - they were not solving the problem
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
What can be the problem?
EDIT:
I tried this configuration (commented) on machines: NameNode(8GB RAM) + 2x DataNode (4GB RAM). I get the same effect: Job hangs on ACCEPTED state.
EDIT2:
changed configuration (thanks #Manjunath Ballur) to:
yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-droplet</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-droplet:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-droplet:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-droplet:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop-droplet:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop-droplet:8088</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$YARN_HOME/*,$YARN_HOME/lib/*
</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/1/yarn/local,/data/2/yarn/local,/data/3/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/data/1/yarn/logs,/data/2/yarn/logs,/data/3/yarn/logs</value>
</property>
<property>
<description>Where to aggregate logs</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>50</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>390</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>390</value>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>50</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx40m</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>50</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>50</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx40m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx40m</value>
</property>
</configuration>
Still not working.
Additional info: I can see no nodes on cluster preview (similar problem here: Slave nodes not in Yarn ResourceManager )
You should check the status of Node managers in your cluster. If the NM nodes are short on disk space then RM will mark them "unhealthy" and those NMs can't allocate new containers.
1) Check the Unhealthy nodes: http://<active_RM>:8088/cluster/nodes/unhealthy
If the "health report" tab says "local-dirs are bad" then it means you need to cleanup some disk space from these nodes.
2) Check the DFS dfs.data.dir property in hdfs-site.xml. It points the location on local file system where hdfs data is stored.
3) Login to those machines and use df -h & hadoop fs - du -h commands to measure the space occupied.
4) Verify hadoop trash and delete it if it's blocking you.
hadoop fs -du -h /user/user_name/.Trash and hadoop fs -rm -r /user/user_name/.Trash/*
I feel, you are getting your memory settings wrong.
To understand the tuning of YARN configuration, I found this to be a very good source: http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_yarn_tuning.html
I followed the instructions given in this blog and was able to get my jobs running. You should alter your settings proportional to the physical memory you have on your nodes.
Key things to remember is:
Values of mapreduce.map.memory.mb and mapreduce.reduce.memory.mb should be at least yarn.scheduler.minimum-allocation-mb
Values of mapreduce.map.java.opts and mapreduce.reduce.java.opts should be around "0.8 times the value of" corresponding mapreduce.map.memory.mb and mapreduce.reduce.memory.mb configurations. (In my case it is 983 MB ~ (0.8 * 1228 MB))
Similarly, value of yarn.app.mapreduce.am.command-opts should be "0.8 times the value of" yarn.app.mapreduce.am.resource.mb
Following are the settings I use and they work perfectly for me:
yarn-site.xml:
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1228</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>9830</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>9830</value>
</property>
mapred-site.xml
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>1228</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx983m</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1228</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1228</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx983m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx983m</value>
</property>
You can also refer to the answer here: Yarn container understanding and tuning
You can add vCore settings, if you want your container allocation to take into account CPU also. But, for this to work, you need to use CapacityScheduler with DominantResourceCalculator. See the discussion about this here: How are containers created based on vcores and memory in MapReduce2?
This has solved my case for this error:
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>100</value>
</property>
Check your hosts file on master and slave nodes. I had exactly this problem. My hosts file looked like this on master node for example
127.0.0.0 localhost
127.0.1.1 master-virtualbox
192.168.15.101 master
I changed it like below
192.168.15.101 master master-virtualbox localhost
So it worked.
These lines
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>100</value>
</property>
in the yarn-site.xml solved my problem since the node will be marked as unhealthy when disk usage is >=95%. Solution mainly suitable for pseudodistributed mode.
You have 512 MB RAM on each of the instance and all your memory configurations in yarn-site.xml and mapred-site.xml are 500 MB to 3 GB. You will not be able to run any thing on the cluster. Change every thing to ~256 MB.
Also your mapred-site.xml is using framework to by yarn and you have job tracker address which is not correct. You need to have resource manager related parameters in yarn-site.xml on a multinode cluster (including resourcemanager web address). With out that, the cluster does not know where your cluster is.
You need to revisit both your xml files.
anyway that's work for me .thank you a lot! #KaP
that's my yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>MacdeMacBook-Pro.local</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
that's my mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
The first thing is to check yarn resource manager logs. I had searched the Internet about this problem for a very long time, but nobody told me how to find out what is really happening. It's so straightforward and simple to check yarn resource manager logs. I am confused why people ignore logs.
For me, there was a error in log
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=172.16.0.167/172.16.0.167:55622]
That's because I switched wifi network in my work place, so my computer IP changed.
Old question, but I got on the same issue recently and in my case it was due to manually setting the master to local in the code.
Please, search for conf.setMaster("local[*]") and remove it.
Hope it helps.
I am trying to test my hadoop installation by running a wordcount job. My problem is that the job gets stuck at ACCEPTED state and seems to run forever. I am using hadoop 2.3.0 and tried fix the problem by following an answer to this question here but it didn't work for me.
This is what I have:
C:\hadoop-2.3.0>yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar wordcount /data/test.txt /data/output
15/03/15 15:36:07 INFO client.RMProxy: Connecting to ResourceManager at/0.0.0.0:8032
15/03/15 15:36:09 INFO input.FileInputFormat: Total input paths to process : 1
15/03/15 15:36:10 INFO mapreduce.JobSubmitter: number of splits:1
15/03/15 15:36:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14 26430101974_0001
15/03/15 15:36:11 INFO impl.YarnClientImpl: Submitted application application_14 26430101974_0001
15/03/15 15:36:11 INFO mapreduce.Job: The url to track the job: http://Agata-PC:8088/proxy/application_1426430101974_0001/
15/03/15 15:36:11 INFO mapreduce.Job: Running job: job_1426430101974_0001
This is my mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>127.0.0.1:9001</value>
</property>
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>/user</value>
</property>
<property>
<name>mapreduce.history.server.http.address</name>
<value>127.0.0.1:51111</value>
<description>Http address of the history server</description>
<final>false</final>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx768m</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>1</value>
<description>The number of virtual cores required for each map task.</description>
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>1</value>
<description>The number of virtual cores required for each map task.</description>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
<description>Larger resource limit for maps.</description>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of maps.</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
<description>Larger resource limit for reduces.</description>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of reduces.</description>
</property>
</configuration>
And this is my yarn-site.xml:
<configuration>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>128</value>
<description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
<description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
<description>Physical memory, in MB, to be made available to running containers</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
<description>Number of CPU cores that can be allocated for containers.</description>
</property>
</configuration>
Any help is much appreciated.
Did you try restarting your hadoop's processes or clusters? There might be some works still running.
May be it will be helpful to see the log by following the url of the job or by going through the hadoop url.
Cheers.
I have run into similar issue earlier, you might have a infinite loop in mapper or reducer . Check if your reducer is properly handling iterable.