Yarn container lauch failed exception and mapred-site.xml configuration - hadoop

I have 7 nodes in my Hadoop cluster [8GB RAM and 4VCPUs to each nodes], 1 Namenode + 6 datanodes.
EDIT-1#ARNON: I followed the link, mad calculation according to the hardware configruation on my nodes and have added the update mapred-site and yarn-site.xml files in my question. Still my application is crashing with the same exection
My mapreduce application has 34 input splits with a block size of 128MB.
mapred-site.xml has the following properties:
mapreduce.framework.name = yarn
mapred.child.java.opts = -Xmx2048m
mapreduce.map.memory.mb = 4096
mapreduce.map.java.opts = -Xmx2048m
yarn-site.xml has the following properties:
yarn.resourcemanager.hostname = hadoop-master
yarn.nodemanager.aux-services = mapreduce_shuffle
yarn.nodemanager.resource.memory-mb = 6144
yarn.scheduler.minimum-allocation-mb = 2048
yarn.scheduler.maximum-allocation-mb = 6144
EDIT-2#ARNON: Setting yarn.scheduler.minimum-allocation-mb to 4096 puts all the map task in suspended state and assigning it as 3072 crashes with the follwoing
Exception from container-launch: ExitCodeException exitCode=134: /bin/bash: line 1: 3876 Aborted (core dumped) /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1424264025191_0002/container_1424264025191_0002_01_000011/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_000011
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 192.168.0.12 50842 attempt_1424264025191_0002_m_000005_0 11 >
/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_000011/stdout 2>
/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_000011/stderr
How can avoid this?any help is appreciated
Is there an option to restrict number of containers on hadoop ndoes?

It seems you are allocating too much memory your tasks (even without looking at all the configurations) 8GB RAM and 8GB per map task and all of which is heap
Try to use lower allocations 2Gb with 1GB heap or something like that

Related

Container is running beyond physical memory limits

I have a MapReduce Job that process 1.4 Tb of data.
While doing it, I am getting the error as below.
The number of splits is 6444.
Before starting the job I set the following settings:
conf.set("mapreduce.map.memory.mb", "8192");
conf.set("mapreduce.reduce.memory.mb", "8192");
conf.set("mapreduce.map.java.opts.max.heap", "8192");
conf.set("mapreduce.map.java.opts", "-Xmx8192m");
conf.set("mapreduce.reduce.java.opts", "-Xmx8192m");
conf.set("mapreduce.job.heap.memory-mb.ratio", "0.8");
conf.set("mapreduce.task.timeout", "21600000");
The error:
2018-05-18 00:50:36,595 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1524473936587_2969_m_004719_3: Container [pid=11510,containerID=container_1524473936587_2969_01_004894] is running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical memory used; 8.8 GB of 16.8 GB virtual memory used. Killing container.
Dump of the process-tree for container_1524473936587_2969_01_004894 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 11560 11510 11510 11510 (java) 14960 2833 9460879360 2133706 /usr/lib/jvm/java-7-oracle-cloudera/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/sdk/7/yarn/nm/usercache/administrator/appcache/application_1524473936587_2969/container_1524473936587_2969_01_004894/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.106.79.75 41869 attempt_1524473936587_2969_m_004719_3 4894
|- 11510 11508 11510 11510 (bash) 0 0 11497472 679 /bin/bash -c /usr/lib/jvm/java-7-oracle-cloudera/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/sdk/7/yarn/nm/usercache/administrator/appcache/application_1524473936587_2969/container_1524473936587_2969_01_004894/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.106.79.75 41869 attempt_1524473936587_2969_m_004719_3 4894 1>/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894/stdout 2>/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Any help would be really appreciated!
The setting mapreduce.map.memory.mb will set the physical memory size of the container running the mapper (mapreduce.reduce.memory.mb will do the same for the reducer container).
Besure that you adjust the heap value as well. In newer version of YARN/MRv2 the setting mapreduce.job.heap.memory-mb.ratio can be used to have it auto-adjust. The default is .8, so 80% of whatever the container size is will be allocated as the heap. Otherwise, adjust manually using mapreduce.map.java.opts.max.heap and mapreduce.reduce.java.opts.max.heap settings.
BTW, I believe that 1 GB is the default and it is quite low. I recommend reading the below link. It provides a good understanding of YARN and MR memory setting, how they relate, and how to set some baseline settings based on the cluster node size (disk, memory, and cores).
Reference: http://community.cloudera.com/t5/Cloudera-Manager-Installation/ERROR-is-running-beyond-physical-memory-limits/td-p/55173
Try to set yarn memory allocation limits:
SET yarn.scheduler.maximum-allocation-mb=16G;
SET yarn.scheduler.minimum-allocation-mb=8G;
You may lookup other Yarn settings here:
https://www.ibm.com/support/knowledgecenter/STXKQY_BDA_SHR/bl1bda_tuneyarn.htm
Try with : set yarn.app.mapreduce.am.resource.mb=1000;
Explanation is here :
In spark, spark.driver.memoryOverhead is considered in calculating the total memory required for the driver. By default it is 0.10 of the driver-memory or minimum 384MB. In your case it will be 8GB * 0.1 = 9011MB ~= 9G
YARN allocates memory only in increments/multiples of yarn.scheduler.minimum-allocation-mb .
When yarn.scheduler.minimum-allocation-mb=4G, it can only allocate container sizes of 4G,8G,12G etc. So if something like 9G is requested it will round up to the next multiple and will allocate 12G of container size for the driver.
When yarn.scheduler.minimum-allocation-mb=1G, then container sizes of 8G, 9G, 10G are possible. The nearest rounded up size of 9G will be used in this case.
https://community.cloudera.com/t5/Support-Questions/Yarn-Container-is-running-beyond-physical-memory-limits-but/m-p/199353#M161393

RMAppMaster is running beyond physical memory limits

I am trying to troubleshoot this puzzling issue: RMAppMaster oversteps its allocated container memory and is then killed by the node manager even if heap size is much smaller than container size.
NM logs:
2017-12-01 11:18:49,863 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 14191 for container-id container_1506599288376_62101_01_000001: 1.0 GB of 1 GB physical memory used; 3.1 GB of 2.1 GB virtual memory used
2017-12-01 11:18:49,863 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_1506599288376_62101_01_000001 has processes older than 1 iteration running over the configured limit. Limit=1073741824, current usage = 1076969472
2017-12-01 11:18:49,863 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=14191,containerID=container_1506599288376_62101_01_000001] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 3.1 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1506599288376_62101_01_000001 :
|- 14279 14191 14191 14191 (java) 4915 235 3167825920 262632 /usr/java/default//bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1506599288376_62101/container_1506599288376_62101_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Djava.net.preferIPv4Stack=true -Xmx512m org.apache.hadoop.mapreduce.v2.app.MRAppMaster
|- 14191 14189 14191 14191 (bash) 0 1 108650496 300 /bin/bash -c /usr/java/default//bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1506599288376_62101/container_1506599288376_62101_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Djava.net.preferIPv4Stack=true -Xmx512m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/var/log/hadoop-yarn/container/application_1506599288376_62101/container_1506599288376_62101_01_000001/stdout 2>/var/log/hadoop-yarn/container/application_1506599288376_62101/container_1506599288376_62101_01_000001/stderr
You can observe that while the heap size is set to 512MB, physical memory observed by the NM grows up to 1GB.
Application is an Oozie launcher (Hive task), thus it has only one mapper which does mostly nothing and no reducer.
What baffles me is that only this specific instance of MRAppMaster is killed and I cannot explain the 500MB overhead between max heap size and physical memory as defined by the NM:
Other MRAppMaster instances run fine even with the default config (yarn.app.mapreduce.am.resource.mb = 1024 and yarn.app.mapreduce.am.command-opts = -Xmx825955249).
MRAppMaster does not run any application specific code, why only this one is having trouble? I expect MRAppMaster memory consumption to be somewhat linear to the number of tasks / attempts and this app has only one mapper.
-Xmx has been reduced to 512MB to see if the issue still happens with ~500MB of headroom. I expect MRAppMaster to consume very little native memory, what could those extra 500MB be?
I will try to workaround the issue by increasing yarn.app.mapreduce.am.resource.mb, but had really like to understand what is going on. Any idea?
config: cdh-5.4

Why does Spark only use one executor on my 2 worker node cluster if I increase the executor memory past 5 GB?

I am using a 3 node cluster: 1 master node and 2 worker nodes, using T2.large EC2 instances.
The "free -m" command gives me the following info:
Master:
total used free shared buffers cached
Mem: 7733 6324 1409 0 221 4555
-/+ buffers/cache: 1547 6186
Swap: 1023 0 1023
Worker Node 1:
total used free shared buffers cached
Mem: 7733 3203 4530 0 185 2166
-/+ buffers/cache: 851 6881
Swap: 1023 0 1023
Worker Node 2:
total used free shared buffers cached
Mem: 7733 3402 4331 0 185 2399
-/+ buffers/cache: 817 6915
Swap: 1023 0 1023
In the yarn-site.xml file, I have the following properties set:
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>7733</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>7733</value>
</property>
In $SPARK_HOME/conf/spark-defaults.conf I am setting the spark.executor.cores at 2 and spark.executor.instances at 2.
When looking at the spark-history UI after running my spark application, both executors (1 and 2) show up in the "Executors" tab along with the driver. In the cores column on that same page, it says 2 for both executors.
When I set the executor-memory at 5G and lower, my spark application runs fine with both worker node executors running. When I set the executor memory at 6G or more, only one worker node runs an executor. Why does this happen? Note: I have tried increasing the yarn.nodemanager.resource.memory-mb and it doesn't change this behavior.

Yarn slave nodes are not communicating with master node?

I am not able to see my nodes when I do yarn node -list, even though I have configured /etc/hadoop/conf/yarn-site.xml with the correct properties (it seems to me, at least according to this question Slave nodes not in Yarn ResourceManager).
Here's what I've done so far:
installed resourcemanager on the master
installed nodemanager on the slaves
checked yarn-site.xml for this on ALL the nodes:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master-node</value>
</property>
after modifying the config file, restarted resourcemanager and nodemanager on the master and slaves, respectively.
But yet when I do yarn node -list I only see
Total Nodes: 0
Node-Id Node-state Node-Http-Address Number-of-Running-Containers
At my nodes, I looked at the .out files in /var/log/hadoop-yarn/ and I see this in them:
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 244592
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 32768
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 65536
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
EDIT:
when I look at the .log files I see the following, but I'm not sure how to fix it:
INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state STARTED; cause:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: <master node ip>:8020:8031 (configuration property 'yarn.resourcemanager.resource-tracker.address')
Caused by: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: <master node ip>:8020:8031 (configuration property 'yarn.resourcemanager.resource-tracker.address')
How do I connect my slave nodes to my master node?
The value set for yarn.resourcemanager.hostname acts as the base value for all the ResourceManager properties. The property yarn.resourcemanager.resource-tracker.address defaults to the value of ${yarn.resourcemanager.hostname}:8031. Refer yarn-default.xml for the complete list of default YARN configurations.
And from the nodemanager ERROR log,
Caused by: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: <master node ip>:8020:8031 (configuration property 'yarn.resourcemanager.resource-tracker.address')
It looks the yarn.resourcemanager.hostname property is configured incorrectly as <master node ip>:8020 instead of <master node ip> on the slave nodes.
Edit the yarn-site.xml on all the nodes to have
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master_node</value> <!-- IP address or Hostname of the node where Resource Manager is started, Omit the port number -->
</property>
Finally, restart the YARN services.
please set all this properties and try
<property>
<name>yarn.resourcemanager.address</name>
<value>master_node:8032</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master_node:8033</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master_node:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master_node:8031</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master_node:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>master_node:8090</value>
</property>
You need to set ip for yarn.resourcemanager.hostname property. if you want to use the hostname, your machine needs to know to which ip that hostname is pointing to. So you need to add host entry in /etc/hosts file.
To do that,
Open terminal
Type vim /etc/hosts and hit enter
Add this line at end of the file (use key i to enable insertion)
<your resourcemanager ip><space><your hostname>
example: `192.168.1.23 master-node`
Save the file by typing <Esc> + :wq
Restart the nodemanager
I recommend using ambari kind of managing tool to do these kind of stuffs. This allows easy modification of configuration at any time for hadoop environment. Because manual work always have more chance for error.

How do we change the block size in hadoop

What is the difference between Cloudera CDH3 cluster and Cloudera CDH4 cluster
What is the default hdfs block size in CDH3
What is the default hdfs block size in CDH4
How to change the hdfs block size in cloudera CDH3 and CDH4 cluster
You can see the hdfs block size in the hdfs-site.xml file. The default is generally 64 or 128 MB, but you can change it in the mentioned file, by changing the dfs.blocksize property:
<property>
<name>dfs.blocksize</name>
<value>SIZE_IN_BYTES</value>
</property>
Bear in mind that the value you write must be in bytes, so 128MB for example would be 134217728 Bytes.

Resources