I have a MapReduce Job that process 1.4 Tb of data.
While doing it, I am getting the error as below.
The number of splits is 6444.
Before starting the job I set the following settings:
conf.set("mapreduce.map.memory.mb", "8192");
conf.set("mapreduce.reduce.memory.mb", "8192");
conf.set("mapreduce.map.java.opts.max.heap", "8192");
conf.set("mapreduce.map.java.opts", "-Xmx8192m");
conf.set("mapreduce.reduce.java.opts", "-Xmx8192m");
conf.set("mapreduce.job.heap.memory-mb.ratio", "0.8");
conf.set("mapreduce.task.timeout", "21600000");
The error:
2018-05-18 00:50:36,595 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1524473936587_2969_m_004719_3: Container [pid=11510,containerID=container_1524473936587_2969_01_004894] is running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical memory used; 8.8 GB of 16.8 GB virtual memory used. Killing container.
Dump of the process-tree for container_1524473936587_2969_01_004894 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 11560 11510 11510 11510 (java) 14960 2833 9460879360 2133706 /usr/lib/jvm/java-7-oracle-cloudera/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/sdk/7/yarn/nm/usercache/administrator/appcache/application_1524473936587_2969/container_1524473936587_2969_01_004894/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.106.79.75 41869 attempt_1524473936587_2969_m_004719_3 4894
|- 11510 11508 11510 11510 (bash) 0 0 11497472 679 /bin/bash -c /usr/lib/jvm/java-7-oracle-cloudera/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/sdk/7/yarn/nm/usercache/administrator/appcache/application_1524473936587_2969/container_1524473936587_2969_01_004894/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.106.79.75 41869 attempt_1524473936587_2969_m_004719_3 4894 1>/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894/stdout 2>/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Any help would be really appreciated!
The setting mapreduce.map.memory.mb will set the physical memory size of the container running the mapper (mapreduce.reduce.memory.mb will do the same for the reducer container).
Besure that you adjust the heap value as well. In newer version of YARN/MRv2 the setting mapreduce.job.heap.memory-mb.ratio can be used to have it auto-adjust. The default is .8, so 80% of whatever the container size is will be allocated as the heap. Otherwise, adjust manually using mapreduce.map.java.opts.max.heap and mapreduce.reduce.java.opts.max.heap settings.
BTW, I believe that 1 GB is the default and it is quite low. I recommend reading the below link. It provides a good understanding of YARN and MR memory setting, how they relate, and how to set some baseline settings based on the cluster node size (disk, memory, and cores).
Reference: http://community.cloudera.com/t5/Cloudera-Manager-Installation/ERROR-is-running-beyond-physical-memory-limits/td-p/55173
Try to set yarn memory allocation limits:
SET yarn.scheduler.maximum-allocation-mb=16G;
SET yarn.scheduler.minimum-allocation-mb=8G;
You may lookup other Yarn settings here:
https://www.ibm.com/support/knowledgecenter/STXKQY_BDA_SHR/bl1bda_tuneyarn.htm
Try with : set yarn.app.mapreduce.am.resource.mb=1000;
Explanation is here :
In spark, spark.driver.memoryOverhead is considered in calculating the total memory required for the driver. By default it is 0.10 of the driver-memory or minimum 384MB. In your case it will be 8GB * 0.1 = 9011MB ~= 9G
YARN allocates memory only in increments/multiples of yarn.scheduler.minimum-allocation-mb .
When yarn.scheduler.minimum-allocation-mb=4G, it can only allocate container sizes of 4G,8G,12G etc. So if something like 9G is requested it will round up to the next multiple and will allocate 12G of container size for the driver.
When yarn.scheduler.minimum-allocation-mb=1G, then container sizes of 8G, 9G, 10G are possible. The nearest rounded up size of 9G will be used in this case.
https://community.cloudera.com/t5/Support-Questions/Yarn-Container-is-running-beyond-physical-memory-limits-but/m-p/199353#M161393
Related
I am trying to troubleshoot this puzzling issue: RMAppMaster oversteps its allocated container memory and is then killed by the node manager even if heap size is much smaller than container size.
NM logs:
2017-12-01 11:18:49,863 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 14191 for container-id container_1506599288376_62101_01_000001: 1.0 GB of 1 GB physical memory used; 3.1 GB of 2.1 GB virtual memory used
2017-12-01 11:18:49,863 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_1506599288376_62101_01_000001 has processes older than 1 iteration running over the configured limit. Limit=1073741824, current usage = 1076969472
2017-12-01 11:18:49,863 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=14191,containerID=container_1506599288376_62101_01_000001] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 3.1 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1506599288376_62101_01_000001 :
|- 14279 14191 14191 14191 (java) 4915 235 3167825920 262632 /usr/java/default//bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1506599288376_62101/container_1506599288376_62101_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Djava.net.preferIPv4Stack=true -Xmx512m org.apache.hadoop.mapreduce.v2.app.MRAppMaster
|- 14191 14189 14191 14191 (bash) 0 1 108650496 300 /bin/bash -c /usr/java/default//bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1506599288376_62101/container_1506599288376_62101_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Djava.net.preferIPv4Stack=true -Xmx512m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/var/log/hadoop-yarn/container/application_1506599288376_62101/container_1506599288376_62101_01_000001/stdout 2>/var/log/hadoop-yarn/container/application_1506599288376_62101/container_1506599288376_62101_01_000001/stderr
You can observe that while the heap size is set to 512MB, physical memory observed by the NM grows up to 1GB.
Application is an Oozie launcher (Hive task), thus it has only one mapper which does mostly nothing and no reducer.
What baffles me is that only this specific instance of MRAppMaster is killed and I cannot explain the 500MB overhead between max heap size and physical memory as defined by the NM:
Other MRAppMaster instances run fine even with the default config (yarn.app.mapreduce.am.resource.mb = 1024 and yarn.app.mapreduce.am.command-opts = -Xmx825955249).
MRAppMaster does not run any application specific code, why only this one is having trouble? I expect MRAppMaster memory consumption to be somewhat linear to the number of tasks / attempts and this app has only one mapper.
-Xmx has been reduced to 512MB to see if the issue still happens with ~500MB of headroom. I expect MRAppMaster to consume very little native memory, what could those extra 500MB be?
I will try to workaround the issue by increasing yarn.app.mapreduce.am.resource.mb, but had really like to understand what is going on. Any idea?
config: cdh-5.4
I am running on a 4 node cluster.
Please suggest what should I put the size of container memory. If I put it more than 1 GB then exactly what size.
What are the criteria for configuring container memory?
Error:
Diagnostic Messages for this Task:
Container [pid=46465,containerID=container_1503271937182_4757_01_000032] is
running beyond physical memory limits. Current usage: 1.0 GB of 1 GB
physical memory used; 2.7 GB of 2.1 GB virtual memory used. Killing
container.
Dump of the process-tree for container_1503271937182_4757_01_000032 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES)
FULL_CMD_LINE
|- 46465 46463 46465 46465 (bash) 0 0 108654592 308 /bin/bash -c /usr/java/jdk1.8.0_121/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/data3/yarn/nm/usercache/hdfs/appcache/application_1503271937182_4757/container_1503271937182_4757_01_000032/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.104.72.113 58079 attempt_1503271937182_4757_m_000015_3 32 1>/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032/stdout
2>/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032/stderr
|- 46483 46465 46465 46465 (java) 2929 1281 2828042240 262018 /usr/java/jdk1.8.0_121/bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/data3/yarn/nm/usercache/hdfs/appcache/application_1503271937182_4757/container_1503271937182_4757_01_000032/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.104.72.113 58079 attempt_1503271937182_4757_m_000015_3 32
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 18 Reduce: 72 Cumulative CPU: 1219.78 sec HDFS Read: 3412303867 HDFS Write: 3935714 SUCCESS
Stage-Stage-9: Map: 18 Reduce: 72 Cumulative CPU: 332.43 sec HDFS Read: 3321536722 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 25 minutes 52 seconds 210 msec
hive failed
I have 7 nodes in my Hadoop cluster [8GB RAM and 4VCPUs to each nodes], 1 Namenode + 6 datanodes.
EDIT-1#ARNON: I followed the link, mad calculation according to the hardware configruation on my nodes and have added the update mapred-site and yarn-site.xml files in my question. Still my application is crashing with the same exection
My mapreduce application has 34 input splits with a block size of 128MB.
mapred-site.xml has the following properties:
mapreduce.framework.name = yarn
mapred.child.java.opts = -Xmx2048m
mapreduce.map.memory.mb = 4096
mapreduce.map.java.opts = -Xmx2048m
yarn-site.xml has the following properties:
yarn.resourcemanager.hostname = hadoop-master
yarn.nodemanager.aux-services = mapreduce_shuffle
yarn.nodemanager.resource.memory-mb = 6144
yarn.scheduler.minimum-allocation-mb = 2048
yarn.scheduler.maximum-allocation-mb = 6144
EDIT-2#ARNON: Setting yarn.scheduler.minimum-allocation-mb to 4096 puts all the map task in suspended state and assigning it as 3072 crashes with the follwoing
Exception from container-launch: ExitCodeException exitCode=134: /bin/bash: line 1: 3876 Aborted (core dumped) /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1424264025191_0002/container_1424264025191_0002_01_000011/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_000011
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 192.168.0.12 50842 attempt_1424264025191_0002_m_000005_0 11 >
/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_000011/stdout 2>
/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_000011/stderr
How can avoid this?any help is appreciated
Is there an option to restrict number of containers on hadoop ndoes?
It seems you are allocating too much memory your tasks (even without looking at all the configurations) 8GB RAM and 8GB per map task and all of which is heap
Try to use lower allocations 2Gb with 1GB heap or something like that
Version: spring-xd-1.0.1
Distributed mode: yarn
Hadoop version: cdh5
I have modified the config/servers.yml to point to right applicationDir, zookeeper, hdfs, resourcemanager,redis, mysqldb
However after the push, when I start admin, it is killed by yarn after sometime.
I do not understand why the container will consume 31G of memory.
Please point me in the right direction to debug this problem. Also, how do I increase the log level
Following error is observed in logs:
Got ContainerStatus=[container_id { app_attempt_id { application_id { id: 432 cluster_timestamp: 1415816376410 } attemptId: 1 } id: 2 } state: C_COMPLETE diagnostics: "Container [pid=19374,containerID=container_1415816376410_0432_01_000002] is running beyond physical memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 31.7 GB of 2.1 GB virtual memory used. Killing container.\nDump of the process-tree for container_1415816376410_0432_01_000002 :\n\t|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE\n\t|- 19381 19374 19374 19374 (java) 3903 121 33911242752 303743 /usr/java/jdk1.7.0_45-cloudera/bin/java -DxdHomeDir=./spring-xd-yarn-1.0.1.RELEASE.zip -Dxd.module.config.location=file:./modules-config.zip/ -Dspring.application.name=admin -Dspring.config.location=./servers.yml org.springframework.xd.dirt.server.AdminServerApplication \n\t|- 19374 24125 19374 19374 (bash) 0 0 110804992 331 /bin/bash -c /usr/java/jdk1.7.0_45-cloudera/bin/java -DxdHomeDir=./spring-xd-yarn-1.0.1.RELEASE.zip -Dxd.module.config.location=file:./modules-config.zip/ -Dspring.application.name=admin -Dspring.config.location=./servers.yml org.springframework.xd.dirt.server.AdminServerApplication 1>/var/log/hadoop-yarn/container/application_1415816376410_0432/container_1415816376410_0432_01_000002/Container.stdout 2>/var/log/hadoop-yarn/container/application_1415816376410_0432/container_1415816376410_0432_01_000002/Container.stderr \n\nContainer killed on request. Exit code is 143\nContainer exited with a non-zero exit code 143\n" exit_status: 143
Yes, with the current version 1.1.0/1.1.1 you don't need to run the admin explicitly. The containers and admin will be instantiated by yarn when you submit the application.
I am running an oozie workflow consisting of a shell action that runs this script.
java -classpath my.jar my.package.Main
The shell action configuration looks like this
<action name="run-test-script">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>run.sh</exec>
<file>${wf:conf("oozie.wf.application.path")}/run.sh</file>
</shell>
<ok to="end"/>
<error to="report_failure"/>
</action>
In Yarn (Cloudera Mgr GUI) I have the following set
mapreduce.map.memory.mb=4GiB
When I run this, it fails and in Yarn I get the following error message:
Container [pid=8340,containerID=container_1397556756420_5519_01_000002] is running beyond virtual memory limits. Current usage: 327.6 MB of 4 GB physical memory used; 33.2 GB of 8.4 GB virtual memory used. Killing container. Dump of the process-tree for container_1397556756420_5519_01_000002 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 8467 8466 8340 8340 (java) 21 2 34039132160 15889 java -classpath my-jar.jar my.package.Main |- 8466 8340 8340 8340 (run.sh) 0 0 108654592 304 /bin/bash ./run.sh |- 8340 31401 8340 8340 (java) 396 21 1476808704 67682 /usr/lib/jvm/default-java/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx825955249 -Djava.io.tmpdir=/data24/yarn/nm/usercache/thomas.larsson/appcache/application_1397556756420_5519/container_1397556756420_5519_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/var/log/hadoop-yarn/container/application_1397556756420_5519/container_1397556756420_5519_01_000002 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.4.4.108 52668 attempt_1397556756420_5519_m_000000_0 2 Container killed on request. Exit code is 143
My question is, why is this task trying to allocate 33.2 Gb of virtual memory?
Update: Sorry, I forgot to post the java class...
public class Main {
public static void main(String[] args) throws InterruptedException {
System.out.println("Running Main.");
Thread.sleep(1000*60);
System.out.println("Completed.");
}
}
the map mb size you're looking at is irrelevant since you're not starting a map/reduce
check all the memory related YARN settings, it might be that your minimal allocation for containers is too high
It's failing because you must've enabled vmem check. And it's crossing the virtual memory limit.
Current usage: 327.6 MB of 4 GB physical memory used; 33.2 GB of 8.4 GB