spring-xd yarn admin yarn-container fails - spring-xd

Version: spring-xd-1.0.1
Distributed mode: yarn
Hadoop version: cdh5
I have modified the config/servers.yml to point to right applicationDir, zookeeper, hdfs, resourcemanager,redis, mysqldb
However after the push, when I start admin, it is killed by yarn after sometime.
I do not understand why the container will consume 31G of memory.
Please point me in the right direction to debug this problem. Also, how do I increase the log level
Following error is observed in logs:
Got ContainerStatus=[container_id { app_attempt_id { application_id { id: 432 cluster_timestamp: 1415816376410 } attemptId: 1 } id: 2 } state: C_COMPLETE diagnostics: "Container [pid=19374,containerID=container_1415816376410_0432_01_000002] is running beyond physical memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 31.7 GB of 2.1 GB virtual memory used. Killing container.\nDump of the process-tree for container_1415816376410_0432_01_000002 :\n\t|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE\n\t|- 19381 19374 19374 19374 (java) 3903 121 33911242752 303743 /usr/java/jdk1.7.0_45-cloudera/bin/java -DxdHomeDir=./spring-xd-yarn-1.0.1.RELEASE.zip -Dxd.module.config.location=file:./modules-config.zip/ -Dspring.application.name=admin -Dspring.config.location=./servers.yml org.springframework.xd.dirt.server.AdminServerApplication \n\t|- 19374 24125 19374 19374 (bash) 0 0 110804992 331 /bin/bash -c /usr/java/jdk1.7.0_45-cloudera/bin/java -DxdHomeDir=./spring-xd-yarn-1.0.1.RELEASE.zip -Dxd.module.config.location=file:./modules-config.zip/ -Dspring.application.name=admin -Dspring.config.location=./servers.yml org.springframework.xd.dirt.server.AdminServerApplication 1>/var/log/hadoop-yarn/container/application_1415816376410_0432/container_1415816376410_0432_01_000002/Container.stdout 2>/var/log/hadoop-yarn/container/application_1415816376410_0432/container_1415816376410_0432_01_000002/Container.stderr \n\nContainer killed on request. Exit code is 143\nContainer exited with a non-zero exit code 143\n" exit_status: 143

Yes, with the current version 1.1.0/1.1.1 you don't need to run the admin explicitly. The containers and admin will be instantiated by yarn when you submit the application.

Related

The oozie job does not run with the message [AM container is launched, waiting for AM container to Register with RM]

I ran a shell job among the oozie examples.
However, YARN application is not executed.
Detail information YARN UI & LOG:
https://docs.google.com/document/d/1N8LBXZGttY3rhRTwv8cUEfK3WkWtvWJ-YV1q_fh_kks/edit
YARN application status is
Application Priority: 0 (Higher Integer value indicates higher priority)
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
Queue: default
FinalStatus Reported by AM: Application has not completed yet.
Finished: N/A
Elapsed: 20mins, 30sec
Tracking URL: ApplicationMaster
Log Aggregation Status: DISABLED
Application Timeout (Remaining Time): Unlimited
Diagnostics: AM container is launched, waiting for AM container to Register with RM
Application Attempt status is
Application Attempt State: FAILED
Elapsed: 13mins, 19sec
AM Container: container_1607273090037_0001_02_000001
Node: N/A
Tracking URL: History
Diagnostics Info: ApplicationMaster for attempt appattempt_1607273090037_0001_000002 timed out
Node Local Request Rack Local Request Off Switch Request
Num Node Local Containers (satisfied by) 0
Num Rack Local Containers (satisfied by) 0 0
Num Off Switch Containers (satisfied by) 0 0 1
nodemanager log
2020-12-07 01:45:16,237 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Starting container [container_1607273090037_0001_01_000001]
2020-12-07 01:45:16,267 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1607273090037_0001_01_000001 transitioned from SCHEDULED to RUNNING
2020-12-07 01:45:16,267 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1607273090037_0001_01_000001
2020-12-07 01:45:16,272 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /tmp/hadoop-oozie/nm-local-dir/usercache/oozie/appcache/application_1607273090037_0001/container_1607273090037_0001_01_000001/default_container_executor.sh]
2020-12-07 01:45:17,301 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: container_1607273090037_0001_01_000001's ip = 127.0.0.1, and hostname = localhost.localdomain
2020-12-07 01:45:17,345 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Skipping monitoring container container_1607273090037_0001_01_000001 since CPU usage is not yet available.
2020-12-07 01:45:48,274 INFO logs: Aliases are enabled
2020-12-07 01:54:50,242 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Cache Size Before Clean: 496756, Total Deleted: 0, Public Deleted: 0, Private Deleted: 0
2020-12-07 01:58:10,071 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1607273090037_0001_000001 (auth:SIMPLE)
2020-12-07 01:58:10,078 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1607273090037_0001_01_000001
What is the problem ?

Container is running beyond physical memory limits

I have a MapReduce Job that process 1.4 Tb of data.
While doing it, I am getting the error as below.
The number of splits is 6444.
Before starting the job I set the following settings:
conf.set("mapreduce.map.memory.mb", "8192");
conf.set("mapreduce.reduce.memory.mb", "8192");
conf.set("mapreduce.map.java.opts.max.heap", "8192");
conf.set("mapreduce.map.java.opts", "-Xmx8192m");
conf.set("mapreduce.reduce.java.opts", "-Xmx8192m");
conf.set("mapreduce.job.heap.memory-mb.ratio", "0.8");
conf.set("mapreduce.task.timeout", "21600000");
The error:
2018-05-18 00:50:36,595 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1524473936587_2969_m_004719_3: Container [pid=11510,containerID=container_1524473936587_2969_01_004894] is running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical memory used; 8.8 GB of 16.8 GB virtual memory used. Killing container.
Dump of the process-tree for container_1524473936587_2969_01_004894 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 11560 11510 11510 11510 (java) 14960 2833 9460879360 2133706 /usr/lib/jvm/java-7-oracle-cloudera/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/sdk/7/yarn/nm/usercache/administrator/appcache/application_1524473936587_2969/container_1524473936587_2969_01_004894/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.106.79.75 41869 attempt_1524473936587_2969_m_004719_3 4894
|- 11510 11508 11510 11510 (bash) 0 0 11497472 679 /bin/bash -c /usr/lib/jvm/java-7-oracle-cloudera/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/sdk/7/yarn/nm/usercache/administrator/appcache/application_1524473936587_2969/container_1524473936587_2969_01_004894/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.106.79.75 41869 attempt_1524473936587_2969_m_004719_3 4894 1>/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894/stdout 2>/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Any help would be really appreciated!
The setting mapreduce.map.memory.mb will set the physical memory size of the container running the mapper (mapreduce.reduce.memory.mb will do the same for the reducer container).
Besure that you adjust the heap value as well. In newer version of YARN/MRv2 the setting mapreduce.job.heap.memory-mb.ratio can be used to have it auto-adjust. The default is .8, so 80% of whatever the container size is will be allocated as the heap. Otherwise, adjust manually using mapreduce.map.java.opts.max.heap and mapreduce.reduce.java.opts.max.heap settings.
BTW, I believe that 1 GB is the default and it is quite low. I recommend reading the below link. It provides a good understanding of YARN and MR memory setting, how they relate, and how to set some baseline settings based on the cluster node size (disk, memory, and cores).
Reference: http://community.cloudera.com/t5/Cloudera-Manager-Installation/ERROR-is-running-beyond-physical-memory-limits/td-p/55173
Try to set yarn memory allocation limits:
SET yarn.scheduler.maximum-allocation-mb=16G;
SET yarn.scheduler.minimum-allocation-mb=8G;
You may lookup other Yarn settings here:
https://www.ibm.com/support/knowledgecenter/STXKQY_BDA_SHR/bl1bda_tuneyarn.htm
Try with : set yarn.app.mapreduce.am.resource.mb=1000;
Explanation is here :
In spark, spark.driver.memoryOverhead is considered in calculating the total memory required for the driver. By default it is 0.10 of the driver-memory or minimum 384MB. In your case it will be 8GB * 0.1 = 9011MB ~= 9G
YARN allocates memory only in increments/multiples of yarn.scheduler.minimum-allocation-mb .
When yarn.scheduler.minimum-allocation-mb=4G, it can only allocate container sizes of 4G,8G,12G etc. So if something like 9G is requested it will round up to the next multiple and will allocate 12G of container size for the driver.
When yarn.scheduler.minimum-allocation-mb=1G, then container sizes of 8G, 9G, 10G are possible. The nearest rounded up size of 9G will be used in this case.
https://community.cloudera.com/t5/Support-Questions/Yarn-Container-is-running-beyond-physical-memory-limits-but/m-p/199353#M161393

While running a hive script i am getting container memory limit crossed and failing query. Currently it iset default 1 GB

I am running on a 4 node cluster.
Please suggest what should I put the size of container memory. If I put it more than 1 GB then exactly what size.
What are the criteria for configuring container memory?
Error:
Diagnostic Messages for this Task:
Container [pid=46465,containerID=container_1503271937182_4757_01_000032] is
running beyond physical memory limits. Current usage: 1.0 GB of 1 GB
physical memory used; 2.7 GB of 2.1 GB virtual memory used. Killing
container.
Dump of the process-tree for container_1503271937182_4757_01_000032 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES)
FULL_CMD_LINE
|- 46465 46463 46465 46465 (bash) 0 0 108654592 308 /bin/bash -c /usr/java/jdk1.8.0_121/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/data3/yarn/nm/usercache/hdfs/appcache/application_1503271937182_4757/container_1503271937182_4757_01_000032/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.104.72.113 58079 attempt_1503271937182_4757_m_000015_3 32 1>/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032/stdout
2>/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032/stderr
|- 46483 46465 46465 46465 (java) 2929 1281 2828042240 262018 /usr/java/jdk1.8.0_121/bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/data3/yarn/nm/usercache/hdfs/appcache/application_1503271937182_4757/container_1503271937182_4757_01_000032/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.104.72.113 58079 attempt_1503271937182_4757_m_000015_3 32
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 18 Reduce: 72 Cumulative CPU: 1219.78 sec HDFS Read: 3412303867 HDFS Write: 3935714 SUCCESS
Stage-Stage-9: Map: 18 Reduce: 72 Cumulative CPU: 332.43 sec HDFS Read: 3321536722 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 25 minutes 52 seconds 210 msec
hive failed

WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1494943588964_0010_01_000001 is : 143

I have setup hadoop 2.7.3 on Ubuntu 16.04 in standalone mode.
I have installed Hive 2.1.1 and working on HQL.
Most of the queries triggers MR jobs.
when i run queries which triggers MR jobs, system automatically gets logged out by terminating all the processes.
When i check the log of Node Manager, i can see the statement which lead to the problem is,
WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1494943588964_0010_01_000001 is : 143
2017-05-16 19:48:08,263 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
2017-05-16 19:48:08,297 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1494943588964_0010_01_000002 is : 143
2017-05-16 19:48:08,304 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1494943588964_0010_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
And in common the log files contains the statement:
RECEIVED SIGNAL 15: SIGTERM
Please find the properties set in yarn-site.xml:
yarn.nodemanager.aux-services = mapreduce_shuffle
yarn.nodemanager.resource.memory-mb = 6144
yarn.scheduler.minimum-allocation-mb = 2048
yarn.scheduler.maximum-allocation-mb = 6144
yarn.app.mapreduce.am.resource.mb = 1024
yarn.app.mapreduce.am.command-opts = -Xmx819m
can anybody help on this..
RECEIVED SIGNAL 15: SIGTERM
could be due to a couple of reasons, a user might kill the app or might container reached it's capacity resources.

Yarn virtual memory usage

I am running an oozie workflow consisting of a shell action that runs this script.
java -classpath my.jar my.package.Main
The shell action configuration looks like this
<action name="run-test-script">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>run.sh</exec>
<file>${wf:conf("oozie.wf.application.path")}/run.sh</file>
</shell>
<ok to="end"/>
<error to="report_failure"/>
</action>
In Yarn (Cloudera Mgr GUI) I have the following set
mapreduce.map.memory.mb=4GiB
When I run this, it fails and in Yarn I get the following error message:
Container [pid=8340,containerID=container_1397556756420_5519_01_000002] is running beyond virtual memory limits. Current usage: 327.6 MB of 4 GB physical memory used; 33.2 GB of 8.4 GB virtual memory used. Killing container. Dump of the process-tree for container_1397556756420_5519_01_000002 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 8467 8466 8340 8340 (java) 21 2 34039132160 15889 java -classpath my-jar.jar my.package.Main |- 8466 8340 8340 8340 (run.sh) 0 0 108654592 304 /bin/bash ./run.sh |- 8340 31401 8340 8340 (java) 396 21 1476808704 67682 /usr/lib/jvm/default-java/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx825955249 -Djava.io.tmpdir=/data24/yarn/nm/usercache/thomas.larsson/appcache/application_1397556756420_5519/container_1397556756420_5519_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/var/log/hadoop-yarn/container/application_1397556756420_5519/container_1397556756420_5519_01_000002 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.4.4.108 52668 attempt_1397556756420_5519_m_000000_0 2 Container killed on request. Exit code is 143
My question is, why is this task trying to allocate 33.2 Gb of virtual memory?
Update: Sorry, I forgot to post the java class...
public class Main {
public static void main(String[] args) throws InterruptedException {
System.out.println("Running Main.");
Thread.sleep(1000*60);
System.out.println("Completed.");
}
}
the map mb size you're looking at is irrelevant since you're not starting a map/reduce
check all the memory related YARN settings, it might be that your minimal allocation for containers is too high
It's failing because you must've enabled vmem check. And it's crossing the virtual memory limit.
Current usage: 327.6 MB of 4 GB physical memory used; 33.2 GB of 8.4 GB

Resources