SonarQube WebServer process spikes CPU after a while - sonarqube

We're running SonarQube 5.1.2 on an AWS node. After a short period of use, typically a day or two, the Sonar web server becomes unresponsive and spikes the server's CPUs:
top - 01:59:47 up 2 days, 3:43, 1 user, load average: 1.89, 1.76, 1.11
Tasks: 93 total, 1 running, 92 sleeping, 0 stopped, 0 zombie
Cpu(s): 94.5%us, 0.0%sy, 0.0%ni, 5.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 7514056k total, 2828772k used, 4685284k free, 155372k buffers
Swap: 0k total, 0k used, 0k free, 872440k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2328 root 20 0 3260m 1.1g 19m S 188.3 15.5 62:51.79 java
11 root 20 0 0 0 0 S 0.3 0.0 0:07.90 events/0
2284 root 20 0 3426m 407m 19m S 0.3 5.5 9:51.04 java
1 root 20 0 19356 1536 1224 S 0.0 0.0 0:00.23 init
The 188% CPU load is coming from the WebServer process:
$ ps -eF|grep "root *2328"
root 2328 2262 2 834562 1162384 0 Mar01 ? 01:06:24 /usr/java/jre1.8.0_25/bin/java -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djruby.management.enabled=false -Djruby.compile.invokedynamic=false -Xmx768m -XX:MaxPermSize=160m -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/opt/sonar/temp -cp ./lib/common/*:./lib/server/*:/opt/sonar/lib/jdbc/mysql/mysql-connector-java-5.1.34.jar org.sonar.server.app.WebServer /tmp/sq-process615754070383971531properties
We initially thought that we were running on way too small of a node and recently upgraded to an m3-large instance, but we're seeing the same problem (except now it's spiking 2 CPUs instead of one).
The only interesting info in the log is this:
2016.03.04 01:52:38 WARN web[o.e.transport] [sonar-1456875684135] Received response for a request that has timed out, sent [39974ms] ago, timed out [25635ms] ago, action [cluster:monitor/nodes/info], node [[#transport#-1][xxxxxxxx-build02-us-west-2b][inet[/127.0.0.1:9001]]], id [43817]
2016.03.04 01:53:19 INFO web[o.e.client.transport] [sonar-1456875684135] failed to get node info for [#transport#-1][xxxxxxxx-build02-us-west-2b][inet[/127.0.0.1:9001]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[/127.0.0.1:9001]][cluster:monitor/nodes/info] request_id [43817] timed out after [14339ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366) ~[elasticsearch-1.4.4.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.8.0_25]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.8.0_25]
at java.lang.Thread.run(Unknown Source) [na:1.8.0_25]
Does anyone know what might be going on here or has some ideas how to further diagnose this problem?

Related

Container is running beyond physical memory limits

I have a MapReduce Job that process 1.4 Tb of data.
While doing it, I am getting the error as below.
The number of splits is 6444.
Before starting the job I set the following settings:
conf.set("mapreduce.map.memory.mb", "8192");
conf.set("mapreduce.reduce.memory.mb", "8192");
conf.set("mapreduce.map.java.opts.max.heap", "8192");
conf.set("mapreduce.map.java.opts", "-Xmx8192m");
conf.set("mapreduce.reduce.java.opts", "-Xmx8192m");
conf.set("mapreduce.job.heap.memory-mb.ratio", "0.8");
conf.set("mapreduce.task.timeout", "21600000");
The error:
2018-05-18 00:50:36,595 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1524473936587_2969_m_004719_3: Container [pid=11510,containerID=container_1524473936587_2969_01_004894] is running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical memory used; 8.8 GB of 16.8 GB virtual memory used. Killing container.
Dump of the process-tree for container_1524473936587_2969_01_004894 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 11560 11510 11510 11510 (java) 14960 2833 9460879360 2133706 /usr/lib/jvm/java-7-oracle-cloudera/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/sdk/7/yarn/nm/usercache/administrator/appcache/application_1524473936587_2969/container_1524473936587_2969_01_004894/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.106.79.75 41869 attempt_1524473936587_2969_m_004719_3 4894
|- 11510 11508 11510 11510 (bash) 0 0 11497472 679 /bin/bash -c /usr/lib/jvm/java-7-oracle-cloudera/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/sdk/7/yarn/nm/usercache/administrator/appcache/application_1524473936587_2969/container_1524473936587_2969_01_004894/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.106.79.75 41869 attempt_1524473936587_2969_m_004719_3 4894 1>/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894/stdout 2>/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Any help would be really appreciated!
The setting mapreduce.map.memory.mb will set the physical memory size of the container running the mapper (mapreduce.reduce.memory.mb will do the same for the reducer container).
Besure that you adjust the heap value as well. In newer version of YARN/MRv2 the setting mapreduce.job.heap.memory-mb.ratio can be used to have it auto-adjust. The default is .8, so 80% of whatever the container size is will be allocated as the heap. Otherwise, adjust manually using mapreduce.map.java.opts.max.heap and mapreduce.reduce.java.opts.max.heap settings.
BTW, I believe that 1 GB is the default and it is quite low. I recommend reading the below link. It provides a good understanding of YARN and MR memory setting, how they relate, and how to set some baseline settings based on the cluster node size (disk, memory, and cores).
Reference: http://community.cloudera.com/t5/Cloudera-Manager-Installation/ERROR-is-running-beyond-physical-memory-limits/td-p/55173
Try to set yarn memory allocation limits:
SET yarn.scheduler.maximum-allocation-mb=16G;
SET yarn.scheduler.minimum-allocation-mb=8G;
You may lookup other Yarn settings here:
https://www.ibm.com/support/knowledgecenter/STXKQY_BDA_SHR/bl1bda_tuneyarn.htm
Try with : set yarn.app.mapreduce.am.resource.mb=1000;
Explanation is here :
In spark, spark.driver.memoryOverhead is considered in calculating the total memory required for the driver. By default it is 0.10 of the driver-memory or minimum 384MB. In your case it will be 8GB * 0.1 = 9011MB ~= 9G
YARN allocates memory only in increments/multiples of yarn.scheduler.minimum-allocation-mb .
When yarn.scheduler.minimum-allocation-mb=4G, it can only allocate container sizes of 4G,8G,12G etc. So if something like 9G is requested it will round up to the next multiple and will allocate 12G of container size for the driver.
When yarn.scheduler.minimum-allocation-mb=1G, then container sizes of 8G, 9G, 10G are possible. The nearest rounded up size of 9G will be used in this case.
https://community.cloudera.com/t5/Support-Questions/Yarn-Container-is-running-beyond-physical-memory-limits-but/m-p/199353#M161393

Why does Spark only use one executor on my 2 worker node cluster if I increase the executor memory past 5 GB?

I am using a 3 node cluster: 1 master node and 2 worker nodes, using T2.large EC2 instances.
The "free -m" command gives me the following info:
Master:
total used free shared buffers cached
Mem: 7733 6324 1409 0 221 4555
-/+ buffers/cache: 1547 6186
Swap: 1023 0 1023
Worker Node 1:
total used free shared buffers cached
Mem: 7733 3203 4530 0 185 2166
-/+ buffers/cache: 851 6881
Swap: 1023 0 1023
Worker Node 2:
total used free shared buffers cached
Mem: 7733 3402 4331 0 185 2399
-/+ buffers/cache: 817 6915
Swap: 1023 0 1023
In the yarn-site.xml file, I have the following properties set:
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>7733</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>7733</value>
</property>
In $SPARK_HOME/conf/spark-defaults.conf I am setting the spark.executor.cores at 2 and spark.executor.instances at 2.
When looking at the spark-history UI after running my spark application, both executors (1 and 2) show up in the "Executors" tab along with the driver. In the cores column on that same page, it says 2 for both executors.
When I set the executor-memory at 5G and lower, my spark application runs fine with both worker node executors running. When I set the executor memory at 6G or more, only one worker node runs an executor. Why does this happen? Note: I have tried increasing the yarn.nodemanager.resource.memory-mb and it doesn't change this behavior.

While running a hive script i am getting container memory limit crossed and failing query. Currently it iset default 1 GB

I am running on a 4 node cluster.
Please suggest what should I put the size of container memory. If I put it more than 1 GB then exactly what size.
What are the criteria for configuring container memory?
Error:
Diagnostic Messages for this Task:
Container [pid=46465,containerID=container_1503271937182_4757_01_000032] is
running beyond physical memory limits. Current usage: 1.0 GB of 1 GB
physical memory used; 2.7 GB of 2.1 GB virtual memory used. Killing
container.
Dump of the process-tree for container_1503271937182_4757_01_000032 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES)
FULL_CMD_LINE
|- 46465 46463 46465 46465 (bash) 0 0 108654592 308 /bin/bash -c /usr/java/jdk1.8.0_121/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/data3/yarn/nm/usercache/hdfs/appcache/application_1503271937182_4757/container_1503271937182_4757_01_000032/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.104.72.113 58079 attempt_1503271937182_4757_m_000015_3 32 1>/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032/stdout
2>/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032/stderr
|- 46483 46465 46465 46465 (java) 2929 1281 2828042240 262018 /usr/java/jdk1.8.0_121/bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/data3/yarn/nm/usercache/hdfs/appcache/application_1503271937182_4757/container_1503271937182_4757_01_000032/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.104.72.113 58079 attempt_1503271937182_4757_m_000015_3 32
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 18 Reduce: 72 Cumulative CPU: 1219.78 sec HDFS Read: 3412303867 HDFS Write: 3935714 SUCCESS
Stage-Stage-9: Map: 18 Reduce: 72 Cumulative CPU: 332.43 sec HDFS Read: 3321536722 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 25 minutes 52 seconds 210 msec
hive failed

Hadoop TeraSort not using all cluster nodes

Question
Regarding the TeraSort demo in hadoop, please suggest if the symptom is as expected or the workload should be distributed.
Background
Started Hadoop (3 nodes in a cluster) and run the TeraSort benchmark as below in Executions.
I expected all 3 nodes would get busy and all CPU would be utilized (400% in top). However only the node on which the job started got busy and the CPU was not fully utilized. For example if it is started on sydspark02, top shows as below.
I wonder this is as expected or if there is a configuration issue by which the workload is not distributed among the nodes.
sydspark02
top - 13:37:12 up 5 days, 2:58, 2 users, load average: 0.22, 0.06, 0.12
Tasks: 134 total, 1 running, 133 sleeping, 0 stopped, 0 zombie
%Cpu(s): 27.5 us, 2.7 sy, 0.0 ni, 69.8 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 8175980 total, 1781888 used, 6394092 free, 68 buffers
KiB Swap: 0 total, 0 used, 0 free. 532116 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2602 hadoop 20 0 2191288 601352 22988 S 120.7 7.4 0:15.52 java
1197 hadoop 20 0 105644 976 0 S 0.3 0.0 0:00.16 sshd
2359 hadoop 20 0 2756336 270332 23280 S 0.3 3.3 0:08.87 java
sydspark01
top - 13:38:32 up 2 days, 19:28, 2 users, load average: 0.15, 0.07, 0.11
Tasks: 141 total, 1 running, 140 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 1.2 sy, 0.0 ni, 96.6 id, 2.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 10240364 total, 10092352 used, 148012 free, 648 buffers
KiB Swap: 0 total, 0 used, 0 free. 8527904 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10635 hadoop 20 0 2766264 238540 23160 S 3.0 2.3 0:11.15 java
11353 hadoop 20 0 2770680 287504 22956 S 1.0 2.8 0:08.97 java
11057 hadoop 20 0 2888396 327260 23068 S 0.7 3.2 0:12.42 java
sydspark03
top - 13:44:21 up 5 days, 1:01, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 124 total, 1 running, 123 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.7 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 8175980 total, 4552876 used, 3623104 free, 1156 buffers
KiB Swap: 0 total, 0 used, 0 free. 3818884 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29374 hadoop 20 0 2729012 204180 22952 S 3.0 2.5 0:07.47 java
Executions
> sbin/start-dfs.sh
Starting namenodes on [sydspark01]
sydspark01: starting namenode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-namenode-sydspark01.out
sydspark03: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-sydspark03.out
sydspark02: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-sydspark02.out
sydspark01: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-sydspark01.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-sydspark01.out
> sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-sydspark01.out
sydspark01: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-sydspark01.out
sydspark03: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-sydspark03.out
sydspark02: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-sydspark02.out
> hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar teragen 100000000 /user/hadoop/terasort-input
> hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar terasort /user/hadoop/terasort-input /user/hadoop/terasort-output
Configuration files
slaves
sydspark01
sydspark02
sydspark03
core-sites.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://sydspark01:9000</value>
</property>
</configuration>
Environment
Ubuntu 14.04.4 LTS 4CPU on VMWare
Hadoop 2.7.3
Java 8
Monitoring with JMC
When running the JMC on the data node in the node where the job is executed.
CPU
Only about 25% of CPU resource (1 CPU out of 4) is used.
Memory
Yarn
$ yarn node -list
16/10/03 15:36:03 INFO client.RMProxy: Connecting to ResourceManager at sydspark01/143.96.102.161:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
sydspark03:57249 RUNNING sydspark03:8042 0
sydspark02:42220 RUNNING sydspark02:8042 0
sydspark01:50445 RUNNING sydspark01:8042 0
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>sydspark01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>sydspark01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>sydspark01:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>sydspark01:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>sydspark01:8033</value>
</property>
</configuration>

spring-xd yarn admin yarn-container fails

Version: spring-xd-1.0.1
Distributed mode: yarn
Hadoop version: cdh5
I have modified the config/servers.yml to point to right applicationDir, zookeeper, hdfs, resourcemanager,redis, mysqldb
However after the push, when I start admin, it is killed by yarn after sometime.
I do not understand why the container will consume 31G of memory.
Please point me in the right direction to debug this problem. Also, how do I increase the log level
Following error is observed in logs:
Got ContainerStatus=[container_id { app_attempt_id { application_id { id: 432 cluster_timestamp: 1415816376410 } attemptId: 1 } id: 2 } state: C_COMPLETE diagnostics: "Container [pid=19374,containerID=container_1415816376410_0432_01_000002] is running beyond physical memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 31.7 GB of 2.1 GB virtual memory used. Killing container.\nDump of the process-tree for container_1415816376410_0432_01_000002 :\n\t|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE\n\t|- 19381 19374 19374 19374 (java) 3903 121 33911242752 303743 /usr/java/jdk1.7.0_45-cloudera/bin/java -DxdHomeDir=./spring-xd-yarn-1.0.1.RELEASE.zip -Dxd.module.config.location=file:./modules-config.zip/ -Dspring.application.name=admin -Dspring.config.location=./servers.yml org.springframework.xd.dirt.server.AdminServerApplication \n\t|- 19374 24125 19374 19374 (bash) 0 0 110804992 331 /bin/bash -c /usr/java/jdk1.7.0_45-cloudera/bin/java -DxdHomeDir=./spring-xd-yarn-1.0.1.RELEASE.zip -Dxd.module.config.location=file:./modules-config.zip/ -Dspring.application.name=admin -Dspring.config.location=./servers.yml org.springframework.xd.dirt.server.AdminServerApplication 1>/var/log/hadoop-yarn/container/application_1415816376410_0432/container_1415816376410_0432_01_000002/Container.stdout 2>/var/log/hadoop-yarn/container/application_1415816376410_0432/container_1415816376410_0432_01_000002/Container.stderr \n\nContainer killed on request. Exit code is 143\nContainer exited with a non-zero exit code 143\n" exit_status: 143
Yes, with the current version 1.1.0/1.1.1 you don't need to run the admin explicitly. The containers and admin will be instantiated by yarn when you submit the application.

Resources