Why does Spark only use one executor on my 2 worker node cluster if I increase the executor memory past 5 GB? - hadoop

I am using a 3 node cluster: 1 master node and 2 worker nodes, using T2.large EC2 instances.
The "free -m" command gives me the following info:
Master:
total used free shared buffers cached
Mem: 7733 6324 1409 0 221 4555
-/+ buffers/cache: 1547 6186
Swap: 1023 0 1023
Worker Node 1:
total used free shared buffers cached
Mem: 7733 3203 4530 0 185 2166
-/+ buffers/cache: 851 6881
Swap: 1023 0 1023
Worker Node 2:
total used free shared buffers cached
Mem: 7733 3402 4331 0 185 2399
-/+ buffers/cache: 817 6915
Swap: 1023 0 1023
In the yarn-site.xml file, I have the following properties set:
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>7733</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>7733</value>
</property>
In $SPARK_HOME/conf/spark-defaults.conf I am setting the spark.executor.cores at 2 and spark.executor.instances at 2.
When looking at the spark-history UI after running my spark application, both executors (1 and 2) show up in the "Executors" tab along with the driver. In the cores column on that same page, it says 2 for both executors.
When I set the executor-memory at 5G and lower, my spark application runs fine with both worker node executors running. When I set the executor memory at 6G or more, only one worker node runs an executor. Why does this happen? Note: I have tried increasing the yarn.nodemanager.resource.memory-mb and it doesn't change this behavior.

Related

Container is running beyond physical memory limits

I have a MapReduce Job that process 1.4 Tb of data.
While doing it, I am getting the error as below.
The number of splits is 6444.
Before starting the job I set the following settings:
conf.set("mapreduce.map.memory.mb", "8192");
conf.set("mapreduce.reduce.memory.mb", "8192");
conf.set("mapreduce.map.java.opts.max.heap", "8192");
conf.set("mapreduce.map.java.opts", "-Xmx8192m");
conf.set("mapreduce.reduce.java.opts", "-Xmx8192m");
conf.set("mapreduce.job.heap.memory-mb.ratio", "0.8");
conf.set("mapreduce.task.timeout", "21600000");
The error:
2018-05-18 00:50:36,595 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1524473936587_2969_m_004719_3: Container [pid=11510,containerID=container_1524473936587_2969_01_004894] is running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical memory used; 8.8 GB of 16.8 GB virtual memory used. Killing container.
Dump of the process-tree for container_1524473936587_2969_01_004894 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 11560 11510 11510 11510 (java) 14960 2833 9460879360 2133706 /usr/lib/jvm/java-7-oracle-cloudera/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/sdk/7/yarn/nm/usercache/administrator/appcache/application_1524473936587_2969/container_1524473936587_2969_01_004894/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.106.79.75 41869 attempt_1524473936587_2969_m_004719_3 4894
|- 11510 11508 11510 11510 (bash) 0 0 11497472 679 /bin/bash -c /usr/lib/jvm/java-7-oracle-cloudera/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/sdk/7/yarn/nm/usercache/administrator/appcache/application_1524473936587_2969/container_1524473936587_2969_01_004894/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.106.79.75 41869 attempt_1524473936587_2969_m_004719_3 4894 1>/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894/stdout 2>/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Any help would be really appreciated!
The setting mapreduce.map.memory.mb will set the physical memory size of the container running the mapper (mapreduce.reduce.memory.mb will do the same for the reducer container).
Besure that you adjust the heap value as well. In newer version of YARN/MRv2 the setting mapreduce.job.heap.memory-mb.ratio can be used to have it auto-adjust. The default is .8, so 80% of whatever the container size is will be allocated as the heap. Otherwise, adjust manually using mapreduce.map.java.opts.max.heap and mapreduce.reduce.java.opts.max.heap settings.
BTW, I believe that 1 GB is the default and it is quite low. I recommend reading the below link. It provides a good understanding of YARN and MR memory setting, how they relate, and how to set some baseline settings based on the cluster node size (disk, memory, and cores).
Reference: http://community.cloudera.com/t5/Cloudera-Manager-Installation/ERROR-is-running-beyond-physical-memory-limits/td-p/55173
Try to set yarn memory allocation limits:
SET yarn.scheduler.maximum-allocation-mb=16G;
SET yarn.scheduler.minimum-allocation-mb=8G;
You may lookup other Yarn settings here:
https://www.ibm.com/support/knowledgecenter/STXKQY_BDA_SHR/bl1bda_tuneyarn.htm
Try with : set yarn.app.mapreduce.am.resource.mb=1000;
Explanation is here :
In spark, spark.driver.memoryOverhead is considered in calculating the total memory required for the driver. By default it is 0.10 of the driver-memory or minimum 384MB. In your case it will be 8GB * 0.1 = 9011MB ~= 9G
YARN allocates memory only in increments/multiples of yarn.scheduler.minimum-allocation-mb .
When yarn.scheduler.minimum-allocation-mb=4G, it can only allocate container sizes of 4G,8G,12G etc. So if something like 9G is requested it will round up to the next multiple and will allocate 12G of container size for the driver.
When yarn.scheduler.minimum-allocation-mb=1G, then container sizes of 8G, 9G, 10G are possible. The nearest rounded up size of 9G will be used in this case.
https://community.cloudera.com/t5/Support-Questions/Yarn-Container-is-running-beyond-physical-memory-limits-but/m-p/199353#M161393

Hadoop TeraSort not using all cluster nodes

Question
Regarding the TeraSort demo in hadoop, please suggest if the symptom is as expected or the workload should be distributed.
Background
Started Hadoop (3 nodes in a cluster) and run the TeraSort benchmark as below in Executions.
I expected all 3 nodes would get busy and all CPU would be utilized (400% in top). However only the node on which the job started got busy and the CPU was not fully utilized. For example if it is started on sydspark02, top shows as below.
I wonder this is as expected or if there is a configuration issue by which the workload is not distributed among the nodes.
sydspark02
top - 13:37:12 up 5 days, 2:58, 2 users, load average: 0.22, 0.06, 0.12
Tasks: 134 total, 1 running, 133 sleeping, 0 stopped, 0 zombie
%Cpu(s): 27.5 us, 2.7 sy, 0.0 ni, 69.8 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 8175980 total, 1781888 used, 6394092 free, 68 buffers
KiB Swap: 0 total, 0 used, 0 free. 532116 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2602 hadoop 20 0 2191288 601352 22988 S 120.7 7.4 0:15.52 java
1197 hadoop 20 0 105644 976 0 S 0.3 0.0 0:00.16 sshd
2359 hadoop 20 0 2756336 270332 23280 S 0.3 3.3 0:08.87 java
sydspark01
top - 13:38:32 up 2 days, 19:28, 2 users, load average: 0.15, 0.07, 0.11
Tasks: 141 total, 1 running, 140 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 1.2 sy, 0.0 ni, 96.6 id, 2.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 10240364 total, 10092352 used, 148012 free, 648 buffers
KiB Swap: 0 total, 0 used, 0 free. 8527904 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10635 hadoop 20 0 2766264 238540 23160 S 3.0 2.3 0:11.15 java
11353 hadoop 20 0 2770680 287504 22956 S 1.0 2.8 0:08.97 java
11057 hadoop 20 0 2888396 327260 23068 S 0.7 3.2 0:12.42 java
sydspark03
top - 13:44:21 up 5 days, 1:01, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 124 total, 1 running, 123 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.7 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 8175980 total, 4552876 used, 3623104 free, 1156 buffers
KiB Swap: 0 total, 0 used, 0 free. 3818884 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29374 hadoop 20 0 2729012 204180 22952 S 3.0 2.5 0:07.47 java
Executions
> sbin/start-dfs.sh
Starting namenodes on [sydspark01]
sydspark01: starting namenode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-namenode-sydspark01.out
sydspark03: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-sydspark03.out
sydspark02: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-sydspark02.out
sydspark01: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-sydspark01.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-sydspark01.out
> sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-sydspark01.out
sydspark01: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-sydspark01.out
sydspark03: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-sydspark03.out
sydspark02: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-sydspark02.out
> hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar teragen 100000000 /user/hadoop/terasort-input
> hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar terasort /user/hadoop/terasort-input /user/hadoop/terasort-output
Configuration files
slaves
sydspark01
sydspark02
sydspark03
core-sites.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://sydspark01:9000</value>
</property>
</configuration>
Environment
Ubuntu 14.04.4 LTS 4CPU on VMWare
Hadoop 2.7.3
Java 8
Monitoring with JMC
When running the JMC on the data node in the node where the job is executed.
CPU
Only about 25% of CPU resource (1 CPU out of 4) is used.
Memory
Yarn
$ yarn node -list
16/10/03 15:36:03 INFO client.RMProxy: Connecting to ResourceManager at sydspark01/143.96.102.161:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
sydspark03:57249 RUNNING sydspark03:8042 0
sydspark02:42220 RUNNING sydspark02:8042 0
sydspark01:50445 RUNNING sydspark01:8042 0
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>sydspark01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>sydspark01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>sydspark01:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>sydspark01:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>sydspark01:8033</value>
</property>
</configuration>

SonarQube WebServer process spikes CPU after a while

We're running SonarQube 5.1.2 on an AWS node. After a short period of use, typically a day or two, the Sonar web server becomes unresponsive and spikes the server's CPUs:
top - 01:59:47 up 2 days, 3:43, 1 user, load average: 1.89, 1.76, 1.11
Tasks: 93 total, 1 running, 92 sleeping, 0 stopped, 0 zombie
Cpu(s): 94.5%us, 0.0%sy, 0.0%ni, 5.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 7514056k total, 2828772k used, 4685284k free, 155372k buffers
Swap: 0k total, 0k used, 0k free, 872440k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2328 root 20 0 3260m 1.1g 19m S 188.3 15.5 62:51.79 java
11 root 20 0 0 0 0 S 0.3 0.0 0:07.90 events/0
2284 root 20 0 3426m 407m 19m S 0.3 5.5 9:51.04 java
1 root 20 0 19356 1536 1224 S 0.0 0.0 0:00.23 init
The 188% CPU load is coming from the WebServer process:
$ ps -eF|grep "root *2328"
root 2328 2262 2 834562 1162384 0 Mar01 ? 01:06:24 /usr/java/jre1.8.0_25/bin/java -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djruby.management.enabled=false -Djruby.compile.invokedynamic=false -Xmx768m -XX:MaxPermSize=160m -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/opt/sonar/temp -cp ./lib/common/*:./lib/server/*:/opt/sonar/lib/jdbc/mysql/mysql-connector-java-5.1.34.jar org.sonar.server.app.WebServer /tmp/sq-process615754070383971531properties
We initially thought that we were running on way too small of a node and recently upgraded to an m3-large instance, but we're seeing the same problem (except now it's spiking 2 CPUs instead of one).
The only interesting info in the log is this:
2016.03.04 01:52:38 WARN web[o.e.transport] [sonar-1456875684135] Received response for a request that has timed out, sent [39974ms] ago, timed out [25635ms] ago, action [cluster:monitor/nodes/info], node [[#transport#-1][xxxxxxxx-build02-us-west-2b][inet[/127.0.0.1:9001]]], id [43817]
2016.03.04 01:53:19 INFO web[o.e.client.transport] [sonar-1456875684135] failed to get node info for [#transport#-1][xxxxxxxx-build02-us-west-2b][inet[/127.0.0.1:9001]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[/127.0.0.1:9001]][cluster:monitor/nodes/info] request_id [43817] timed out after [14339ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366) ~[elasticsearch-1.4.4.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.8.0_25]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.8.0_25]
at java.lang.Thread.run(Unknown Source) [na:1.8.0_25]
Does anyone know what might be going on here or has some ideas how to further diagnose this problem?

Yarn container lauch failed exception and mapred-site.xml configuration

I have 7 nodes in my Hadoop cluster [8GB RAM and 4VCPUs to each nodes], 1 Namenode + 6 datanodes.
EDIT-1#ARNON: I followed the link, mad calculation according to the hardware configruation on my nodes and have added the update mapred-site and yarn-site.xml files in my question. Still my application is crashing with the same exection
My mapreduce application has 34 input splits with a block size of 128MB.
mapred-site.xml has the following properties:
mapreduce.framework.name = yarn
mapred.child.java.opts = -Xmx2048m
mapreduce.map.memory.mb = 4096
mapreduce.map.java.opts = -Xmx2048m
yarn-site.xml has the following properties:
yarn.resourcemanager.hostname = hadoop-master
yarn.nodemanager.aux-services = mapreduce_shuffle
yarn.nodemanager.resource.memory-mb = 6144
yarn.scheduler.minimum-allocation-mb = 2048
yarn.scheduler.maximum-allocation-mb = 6144
EDIT-2#ARNON: Setting yarn.scheduler.minimum-allocation-mb to 4096 puts all the map task in suspended state and assigning it as 3072 crashes with the follwoing
Exception from container-launch: ExitCodeException exitCode=134: /bin/bash: line 1: 3876 Aborted (core dumped) /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1424264025191_0002/container_1424264025191_0002_01_000011/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_000011
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 192.168.0.12 50842 attempt_1424264025191_0002_m_000005_0 11 >
/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_000011/stdout 2>
/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_000011/stderr
How can avoid this?any help is appreciated
Is there an option to restrict number of containers on hadoop ndoes?
It seems you are allocating too much memory your tasks (even without looking at all the configurations) 8GB RAM and 8GB per map task and all of which is heap
Try to use lower allocations 2Gb with 1GB heap or something like that

hbase hdfs goes to safemode on process restart ( probably under replication reported? )

UPDATE
You need to give hdfs-site.xml to hbase/conf so hbase can use the correct target replica, else it uses default 3.
That fixes the message. But my namenode is always in safemode during every process restart.
The fsck is all fine with no errors, no under replicated etc.
I see no logs after:
2012-10-17 13:15:13,278 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON.
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe mode
will be turned off automatically.
2012-10-17 13:15:14,228 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node:
/default-rack/127.0.0.1:50010
2012-10-17 13:15:14,238 INFO org.apache.hadoop.hdfs.StateChange: BLOCK
NameSystem.processReport: from 127.0.0.1:50010, blocks: 20, processing time: 0 msecs
Any suggestions ?
I have dfs.replication set to 1.
hbase is in distributed mode.
First write goes through, but when I restart namenode always reports blocks as under reported.
Output from hadoop fsck /hbase
/hbase/tb1/.tableinfo.0000000003: Under replicated blk_-6315989673141511716_1029. Target Replicas is 3 but found 1 replica(s).
.
/hbase/tb1/83859abf0f46016485814a5941b16de5/.oldlogs/hlog.1350414672838: Under replicated blk_-7364606700173135939_1027. Target Replicas is 3 but found 1 replica(s).
.
/hbase/tb1/83859abf0f46016485814a5941b16de5/.regioninfo: Under replicated blk_788178851601564156_1027. Target Replicas is 3 but found 1 replica(s).
Status: HEALTHY
Total size: 8731 B
Total dirs: 34
Total files: 25 (Files currently being written: 1)
Total blocks (validated): 25 (avg. block size 349 B) (Total open file blocks (not validated): 1)
Minimally replicated blocks: 25 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 25 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 50 (200.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Tue Oct 16 13:23:55 PDT 2012 in 0 milliseconds
Why does it say target replica is 3 but default replication factor is clearly 1.
Anyone please advice.
My versions are hadoop 1.0.3 and hbase 0.94.1
Thanks!
To force the Hdfs to exit from safemode.
Type this:
hadoop dfsadmin -safemode leave

Resources