Heap Size vs HADOOP_NAMENODE_OPTS at namenode - hadoop

I am using hadoop apache 2.7.1 in HA Cluster.
I needed to update heap memory for both name nodes, so I updated
the property HADOOP_NAMENODE_OPTS in hadoop-env.sh to be 8 gb
export HADOOP_NAMENODE_OPTS="-Xmx8192m $HADOOP_NAMENODE_OPTS"
so the heap size in my name nodes is now 8 GB
but I realized the parameter HADOOP_HEAPSIZE in hadoop-env.sh
and I didn't give it any value
is setting HADOOP_NAMENODE_OPTS to 8 GB enough or should we set HADOOP_HEAPSIZE to 8 GB too?
I mean does the value HADOOP_NAMENODE_OPTS override the value HADOOP_HEAPSIZE
or should be both configured and each one has its specific job?

does the value HADOOP_NAMENODE_OPTS overrides the value HADOOP_HEAPSIZE
Yes, it does. https://www.cloudera.com/documentation/enterprise/latest/topics/admin_nn_memory_config.html

Related

Setting JVM options when configuring elastic search

I'm configuring jvm options for an Elasticsearch cluster, and I wonder which jvm heap
would be best for my usecase.
The machine has 16GB memory and will be dedicated to a single node of elasticsearch.
The default value is 1GB, and I'm not familar with Java/JVM but I feel like this is too small.
Any help would be appreciated.
If you use Windows, you can type Windows + R, then systempropertiesadvanced , then set, for example:
ES_JAVA_OPTS
-Xms2g -Xmx2g
(You can increase value as you want, 2 is a number, g is gigabyte, m is megabyte)
Reference document: https://www.elastic.co/guide/en/elasticsearch/reference/master/advanced-configuration.html#set-jvm-options
https://www.javadevjournal.com/java/jvm-parameters/

How do I increase Tez's container physical memory?

I've been running some hive scripts on an aws emr 4.8 cluster with hive 1.0 and tez 0.8.
My configurations look like this:
SET hive.exec.compress.output=true;
SET mapred.output.compression.type=BLOCK;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
set hive.execution.engine=tez;
set hive.merge.mapfiles=false;
SET hive.default.fileformat=Orc;
set tez.task.resource.memory.mb=5000;
SET hive.tez.container.size=6656;
SET hive.tez.java.opts=-Xmx5120m;
set hive.optimize.ppd=true;
And my global configs are:
hadoop-env.export HADOOP_HEAPSIZE 4750
hadoop-env.export HADOOP_DATANODE_HEAPSIZE 4750
hive-env.export HADOOP_HEAPSIZE 4750
While running my script, I get the following error:
Container [pid=19027,containerID=container_1477393351192_0007_02_000001] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.9 GB of 5 GB virtual memory used. Killing container.
On googling this error, I read that set tez.task.resource.memory.mb will change the physical memory limit, but clearly I was mistaken. What am I missing?
I have had this problem a lot. The changing
Set hive.tez.container.size=6656;
Set hive.tez.java.opts=-Xmx4g;
does not fix the problem for me but this does:
set tez.am.resource.memory.mb=4096;
Set the Tez container size to be a larger multiple of the YARN container size (4GB):
SET hive.tez.container.size=4096MB
"hive.tez.container.size" and "hive.tez.java.opts" are the parameters that alter Tez memory settings in Hive. If "hive.tez.container.size" is set to "-1" (default value), it picks the value of "mapreduce.map.memory.mb". If "hive.tez.java.opts" is not specified, it relies on the "mapreduce.map.java.opts" setting. Thus, if Tez specific memory settings are left as default values, memory sizes are picked from mapreduce mapper memory settings "mapreduce.map.memory.mb".
https://documentation.altiscale.com/memory-settings-for-tez
For more info Tez configuration and Tez memory tuning
Note: Set in MB with Ambari
Incase anyone else stumbles upon this thread trying to solve this above, here is a link to a real solution that worked for me where all the other solutions did not.
http://moi.vonos.net/bigdata/hive-cli-memory/
TL;DR add these to your hive call --hiveconf tez.am.resource.memory.mb=<size as int>
--hiveconf tez.am.launch.cmd-opts=""
Set hive.tez.container.size=6656
Set hive.tez.java.opts=-Xmx4g

correct way to increase hdfs java heap memory

I'm getting the following errors in my hadoop namenode log:
2015-12-20 06:15:40,717 WARN [IPC Server handler 21 on 9000] ipc.Server
(Server.java:run(2029)) - IPC Server handler 21 on 9000, call
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport
from 172.31.21.110:46999 Call#163559 Retry#0:
error: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
2015-12-20 06:15:42,710 WARN [IPC Server handler 22 on 9000] ipc.Server
(Server.java:run(2029)) - IPC Server handler 22 on 9000, call
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from
172.31.24.250:45624 Call#164898 Retry#0:
error: java.lang.OutOfMemoryError: Java heap space
which results in all the nodes being listed as dead.
I have checked other stackoverflow questions and the most useful suggestion seems to be that I need to set the mapred.child.java.opts option in conf/mapred-site.xml to something higher than 2048MB,
but I'm concerned that might not be enough.
I'm launching my cluster using spark with the --hadoop-major-version=yarn option, so all MapReduce jobs are run through Yarn if I understand correctly, including jobs created by HDFS.
My question is: what other settings, if any, do I need to modify (and how do I determine their amounts, given that I want to use say 4GB for the mapreduce.child.java.opts setting) to increase the memory available to HDFS's MapReduce jobs?
Hadoop daemons control their JVM arguments, including heap size settings, through the use of environment variables that have names suffixed with _OPTS. These environment variables are defined in various *-env.sh files in the configuration directory.
Using the NameNode as an example, you can set a line like this in your hadoop-env.sh file.
export HADOOP_NAMENODE_OPTS="-Xms4G -Xmx4G $HADOOP_NAMENODE_OPTS"
This sets a minimum/maximum heap size of 4 GB for the NameNode and also preserves any other arguments that were placed into HADOOP_NAMENODE_OPTS earlier in the script.

elasticsearch index getting reset

I have a single node elasticsearch instance ( 0.90 version) running on a single machine ( 8GB RAM, dual core CPU) having RHEL 5.6
After having indexed close to 2 million documents, it runs fine for a few hours and then restarts on its own, wiping out the index in the process. I now need to reindex all the documents again.
Any ideas on why this happens? Maximum file descriptors is set to 32k and the number of open file descriptors at any time does not even come close. So it cant be that.
Here are the modifications i made to the default elasticsearch.yml file :
index.number_of_shards: 5
index.cache.field.type: soft
index.fielddata.cache: soft
index.cache.field.expire: 5m
indices.fielddata.cache.size: 10%
indices.fielddata.cache.expire : 5m
index.store.type: mmapfs
bootstrap.mlockall: true
discovery.zen.ping.multicast.enabled: false
action.disable_delete_all_indices: true
script.disable_dynamic: true
I use the elasticsearch service wrapper to start and stop the instance. In the elasticsearch.conf file, i have set the heap size to 2GB :
set.default.ES_HEAP_SIZE=2048
Any help in diagnosing the problem will be appreciated.
Thanks guys!

JMeter issues when running large number of threads

I'm testing using Apache's Jmeter, I'm simply accessing one page of my companies website and turning up the number of users until it reaches a threshold, the problem is that when I get to around 3000 threads JMeter doesn't run all of them. Looking at the Aggregate Graph
it only runs about 2,536 (this number varies but is always around here) of them.
The partial run comes with the following exception in the logs:
01:16 ERROR - jmeter.JMeter: Uncaught exception:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at org.apache.jmeter.threads.ThreadGroup.start(ThreadGroup.java:293)
at org.apache.jmeter.engine.StandardJMeterEngine.startThreadGroup(StandardJMeterEngine.java:476)
at org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:395)
at java.lang.Thread.run(Unknown Source)
This behavior is consistent. In addition one of the times JMeter crashed in the middle outputting a file that said:
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 32756 bytes for ChunkPool::allocate
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (allocation.cpp:211), pid=10748, tid=11652
#
# JRE version: 6.0_31-b05
# Java VM: Java HotSpot(TM) Client VM (20.6-b01 mixed mode, sharing windows-x86 )
Any ideas?
I tried changing the heap size in jmeter.bat, but that didn't seem to help at all.
JVM is simply not capable of running so many threads. And even if it is, JMeter will consume a lot of CPU resources to purely switch contexts. In other words, above some point you are not benchmarking your web application but the client computer, hosting JMeter.
You have few choices:
experiment with JVM options, e.g. decrease default -Xss512K to something smaller
run JMeter in a cluster
use tools taking radically different approach like Gatling
I had a similar issue and increased the heap size in jmeter.bat to 1024M and that fixed the issue.
set HEAP=-Xms1024m -Xmx1024m
For the JVM, if you read hprof it gives you some solutions among which are:
switch to a 64 bits jvm ( > 6_u25)
with this you will be able to allocate more Heap (-Xmx) , ensure you have this RAM
reduce Xss with:
-Xss256k
Then for JMeter, follow best-practices:
http://jmeter.apache.org/usermanual/best-practices.html
http://www.ubik-ingenierie.com/blog/jmeter_performance_tuning_tips/
Finally ensure you use last JMeter version.
Use linux OS preferably
Tune the TCP stack, limits
Success will depend on your machine power (cpu and memory) and your test plan.
If this is not enough (for 3000 threads it should be OK), you may need to use distributed testing
Increasing the heap size in jmeter.bat works fine
set HEAP=-Xms1024m -Xmx1024m
OR
you can do something like below if you are using jmeter.sh:
JVM_ARGS="-Xms512m -Xmx1024m" jmeter.sh etc.
I ran into this same problem and the only solution that helped me is: https://stackoverflow.com/a/26190804/5796780
proper 100k threads on linux:
ulimit -s 256
ulimit -i 120000
echo 120000 > /proc/sys/kernel/threads-max
echo 600000 > /proc/sys/vm/max_map_count
echo 200000 > /proc/sys/kernel/pid_max
If you don't have root access:
echo 200000 | sudo dd of=/proc/sys/kernel/pid_max
After increasing Xms et Xmx heap size, I had to make my Java run in 64 bits mode. In jmeter.bat :
set JM_LAUNCH=java.exe -d64
Obviously, you need to run a 64 bits OS and have installed Java 64 bits (see https://www.java.com/en/download/manual.jsp)

Resources