correct way to increase hdfs java heap memory - hadoop

I'm getting the following errors in my hadoop namenode log:
2015-12-20 06:15:40,717 WARN [IPC Server handler 21 on 9000] ipc.Server
(Server.java:run(2029)) - IPC Server handler 21 on 9000, call
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport
from 172.31.21.110:46999 Call#163559 Retry#0:
error: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
2015-12-20 06:15:42,710 WARN [IPC Server handler 22 on 9000] ipc.Server
(Server.java:run(2029)) - IPC Server handler 22 on 9000, call
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from
172.31.24.250:45624 Call#164898 Retry#0:
error: java.lang.OutOfMemoryError: Java heap space
which results in all the nodes being listed as dead.
I have checked other stackoverflow questions and the most useful suggestion seems to be that I need to set the mapred.child.java.opts option in conf/mapred-site.xml to something higher than 2048MB,
but I'm concerned that might not be enough.
I'm launching my cluster using spark with the --hadoop-major-version=yarn option, so all MapReduce jobs are run through Yarn if I understand correctly, including jobs created by HDFS.
My question is: what other settings, if any, do I need to modify (and how do I determine their amounts, given that I want to use say 4GB for the mapreduce.child.java.opts setting) to increase the memory available to HDFS's MapReduce jobs?

Hadoop daemons control their JVM arguments, including heap size settings, through the use of environment variables that have names suffixed with _OPTS. These environment variables are defined in various *-env.sh files in the configuration directory.
Using the NameNode as an example, you can set a line like this in your hadoop-env.sh file.
export HADOOP_NAMENODE_OPTS="-Xms4G -Xmx4G $HADOOP_NAMENODE_OPTS"
This sets a minimum/maximum heap size of 4 GB for the NameNode and also preserves any other arguments that were placed into HADOOP_NAMENODE_OPTS earlier in the script.

Related

Setting JVM options when configuring elastic search

I'm configuring jvm options for an Elasticsearch cluster, and I wonder which jvm heap
would be best for my usecase.
The machine has 16GB memory and will be dedicated to a single node of elasticsearch.
The default value is 1GB, and I'm not familar with Java/JVM but I feel like this is too small.
Any help would be appreciated.
If you use Windows, you can type Windows + R, then systempropertiesadvanced , then set, for example:
ES_JAVA_OPTS
-Xms2g -Xmx2g
(You can increase value as you want, 2 is a number, g is gigabyte, m is megabyte)
Reference document: https://www.elastic.co/guide/en/elasticsearch/reference/master/advanced-configuration.html#set-jvm-options
https://www.javadevjournal.com/java/jvm-parameters/

memory usage grows until VM crashes while running Wildfly 9 with Java 8

We are having an issue with virtual servers (VMs) running out of native memory. These VMs are running:
Linux 7.2(Maipo)
Wildfly 9.0.1
Java 1.8.0._151 running with (different JVMs have different heap sizes. They range from 0.5G to 2G)
The JVM args are:
-XX:+UseG1GC
-XX:SurvivorRatio=1
-XX:NewRatio=2
-XX:MaxTenuringThreshold=15
-XX:-UseAdaptiveSizePolicy
-XX:G1HeapRegionSize=16m
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=64m
-javaagent:/<path to new relic.jar>
After about a month, sometimes longer, the VMs start to use all of their swap space and then eventually the OOM-Killer notices that java is using too much memory and kills one of our JVMs.
The amount of memory being used by the java process is larger than heap + metaSpace + compressed as revealed by using -XX:NativeMemoryTracking=detail
Are there tools that could tell me what is in this native memory(like a heap dump but not for the heap)?
Are there any tools that can map java heap usage to native memory usage (outside the heap) that are not jemalloc? I have used jemalloc to try to achieve this but the graph that is being drawn contains only hex values and not human readable class names so I cant really get anything out of it. Maybe I'm doing something wrong or perhaps I need another tool.
Any suggestions would be greatly appreciated.
You can use jcmd.
Start application with -XX:NativeMemoryTracking=summary or -
XX:NativeMemoryTracking=detail
Use jcmd to monitor the NMT (native memory tracker)
jcmd "pid" VM.native_memory baseline //take the baseline
jcmd "pid" VM.native_memory detail.diff // use based on your need to analyze more on change in native memory from its baseline

Heap Size vs HADOOP_NAMENODE_OPTS at namenode

I am using hadoop apache 2.7.1 in HA Cluster.
I needed to update heap memory for both name nodes, so I updated
the property HADOOP_NAMENODE_OPTS in hadoop-env.sh to be 8 gb
export HADOOP_NAMENODE_OPTS="-Xmx8192m $HADOOP_NAMENODE_OPTS"
so the heap size in my name nodes is now 8 GB
but I realized the parameter HADOOP_HEAPSIZE in hadoop-env.sh
and I didn't give it any value
is setting HADOOP_NAMENODE_OPTS to 8 GB enough or should we set HADOOP_HEAPSIZE to 8 GB too?
I mean does the value HADOOP_NAMENODE_OPTS override the value HADOOP_HEAPSIZE
or should be both configured and each one has its specific job?
does the value HADOOP_NAMENODE_OPTS overrides the value HADOOP_HEAPSIZE
Yes, it does. https://www.cloudera.com/documentation/enterprise/latest/topics/admin_nn_memory_config.html

JMeter issues when running large number of threads

I'm testing using Apache's Jmeter, I'm simply accessing one page of my companies website and turning up the number of users until it reaches a threshold, the problem is that when I get to around 3000 threads JMeter doesn't run all of them. Looking at the Aggregate Graph
it only runs about 2,536 (this number varies but is always around here) of them.
The partial run comes with the following exception in the logs:
01:16 ERROR - jmeter.JMeter: Uncaught exception:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at org.apache.jmeter.threads.ThreadGroup.start(ThreadGroup.java:293)
at org.apache.jmeter.engine.StandardJMeterEngine.startThreadGroup(StandardJMeterEngine.java:476)
at org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:395)
at java.lang.Thread.run(Unknown Source)
This behavior is consistent. In addition one of the times JMeter crashed in the middle outputting a file that said:
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 32756 bytes for ChunkPool::allocate
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (allocation.cpp:211), pid=10748, tid=11652
#
# JRE version: 6.0_31-b05
# Java VM: Java HotSpot(TM) Client VM (20.6-b01 mixed mode, sharing windows-x86 )
Any ideas?
I tried changing the heap size in jmeter.bat, but that didn't seem to help at all.
JVM is simply not capable of running so many threads. And even if it is, JMeter will consume a lot of CPU resources to purely switch contexts. In other words, above some point you are not benchmarking your web application but the client computer, hosting JMeter.
You have few choices:
experiment with JVM options, e.g. decrease default -Xss512K to something smaller
run JMeter in a cluster
use tools taking radically different approach like Gatling
I had a similar issue and increased the heap size in jmeter.bat to 1024M and that fixed the issue.
set HEAP=-Xms1024m -Xmx1024m
For the JVM, if you read hprof it gives you some solutions among which are:
switch to a 64 bits jvm ( > 6_u25)
with this you will be able to allocate more Heap (-Xmx) , ensure you have this RAM
reduce Xss with:
-Xss256k
Then for JMeter, follow best-practices:
http://jmeter.apache.org/usermanual/best-practices.html
http://www.ubik-ingenierie.com/blog/jmeter_performance_tuning_tips/
Finally ensure you use last JMeter version.
Use linux OS preferably
Tune the TCP stack, limits
Success will depend on your machine power (cpu and memory) and your test plan.
If this is not enough (for 3000 threads it should be OK), you may need to use distributed testing
Increasing the heap size in jmeter.bat works fine
set HEAP=-Xms1024m -Xmx1024m
OR
you can do something like below if you are using jmeter.sh:
JVM_ARGS="-Xms512m -Xmx1024m" jmeter.sh etc.
I ran into this same problem and the only solution that helped me is: https://stackoverflow.com/a/26190804/5796780
proper 100k threads on linux:
ulimit -s 256
ulimit -i 120000
echo 120000 > /proc/sys/kernel/threads-max
echo 600000 > /proc/sys/vm/max_map_count
echo 200000 > /proc/sys/kernel/pid_max
If you don't have root access:
echo 200000 | sudo dd of=/proc/sys/kernel/pid_max
After increasing Xms et Xmx heap size, I had to make my Java run in 64 bits mode. In jmeter.bat :
set JM_LAUNCH=java.exe -d64
Obviously, you need to run a 64 bits OS and have installed Java 64 bits (see https://www.java.com/en/download/manual.jsp)

Hadoop verification block

I have a problem when start hadoop.
DataBlockScanner consume up to 100% of one CPU.
Master log is:
2012-04-02 11:25:49,793 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK NameSystem.processReport: from 192.168.33.44:50010, blocks: 16148, processing time: 13 msecs
Slave log is:
2012-04-02 11:09:34,109 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-1757906724564777881_10532084
I checked hadoop fsck and found no error or corrupt block.
Why is the CPU usage so high, and how to stop the block verification?
Without digging through the source to confirm, this is probably only a problem on startup, as the datanode has to tree walk the data directory (/ies) to discover all the blocks and then report them to the namenode. Again without the source i'm unable to confirm as to whether the checksums of each block are verified on startup too, which could be the cause for the 100% CPU.
Thanks.I think my CPU usage so high because the leap second.I think the problem is java.When i start hadoop, cpu usage so high.
http://en.wikipedia.org/wiki/Leap_second

Resources