HeapDumpOnOutOfMemoryError ElasticSearch - elasticsearch

I am seeing this when i do ps -aef | grep elasticsearch
HeapDumpOnOutOfMemoryError
501 37347 1 0 2:29PM ttys004 0:04.14 /usr/bin/java -Xms4g
-Xmx4g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.path.home=/Users/abdullahmuhammad/elasticsearch -cp :/Users/abdullahmuhammad/elasticsearch/lib/elasticsearch-0.20.6.jar:/Users/abdullahmuhammad/elasticsearch/lib/:/Users/abdullahmuhammad/elasticsearch/lib/sigar/
org.elasticsearch.bootstrap.ElasticSearch
I have tried a few things. Playing with the size of initial heap. Increasing, decreasing it.
I have also deleted my whole index but still i get no luck.
I used following to delete the index.
curl -XDELETE 'http://localhost:9200/_all/'
Any help would be appreciated.

If you use some plugins like Marvel you should check indexes count and their size. Because some plugins create big number of indixes and they can eat all your memory.

For the heap, Elasticsearch recommands 50% of your available memory.
General, Elasticsearch recommandations for memory: max. 64GB, min. 8GB.
Important documentation:
https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html

A few recommendations:
- Adjust your ES_HEAP_SIZE environment variable.
- Set mlockall option (in config file) of ES to true. This will always allocate a concrete block of heap memory.
- If your system is not very strong, you decrease your shard number. Note that; while increasing the number of shards increases the insert performance, increasing number of replication increases the query performance.

Related

How to increase heap memory of elasticsearch in Centos 7?

When we run elasticsearch in server then we face Broken pipe issue in elasticsearch.
"org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe"
We just increase the heap memory of the elasticsearch as given step.
First check current heap memory of elasticsearch.
ps aux | grep elasticsearch
"-Xms1g -Xmx1g"
Increase the size of heap Memory
vi /etc/sysconfig/elasticsearch
**Heap size defaults to 256m min, 1g max
Set ES_HEAP_SIZE to 50% of available RAM, but no more than 31g**
ES_HEAP_SIZE=3g
Check new heap memory
ps aux | grep elasticsearch
"-Xms3g -Xmx3g"

Hive process memory size

Please can you help me understand what is 512mb (highlighed in bold) in Hive context?
Which memory size it is?
I have set export HADOOP_HEAPSIZE=4192 in my hive-site.xml file
hadoop#master:~/hive/conf$ ps -ef | grep 'hive'
hadoop 5587 1 0 Feb14 ? 00:05:27
/usr/lib/jvm/default-jdk/bin/java -Xmx4192m
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.3/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.3 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/hadoop-2.7.3/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar
/home/hadoop/hive/lib/hive-metastore-2.3.2.jar
org.apache.hadoop.hive.metastore.HiveMetaStore
hadoop 9903 8034 0 10:54 pts/0 00:00:00 grep --color=auto hive
hadoop 21646 15918 1 07:37 pts/3 00:03:02
/usr/lib/jvm/default-jdk/bin/java -Xmx4192m
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.3/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.3 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/hadoop-2.7.3/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dproc_hivecli -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/home/hadoop/hive/conf/parquet-logging.properties
-Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /home/hadoop/hive/lib/hive-cli-2.3.2.jar
org.apache.hadoop.hive.cli.CliDriver
The variable you edited is for the clients, not the servers, and you don't export anything from the XML files
To configure the heap size for HiveServer2 and Hive metastore, set the -Xmx parameter in the HADOOP_OPTS variable to the desired maximum heap size in /etc/hive/hive-env.sh
VS
To configure the heap size for the Beeline CLI, set the HADOOP_HEAPSIZE environment variable in /etc/hive/hive-env.sh
Besides that, both values made it though, so you might want to look at
Duplicated Java runtime options : what is the order of preference?

Issue with HDFS command taking 100% cpu

I have a hdfs server where I am currently streaming.
I also hit this server with the following type command regularly to check for certain conditions: hdfs dfs -find /user/cdh/streameddata/ -name *_processed
however, I have started to see this command taking a massive portion of my CPU when monitoring in TOP:
cdh 16919 1 99 13:03 ? 00:43:45 /opt/jdk/bin/java -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/hadoop -Dhadoop.id.str=cdh -Dhadoop.root.logger=ERROR,DRFA -Djava.library.path=/opt/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.fs.FsShell -find /user/cdh/streameddata/ -name *_processed
This is causing other applications to stall, and is having a massive impact on my application on the whole.
My server is a 48 core server, I did not expect this to be an issue.
Currently, I have not set any additional heap in hadoop, so it is using the 1000MB default.
If you think your heap probably is too small, you can run:
jstat -gcutil 16919 # process ID of the hdfs dfs find command
And look at the value under GCT (Garbage Collection Time) to see how much time you're spending in garbage collection relative to your total run time.
However, if directory /user/cdh/streameddata/ has hundreds of thousands of files or millions of files, you probably are legitimately crippling your system.

hadoop ulimit open files name

I have a hadoop cluster we assuming is performing pretty "bad". The nodes are pretty beefy.. 24 cores, 60+G RAM ..etc. And we are wondering if there are some basic linux/hadoop default configuration that prevent hadoop from fully utilizing our hardware.
There is a post here that described a few possibilities that I think might be true.
I tried logging in the namenode as root, hdfs and also myself and trying to see the output of lsof and also the setting of ulimit. Here are the output, can anyone help me understand why the setting doesn't match with the open files number.
For example, when I logged in as root. The lsof looks like this:
[root#box ~]# lsof | awk '{print $3}' | sort | uniq -c | sort -nr
7256 cloudera-scm
3910 root
2173 oracle
1886 hbase
1575 hue
1180 hive
801 mapred
470 oozie
427 yarn
418 hdfs
244 oragrid
241 zookeeper
94 postfix
87 httpfs
...
But when I check out the ulimit output, it looks like this:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 806018
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I am assuming, there should be no more than 1024 files opened by one user, however, when you look at the output of lsof, there are 7000+ files opened by one user, can anyone help explain what is going on here?
Correct me if I had made any mistake understanding the relation between ulimit and lsof.
Many thanks!
You need to check limits for the process. It may be different from your shell session:
Ex:
[root#ADWEB_HAPROXY3 ~]# cat /proc/$(pidof haproxy)/limits | grep open
Max open files 65536 65536 files
[root#ADWEB_HAPROXY3 ~]# ulimit -n
4096
In my case haproxy has a directive on its config file to change maximum open files, there should be something for hadoop as well
I had a very similar issue, which caused one of the claster's YARN TimeLine server to stop due to reaching magical 1024 files limit and crashing with "too many open files" errors.
After some investigation it came out that it had some serious issues with dealing with too many files in TimeLine's LevelDB. For some reason YARN ignored yarn.timeline-service.entity-group-fs-store.retain-seconds setting (by default it's set to 7 days, 604800ms). We had LevelDB files dating back for over a month.
What seriously helped was applying a fix described in here: https://community.hortonworks.com/articles/48735/application-timeline-server-manage-the-size-of-the.html
Basically, there are a couple of options I tried:
Shrink TTL (time to live) settings First enable TTL:
<property>
<description>Enable age off of timeline store data.</description>
<name>yarn.timeline-service.ttl-enable</name>
<value>true</value>
</property>
Then set yarn.timeline-service.ttl-ms (set it to some low settings for a period of time):
\
<property>
<description>Time to live for timeline store data in milliseconds.</description>
<name>yarn.timeline-service.ttl-ms</name>
<value>604800000</value>
</property>
Second option, as described, is to stop TimeLine server, delete the whole LevelDB and restart the server. This will start the ATS database from scratch. Works fine if you failed with any other options.
To do it, find the database location from yarn.timeline-service.leveldb-timeline-store.path, back it up and remove all subfolders from it. This operation will require root access to the server where TimeLine is located.
Hope it helps.

hadoop storage directory uses space more than total data on HDFS

i have a three node hadoop cluster with replication factor = 3.
Storage Directory is /app/hadoop/tmp/dfs/ for each system.
Each datanode system has hard-disk capacity of 221GB.
the Effective data of HDFS is 62GB with replication 62*3= 186GB.
Now the problem is i am falling short of storage even though i have only 186GB of data on 660 GB cluster:
HDFS shows huge difference in the space available for use:
datanode1 =7.47 GB
datanode2 =17.7 GB
datanode3 =143 GB
to make sure that these space is used by hadoop local storage, i ran this command on each datanode.
for datanode1
du -h --max-depth=1 /app/hadoop/tmp/
63G /app/hadoop/tmp/dfs
139G /app/hadoop/tmp/mapred
201G /app/hadoop/tmp/
for datanode2
du -h --max-depth=1 /app/hadoop/tmp/
126G /app/hadoop/tmp/mapred
62G /app/hadoop/tmp/dfs
188G /app/hadoop/tmp/
for datanode3
du -h --max-depth=1 /app/hadoop/tmp/dfs/
62G /app/hadoop/tmp/dfs/data
62G /app/hadoop/tmp/dfs/
here datanode1 has used 201GB space for storage.
I tried load-balancer but its showing the cluster is balanced.
here is the output.
start-balancer.sh
starting balancer, logging to /usr/lib/hadoop-0.20/logs/hadoop-ocpe-balancer-blrkec241933d.out
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
The cluster is balanced. Exiting...
Balancing took 622.0 milliseconds
recently one of my datanode went down for few days, after fixing it this problem has arisen.
How to balance the load?
Your analysis of disk space includes MapReduce scratch directory space (/app/hadoop/tmp/mapred paths), which lie outside of HDFS and are mostly temporary data cleared upon job completion.
The DFS space (/app/hadoop/tmp/dfs) seems to be consistent with your expected usage size.
Therefore, your disk space isn't being hogged by the DataNodes, rather by the TaskTrackers - and restarting them forces a clearing of those directories.

Resources