Hive process memory size - hadoop

Please can you help me understand what is 512mb (highlighed in bold) in Hive context?
Which memory size it is?
I have set export HADOOP_HEAPSIZE=4192 in my hive-site.xml file
hadoop#master:~/hive/conf$ ps -ef | grep 'hive'
hadoop 5587 1 0 Feb14 ? 00:05:27
/usr/lib/jvm/default-jdk/bin/java -Xmx4192m
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.3/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.3 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/hadoop-2.7.3/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar
/home/hadoop/hive/lib/hive-metastore-2.3.2.jar
org.apache.hadoop.hive.metastore.HiveMetaStore
hadoop 9903 8034 0 10:54 pts/0 00:00:00 grep --color=auto hive
hadoop 21646 15918 1 07:37 pts/3 00:03:02
/usr/lib/jvm/default-jdk/bin/java -Xmx4192m
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.3/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.3 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/hadoop-2.7.3/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dproc_hivecli -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/home/hadoop/hive/conf/parquet-logging.properties
-Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /home/hadoop/hive/lib/hive-cli-2.3.2.jar
org.apache.hadoop.hive.cli.CliDriver

The variable you edited is for the clients, not the servers, and you don't export anything from the XML files
To configure the heap size for HiveServer2 and Hive metastore, set the -Xmx parameter in the HADOOP_OPTS variable to the desired maximum heap size in /etc/hive/hive-env.sh
VS
To configure the heap size for the Beeline CLI, set the HADOOP_HEAPSIZE environment variable in /etc/hive/hive-env.sh
Besides that, both values made it though, so you might want to look at
Duplicated Java runtime options : what is the order of preference?

Related

where is core-default.xml file?

I'm interested in value of fs.s3a.connection.ssl.enabled parameter in my mapr cluster.
I know the value is set in core-default.xml (if not overwritten by core-site.xml) but I cannot find core-default.xml file. Any suggestions where it can be?
Is there any way to check the current value of parameter?
where is core-default.xml file?
It is in resources of hadoop-common; https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
So in this case you will be able to find it inside jar file of hadoop-common; the jar can be found in /opt/mapr/hadoop/hadoop-<version>/share/hadoop/common/hadoop-common-<version>.jar
I have extracted the jar and list the files;
[... ~]$ jar xf ./hadoop-common-<version>.jar
[... ~]$ ll
-rw-rw-r-- 1 mapr mapr 1041 Mar 15 18:36 common-version-info.properties
-rw-rw-r-- 1 mapr mapr 64287 Mar 15 18:06 core-default.xml
...
Is there any way to check the current value of parameter?
Yes there is, please run the following command to see the property;
hadoop org.apache.hadoop.conf.Configuration | grep "fs.s3a.connection.ssl.enabled"

Issue with HDFS command taking 100% cpu

I have a hdfs server where I am currently streaming.
I also hit this server with the following type command regularly to check for certain conditions: hdfs dfs -find /user/cdh/streameddata/ -name *_processed
however, I have started to see this command taking a massive portion of my CPU when monitoring in TOP:
cdh 16919 1 99 13:03 ? 00:43:45 /opt/jdk/bin/java -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/hadoop -Dhadoop.id.str=cdh -Dhadoop.root.logger=ERROR,DRFA -Djava.library.path=/opt/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.fs.FsShell -find /user/cdh/streameddata/ -name *_processed
This is causing other applications to stall, and is having a massive impact on my application on the whole.
My server is a 48 core server, I did not expect this to be an issue.
Currently, I have not set any additional heap in hadoop, so it is using the 1000MB default.
If you think your heap probably is too small, you can run:
jstat -gcutil 16919 # process ID of the hdfs dfs find command
And look at the value under GCT (Garbage Collection Time) to see how much time you're spending in garbage collection relative to your total run time.
However, if directory /user/cdh/streameddata/ has hundreds of thousands of files or millions of files, you probably are legitimately crippling your system.

Write multi-line string in Spring boot .conf file

For my Spring boot application, I have a .conf file that is used to run the application.
In this file, I put some jvm options.
Currently it contains this :
JAVA_OPTS="-Xms256m -Xmx512m -Dvisualvm.display.name=ApplicationWs -Dcom.sun.management.jmxremote.port=3333 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
In the future I will certainly add other options and the line will increase in size.
I want to make it more readable by writing one or two options by line. But I don't find the proper syntax for this.
I want to do something like this :
# Heap Size
JAVA_OPTS="-Xms256m -Xmx512m"
# JVisualVM Name in VisualVM
JAVA_OPTS="$JAVA_OPTS -Dvisualvm.display.name=ApplicationWs"
# Jmx Configuration
JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.port=3333 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
I already tried :
JAVA_OPTS="-Xms256m -Xmx512m"
JAVA_OPTS="$JAVA_OPTS -Dvisualvm.display.name=ApplicationWs"
export JAVA_OPTS
JAVA_OPTS="-Xms256m -Xmx512m"
JAVA_OPTS="${JAVA_OPTS} -Dvisualvm.display.name=ApplicationWs"
export JAVA_OPTS
JAVA_OPTS="-Xms256m -Xmx512m
-Dvisualvm.display.name=ApplicationWs"
JAVA_OPTS="-Xms256m -Xmx512m "
+ " -Dvisualvm.display.name=ApplicationWs"
What is the proper syntax for a multi-line string in a spring-boot .conf file ?
Spring boot launch script will use the shell to source the .conf file, so you can put any shell script syntax to write the configuration. I would prefer to use vars to format them in your case such as the following:
MEM_OPTS='-Xms256m -Xmx512m'
DISPLAY_NAME='visualvm.display.name=ApplicationWs'
JMXREMOTE_PORT='com.sun.management.jmxremote.port=3333'
JMXREMOTE_SSL='com.sun.management.jmxremote.ssl=false'
JMXREMOTE_AUTH='com.sun.management.jmxremote.authenticate=false'
JAVA_OPTS="${MEM_OPTS} -D${DISPLAY_NAME} -D${JMXREMOTE_PORT} -D${JMXREMOTE_SSL} -D${JMXREMOTE_AUTH}"
see here
Try multiple line like this:
primes = 2,\
3,\
5,\
7,\
11
from: https://stackoverflow.com/a/8978515/404145
Only way that actually works is to pass one line command, note semicolons and backslashes in the end:
MEMORY_PARAMS=' -Xms512M -Xmx512M '; \
JMX_MONITORING='-Dcom.sun.management.jmxremote.port=8890 -Dcom.sun.management.jmxremote.rmi.port=8890 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote -Djava.rmi.server.hostname=13.55.666.7777'; \
REMOTE_DEBUG='-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:8889'; \
JAVA_OPTS=" -Dfile.encoding=UTF-8 ${MEMORY_PARAMS} ${REMOTE_DEBUG} ${JMX_MONITORING} "

Centralized Cache failed in hadoop-2.3

i want to use Centralized Cache in hadoop-2.3.
here is my steps. (10 nodes, every node 6g memory)
1.my file(45M) to be cached
[hadoop#Master ~]$ hadoop fs -ls /input/pics/bundle
Found 1 items
-rw-r--r-- 1 hadoop supergroup 47185920 2014-03-09 19:10 /input/pics/bundle/bundle.chq
2.create cache pool
[hadoop#Master ~]$ hdfs cacheadmin -addPool myPool -owner hadoop -group supergroup
Successfully added cache pool myPool.
[hadoop#Master ~]$ hdfs cacheadmin -listPools -stats
Found 1 result.
NAME OWNER GROUP MODE LIMIT MAXTTL BYTES_NEEDED BYTES_CACHED BYTES_OVERLIMIT FILES_NEEDED FILES_CACHED
myPool hadoop supergroup rwxr-xr-x unlimited never 0 0 0 0 0
3.addDirective
[hadoop#Master ~]$ hdfs cacheadmin -addDirective -path /input/pics/bundle/bundle.chq -pool myPool -force -replication 3
Added cache directive 2
4.listDirectives
[hadoop#Master ~]$ hdfs cacheadmin -listDirectives -stats -path /input/pics/bundle/bundle.chq -pool myPool
Found 1 entry
ID POOL REPL EXPIRY PATH BYTES_NEEDED BYTES_CACHED FILES_NEEDED FILES_CACHED
2 myPool 3 never /input/pics/bundle/bundle.chq 141557760 0 1 0
the BYTES_NEEDED is right, but BYTES_CACHED is zero. It seems that the size has been calculated but the cache action which puts file into memory has not been done.So how to cache my file into memory.
Thank you very much.
There were a bunch of bugs we fixed in Hadoop 2.3. I would recommend using at least Hadoop 2.4 to use HDFS caching.
To get into more detail I would need to see the log messages.
Including the output of hdfs dfsadmin -report would also be useful, as well as ensuring that you have followed the setup instructions here (namely, increasing the ulimit and setting dfs.datanode.max.locked.memory):
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html

HeapDumpOnOutOfMemoryError ElasticSearch

I am seeing this when i do ps -aef | grep elasticsearch
HeapDumpOnOutOfMemoryError
501 37347 1 0 2:29PM ttys004 0:04.14 /usr/bin/java -Xms4g
-Xmx4g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.path.home=/Users/abdullahmuhammad/elasticsearch -cp :/Users/abdullahmuhammad/elasticsearch/lib/elasticsearch-0.20.6.jar:/Users/abdullahmuhammad/elasticsearch/lib/:/Users/abdullahmuhammad/elasticsearch/lib/sigar/
org.elasticsearch.bootstrap.ElasticSearch
I have tried a few things. Playing with the size of initial heap. Increasing, decreasing it.
I have also deleted my whole index but still i get no luck.
I used following to delete the index.
curl -XDELETE 'http://localhost:9200/_all/'
Any help would be appreciated.
If you use some plugins like Marvel you should check indexes count and their size. Because some plugins create big number of indixes and they can eat all your memory.
For the heap, Elasticsearch recommands 50% of your available memory.
General, Elasticsearch recommandations for memory: max. 64GB, min. 8GB.
Important documentation:
https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html
A few recommendations:
- Adjust your ES_HEAP_SIZE environment variable.
- Set mlockall option (in config file) of ES to true. This will always allocate a concrete block of heap memory.
- If your system is not very strong, you decrease your shard number. Note that; while increasing the number of shards increases the insert performance, increasing number of replication increases the query performance.

Resources