How do I increase Tez's container physical memory? - hadoop

I've been running some hive scripts on an aws emr 4.8 cluster with hive 1.0 and tez 0.8.
My configurations look like this:
SET hive.exec.compress.output=true;
SET mapred.output.compression.type=BLOCK;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
set hive.execution.engine=tez;
set hive.merge.mapfiles=false;
SET hive.default.fileformat=Orc;
set tez.task.resource.memory.mb=5000;
SET hive.tez.container.size=6656;
SET hive.tez.java.opts=-Xmx5120m;
set hive.optimize.ppd=true;
And my global configs are:
hadoop-env.export HADOOP_HEAPSIZE 4750
hadoop-env.export HADOOP_DATANODE_HEAPSIZE 4750
hive-env.export HADOOP_HEAPSIZE 4750
While running my script, I get the following error:
Container [pid=19027,containerID=container_1477393351192_0007_02_000001] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.9 GB of 5 GB virtual memory used. Killing container.
On googling this error, I read that set tez.task.resource.memory.mb will change the physical memory limit, but clearly I was mistaken. What am I missing?

I have had this problem a lot. The changing
Set hive.tez.container.size=6656;
Set hive.tez.java.opts=-Xmx4g;
does not fix the problem for me but this does:
set tez.am.resource.memory.mb=4096;

Set the Tez container size to be a larger multiple of the YARN container size (4GB):
SET hive.tez.container.size=4096MB
"hive.tez.container.size" and "hive.tez.java.opts" are the parameters that alter Tez memory settings in Hive. If "hive.tez.container.size" is set to "-1" (default value), it picks the value of "mapreduce.map.memory.mb". If "hive.tez.java.opts" is not specified, it relies on the "mapreduce.map.java.opts" setting. Thus, if Tez specific memory settings are left as default values, memory sizes are picked from mapreduce mapper memory settings "mapreduce.map.memory.mb".
https://documentation.altiscale.com/memory-settings-for-tez
For more info Tez configuration and Tez memory tuning
Note: Set in MB with Ambari

Incase anyone else stumbles upon this thread trying to solve this above, here is a link to a real solution that worked for me where all the other solutions did not.
http://moi.vonos.net/bigdata/hive-cli-memory/
TL;DR add these to your hive call --hiveconf tez.am.resource.memory.mb=<size as int>
--hiveconf tez.am.launch.cmd-opts=""

Set hive.tez.container.size=6656
Set hive.tez.java.opts=-Xmx4g

Related

janusgraph-0.5.3 memory configuration

I am using janusgraph-0.5.3 (with Cassandra) and I want to know how to configure memory allocation to increase default memory allocated to 2GB for the gremlin server process.
I am trying to load bulk data on my gremlin-server, but it is failing with error. I would like to know if there is a way to check and increase the default memory allocation.
I need help in locating the .yaml configuration files as well as the values in these files that would need to change.
Thanks
I changed gremlin-server.sh file to take additional memory
# Set Java options
if [ "$JAVA_OPTIONS" = "" ] ; then
echo "Setting xmx and xss"
JAVA_OPTIONS="-Xms1024m -Xmx3074m -Xss2048k -javaagent:$JANUSGRAPH_LIB/jamm-0.3.0.jar -Dgremlin.io.kryoShimService=org.janusgraph.hadoop.serialize.JanusGraphKryoShimService"
fi

Setting JVM options when configuring elastic search

I'm configuring jvm options for an Elasticsearch cluster, and I wonder which jvm heap
would be best for my usecase.
The machine has 16GB memory and will be dedicated to a single node of elasticsearch.
The default value is 1GB, and I'm not familar with Java/JVM but I feel like this is too small.
Any help would be appreciated.
If you use Windows, you can type Windows + R, then systempropertiesadvanced , then set, for example:
ES_JAVA_OPTS
-Xms2g -Xmx2g
(You can increase value as you want, 2 is a number, g is gigabyte, m is megabyte)
Reference document: https://www.elastic.co/guide/en/elasticsearch/reference/master/advanced-configuration.html#set-jvm-options
https://www.javadevjournal.com/java/jvm-parameters/

Heap Size vs HADOOP_NAMENODE_OPTS at namenode

I am using hadoop apache 2.7.1 in HA Cluster.
I needed to update heap memory for both name nodes, so I updated
the property HADOOP_NAMENODE_OPTS in hadoop-env.sh to be 8 gb
export HADOOP_NAMENODE_OPTS="-Xmx8192m $HADOOP_NAMENODE_OPTS"
so the heap size in my name nodes is now 8 GB
but I realized the parameter HADOOP_HEAPSIZE in hadoop-env.sh
and I didn't give it any value
is setting HADOOP_NAMENODE_OPTS to 8 GB enough or should we set HADOOP_HEAPSIZE to 8 GB too?
I mean does the value HADOOP_NAMENODE_OPTS override the value HADOOP_HEAPSIZE
or should be both configured and each one has its specific job?
does the value HADOOP_NAMENODE_OPTS overrides the value HADOOP_HEAPSIZE
Yes, it does. https://www.cloudera.com/documentation/enterprise/latest/topics/admin_nn_memory_config.html

This isn't normal right ? Required AM memory (471859200+47185920 MB) is above the max threshold (2048 MB)

I have read alot about solving that kind of problem by setting yarn.scheduler.maximum-allocation-mb, which I have set to 2gb as I am currently running select count(*) from <table> which isn't a heavy computation, I guess. But what's Required AM memory (471859200+47185920 MB) supposed to mean? Other question says problem of about (1024+2048) or something like that.
I am setting up on a single machine, i.e my desktop which has 4-gb ram and 2 cores. Is this very low spec to run Spark as Hive execution engine?
Currently I am running this job from java and my setup is
Connection connect = DriverManager.getConnection("jdbc:hive2://saurab:10000/default", "hiveuser", "hivepassword");
Statement state = connect.createStatement();
state.execute("SET hive.execution.engine=spark");
state.execute("SET spark.executor.memory=1g");
state.execute("SET spark.yarn.executor.memoryOverhead=512m");
yarn-site.xml
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>3g</value>
</property>
And a simple query
String query = "select count(*) from sales_txt";
ResultSet res = state.executeQuery(query);
if (res.next()) {
System.out.println(res.getString());
}
Also what are those two memory numbers (A+B) ?
AM stands for Application Master for running Spark on Yarn. Good explanation here:
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/yarn/spark-yarn-applicationmaster.html
It's not clear why you need to run yarn on your single machine to test this out. You could run this in standalone mode to remove the yarn overhead, and test your spark application code.
https://spark.apache.org/docs/latest/
The spark.*.memory and spark.yarn.executor.memoryOverhead need to be set when you deploy the spark application. They cannot be set in those statements.

JMeter issues when running large number of threads

I'm testing using Apache's Jmeter, I'm simply accessing one page of my companies website and turning up the number of users until it reaches a threshold, the problem is that when I get to around 3000 threads JMeter doesn't run all of them. Looking at the Aggregate Graph
it only runs about 2,536 (this number varies but is always around here) of them.
The partial run comes with the following exception in the logs:
01:16 ERROR - jmeter.JMeter: Uncaught exception:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at org.apache.jmeter.threads.ThreadGroup.start(ThreadGroup.java:293)
at org.apache.jmeter.engine.StandardJMeterEngine.startThreadGroup(StandardJMeterEngine.java:476)
at org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:395)
at java.lang.Thread.run(Unknown Source)
This behavior is consistent. In addition one of the times JMeter crashed in the middle outputting a file that said:
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 32756 bytes for ChunkPool::allocate
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (allocation.cpp:211), pid=10748, tid=11652
#
# JRE version: 6.0_31-b05
# Java VM: Java HotSpot(TM) Client VM (20.6-b01 mixed mode, sharing windows-x86 )
Any ideas?
I tried changing the heap size in jmeter.bat, but that didn't seem to help at all.
JVM is simply not capable of running so many threads. And even if it is, JMeter will consume a lot of CPU resources to purely switch contexts. In other words, above some point you are not benchmarking your web application but the client computer, hosting JMeter.
You have few choices:
experiment with JVM options, e.g. decrease default -Xss512K to something smaller
run JMeter in a cluster
use tools taking radically different approach like Gatling
I had a similar issue and increased the heap size in jmeter.bat to 1024M and that fixed the issue.
set HEAP=-Xms1024m -Xmx1024m
For the JVM, if you read hprof it gives you some solutions among which are:
switch to a 64 bits jvm ( > 6_u25)
with this you will be able to allocate more Heap (-Xmx) , ensure you have this RAM
reduce Xss with:
-Xss256k
Then for JMeter, follow best-practices:
http://jmeter.apache.org/usermanual/best-practices.html
http://www.ubik-ingenierie.com/blog/jmeter_performance_tuning_tips/
Finally ensure you use last JMeter version.
Use linux OS preferably
Tune the TCP stack, limits
Success will depend on your machine power (cpu and memory) and your test plan.
If this is not enough (for 3000 threads it should be OK), you may need to use distributed testing
Increasing the heap size in jmeter.bat works fine
set HEAP=-Xms1024m -Xmx1024m
OR
you can do something like below if you are using jmeter.sh:
JVM_ARGS="-Xms512m -Xmx1024m" jmeter.sh etc.
I ran into this same problem and the only solution that helped me is: https://stackoverflow.com/a/26190804/5796780
proper 100k threads on linux:
ulimit -s 256
ulimit -i 120000
echo 120000 > /proc/sys/kernel/threads-max
echo 600000 > /proc/sys/vm/max_map_count
echo 200000 > /proc/sys/kernel/pid_max
If you don't have root access:
echo 200000 | sudo dd of=/proc/sys/kernel/pid_max
After increasing Xms et Xmx heap size, I had to make my Java run in 64 bits mode. In jmeter.bat :
set JM_LAUNCH=java.exe -d64
Obviously, you need to run a 64 bits OS and have installed Java 64 bits (see https://www.java.com/en/download/manual.jsp)

Resources