Hive memory setting for local task during map join - hadoop

I'm using a hdinsight cluster (hive version .13) to run some hive queries. One of the queries (query 7 from the TPCH suit) which launches a local task during map join fails due to insufficient memory (hive aborts it because the hashtable has reached the configured limit).
Hive seems to be allocating 1GB to the local task, from where is this size picked up and how can I increase it?
2015-05-03 05:38:19 Starting to launch local task to process map join; maximum memory = 932184064
I assumed the local task should use the same heap size of the mapper, but it does not seem to be the case. Any help is appreciated.

Quite late on this thread.. but just for others who face the same issue.
The documentation does state that the local (child) JVM will have same size as that of map (https://cwiki.apache.org/confluence/display/Hive/MapJoinOptimization), it does not seem to be the case. Instead, the JVM size is governed by HADOOP_HEAPSIZE setting from hive-env.sh. So, in the case of original post from Shradha, I suspect the HADOOP_HEAPSIZE is set to 1GB.

This property controls it :
yarn.app.mapreduce.am.command-opts
This is the Application Manager jvm opts. Since local task runs on AM.
Can you also try this property :
set hive.mapjoin.localtask.max.memory.usage = 0.999;

You can use HADOOP_HEAPSIZE=512 or HADOOP_CLIENT_OPTS=-Xmx512m which can both be tweaked from hadoop-env.sh.
Note however that this might lead to unexpected behaviors for some jobs and you will probably have to play with
mapreduce.map.memory.mb and mapreduce.map.java.opts
as well as
mapreduce.reduce.memory.mb and mapreduce.reduce.java.opts in the mapred-site config file in order to make sure that jobs remain stable.

Related

Why is Spark setting partitions to the file size in bytes?

I have a very simple pyspark program that is supposed to read CSV files from S3:
r = sc.textFile('s3a://some-bucket/some-file.csv')
.map(etc... you know the drill...)
This was failing when running a local Spark node (it works in EMR). I was getting OOM errors and GC crashes. Upon further inspection, I realized that the number of partitions was insanely high. In this particular case r.getNumPartitions() would return 2358041.
I realized that that's exactly the size of my file in bytes. This, of course, makes Spark crash miserably.
I've tried different configurations, like chaning mapred.min.split.size:
conf = SparkConf()
conf.setAppName('iRank {}'.format(datetime.now()))
conf.set("mapred.min.split.size", "536870912")
conf.set("mapred.max.split.size", "536870912")
conf.set("mapreduce.input.fileinputformat.split.minsize", "536870912")
I've also tried using repartition or changing passing a partitions argument to textFile, to no avail.
I would love to know what makes Spark think that it's a good idea to derive the number of partitions from the file size.
In general it doesn't. As nicely explained by eliasah in his answer to Spark RDD default number of partitions it uses max of minPartitions (2 if not provided) and splits computed by Hadoop input format.
The latter one will by unreasonably high, only if instructed by the configuration. This suggests that some configuration file interferes with your program.
The possible problem with your code is that you use wrong configuration. Hadoop options should be set using hadoopConfiguration not Spark configuration. It looks like you use Python so you have to use private JavaSparkContext instance:
sc = ... # type: SparkContext
sc._jsc.hadoopConfiguration().setInt("mapred.min.split.size", min_value)
sc._jsc.hadoopConfiguration().setInt("mapred.max.split.size", max_value)
There was actually a bug in Hadoop 2.6 which would do this; the initial S3A release didn't provide a block size to Spark to split up, the default of "0" meant one-byte-per-job.
Later version should all take fs.s3a.block.size as the config option specifying the block size...something like 33554432 (= 32 MB) would be a start.
If you are using Hadoop 2.6.x. Don't use S3A. That's my recommendation.

How to configure Hadoop parameters on Amazon EMR?

I run a MR job with one Master and two slavers on the Amazon EMR, but got lots of the error messages like running beyond physical memory limits. Current usage: 3.0 GB of 3 GB physical memory used; 3.7 GB of 15 GB virtual memory used. Killing container after map 100% reduce 35%
I modified my codes by adding the following lines in the Hadoop 2.6.0 MR configuration, but I still got the same error messages.
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "jobtest2");
//conf.set("mapreduce.input.fileinputformat.split.minsize","3073741824");
conf.set("mapreduce.map.memory.mb", "8192");
conf.set("mapreduce.map.java.opts", "-Xmx8192m");
conf.set("mapreduce.reduce.memory.mb", "8192");
conf.set("mapreduce.reduce.java.opts", "-Xmx8192m");
What is the correct way to configure those parameters(mapreduce.map.memory.mb, mapreduce.map.java.opts, mapreduce.reduce.memory.mb, mapreduce.reduce.java.opts) on Amazon EMR? Thank you!
Hadoop 2.x allows you to set the map and reduce settings per job so you are setting the correct section. The problem is the Java opts Xmx memory must be less than the map/reduce.memory.mb. This property represents the total memory for heap and off heap usage. Take a look at the defaults as an example: http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-hadoop-task-config.html. If Yarn was killing off the containers for exceeding the memory when using the default settings then this means you need to give more memory to the off heap portion, thus increasing the gap between Xmx and the total map/reduce.memory.mb.
Take a look at the documentation for the AWS CLI. There is a section on Hadoop and how to map to specific XML config files on EMR instance creation. I have found this to be the best approach available on EMR.

Hadoop Error: Java heap space

So, after seeing the a percent or so of running the job I get an error that says, "Error: Java heap space" and then something along the lines of, "Application container killed"
I am literally running an empty map and reduce job. However, the job does take in an input that is, roughly, about 100 gigs. For whatever reason, I run out of heap space. Although the job does nothing.
I am using default configurations and it's on a single machine. It is running on hadoop version 2.2 and ubuntu. The machine has 4 gigs of ram.
Thanks!
//Note
Got it figured out.
Turns out I was setting the configuration to have a different terminating token/string. The format of the data had changed, so that token/string no longer existed. So it was trying to send all 100gigs into ram for one key.

Setting mapred.child.java.opts in Hive script results in MR job getting 'killed' right away

I have been having a few jobs failing due to OutOfMemory and GC overhead limit exceeded errors. To counter the former I tried setting SET mapred.child.java.opts="-Xmx3G"; at the start of the hive script**.
Basically any time I add this option to the script, the MR jobs that get scheduled(for the first of several queries in the script) are 'killed' right away.
Any thoughts on how to rectify this? Are there any other params that need to be tinkered with in conjunction with max heap space(eg. io.sort.mb)? Any help would be most appreciated.
FWIW, I am using hive-0.7.0 with hadoop-0.20.2. The default setting for max heap size in our cluster is 1200M.
TIA.
** - Some other alternatives that were tried(with comical effect but no discernible change in outcome):
SET mapred.child.java.opts="-Xmx3G";
SET mapred.child.java.opts="-server -Xmx3072M";
SET mapred.map.child.java.opts ="-server -Xmx3072M";
SET mapred.reduce.child.java.opts ="-server -Xmx3072M";
SET mapred.child.java.opts="-Xmx2G";
Update 1: It is possible that it's not necessarily anything to do with setting heap size. Tinkering with mapred.child.java.opts in any way is causing the same outcome. For example setting it thusly, SET mapred.child.java.opts="-XX:+UseConcMarkSweepGC"; is having the same result of MR jobs getting killed right away. Or even setting it explicitly in the script to what is the 'cluster default' causes this.
Update 2: Added a pastebin of a grep of JobTracker logs here.
Figured this would end up being something trivial/inane and it was in the end.
Setting mapred.child.java.opts thusly:
SET mapred.child.java.opts="-Xmx4G -XX:+UseConcMarkSweepGC";
is unacceptable. But this seem to go through fine:
SET mapred.child.java.opts=-Xmx4G -XX:+UseConcMarkSweepGC; (minus the double-quotes)
sigh. Having better debug options/error messages would have been nice.
Two other guards can restrict task memory usage. Both are designed for admins to enforce QoS, so if you're not one of the admins on the cluster, you may be unable to change them.
The first is the ulimit, which can be set directly in the node OS, or by setting mapred.child.ulimit.
The second is a pair of cluster-wide mapred.cluster.max.*.memory.mb properties that enforce memory usage by comparing job settings mapred.job.map.memory.mb and mapred.job.reduce.memory.mb against those cluster-wide limits.

Does Map and Reduce runs in separate JVM's?

Hi I have a Map Reduce task say AverageScoreCalculator which has mapper and reducer.
the question is i static initialitze few fields in AverageScoreCalculator will that be avialable to both mapper and reducer ?
By default, each map and reduce task runs in a different JVM and there can be multiple JVMs running at any particular instance on a node.
Set the following properties
mapred.job.reuse.jvm.num.tasks = -1
mapreduce.tasktracker.map.tasks.maximum = 1
mapreduce.tasktracker.reduce.tasks.maximum = 1
mapreduce.job.reduce.slowstart.completedmaps = 1
and there will be only a single mapper/reducer running on a given node with JVM reuse and the reducers won't start until all the mappers have completed processing.
Couple of things to note
The above approach works with MapReduce 1x release and is not an efficient approach.
JVM reuse is not supported in MapReduce 2x release.
Static fields will create problem if they are updated dynamically in either map or reduce program. Standalone and pseudo-distributed modes are for beginners and should only be used if you are learning Hadoop. These mode wont help while processing huge volumes of data which is primary objective of map - reduce programming practice.
When jobs are distributed across the nodes , static information will be lost. Reconsider use of static variable.
If you can , paste the map and reduce program and the need for static fields , we can have a better solution for the same.
You should first know which configuration/mode your job is going to be run in.
For instance, if you run in local(standalone) mode, there will be only one JVM running your job.
If you run it in a pseudo-distributed mode, the job will be run using multiple JVMs on your machine.
If you run it in a distributed mode they will run on different machines and of course different JVMs (with JVM reuse)

Resources