As you can see, this node contains 3 containers (which is using 6GB memory and 3 vCores). I would like it to use rest of the vCores ( 5 vCores in vCores Avail column). I've not done any configuration on the yarn-site.xml yet.
Yarn uses only the resources that it needs
What you are currently looking at seems to be the resources currently used by the running jobs.
Yarn knows that it is allowed to use 2GB more memory and 5 more vcores, but it seems that by the nature of the job these can simply not be utilized.
Hence, it is not likely a problem, or somethign that needs to be fixed but just a matter of you running job of this nature.
Example
When I say a job of this nature, I mean a job for which it is not required to use 3 containers of 2 GB each.
The most simple example of such a job would be a count of 3 comparatively small files with a default container size of 2 GB.
If you really would want to run this kind of job more in parrallel, you would need to resort to workarounds (like setting a tiny maximum container size, or splitting all files in half). However, I would not recommend this.
Related
I'm trying to squeeze every single bit from my cluster when configuring the spark application but it seems I'm not understanding everything completely right. So I'm running the application on an AWS EMR cluster with 1 master and 2 core nodes from type m3.xlarge(15G ram and 4 vCPU for every node). This means that by default 11.25 GB are reserved on every node for applications scheduled by yarn. So the master node is used only by the resource manager(yarn) and that means the remaining 2 core nodes will be used to schedule applications(so we have 22.5G for that purpose). So far so good. But here comes the part which I don't get. I'm starting the spark application with the following parameters:
--driver-memory 4G --num-executors 4 --executor-cores 7 --executor-memory 4G
What this means by my perceptions(from what I found as information) is that for the driver will be allocated 4G and 4 executors will be launched with 4G every one of them. So a rough estimate makes it 5*4=20G(lets make them 21G with the expected memory reserves), which should be fine as we have 22.5G for applications. Here's a screenshot from the UI of the hadoop yarn after the launch:
What we can see is that 17.63 are used by the application but this a little bit less than the expected ~21G and this triggers the first question- what did happen here?
Then I go to the spark UI's executors page. Here comes the bigger question:
The executors are 3(not 4), the memory allocated for them and the driver is 2.1G(not the specified 4G). So hadoop yarn says 17.63G are used, but the spark says 8.4G are allocated. So, what is happening here? Is this related to the Capacity Scheduler(from the documentation I couldn't come up with this conclusion)?
Can you check whether spark.dynamicAllocation.enabled is turned on. If that is the case then spark your application may give resources back to the cluster if they are no longer used. The minimum number of executors to be launched at the startup will be decided by spark.executor.instances.
If that is not the case, what is your source for spark application and what is the partition size set for that, spark will literally map the partition size to the spark cores, if your source has only 10 partitions, and when you try to allocate 15 cores it will only use 10 cores because that is what is needed. I guess this might be the cause that spark has launched 3 executors instead of 4. Regarding memory i would recommend to revisit because you are asking for 4 executors and 1 driver with 4Gb each which would be 5*4+5*384MB approx equals to 22GB and you are trying to use up everything and not much is left for your OS and nodemanager to run that would not be the ideal way to do.
So I'm trying to run some hadoop jobs on AWS R3.4xLarge machines. They have 16 vcores and 122 gigabytes of ram available.
Each of my mappers requires about 8 gigs of ram and one thread, so these machines are very nearly perfect for the job.
I have mapreduce.memory.mb set to 8192,
and mapreduce.map.java.opts set to -Xmx6144
This should result in approximately 14 mappers (in practice nearer to 12) running on each machine.
This is in fact the case for a 2 slave setup, where the scheduler shows 90 percent utilization of the cluster.
When scaling to, say, 4 slaves however, it seems that hadoop simply doesnt create more mappers. In fact it creates LESS.
On my 2 slave setup I had just under 30 mappers running at any one time, on four slaves I had about 20. The machines were sitting at just under 50 percent utilization.
The vcores are there, the physical memory is there. What the heck is missing? Why is hadoop not creating more containers?
So it turns out that this is one of those hadoop things that never makes sense, no matter how hard you try to figure it out.
there is a setting in yarn-default called yarn.nodemanager.heartbeat.interval-ms.
This is set to 1000. Apparently it controls the minimum period between assigning containers in milliseconds.
This means it only creates one new map task per second. This means the number of containers is limited by how many containers I have running*the time that it takes for a container to be finished.
By setting this value to 50, or better yet, 1, I was able to get the kind of scaling that is expected from a hadoop cluster. Honestly should be documented better.
I want to ask. Why if I configured on mapred-site.xml in mapreduce.map/reduce.memory.mb and mapreduce.map/reduce.java.opts to bigger value than default value make my job slower?
But If I configured it too low, then I'll get task failed. And I think on this condition, my memory configuration on hadoop is not necessary...
Can you give me an explanation?
What might be happening in your environment is, when you increase values of the mapreduce.map/reduce.memory.mb and mapreduce.map/reduce.java.opts configurations to upper bound, it actually reduces the number of containers allowed to execute Map/Reduce task in every node thus eventually causes the slowness in the over all job time.
If you have 2 nodes, each with 25 GB of free ram , and say you configured the mapreduce.map/reduce.memory.mb as 4 GB, then you might get atleast 6 containers on every node, totally it is 12. So you would get a chance of running 12 mapper/reducer tasks in parallel.
In case if you configure mapreduce.map/reduce.memory.mb as 10 GB , then you might get only 2 containers on every node , totally it would be 4 containers to execute your mapper/reducer tasks in parallel. So the mapper/reducer tasks would mostly run in sequence due to lack of free containers, thus causes a delay in the over all job completion time.
You should justify the approprite value for the configuration with considering the resources available and the amount of resources required for the Map/Reduce containers according to your environment. Hope this makes sense.
you can allocate memory for map/reduce containers based on two factors
available memory per each Datanode
total number of cores(vcores) you have.
try to create number of containers equivalent to number of cores you have in each detained. ( including hyper threading)
for example if you have 10 physical core ( 20 cores including hyper threading)
so total number containers you can plan is 19 ( leaving 1 core for other processes)
assume that you have 'X' GB Ram in each data node, then
leave some memory(assume Y GB) for other processes (heap) like, Datanode, Node Manager,Region server ,etc
Now memory available for YARN is X -Y = Z
Memory for Map container = Y/number of containers per node
Memory for Reduce container = Y/(2 * number of containers per node)
I want to run map reduce tasks on a single machine and I want to use all the cores of my machine. Which is the best approach? If I install hadoop in pseudo distributed mode it is possible to use all the cores?
You can make use of the properties mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum to increase the number of Mappers/Reducers spawned simultaneously on a TaskTracker as per your hardware specs. By default, it is set to 2, hence a maximum of 2 maps and 2 reduces will run at a given instance. But, one thing to keep in mind is that if your input is very small then framework will decide it's not worth parallelizing the execution. In such a case you need to handle it by tweaking the default split size through mapred.max.split.size.
Having said that, I, based on my personal experience, have noticed that MR jobs are normally I/O(perhaps memory, sometimes) bound. So, CPU does not really become a bottleneck under normal circumstances. As a result you might find it difficult to fully utilize all the cores on one machine at a time for a job.
I would suggest to devise some strategy to decide the proper number of Mappers/Reducers to efficiently carry out the processing to make sure that you are properly utilizing the CPU since Mappers/Reducers take up slots on each node. One approach could be to take the number of cores, multiply it by .75 and then set the number of Mappers and Reducers as per your needs. For example, you have 12 physical cores or 24 virtual cores, then you could have 24*.75 = 18 slots. Now based on your needs you can decide whether to use 9Mappers+9Reducers or 12Mappers+6Reducers or something else.
I'm reposting my answer from this question: Hadoop and map-reduce on multicore machines
For Apache Hadoop 2.7.3, my experience has been that enabling YARN will also enable multi-core support. Here is a simple guide for enabling YARN on a single node:
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_a_Single_Node
The default configuration seems to work pretty well. If you want to tune your core usage, then perhaps look into setting 'yarn.scheduler.minimum-allocation-vcores' and 'yarn.scheduler.maximum-allocation-vcores' within yarn-site.xml (https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml)
Also, see here for instructions on how to configure a simple Hadoop sandbox with multicore support: https://bitbucket.org/aperezrathke/hadoop-aee
I'm having trouble figuring out the best way to configure my Hadoop cluster (CDH4), running MapReduce1. I'm in a situation where I need to run both mappers that require such a large amount of Java heap space that I couldn't possible run more than 1 mapper per node - but at the same time I want to be able to run jobs that can benefit from many mappers per node.
I'm configuring the cluster through the Cloudera management UI, and the Max Map Tasks and mapred.map.child.java.opts appear to be quite static settings.
What I would like to have is something like a heap space pool with X GB available, that would accommodate both kinds of jobs without having to reconfigure the MapReduce service each time. If I run 1 mapper, it should assign X GB heap - if I run 8 mappers, it should assign X/8 GB heap.
I have considered both the Maximum Virtual Memory and the Cgroup Memory Soft/Hard limits, but neither will get me exactly what I want. Maximum Virtual Memory is not effective, since it still is a per task setting. The Cgroup setting is problematic because it does not seem to actually restrict the individual tasks to a lower amount of heap if there is more of them, but rather will allow the task to use too much memory and then kill the process when it does.
Can the behavior I want to achieve be configured?
(PS you should use the newer name of this property with Hadoop 2 / CDH4: mapreduce.map.java.opts. But both should still be recognized.)
The value you configure in your cluster is merely a default. It can be overridden on a per-job basis. You should leave the default value from CDH, or configure it to something reasonable for normal mappers.
For your high-memory job only, in your client code, set mapreduce.map.java.opts in your Configuration object for the Job before you submit it.
The answer gets more complex if you are running MR2/YARN since it no longer schedules by 'slots' but by container memory. So memory enters the picture in a new, different way with new, different properties. (It confuses me, and I'm even at Cloudera.)
In a way it would be better, because you express your resource requirement in terms of memory, which is good here. You would set mapreduce.map.memory.mb as well to a size about 30% larger than your JVM heap size since this is the memory allowed to the whole process. It would be set higher by you for high-memory jobs in the same way. Then Hadoop can decide how many mappers to run, and decide where to put the workers for you, and use as much of the cluster as possible per your configuration. No fussing with your own imaginary resource pool.
In MR1, this is harder to get right. Conceptually you want to set the maximum number of mappers per worker to 1 via mapreduce.tasktracker.map.tasks.maximum, along with your heap setting, but just for the high-memory job. I don't know if the client can request or set this though on a per-job basis. I doubt it as it wouldn't quite make sense. You can't really approach this by controlling the number of mappers just because you have to hack around to even find out, let alone control, the number of mappers it will run.
I don't think OS-level settings will help. In a way these resemble more how MR2 / YARN thinks about resource scheduling. Your best bet may be to (move to MR2 and) use MR2's resource controls and let it figure the rest out.