Increasing assigned memory for a topology in Storm - performance

I have a 10 node cluster with each machine of 8 GB RAM and when I run my topology, the assigned memory is always proportional to the number of workers. And each worker is approximately taking a 1 GB of Memory. I want to allot 2 GB to each worker I tried to set in storm.yaml as worker.childopts: "-Xmx6g -Xms6g" since I am running three workers in each node. But the assigned memory decreased to below 1 GB.
How to tune my topology better?
I am getting the following error in one of my bolts
java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.kafka.common.memory.MemoryPo

You can reference this link
https://github.com/apache/storm/blob/master/conf/defaults.yaml
I config 2 params:
worker.heap.memory.mb: 768
supervisor.memory.capacity.mb: 4096.0
If you have 10 workers you have to config supervisor.memory.capacity.mb = (768 x 10) It's capacity of supervisor

I think your configuration is slightly wrong. The worker.childopts setting is passed to each worker JVM, so when you set -Xmx6g -Xms6g you are giving each of your three workers 6 gigs of memory (18 total for the node)

Related

Mulesoft vCore metric

I would like to get data about the APIs.
More specifically: how many vCores are assigned and what is the load on those vCores. So for example, if I assign 1 vCore to an API but it is barely used, that would be a waste.
So I want to build an API that would get me this data and transform it into a suitable format. Can someone tell me, how can I get this data about APIs and if that is even possible to do.
so what I want to return is something like this:
api-name, vCoreUsed, Load%
appOne, 2, 50%
(What I mean by load: so if the load is 100% it means that all vCores are used and the service might be slow and if it is 10% than it is a waste of vCores)
Thank you for all replies (and I hope it makes sense what I said ;/ )
This is more related to Cloudhub Architecture. Please refer to Cloudhub Architecture and cloudhub fabric and features for the details.
(As per Mule documentation)
CloudHub Workers
Applications on CloudHub are run by one or more instances of Mule, called workers. These have the following characteristics:
Capacity: Each worker has a specific amount of capacity to process data, you can select the size of your workers when configuring an application.
Isolation: Each worker runs in a separate container from every other application.
Manageability: Each worker is deployed and monitored independently.
Locality: Each worker runs in a specific worker cloud, the US, EU, Asia-Pacific, etc.
Each worker is a dedicated instance of Mule that runs your integration application. Workers may have a different memory capacity and processing power depending on how you configure them at application level. Workers can be scaled vertically by selecting one of the available worker sizes:
Worker Sizes:
0.1 vCores + 500 MB Heap Memory
0.2 vCores + 1 GB Heap Memory
1 vCores + 1.5 GB Heap Memory
2 vCores + 3.5 GB Heap Memory
4 vCores + 7.5 GB Heap Memory
8 vCores + 15 GB Heap Memory
16 vCores + 32 GB Heap Memory

How to increase hive concurrent mappers to more than 4?

Summary
When I run a simple select count(*) from table query in hive only two nodes in my large cluster are being used for mapping. I would like to use the whole cluster.
Details
I am using a somewhat large cluster (tens of nodes each more than 200 GB RAM) running hdfs and Hive 1.2.1 (IBM-12).
I have a table of several billion rows. When I perform a simple
select count(*) from mytable;
hive creates hundreds of map tasks, but only 4 are running simultaneously.
This means that my cluster is mostly idle during the query which seems wasteful. I have tried ssh'ing to the nodes in use and they are not utilizing CPU or memory fully. Our cluster is backed by Infiniband networking and Isilon file storage neither of which seems very loaded at all.
We are using mapreduce as the engine. I have tried removing any limits to resources that I could find, but it does not change the fact that only two nodes are being used (4 concurrent mappers).
The memory settings are as follows:
yarn.nodemanager.resource.memory-mb 188928 MB
yarn.scheduler.minimum-allocation-mb 20992 MB
yarn.scheduler.maximum-allocation-mb 188928 MB
yarn.app.mapreduce.am.resource.mb 20992 MB
mapreduce.map.memory.mb 20992 MB
mapreduce.reduce.memory.mb 20992 MB
and we are running on 41 nodes. By my calculation I should be able to get 41*188928/20992 = 369 map/reduce tasks. Instead I get 4.
Vcore settings:
yarn.nodemanager.resource.cpu-vcores 24
yarn.scheduler.minimum-allocation-vcores 1
yarn.scheduler.maximum-allocation-vcores 24
yarn.app.mapreduce.am.resource.cpu-vcores 1
mapreduce.map.cpu.vcores 1
mapreduce.reduce.cpu.vcores 1
Is there are way to get hive/mapreduce to use more of my cluster?
How would a go about figuring out the bottle neck?
Could it be that Yarn is not assigning tasks fast enough?
I guess that using tez would improve performance, but I am still interested in why resources utilization is so limited (and we do not have it installed ATM).
Running parallel tasks depends on your memory setting in yarn
for example if you have 4 data nodes and your yarn memory properties are defined as below
yarn.nodemanager.resource.memory-mb 1 GB
yarn.scheduler.minimum-allocation-mb 1 GB
yarn.scheduler.maximum-allocation-mb 1 GB
yarn.app.mapreduce.am.resource.mb 1 GB
mapreduce.map.memory.mb 1 GB
mapreduce.reduce.memory.mb 1 GB
according to this setting you have 4 data nodes so total yarn.nodemanager.resource.memory-mb will be 4 GB that you can use to launch container
and since container can take 1 GB memory so it means at any given point of time you can launch 4 container , one will be used by application master so you can have maximum 3 mapper or reducer tasks can ran at any given point of time since application master,mapper and reducer each is using 1 GB memory
so you need to increase yarn.nodemanager.resource.memory-mb to increase the number of map/reduce task
P.S. - Here we are taking about maximum tasks that can be launched,it may be some less than that also

Yarn: How to utilize full cluster resources?

So I am having a cloudera cluster with 7 worker nodes.
30GB RAM
4 vCPUs
Here are some of my configurations which I found important (from Google) in tuning performance of my cluster. I am running with:
yarn.nodemanager.resource.cpu-vcores => 4
yarn.nodemanager.resource.memory-mb => 17GB (Rest reserved for OS and other processes)
mapreduce.map.memory.mb => 2GB
mapreduce.reduce.memory.mb => 2GB
Running nproc => 4 (Number of processing units available)
Now my concern is, when I look at my ResourceManager, I see Available Memory as 119 GB which is fine. But when I run a heavy sqoop job and my cluster is at its peak it uses only ~59 GB of memory, leaving ~60 GB memory unused.
One way which I see, can fix this unused memory issue is increasing map|reduce.memory to 4 GB so that we can use upto 16 GB per node.
Other way is to increase the number of containers, which I am not sure how.
4 cores x 7 nodes = 28 possible containers. 3 being used by other processes, only 5 are currently being available for sqoop job.
What should be the right config to improve cluster performance in this case. Can I increase the number of containers, say 2 containers per core. And is it recommended?
Any help or suggestions on the cluster configuration would be highly appreciated. Thanks.
If your input data is in 26 splits, YARN will create 26 mappers to process those splits in parallel.
If you have 7 nodes with 2 GB mappers for 26 splits, the repartition should be something like:
Node1 : 4 mappers => 8 GB
Node2 : 4 mappers => 8 GB
Node3 : 4 mappers => 8 GB
Node4 : 4 mappers => 8 GB
Node5 : 4 mappers => 8 GB
Node6 : 3 mappers => 6 GB
Node7 : 3 mappers => 6 GB
Total : 26 mappers => 52 GB
So the total memory used in your map reduce job if all mappers are running at the same time will be 26x2=52 GB. Maybe if you add the memory user by the reducer(s) and the ApplicationMaster container, you can reach your 59 GB at some point, as you said ..
If this is the behaviour you are witnessing, and the job is finished after those 26 mappers, then there is nothing wrong. You only need around 60 GB to complete your job by spreading tasks across all your nodes without needing to wait for container slots to free themselves. The other free 60 GB are just waiting around, because you don't need them. Increasing heap size just to use all the memory won't necessarily improve performance.
Edited:
However, if you still have lots of mappers waiting to be scheduled, then maybe its because your installation insconfigured to calculate container allocation using vcores as well. This is not the default in Apache Hadoop but can be configured:
yarn.scheduler.capacity.resource-calculator :
The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. org.apache.hadoop.yarn.util.resource.DefaultResourseCalculator only uses Memory while DominantResourceCalculator uses Dominant-resource to compare multi-dimensional resources such as Memory, CPU etc. A Java ResourceCalculator class name is expected.
Since you defined yarn.nodemanager.resource.cpu-vcores to 4, and since each mapper uses 1 vcore by default, you can only run 4 mappers per node at a time.
In that case you can double your value of yarn.nodemanager.resource.cpu-vcores to 8. Its just an arbitrary value it should double the number of mappers.

Swapping of the RAM while Shuffling in HADOOP

I work with hadoop 1.1.1. My project is processing more than 6000 documents. My cluster contains 2 nodes: master(CPU:COREi7, RAM:6G) and slave(CPU:COREi3, RAM:12G). The number of MAPPER is 16. When I assign the number of REDUCER more than 1(e.i. 2,...,16) at the phase of shuffling the RAM begins to SWAP and this causes a significant reduction on my system speed.
How can I stop the RAM from swapping?
What is kept in RAM in the process between MAP and REDUCE?
Is there any reference?
Thanks a lot.
So on the master:
6G physical RAM;
2G allocated per process;
8 mappers and 8 reducers can run concurrently;
8x2 + 8x2, 32G memory required if all tasks are maxed out - over 5x your physical amount.
On the slave:
12G physical RAM;
2G allocated per task;
4 mappers, 4 reducers;
4x2 + 4x2, 16G memory required - 50% more than physical.
Now if you're only running a single job at a time, you can set the slowstart configuration property to 1.0 to ensure that the mappers and reducers don't run concurrently and that will help, but you're still maxed out on the master.
I suggest you reduce either the memory allocation to 1G (if you really want that many map / reduce slots on each node), or reduce the maximum number of tasks for both nodes, such that you're closer to the physical amount (if running maxed out).

How to decide on number of parallel mapers/reducers along with Heap memory?

Say I have a EMR job running on 11 node cluster: m1.small master node while 10 m1.xlarge slave nodes.
Now one m1.xlarge node has 15 GB of RAM.
How to then decide on the number of parallel mappers and reducers which can be set?
My jobs are memory intensive and I would like to have more and more of heap allotted to JVM.
Another related question:
If we set the following parameter:
<property><name>mapred.child.java.opts</name><value>-Xmx4096m</value></property>
<property><name>mapred.job.reuse.jvm.num.tasks</name><value>1</value></property>
<property><name>mapred.tasktracker.map.tasks.maximum</name><value>2</value></property>
<property><name>mapred.tasktracker.reduce.tasks.maximum</name><value>2</value></property>
So will this 4GB be shared by 4 processes (2 mapper and 2 reducer) or will they all get 4GB each?
They will each get 4gb.
You should check what your heap setting is for the task trackers and the data nodes, then you'll have an idea of how much memory you have left over to allocate to children (the actual mappers / reducers).
Then it's just a balancing act. If you need more memory, you'll want less mappers / reducers, and vice versa.
Also try to keep in mind how many cores your CPU has, you don't want 100 map tasks on a single core. To tweak, it's best to monitor both heap usage and cpu utilization over time so you can fiddle with the knobs.

Resources