I would like to get data about the APIs.
More specifically: how many vCores are assigned and what is the load on those vCores. So for example, if I assign 1 vCore to an API but it is barely used, that would be a waste.
So I want to build an API that would get me this data and transform it into a suitable format. Can someone tell me, how can I get this data about APIs and if that is even possible to do.
so what I want to return is something like this:
api-name, vCoreUsed, Load%
appOne, 2, 50%
(What I mean by load: so if the load is 100% it means that all vCores are used and the service might be slow and if it is 10% than it is a waste of vCores)
Thank you for all replies (and I hope it makes sense what I said ;/ )
This is more related to Cloudhub Architecture. Please refer to Cloudhub Architecture and cloudhub fabric and features for the details.
(As per Mule documentation)
CloudHub Workers
Applications on CloudHub are run by one or more instances of Mule, called workers. These have the following characteristics:
Capacity: Each worker has a specific amount of capacity to process data, you can select the size of your workers when configuring an application.
Isolation: Each worker runs in a separate container from every other application.
Manageability: Each worker is deployed and monitored independently.
Locality: Each worker runs in a specific worker cloud, the US, EU, Asia-Pacific, etc.
Each worker is a dedicated instance of Mule that runs your integration application. Workers may have a different memory capacity and processing power depending on how you configure them at application level. Workers can be scaled vertically by selecting one of the available worker sizes:
Worker Sizes:
0.1 vCores + 500 MB Heap Memory
0.2 vCores + 1 GB Heap Memory
1 vCores + 1.5 GB Heap Memory
2 vCores + 3.5 GB Heap Memory
4 vCores + 7.5 GB Heap Memory
8 vCores + 15 GB Heap Memory
16 vCores + 32 GB Heap Memory
Related
Current Setup
we have our 10 node discovery cluster.
Each node of this cluster has 24 cores and 264 GB ram Keeping some memory and CPU aside for background processes, we are planning to use 240 GB memory.
now, when it comes to container set up, as each container may need 1 core, so max we can have 24 containers, each with 10GB memory.
Usually clusters have containers with 1-2 GB memory but we are restricted with the available cores we have with us or maybe I am missing something
Problem statement
as our cluster is extensively used by data scientists and analysts, having just 24 containers does not suffice. This leads to heavy resource contention.
Is there any way we can increase number of containers?
Options we are considering
If we ask the team to run many tez queries (not separately) but in a file, then at max we will keep one container.
Requests
Is there any other way possible to manage our discovery cluster.
Is there any possibility of reducing container size.
can a vcore (as it's a logical concept) be shared by multiple containers?
Vcores are just a logical unit and not in anyway related to a CPU core unless you are using YARN with CGroups and have yarn.nodemanager.resource.percentage-physical-cpu-limit enabled. Most tasks are rarely CPU-bound but more typically network I/O bound. So if you were to look at your cluster's overall CPU utilization and memory utilization, you should be able to resize your containers based on the wasted (spare) capacity.
You can measure utilization with a host of tools but sar, ganglia and grafana are the obvious ones but you can also look at Brendan Gregg's Linux Performance tools for more ideas.
I have a 10 node cluster with each machine of 8 GB RAM and when I run my topology, the assigned memory is always proportional to the number of workers. And each worker is approximately taking a 1 GB of Memory. I want to allot 2 GB to each worker I tried to set in storm.yaml as worker.childopts: "-Xmx6g -Xms6g" since I am running three workers in each node. But the assigned memory decreased to below 1 GB.
How to tune my topology better?
I am getting the following error in one of my bolts
java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.kafka.common.memory.MemoryPo
You can reference this link
https://github.com/apache/storm/blob/master/conf/defaults.yaml
I config 2 params:
worker.heap.memory.mb: 768
supervisor.memory.capacity.mb: 4096.0
If you have 10 workers you have to config supervisor.memory.capacity.mb = (768 x 10) It's capacity of supervisor
I think your configuration is slightly wrong. The worker.childopts setting is passed to each worker JVM, so when you set -Xmx6g -Xms6g you are giving each of your three workers 6 gigs of memory (18 total for the node)
I have dynamic allocation enabled on a 10 worker node cluster. I am running tests to better understand how spark manages memory.
Each worker node has 8 vCores and 12g of available memory according to the Spark Resource Manager. I decided to turn down the spark.executor.memory to 1g and set the number of executor cores at 4.
From this configuration I would expect each worker node to get 2 executors since 8 vCores / 4 executor cores = 2 executors.
However, I get the following distribution from the Resource Manager:
Resources used per node
As you can see different nodes use different numbers of executors from 2 all the way to 8, totalling 39 executors. How can you reliably predict how many executors Spark will give you when dynamic allocation is enabled?
i have to reduce ram size of virtual box from 4 gb to 1 gb .I had tried for reducing it But it is unchangable so please suggest ways to do it in right manner . I am attaching screenshot .
The same error had occured when i had tried for hadoop , now you can use these things .
Configuring YARN
In a Hadoop cluster, it’s vital to balance the usage of RAM, CPU and disk so that processing is not constrained by any one of these cluster resources. As a general recommendation, we’ve found that allowing for 1-2 Containers per disk and per core gives the best balance for cluster utilization. So with our example cluster node with 12 disks and 12 cores, we will allow for 20 maximum Containers to be allocated to each node.
Each machine in our cluster has 48 GB of RAM. Some of this RAM should be reserved for Operating System usage. On each node, we’ll assign 40 GB RAM for YARN to use and keep 8 GB for the Operating System. The following property sets the maximum memory YARN can utilize on the node:
In yarn-site.xml
<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value>
The next step is to provide YARN guidance on how to break up the total resources available into Containers. You do this by specifying the minimum unit of RAM to allocate for a Container. We want to allow for a maximum of 20 Containers, and thus need (40 GB total RAM) / (20 # of Containers) = 2 GB minimum per container:
In yarn-site.xml
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
YARN will allocate Containers with RAM amounts greater than the yarn.scheduler.minimum-allocation-mb.
For more information you can visit hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
Say I have a EMR job running on 11 node cluster: m1.small master node while 10 m1.xlarge slave nodes.
Now one m1.xlarge node has 15 GB of RAM.
How to then decide on the number of parallel mappers and reducers which can be set?
My jobs are memory intensive and I would like to have more and more of heap allotted to JVM.
Another related question:
If we set the following parameter:
<property><name>mapred.child.java.opts</name><value>-Xmx4096m</value></property>
<property><name>mapred.job.reuse.jvm.num.tasks</name><value>1</value></property>
<property><name>mapred.tasktracker.map.tasks.maximum</name><value>2</value></property>
<property><name>mapred.tasktracker.reduce.tasks.maximum</name><value>2</value></property>
So will this 4GB be shared by 4 processes (2 mapper and 2 reducer) or will they all get 4GB each?
They will each get 4gb.
You should check what your heap setting is for the task trackers and the data nodes, then you'll have an idea of how much memory you have left over to allocate to children (the actual mappers / reducers).
Then it's just a balancing act. If you need more memory, you'll want less mappers / reducers, and vice versa.
Also try to keep in mind how many cores your CPU has, you don't want 100 map tasks on a single core. To tweak, it's best to monitor both heap usage and cpu utilization over time so you can fiddle with the knobs.