How to modify the size of RAM requirement of aurora task in Heron cluster deployed on Aurora scheduler? - mesos

I deployed Heron cluster using aurora-scheduler and Mesos. And when I ran the default WordCountTopology using this cluster, I found the ram demand of aurora task is 4G. However, the WordCountToplogy's configuration as follows:
componentRam: 1G
containerRamRequested: 1G
containerCpuRequested: 2 cores
containerDiskRequeted: 2G
Aurora task.json content is:
It shows that this task of aurora needs 4g ram resources. But I don't know why it requests 4G ram. And how to modified this ram requirement?
In addition, there are two slave hosts in my heron cluster and these host resources is:

In addition to the Ram requested by topology's components, there are some additional resources(cpu, memory) requested for heron's daemon processes,e.g stream-manager. Packing additional CPU in RR
A second cause for the larger resource request is due to Aurora only allows homogeneous containers. The packing algorithm will pick maximum container resources as the resource request for all the containers. For example, if a topology has two containers: one requests 2 cpus and the other requests 3 cpus. Then the eventually all containers will request 3 cpus.

Related

Creating Elasticsearch cluster from three servers

We have three physical servers. Each server has 2 CPUs (32 cores), 96 TB HDD, and 768 GB RAM. We would like to use these servers in an Elasticsearch cluster.
Each server will be located in a different data center, connecting each server using a private connection.
How can be optimize our configuration for high performance? Also, how should we best run Elasticsearch on these machines. For example, should we use virtualization to create multiple nodes per machine, or not?
As you have huge RAM(768) available on each physical server and according to ES documentation on heap setting it shouldn't cross 32 GB, so you will have to use virtualization to create multiple nodes per physical server for better ultization of your infra.
Apart from these there are various cluster settings and node settings which you can optimize but as you have not provided them, its difficult to provide recommendation on them.
Another thing to note is that you have huge RAM and disk but CPU is not in proportion to it, so if you can increase them as well, it would be good.

What are the resource requirements to run Logstash in a k8s pod?

I was noticing that running a ELK stack on a Raspberry Pi running a Kubernetes Cluster. I noticed that it didnt have the resources to run all three containers. I was looking up that with Kubernetes you can put limits and requests on your resources CPU and Memory, and it got me thinking. What are the minimum requirements? To me, applications are greedy, so is there a way to cut down the requirements for Logstash, to emphasize resources for Elasticsearch?
Right now, I am running a Raspberry Pi 4, 4g RAM, 32G disk.
If I can put min and max requirements on the container it will better allow me manage the resources. The think though that I noticed is that there was no insight from what I can tell as to minimum requirements for the different containers.
https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-managing-compute-resources.html
The above link i believe tells me that the CPU consumption is greedy, but the default MEMORY for Elastic and Kibana 2Gi and 1Gi respectively. It mentioned nothing about logstash though, and whether or not there is a Minimum requirement for CPUs.
I wasnt sure if I should set each ELK container to 1CPU, 1Gi RAM, and I can try it to see if it functions, but since the concept of it throttling down makes me curious what the happy medium would be.
Logstash is not part of the Elastic Cloud, that is why there is no mention of it in the Elastic Cloud on Kubernetes documentation link that you shared.
Logstash is way more CPU bound than memory bound, but how much memory does it needs is completely dependent on your pipelines.
In Logstash the memory depends on the pipelines, the batch size, the filters used, the number of events per seconds, the queue type etc. If you are running a dev or lab environment I think that you can try to give Logstash 1 CPU and 512 MB of RAM and see if it feets your use case.
But I would say that 4GB is pretty small for a full stack since you need to have memory for the applications and still have some memory left for the sytems.

What is the relationship between Elasticsearh ES_Java_Opts and Kubernetes Resource Limits

So i have a Elasticsearch Cluster inside the Kubernetes.
The machine it is running on has 30 GB RAM and 8 cores.
Now according to the thumb rule 50% of the RAM is what we set as ES_JAVA_OPTS and remaining is used for file caching.
here it would be 15 GB
Also in the helm chart we have resource requirements mentioned like below:
resources:
limits:
cpu: 8
memory: 15Gi
requests:
cpu: 8
memory: 15Gi
My question is whether the 50% RAM is of the host machine (Which is 30 GB) or the limit specified in the helm chart 15 GB
Can someone explain how in kubernetes utilise the RAM
Because if it with respect to Host and file caching is not considered as the utilisation of Deployed Application we are OK. But if it within the Resources Limits i need to increase the to 30GB.
Edit:
The question here is that if one elasticsearch node used 50% of RAM as Heap and 50% as file caching and i mention the Heap as 15GB (50% of the RAM) in a 30GB machine. so should i mention the resoure limitations in the deployment template as somewhere around 15GB which Heap requires of need 30GB (Say 28GB) that from the rule Elasticsearch need to be able to cache files.
This comes as concern as if pod exceed the mentioned limit on the template at any given moment kubernetes restart the pod.
So in other words i want to know the RAM file caching is come into play in the overall memory usage of the pod or not.
Note: I am using instance storage as primary Storage of the ES Data as this is extremely fast as compare to EBS.
Conclusion:
Keep Heap half to the RAM in the system and Mentioned in the resources Limit(if any)
I am not a expert in k8s and docker but what I understand is that, docker container uses the host resources and using resource limit you can have a hard limit on the resources it can consume.
If you put a resource limit of 15GB, than overall your docker container can consume 15GB of host RAM.now whether it will share the file system cache with host or not depends on how you have configured your docker volume.
As docker container have the option to share the file system with host using the bind volume or have its own data volume(which is ephemeral and not suited for ES as its a stateful application). in first option it should share the file system cache with host and you should not increase the resource limit further(recommended as you have ES which is stateful) and in second option, as it will use its own file system you have to allocate RAM for its file system cache and have to increase RAM to 30 GB, but you have to give some space for Host OS as well.
Container will always see the node`s memory instead of the container one. In Kubernertes, even though you set a limit for the memory to a container, the container itself is not aware of this limit.
This has an effect on the applications that looks up for the memory available on the system and use that information to decide how memory it wants to reserve.
This is why you setup the JVM heap size. Without this specified the JVM will setup the maximum heap size based on the host/node total memory instead of the one available (that you`ve declared as limit) to the container.
Check out this article about how limits works in k8s.

Does Mesos really treat all your resources as a single pool?

Mesos is advertised as a system that lets you program against your datacenter like it's a single pool of resources (See the Mesos Website). But is this really true that you don't need to consider the configuration of the individual machines? Using Mesos, can you request more resources for a task than are available on a single machine?
For example, if you have 10 machines each with 2 cores and 2g of RAM and 20g HD, can you really request 10 cores, 15g of RAM and 100g of disk space for a single task?
If so, how does this work? Is Mesos able to address memory across machines for you, and use other CPUs as local threads and create a single filesystem from a number of distributed nodes?
How does it accomplish this without suffering from the Fallacies of distributed computing, especially those related to network latency and transport cost?
According to this Mesos architecture you can't aggregate resources from different slaves (agents / machines) to use them for one task.
As you can see there is strict "taks per agent" situation
Also their example says pretty much same
Let’s walk through the events in the figure.
Agent 1 reports to the master that it has 4 CPUs and 4 GB of memory
free. The master then invokes the allocation policy module, which
tells it that framework 1 should be offered all available resources.
The master sends a resource offer describing what is available on
agent 1 to framework 1. The framework’s scheduler replies to the
master with information about two tasks to run on the agent, using <2
CPUs, 1 GB RAM> for the first task, and <1 CPUs, 2 GB RAM> for the
second task. Finally, the master sends the tasks to the agent, which
allocates appropriate resources to the framework’s executor, which in
turn launches the two tasks (depicted with dotted-line borders in the
figure). Because 1 CPU and 1 GB of RAM are still unallocated, the
allocation module may now offer them to framework 2.

resource offer showing less memory than added in mesos

I am currently exploring mesos. I have set up mesos cluster with one slave node added. The hardware added is 1 cpu-core, 2 GB RAM. but at mesos UI it is showing 1 cpu-core, and 1001 MB RAM. It is showing approximately 1GB less RAM. Can any one knows where remaining 1GB RAM is getting utilized ?
If you don't specify via resources how much RAM a Mesos Slave (now: Agent) is supposed to use the default kicks in, see the Mesos containerizer for details.

Resources