What are the resource requirements to run Logstash in a k8s pod? - elasticsearch

I was noticing that running a ELK stack on a Raspberry Pi running a Kubernetes Cluster. I noticed that it didnt have the resources to run all three containers. I was looking up that with Kubernetes you can put limits and requests on your resources CPU and Memory, and it got me thinking. What are the minimum requirements? To me, applications are greedy, so is there a way to cut down the requirements for Logstash, to emphasize resources for Elasticsearch?
Right now, I am running a Raspberry Pi 4, 4g RAM, 32G disk.
If I can put min and max requirements on the container it will better allow me manage the resources. The think though that I noticed is that there was no insight from what I can tell as to minimum requirements for the different containers.
https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-managing-compute-resources.html
The above link i believe tells me that the CPU consumption is greedy, but the default MEMORY for Elastic and Kibana 2Gi and 1Gi respectively. It mentioned nothing about logstash though, and whether or not there is a Minimum requirement for CPUs.
I wasnt sure if I should set each ELK container to 1CPU, 1Gi RAM, and I can try it to see if it functions, but since the concept of it throttling down makes me curious what the happy medium would be.

Logstash is not part of the Elastic Cloud, that is why there is no mention of it in the Elastic Cloud on Kubernetes documentation link that you shared.
Logstash is way more CPU bound than memory bound, but how much memory does it needs is completely dependent on your pipelines.
In Logstash the memory depends on the pipelines, the batch size, the filters used, the number of events per seconds, the queue type etc. If you are running a dev or lab environment I think that you can try to give Logstash 1 CPU and 512 MB of RAM and see if it feets your use case.
But I would say that 4GB is pretty small for a full stack since you need to have memory for the applications and still have some memory left for the sytems.

Related

k8s tasks slowdown with no excess CPU or RAM usage

I have a small virtualised k8s cluster running on top of KVM on 2 physical machines. After deploying Ceph (a storage framework), all the k8s tasks like creating containers or starting containers became insufferably slow, like taking over a minute to get from creating to starting a container.
I checked the nodes for excess CPU or RAM usages, both work nodes and the master node is well below consuming half the assigned resources. I have about 10-20 pods running on each node at the moment.
I am not sure what to google and given my level of k8s knowledge am completely out of ideas. Anyone with similar experience or could point me in the right direction would be much appreciated!

Creating Elasticsearch cluster from three servers

We have three physical servers. Each server has 2 CPUs (32 cores), 96 TB HDD, and 768 GB RAM. We would like to use these servers in an Elasticsearch cluster.
Each server will be located in a different data center, connecting each server using a private connection.
How can be optimize our configuration for high performance? Also, how should we best run Elasticsearch on these machines. For example, should we use virtualization to create multiple nodes per machine, or not?
As you have huge RAM(768) available on each physical server and according to ES documentation on heap setting it shouldn't cross 32 GB, so you will have to use virtualization to create multiple nodes per physical server for better ultization of your infra.
Apart from these there are various cluster settings and node settings which you can optimize but as you have not provided them, its difficult to provide recommendation on them.
Another thing to note is that you have huge RAM and disk but CPU is not in proportion to it, so if you can increase them as well, it would be good.

What is the relationship between Elasticsearh ES_Java_Opts and Kubernetes Resource Limits

So i have a Elasticsearch Cluster inside the Kubernetes.
The machine it is running on has 30 GB RAM and 8 cores.
Now according to the thumb rule 50% of the RAM is what we set as ES_JAVA_OPTS and remaining is used for file caching.
here it would be 15 GB
Also in the helm chart we have resource requirements mentioned like below:
resources:
limits:
cpu: 8
memory: 15Gi
requests:
cpu: 8
memory: 15Gi
My question is whether the 50% RAM is of the host machine (Which is 30 GB) or the limit specified in the helm chart 15 GB
Can someone explain how in kubernetes utilise the RAM
Because if it with respect to Host and file caching is not considered as the utilisation of Deployed Application we are OK. But if it within the Resources Limits i need to increase the to 30GB.
Edit:
The question here is that if one elasticsearch node used 50% of RAM as Heap and 50% as file caching and i mention the Heap as 15GB (50% of the RAM) in a 30GB machine. so should i mention the resoure limitations in the deployment template as somewhere around 15GB which Heap requires of need 30GB (Say 28GB) that from the rule Elasticsearch need to be able to cache files.
This comes as concern as if pod exceed the mentioned limit on the template at any given moment kubernetes restart the pod.
So in other words i want to know the RAM file caching is come into play in the overall memory usage of the pod or not.
Note: I am using instance storage as primary Storage of the ES Data as this is extremely fast as compare to EBS.
Conclusion:
Keep Heap half to the RAM in the system and Mentioned in the resources Limit(if any)
I am not a expert in k8s and docker but what I understand is that, docker container uses the host resources and using resource limit you can have a hard limit on the resources it can consume.
If you put a resource limit of 15GB, than overall your docker container can consume 15GB of host RAM.now whether it will share the file system cache with host or not depends on how you have configured your docker volume.
As docker container have the option to share the file system with host using the bind volume or have its own data volume(which is ephemeral and not suited for ES as its a stateful application). in first option it should share the file system cache with host and you should not increase the resource limit further(recommended as you have ES which is stateful) and in second option, as it will use its own file system you have to allocate RAM for its file system cache and have to increase RAM to 30 GB, but you have to give some space for Host OS as well.
Container will always see the node`s memory instead of the container one. In Kubernertes, even though you set a limit for the memory to a container, the container itself is not aware of this limit.
This has an effect on the applications that looks up for the memory available on the system and use that information to decide how memory it wants to reserve.
This is why you setup the JVM heap size. Without this specified the JVM will setup the maximum heap size based on the host/node total memory instead of the one available (that you`ve declared as limit) to the container.
Check out this article about how limits works in k8s.

Running Hadoop in virtual environment

I would like to know whether I should expect problems when having Hadoop cluster on virtual instead of physical machines?
I'm mostly worried about using the same hard drive, I read that I should count for 1-2 containers per drive,but in my case only one drive will exist. Could that be a problem?
I think it depends upon how much size are you allocating for containers. Of course there would be limitation to number of containers if you have restriction to the memory.
I can highlight few points which can be considered while running hadoop cluster in virtual environment:
Network configuration in case of multi node cluster
Obvious the performance of application
Affect on scalability as limited resources if you are planning to run the cluster on host which has low configuration hardware

The best memory configuration for ElasticSearch

I have one linux server with 128G memory and 32 cpu cores. I would run an ElasticSearch instance on this server, the server is exclusively only for running ES. So how many memory I should configure for ES. How could I get the best performance of ES please. Is the server too luxurious for ES? Thanks!
I suggest you run two ES instances in each server. Since your linux server pretty powerful, if you set the ES memory as 60g or 80g it may encounter GC problem. Try to run two or three ES instances in one server and monitor the CPU and Memory usage, btw, change the http port of ES for running multiple nodes in one server.

Resources