memory management for elasticsearch - elasticsearch

I am trying to calculate good balance of total memory in three node es cluster.
If I have three node e.s cluster each with 32G memory, 8 vcpu. Which combination would be more suitable for balancing memory between all the components? I know there will be no fixed answers but just trying to get as accurate as I can.
different elasticsearch components will be used are beats (filebeat, metricbeat,heartbeat), logstash, elasticsearch, kibana.
most use case for this cluster will be, application logs getting indexed and running query on them like fetch average response time for 7 days,30 days, how many are different status codes for last 24 hrs, 7 days etc through curl calls, so aggregation will be used and other use case is monitoring, seeing logs through kibana but no ML jobs or dashboard creation etc..
After going through below official docs, its recommended to set heap size as below,
logstash -
https://www.elastic.co/guide/en/logstash/current/jvm-settings.html#heap-size
The recommended heap size for typical ingestion scenarios should be no less than 4GB and no more than 8GB.
elasticsearch -
https://www.elastic.co/guide/en/elasticsearch/reference/current/advanced-configuration.html#set-jvm-heap-size
Set Xms and Xmx to no more than 50% of your total memory. Elasticsearch requires memory for purposes other than the JVM heap
Kibana -
I have't found default or recommended memory for kibana but in our test cluster of single node of 8G memory it is taking 1.4G as total (256 MB/1.4 GB)
beats -
not found what is the default or recommended memory for beats but they will also consume more or less.
What should the ideal combination from below?
32G = 16G for OS + 16G for Elasticsearch heap.
for logstash 4G from 16G of OS, say three beats will consume 4G, kibana 2G
this leaves OS with 6G and if any new component has to be install in future like say APM or any other OS related then they all will have only 6G with OS.
Above is, per offical recommendation for all components. (i.e 50% for OS and 50% for es)
32G = 8G for elasticsearch heap. (25% for elasticsearch)
4G for logstash + beats 4G + kibana 2G
this leaves 14G for OS and for any future component.
I am missing to cover something that can change this memory combination?
Any suggestion by changing in above combination or any new combination is appreciated.
Thanks,

Related

Elasticsearch and Fluentd optimisation for log cluster

we are using Elasticsearch and Fluentd for Central logging platform. below is our Config details:
Elasticsearch Cluster:
Master Nodes: 64Gb Ram, 8 CPU, 9 instances
Data Nodes: 64Gb Ram, 8 CPU, 40 instances
Coordinator Nodes: 64Gb Ram, 8Cpu, 20 instances
Fluentd: at any given time we have around 1000+ fluentd instances writing logs to Elasticsearch coordinator nodes.
and on daily basis we create around 700-800 indices and which total to 4K shards on daily basis. and we keep maximum 40K shards on cluster.
we started facing performance issue on Fluentd side, where fluentd instances fails to write logs. common issues are :
1. read time out
2. request time out
3. {"time":"2021-07-02","level":"warn","message":"failed to flush the buffer. retry_time=9 next_retry_seconds=2021-07-02 07:23:08 265795215088800420057/274877906944000000000 +0000 chunk=\"5c61e5fa4909c276a58b2efd158b832d\" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error=\"could not push logs to Elasticsearch cluster ({:host=>\\\"logs-es-data.internal.tech\\\", :port=>9200, :scheme=>\\\"http\\\"}): [429] {\\\"error\\\":{\\\"root_cause\\\":[{\\\"type\\\":\\\"circuit_breaking_exception\\\",\\\"reason\\\":\\\"[parent] Data too large, data for [<http_request>] would be [32274168710/30gb], which is larger than the limit of [31621696716/29.4gb], real usage: [32268504992/30gb], new bytes reserved: [5663718/5.4mb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=17598408008/16.3gb, model_inference=0/0b, accounting=0/0b]\\\",\\\"bytes_wanted\\\":32274168710,\\\"bytes_limit\\\":31621696716,\\\"durability\\\":\\\"TRANSIENT\\\"}],\\\"type\\\":\\\"circuit_breaking_exception\\\",\\\"reason\\\":\\\"[parent] Data too large, data for [<http_request>] would be [32274168710/30gb], which is larger than the limit of [31621696716/29.4gb], real usage: [32268504992/30gb], new bytes reserved: [5663718/5.4mb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=17598408008/16.3gb, model_inference=0/0b, accounting=0/0b]\\\",\\\"bytes_wanted\\\":32274168710,\\\"bytes_limit\\\":31621696716,\\\"durability\\\":\\\"TRANSIENT\\\"},\\\"status\\\":429}\"","worker_id":0}
looking for guidance on this, how we can optimise our Logs cluster?
Well, by the looks of it, you have exhausted your parent circuit breaker limit of 95% of Heap Memory.
The error you mentioned has been mentioned in the elasticsearch docs -
[1]: https://www.elastic.co/guide/en/elasticsearch/reference/current/fix-common-cluster-issues.html#diagnose-circuit-breaker-errors
. The page also refers to a few steps you can take to Reduce JVM memory pressure, which can be helpful to reduce this error.
You can also try increasing this limit to 98%, using the dynamic command -
PUT /_cluster/settings
{
"persistent" : {
"indices.breaker.total.limit" : "98%"
}
}
But I would suggest this be performance tested before applying in production.
Since your request is 30GB, which is a bit too much, for a more reliable solution, I would suggest increasing your log scrapers frequency, so that it makes more frequent posts to ES with smaller-sized data blocks.

High CPU usage on elasticsearch nodes

we have been using a 3 node Elasticsearch(7.6v) cluster running in docker container. I have been experiencing very high cpu usage on 2 nodes(97%) and moderate CPU load on the other node(55%). Hardware used are m5 xlarge servers.
There are 5 indices with 6 shards and 1 replica. The update operations take around 10 seconds even for updating a single field. similar case is with delete. however querying is quite fast. Is this because of high CPU load?
2 out of 5 indices, continuously undergo a update and write operations as they listen from a kafka stream. size of the indices are 15GB, 2Gb and the rest are around 100MB.
You need to provide more information to find the root cause:
All the ES nodes are running on different docker containers on the same host or different host?
Do you have resource limit on your ES docker containers?
How much heap size of ES and is it 50% of host machine RAM?
Node which have high CPU, holds the 2 write heavy indices which you mentioned?
what is the refresh interval of your indices which receives high indexing requests.
what is the segment size of your 15 GB indices, use https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-segments.html to get this info.
What all you have debugged so far and is there is any interesting info you want to share to find the issue?

how to limit memory usage of elasticsearch in ubuntu 17.10?

My elasticsearch service is consuming around 1 gb.
My total memory is 2gb. The elasticsearch service keeps getting shut down. I guess the reason is because of the high memory consumption. How can i limit the usage to just 512 MB?
This is the memory before starting elastic search
After running sudo service elasticsearch start the memory consumption jumps
I appreciate any help! Thanks!
From the official doc
The default installation of Elasticsearch is configured with a 1 GB heap. For just about every deployment, this number is usually too small. If you are using the default heap values, your cluster is probably configured incorrectly.
So you can change it like this
There are two ways to change the heap size in Elasticsearch. The easiest is to set an environment variable called ES_HEAP_SIZE. When the server process starts, it will read this environment variable and set the heap accordingly. As an example, you can set it via the command line as follows: export ES_HEAP_SIZE=512m
But it's not recommended. You just can't run an Elasticsearch in the optimal way with so few RAM available.

Elasticsearch config tweaking with limited memory

I have following scenario:
A single machine with 32GB of ram runs Elasticsearch 2.4, there is one index with 5 shards that is 25gb in size.
On that index we are constantly indexing new data, plus doing full-text search queries that check about 95% documents - no aggregations. The instance generates a lot of CPU load - there is no swapping.
My question is: how should I tweak elasticsearch memory usage? (I don't have an option to add another machine at this moment)
Should I assign more memory to ES HEAP like 25GB (going over 50% memory that readme advises to not do do), or should I assign minimal HEAP like 1GB-2GB and assume Lucene will cache all the index in memory since its full-text searches?
Right now 50% of server memory so 16GB in this case seems to work best for us.

Slow index speed of Elasticsearch

We deployed ES 2.0 on 3 EC2 c4.4xlarge(16 cores, 32gb memory) nodes, allocating 16G for ES, attached 500GB with io1/4000 IOPS on each.
Problem : We are expecting great performance from this hardware config, however a very slow indexing speed is observed.
Our document is about 10-50k in size, we are using Java transport client to insert. The speed was alright for the first 50,000 at roughly 1000/second, and dramatically slow down to 100-200/second.
In the meanwhile we are looking at the low resource consumption:
CPU is about 1-20% only (16 Core CPU)
IO write is about 4-10Mb/second only
Memory consumption is about 20-30% only
Requirements :So I cannot understand why it is so slow while all the recourses are so free, what can I do to enhance the efficiency? Thanks.
Here is the config file we are using:
cluster.name: {{ env }}-{{ app }}
path.data: /data/es
path.logs: /data/es-logs
network.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: ["xxxx"]
bootstrap.mlockall: true
threadpool.search.queue_size: 300
threadpool.index.type: fixed
threadpool.index.size: 16
threadpool.index.queue_size: 250000
index.refresh_interval: 1s
index.translog.flush_threshold_ops: 50000
indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb
script.inline: on
script.indexed: on
http.cors.enabled: true
http.cors.allow-origin: /https?:\/\/localhost(:[0-9]+)?/
Here is htop and iostat while running the job:
Upgrade your ES to latest version, because in recent releases they have made it more production friendly and most stable release now is the latest one 2.3
You can try following things to make indexing go faster:
Make some master nodes, separate from Data nodes as it will reduce load on all your cluster.
Disable OS swapping, ES takes care of that and Check your heap size on all your machines Heap Sizing
Check your documents are of similar size always, you can make use of bulk indexing and tweak you settings in there like chunk_size in number of records or in memory size
If you are using script try to optimize that as they make the indexing slow, you can store the scripted value if possible as preprocessing, as ES is not designed to handle scripting.
Check number of shards per node and try to balance that out across nodes using Routing
Read more on how ES guys suggest production ready system to work Elasticsearch in Production
One more blog on increasing Elasticsearch Indexing performance Performance Considerations for Elasticsearch Indexing
Check this answer for optimal way to setup ELK Stack on three servers. Optimal way to set up ELK stack on three servers

Resources