Elasticsearch indexing performance issues - elasticsearch

We are facing some performance issues with elasticsearch in the last couple of days. As you can see on the screenshot, the indexing rate has some significant drops after the index reaches a certain size. At normal speed, we index arround 3000 logs per second. When the index we write to reaches a size of about ~10 GB, the rate drops.
We are using time based indices and arround 00:00, when a new Index is created by Logstash, the rates climb again to ~3000 logs per second (thats why we think its somehow related to the size of the index).
Server stats show nothing unusal at the CPU or memory stats (they are the same during drop-phases), but one of the servers has alot of I/O waits. Our Elasticsearch config is quite standard, with some adjustments to index performance (taken from the ES guide):
# If your index is on spinning platter drives, decrease this to one
# Reference / index-modules-merge
index.merge.scheduler.max_thread_count: 1
# allows larger segments to flush and decrease merge pressure
index.refresh_interval: 5s
# increase threshold_size from default when you are > ES 1.3.2
index.translog.flush_threshold_size: 1000mb
# JVM settings
bootstrap.mlockall: true (ES_HEAP SIZE is 50% of RAM)
We use two nodes. Both with 8 GB of RAM, 2 CPU cores and 300GB HDD size (dev environment).
I already saw clusters with alot bigger indices than ours. Do you guys have any idea what we could do to fix the issues?
BR
Edit:
Just ran into the performance issues again. Top sometimes shows arround 60% wa (wait), but iotop only reports about 1000 K/s read and write at max. I have no idea where these waits are coming from.

Related

Elastic search down after several months

I have a elastic search cluster with 2 nodes running on a 2 cores CPU 8GB ram instance. Each node has the argument "ES_JAVA_OPTS=-Xms3g -Xmx3g" passed in. I have 4 indices each has 2 shards and 1 replica. After 2 months, it went down. Checked the instance monitoring, not seeing any CPU or memory spike. Disk has plenty of free space. Checked the es log. The only thing I see is
[gc][2845340] overhead, spent [339ms] collecting in the last [1s]
Any idea why?
When the garbage collector starts reporting that it spends ~30% of the time collecting, it usually means that there's not enough heap anymore.
You should increase the heap a little bit until the GC stops reporting. You can increase the heap up to half the available memory, but not more than 30GB
To this end, change the setting below, and make sure that Xms is always equals to Xmx and never goes over 30.
ES_JAVA_OPTS=-Xms4g -Xmx4g

Elasticsearch Indexing Speed Degrade with the Time

I am doing some performance tuning in elastic search for my project and I need some help in improving the elastic search indexing speed. I am using ES 5.1.1 and I have 2 nodes setup with 8 shards for the index. I have the servers for 2 nodes with 16GB RAM and 12CPUs allocated for each server with 2.2GHz clock speed. I need to index around 25,000,000 documents within 1.5 hours, which I am currently doing in around 4 hours. I have done the following config changes to improve the indexing time.
Setting ‘indices.store.throttle.type’ to ‘none’
Setting ‘refresh_interval’ to ‘-1’
Increasing ‘translog.flush_threshold_size’ to 1GB
Setting ‘number_of_replicas’ to ‘0’
Using 8 shards for the index
Setting VM Options -Xms8g -Xmx8g (Half of the RAM size)
I am using the bulk processor to generate the documents in my java application and I’m using the following configurations to setup the bulk processor.
Bulk Actions Count : 10000
Bulk Size in MB : 100
Concurrent Requests : 100
Flush Interval : 30
Initially I can index around 356167 in the first minute. But with the time, It decreases and after around 1 hour its around 121280 docs per minute.
How can I keep the indexing rate steady over the time? Is there any other ways to improve the performance?
I highly encourage not to change configuration parameters like the translog flush size, the throttling, unless you know what you are doing (and this does not mean reading some blog post on the internet :-)
Try a single shard per server and especially reduce the bulk size to something like 10MB. 100MB * 100 concurrent requests means you need 10GB of heap to deal with those (without doing anything else). I suppose not all of the documents get indexed because of your rejected tasks in your threadpools.
Start small and get bigger instead of starting big but not have any insight in your indexing.

Elasticsearch: High CPU usage by Lucene Merge Thread

I have a ES 2.4.1 cluster with 3 master and 18 data nodes which collects log data with a new index being created every day. In a day index size grows to about 2TB. Indexes older than 7 days get deleted. Very few searches are being performed on the cluster, so the main goal is to increase indexing throughput.
I see a lot of the following exceptions which is another symptom of what I am going to say next:
EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4#5a7d8a24 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor#5f9ef44f[Running, pool size = 8, active threads = 8, queued tasks = 50, completed tasks = 68888704]]];]];
The nodes in the cluster are constantly pegging CPU. I increased index refresh interval to 30s but that had little effect. When I check hot threads I see multiple "Lucene Merge Thread" per node using 100% CPU. I also noticed that segment count is constantly around 1000 per shard, which seems like a lot. The following is an example of a segment stat:
"_2zo5": {
"generation": 139541,
"num_docs": 5206661,
"deleted_docs": 123023,
"size_in_bytes": 5423948035,
"memory_in_bytes": 7393758,
"committed": true,
"search": true,
"version": "5.5.2",
"compound": false
}
Extremely high "generation" number worries me and I'd like to optimize segment creation and merge to reduce CPU load on the nodes.
Details about indexing and cluster configuration:
Each node is an i2.2xl AWS instance with 8 CPU cores and 1.6T SSD drives
Documents are indexed constantly by 6 client threads with bulk size 1000
Each index has 30 shards with 1 replica
It takes about 25 sec per batch of 1000 documents
/_cat/thread_pool?h=bulk*&v shows that bulk.completed are equally spread out across nodes
Index buffer size and transaction durability are left at default
_all is disabled, but dynamic mappings are enabled
The number of merge threads is left at default, which should be OK given that I am using SSDs
What's the best way to go about it?
Thanks!
Here are the optimizations I made to the cluster to increase indexing throughput:
Increased threadpool.bulk.queue_size to 500 because index requests were frequently overloading the queues
Increased disk watermarks, because default settings were too aggressive for the large SSDs that we were using. I set "cluster.routing.allocation.disk.watermark.low": "100gb" and "cluster.routing.allocation.disk.watermark.high": "10gb"
Deleted unused indexes to free up resources ES uses to manage their shards
Increased number of primary shards to 175 with the goal of keeping shard size under 50GB and have approximately a shard per processor
Set client index batch size to 10MB, which seemed to work very well for us because the size of documents indexed varied drastically (from KBs to MBs)
Hope this helps others
I have run similar workloads and your best bet is to run hourly indices and run optimize on older indices to keep segments in check.

Logstash/Elasticsearch/Kibana resource planning

How to plan resources (I suspect, elasticsearch instances) according to load:
With load I mean ≈500K events/min, each containing 8-10 fields.
What are the configuration knobs I should turn?
I'm new to this stack.
500,000 events per minute is 8,333 events per second, which should be pretty easy for a small cluster (3-5 machines) to handle.
The problem will come with keeping 720M daily documents open for 60 days (43B documents). If each of the 10 fields is 32 bytes, that's 13.8TB of disk space (nearly 28TB with a single replica).
For comparison, I have 5 nodes at the max (64GB of RAM, 31GB heap), with 1.2B documents consuming 1.2TB of disk space (double with a replica). This cluster could not handle the load with only 32GB of RAM per machine, but it's happy now with 64GB. This is 10 days of data for us.
Roughly, you're expecting to have 40x the number of documents consuming 10x the disk space than my cluster.
I don't have the exact numbers in front of me, but our pilot project for using doc_values is giving us something like a 90% heap savings.
If all of that math holds, and doc_values is that good, you could be OK with a similar cluster as far as actual bytes indexed were concerned. I would solicit additional information on the overhead of having so many individual documents.
We've done some amount of elasticsearch tuning, but there's probably more than could be done as well.
I would advise you to start with a handful of 64GB machines. You can add more as needed. Toss in a couple of (smaller) client nodes as the front-end for index and search requests.

How much load can cassandra handle on m1.xlarge instance?

I setup 3 nodes of Cassandra (1.2.10) cluster on 3 instances of EC2 m1.xlarge.
Based on default configuration with several guidelines included, like:
datastax_clustering_ami_2.4
not using EBS, raided 0 xfs on ephemerals instead,
commit logs on separate disk,
RF=3,
6GB heap, 200MB new size (also tested with greater new size/heap values),
enhanced limits.conf.
With 500 writes per second, the cluster works only for couple of hours. After that time it seems like not being able to respond because of CPU overload (mainly GC + compactions).
Nodes remain Up, but their load is huge and logs are full of GC infos and messages like:
ERROR [Native-Transport-Requests:186] 2013-12-10 18:38:12,412 ErrorMessage.java (line 210) Unexpected exception during request java.io.IOException: Broken pipe
nodetool shows many dropped mutations on each node:
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 7
BINARY 0
READ 2
MUTATION 4072827
_TRACE 0
REQUEST_RESPONSE 1769
Is 500 wps too much for 3-node cluster of m1.xlarge and I should add nodes? Or is it possible to further tune GC somehow? What load are you able to serve with 3 nodes of m1.xlarge? What are your GC configs?
Cassandra is perfectly able to handle tens of thousands small writes per second on a single node. I just checked on my laptop and got about 29000 writes/second from cassandra-stress on Cassandra 1.2. So 500 writes per second is not really an impressive number even for a single node.
However beware that there is also a limit on how fast data can be flushed to disk and you definitely don't want your incoming data rate to be close to the physical capabilities of your HDDs. Therefore 500 writes per second can be too much, if those writes are big enough.
So first - what is the average size of the write? What is your replication factor? Multiply number of writes by replication factor and by average write size - then you'll approximately know what is required write throughput of a cluster. But you should take some safety margin for other I/O related tasks like compaction. There are various benchmarks on the Internet telling a single m1.xlarge instance should be able to write anywhere between 20 MB/s to 100 MB/s...
If your cluster has sufficient I/O throughput (e.g. 3x more than needed), yet you observe OOM problems, you should try to:
reduce memtable_total_space_mb (this will cause C* to flush smaller memtables, more often, freeing heap earlier)
lower write_request_timeout to e.g. 2 seconds instead of 10 (if you have big writes, you don't want to keep too many of them in the incoming queues, which reside on the heap)
turn off row_cache (if you ever enabled it)
lower size of the key_cache
consider upgrading to Cassandra 2.0, which moved quite a lot of things off-heap (e.g. bloom filters and index-summaries); this is especially important if you just store lots of data per node
add more HDDs and set multiple data directories, to improve flush performance
set larger new generation size; I usually set it to about 800M for a 6 GB heap, to avoid pressure on the tenured gen.
if you're sure memtable flushing lags behind, make sure sstable compression is enabled - this will reduce amount of data physically saved to disk, at the cost of additional CPU cycles

Resources