Elastic Search and Logstash Performance Tuning - elasticsearch

In a Single Node Elastic Search along with logstash, We tested with 20mb and 200mb file parsing to Elastic Search on Different types of the AWS instance i.e Medium, Large and Xlarge.
Environment Details : Medium instance 3.75 RAM 1 cores Storage :4 GB SSD 64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search
Scenario: 1
**With default settings**
Result :
20mb logfile 23 mins Events Per/second 175
200mb logfile 3 hrs 3 mins Events Per/second 175
Added the following to settings:
Java heap size : 2GB
bootstrap.mlockall: true
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
index.translog.flush_threshold_ops: 50000
indices.memory.index_buffer_size: 50%
# Search thread pool
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100
**With added settings**
Result:
20mb logfile 22 mins Events Per/second 180
200mb logfile 3 hrs 07 mins Events Per/second 180
Scenario 2
Environment Details : R3 Large 15.25 RAM 2 cores Storage :32 GB SSD 64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search
**With default settings**
Result :
20mb logfile 7 mins Events Per/second 750
200mb logfile 65 mins Events Per/second 800
Added the following to settings:
Java heap size: 7gb
other parameters same as above
**With added settings**
Result:
20mb logfile 7 mins Events Per/second 800
200mb logfile 55 mins Events Per/second 800
Scenario 3
Environment Details :
R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Storage :32 GB SSD 64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search
**With default settings**
Result:
20mb logfile 7 mins Events Per/second 1200
200mb logfile 34 mins Events Per/second 1200
Added the following to settings:
Java heap size: 15gb
other parameters same as above
**With added settings**
Result:
20mb logfile 7 mins Events Per/second 1200
200mb logfile 34 mins Events Per/second 1200
I wanted to know
What is the benchmark for the performance?
Is the performance meets the benchmark or is it below the benchmark
Why even after i increased the elasticsearch JVM iam not able to find the difference?
how do i monitor Logstash and improve its performance?
appreciate any help on this as iam new to logstash and elastic search.

I think this situation is related to the fact that Logstash uses fixed size queues (The Logstash event processing pipeline)
Logstash sets the size of each queue to 20. This means a maximum of 20 events can be pending for the next stage. The small queue sizes mean that Logstash simply blocks and stalls safely when there’s a heavy load or temporary pipeline problems. The alternatives would be to either have an unlimited queue or drop messages when there’s a problem. An unlimited queue can grow unbounded and eventually exceed memory, causing a crash that loses all of the queued messages.
I think what you should try is to increase the worker count with the '-w' flag.
On the other hand many people say that Logstash should be scaled horizontally, rather that adding more cores and GB of ram (How to improve Logstash performance)

You have given Java Heap size correctly with respect to your total memory, but I think you are not utilizing it properly. I hope you have idea about what is fielddata size, the default is 60% of Heap size and you are reducing it to 30%.
I don't know why you are doing this, my perception might be wrong for your use-case but its good habit to allocate indices.fielddata.cache.size: "70%" or even 75%, but with this setting you must have to set something like indices.breaker.total.limit: "80%" to avoid Out Of Memory(OOM) exception. You can check this for further details on Limiting Memory Usage.

Related

Elasticsearch Indexing Speed Degrade with the Time

I am doing some performance tuning in elastic search for my project and I need some help in improving the elastic search indexing speed. I am using ES 5.1.1 and I have 2 nodes setup with 8 shards for the index. I have the servers for 2 nodes with 16GB RAM and 12CPUs allocated for each server with 2.2GHz clock speed. I need to index around 25,000,000 documents within 1.5 hours, which I am currently doing in around 4 hours. I have done the following config changes to improve the indexing time.
Setting ‘indices.store.throttle.type’ to ‘none’
Setting ‘refresh_interval’ to ‘-1’
Increasing ‘translog.flush_threshold_size’ to 1GB
Setting ‘number_of_replicas’ to ‘0’
Using 8 shards for the index
Setting VM Options -Xms8g -Xmx8g (Half of the RAM size)
I am using the bulk processor to generate the documents in my java application and I’m using the following configurations to setup the bulk processor.
Bulk Actions Count : 10000
Bulk Size in MB : 100
Concurrent Requests : 100
Flush Interval : 30
Initially I can index around 356167 in the first minute. But with the time, It decreases and after around 1 hour its around 121280 docs per minute.
How can I keep the indexing rate steady over the time? Is there any other ways to improve the performance?
I highly encourage not to change configuration parameters like the translog flush size, the throttling, unless you know what you are doing (and this does not mean reading some blog post on the internet :-)
Try a single shard per server and especially reduce the bulk size to something like 10MB. 100MB * 100 concurrent requests means you need 10GB of heap to deal with those (without doing anything else). I suppose not all of the documents get indexed because of your rejected tasks in your threadpools.
Start small and get bigger instead of starting big but not have any insight in your indexing.

Elasticsearch: High CPU usage by Lucene Merge Thread

I have a ES 2.4.1 cluster with 3 master and 18 data nodes which collects log data with a new index being created every day. In a day index size grows to about 2TB. Indexes older than 7 days get deleted. Very few searches are being performed on the cluster, so the main goal is to increase indexing throughput.
I see a lot of the following exceptions which is another symptom of what I am going to say next:
EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4#5a7d8a24 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor#5f9ef44f[Running, pool size = 8, active threads = 8, queued tasks = 50, completed tasks = 68888704]]];]];
The nodes in the cluster are constantly pegging CPU. I increased index refresh interval to 30s but that had little effect. When I check hot threads I see multiple "Lucene Merge Thread" per node using 100% CPU. I also noticed that segment count is constantly around 1000 per shard, which seems like a lot. The following is an example of a segment stat:
"_2zo5": {
"generation": 139541,
"num_docs": 5206661,
"deleted_docs": 123023,
"size_in_bytes": 5423948035,
"memory_in_bytes": 7393758,
"committed": true,
"search": true,
"version": "5.5.2",
"compound": false
}
Extremely high "generation" number worries me and I'd like to optimize segment creation and merge to reduce CPU load on the nodes.
Details about indexing and cluster configuration:
Each node is an i2.2xl AWS instance with 8 CPU cores and 1.6T SSD drives
Documents are indexed constantly by 6 client threads with bulk size 1000
Each index has 30 shards with 1 replica
It takes about 25 sec per batch of 1000 documents
/_cat/thread_pool?h=bulk*&v shows that bulk.completed are equally spread out across nodes
Index buffer size and transaction durability are left at default
_all is disabled, but dynamic mappings are enabled
The number of merge threads is left at default, which should be OK given that I am using SSDs
What's the best way to go about it?
Thanks!
Here are the optimizations I made to the cluster to increase indexing throughput:
Increased threadpool.bulk.queue_size to 500 because index requests were frequently overloading the queues
Increased disk watermarks, because default settings were too aggressive for the large SSDs that we were using. I set "cluster.routing.allocation.disk.watermark.low": "100gb" and "cluster.routing.allocation.disk.watermark.high": "10gb"
Deleted unused indexes to free up resources ES uses to manage their shards
Increased number of primary shards to 175 with the goal of keeping shard size under 50GB and have approximately a shard per processor
Set client index batch size to 10MB, which seemed to work very well for us because the size of documents indexed varied drastically (from KBs to MBs)
Hope this helps others
I have run similar workloads and your best bet is to run hourly indices and run optimize on older indices to keep segments in check.

Aggregate Resource Allocation for a job in YARN

I am new to Hadoop. When i run a job, i see the aggregate resource allocation for that job as 251248654 MB-seconds, 24462 vcore-seconds. However, when i find the details about the cluster, it shows there are 888 Vcores-total and 15.90 TB Memory-total. Can anyone tell me how this is related? what does MB-second and Vcore-seconds refer to for the job.
Is there any material online to know these? I tried surfing, dint get a proper answer
VCores-Total: Indicates the total number of VCores available in the cluster
Memory-Total: Indicates the total memory available in the cluster.
For e.g. I have a single node cluster, where, I have set memory requirements per container to be: 1228 MB (determined by config: yarn.scheduler.minimum-allocation-mb) and vCores per container to 1 vCore (determined by config: yarn.scheduler.minimum-allocation-vcores).
I have set: yarn.nodemanager.resource.memory-mb to 9830 MB. So, there can be totally 8 containers per node (9830 / 1228 = 8).
So, for my cluster:
VCores-Total = 1 (node) * 8 (containers) * 1 (vCore per container) = 8
Memory-Total = 1 (node) * 8 (containers) * 1228 MB (memory per container) = 9824 MB = 9.59375 GB = 9.6 GB
The figure below, shows my cluster metrics:
Now let's see "MB-seconds" and "vcore-seconds".
As per the description in the code (ApplicationResourceUsageReport.java):
MB-seconds: The aggregated amount of memory (in megabytes) the application has allocated times the number of seconds the application has been running.
vcore-seconds: The aggregated number of vcores that the application has allocated times the number of seconds the application has been running.
The description is self-explanatory (remember the keyword: Aggregated).
Let me explain this with an example.
I ran a DistCp job (which spawned 25 containers), for which I got the following:
Aggregate Resource Allocation: 10361661 MB-seconds, 8424 vcore-seconds
Now, let's do some rough calculation on how much time each container took:
For memory:
10361661 MB-seconds = 10361661 / 25 (containers) / 1228 MB (memory per container) = 337.51 seconds = 5.62 minutes
For CPU
8424 vcore-seconds = 8424 / 25 (containers) / 1 (vCore per container) = 336.96 seconds = 5.616 minutes
This indicates on an average, each container took 5.62 minutes to execute.
I hope this makes it clear. You can execute a job and confirm it yourself.

Elasticsearch indexing performance issues

We are facing some performance issues with elasticsearch in the last couple of days. As you can see on the screenshot, the indexing rate has some significant drops after the index reaches a certain size. At normal speed, we index arround 3000 logs per second. When the index we write to reaches a size of about ~10 GB, the rate drops.
We are using time based indices and arround 00:00, when a new Index is created by Logstash, the rates climb again to ~3000 logs per second (thats why we think its somehow related to the size of the index).
Server stats show nothing unusal at the CPU or memory stats (they are the same during drop-phases), but one of the servers has alot of I/O waits. Our Elasticsearch config is quite standard, with some adjustments to index performance (taken from the ES guide):
# If your index is on spinning platter drives, decrease this to one
# Reference / index-modules-merge
index.merge.scheduler.max_thread_count: 1
# allows larger segments to flush and decrease merge pressure
index.refresh_interval: 5s
# increase threshold_size from default when you are > ES 1.3.2
index.translog.flush_threshold_size: 1000mb
# JVM settings
bootstrap.mlockall: true (ES_HEAP SIZE is 50% of RAM)
We use two nodes. Both with 8 GB of RAM, 2 CPU cores and 300GB HDD size (dev environment).
I already saw clusters with alot bigger indices than ours. Do you guys have any idea what we could do to fix the issues?
BR
Edit:
Just ran into the performance issues again. Top sometimes shows arround 60% wa (wait), but iotop only reports about 1000 K/s read and write at max. I have no idea where these waits are coming from.

What is elastic search bounded by? Is it cpu, memory etc

I am running elastic search in my personal box.
Memory: 6GB
Processor: Intel® Core™ i3-3120M CPU # 2.50GHz × 4
OS: Ubuntu 12.04 - 64-bit
ElasticSearch Settings: Only running locally
Version : 1.2.2
ES_MIN_MEM=3g
ES_MAX_MEM=3g
threadpool.bulk.queue_size: 3000
indices.fielddata.cache.size: 25%
http.compression: true
bootstrap.mlockall: true
script.disable_dynamic: true
cluster.name: elasticsearch
index size: 252MB
Scenario:
I am trying to test the performance of my bulk queries/aggregations. The test case is to run asynchronous http requests to node.js which in turn will call elastic search. The tests are running from a Java method. Started with 50 requests at a time. Each request is divided and parallized in to two asynchronous(async.parallel) bulk queries in node.js. I am using node-elasticsearch api (uses elasticsearch 1.3 api). The two bulk queries contain 13 and 10 queries respectively.And the two are asynchronously sent to elastic search from node.js. When the Elastic Search returns, the query results are combined and sent back to the test case.
Observations:
I see that all the cpu cores are utilized 100%. Memory is utilized around 90%. The response time for all 50 requests combined is 30 seconds. If I run just the single queries each alone, in the bulk queries, each are returning in less than 100 milli-seconds. Node.js is taking negligible time to forward requests to elastic search and combine responses from elastic search.
Even if run the test case synchronously from java, the response time does not change. I may say that elastic search is not doing parallel processing. Is this because I am CPU or memory bound? One more observation: if I change heap size for elastic search from 1 - 3GB, the response time does not change.
Also I am pasting top command output:
top - 18:04:12 up 4:29, 5 users, load average: 5.93, 5.16, 4.15
Tasks: 224 total, 3 running, 221 sleeping, 0 stopped, 0 zombie
Cpu(s): 98.2%us, 1.0%sy, 0.0%ni, 0.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 5955796k total, 5801920k used, 153876k free, 1548k buffers
Swap: 6133756k total, 708336k used, 5425420k free, 460436k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17410 root 20 0 7495m 3.3g 27m S 366 58.6 5:09.57 java
15356 rmadd 20 0 1015m 125m 3636 S 19 2.2 1:14.03 node
Questions:
Is this expected, because I am running Elastic Search in my local machine and not in a cluster? Can I improve my performance in my local machine? I would definitely start a cluster. But I want to know, how to improve the performance scalably. What is it that the elastic search is bound to?
I am not able to find this in forums. And am sure this would help others. Thanks for your help.

Resources